New Issue: User-defined hash functions for Map

18291, "bmcdonald3", "User-defined hash functions for Map", "2021-08-24T22:21:57Z"

Summary

Currently, the Chapel Map uses the default hash function in all scenarios, so even if the user wanted to use their own hash function, they are not able to. Many languages have hash table implementations that allow users to provide their own hash functions, and I also have observed performance improvements in certain instances with this change. I am proposing that we add this functionality to the map API.

Proposed Solutions

Proposal 1: Detect if there is a hash method provided on the key type

  proc int.hash() {
    return calculateHash();
  }
  var myMap:map(int,string);
  // will use the above hash since it is defined on int, even though not explicitly stated by user
  myMap[1] = "hello";

Proposal 2: Allow map to accept a hash record or first class function

proc doHash(arg, seed:uint): uint {
  return calculateHash();
}
var myMap2: map(int, string, doHash);
myMap2[1] = "hello 2";

Pros/cons of proposed solutions

  1. Detect if there is a hash method provided on the key type:
    a. Pros: Simple to use, not much extra work by user
    b. Cons: Since hash method now has special meaning, it might no longer be used in applications for other purposes, not very intuitive; no one would know how to use it unless they had seen the documentation
  2. Allow map to accept a hash record or first class function:
    a. Pros: Other implementations do this, such as the C++ one that our program is based on so there is precedence
    b. Cons: Changes the Map type to include the "hasher" type when specified, also, a little bit more work to program for a user in my opinion

More info for the highly interested

I ran into this while running performance tests on the k-nucleotide benchmark, where the C++ implementation that our implementation is based on allows a user to pass a hash function. The k-nucleotide implementation does the hashing with a required hash function before using the map, so we are double hashing in that case. If you want more info on the result of using the built in/user defined hash, see: https://github.com/Cray/chapel-private/issues/2365

C++ implementation usage:

struct T {
  ... other stuff ...
   struct hash{
     uint64_t operator()(const T& t)const{ return t.data; }
   };
};

cc_hash_table<T,unsigned,T::hash> myMap;