18291, "bmcdonald3", "User-defined hash functions for Map", "2021-08-24T22:21:57Z"
Summary
Currently, the Chapel Map uses the default hash function in all scenarios, so even if the user wanted to use their own hash function, they are not able to. Many languages have hash table implementations that allow users to provide their own hash functions, and I also have observed performance improvements in certain instances with this change. I am proposing that we add this functionality to the map
API.
Proposed Solutions
Proposal 1: Detect if there is a hash
method provided on the key type
proc int.hash() {
return calculateHash();
}
var myMap:map(int,string);
// will use the above hash since it is defined on int, even though not explicitly stated by user
myMap[1] = "hello";
Proposal 2: Allow map
to accept a hash record or first class function
proc doHash(arg, seed:uint): uint {
return calculateHash();
}
var myMap2: map(int, string, doHash);
myMap2[1] = "hello 2";
Pros/cons of proposed solutions
- Detect if there is a
hash
method provided on the key type:
a. Pros: Simple to use, not much extra work by user
b. Cons: Sincehash
method now has special meaning, it might no longer be used in applications for other purposes, not very intuitive; no one would know how to use it unless they had seen the documentation - Allow
map
to accept a hash record or first class function:
a. Pros: Other implementations do this, such as the C++ one that our program is based on so there is precedence
b. Cons: Changes the Map type to include the "hasher" type when specified, also, a little bit more work to program for a user in my opinion
More info for the highly interested
I ran into this while running performance tests on the k-nucleotide benchmark, where the C++ implementation that our implementation is based on allows a user to pass a hash function. The k-nucleotide implementation does the hashing with a required hash function before using the map, so we are double hashing in that case. If you want more info on the result of using the built in/user defined hash, see: https://github.com/Cray/chapel-private/issues/2365
C++ implementation usage:
struct T {
... other stuff ...
struct hash{
uint64_t operator()(const T& t)const{ return t.data; }
};
};
cc_hash_table<T,unsigned,T::hash> myMap;