Hash Functions

Contents[Show]

The reason for the constant access time (best cast) of the unordered associative containers are the hash functions. As ever, C++ offers a lot of ways to adjust the behaviour of the hash functions. On one hand, C++ has a lot of different hash functions; on the other hand, you can define your one hash function. You can even adjust the number of buckets.

 

Before I write about the hash functions, I want to have at first a closer look to the declaration of the unordered associative containers. So we will not lose the big picture. I take std::unordered_map as the most prominent unordered associative container.

std::unordered_map

The declaration of the hash table std::unordered_map reveals a lot of interesting details.

template<
    class Key,
    class Val,
    class Hash = std::hash<Key>,
    class KeyEqual = std::equal_to<Key>,
    class Allocator = std::allocator< std::pair<const Key, Val> >
> class unordered_map;

 

Let's have a closer look at the template parameters. The std::unordered_map associates the key (Key) with its value (Val). The remaining three template parameters are derived from the type of the key and the type of the value. This statement holds true for the hash function (Hash), the equality function (KeyEqual), and the allocator (Allocator). Therefore, it's quite easy to instantiate a std::unordered_map char2int.

std::unordered_map<char,int> char2int;

Now it gets more interesting.By default, the hash function std::hash<Key> and the equality function std::equal_to<Key> is used. Therefore, you can instantiate a std::unordered_map that has a special hash or equality function. But stop. Why do we need an equality function? The hash functions maps the key to the hash value. The hash value determines the bucket. This mapping can cause a collision. That means, different keys goes into the same bucket. The std::unorderd_map has to deal with these collisions. To do that, it uses the equality function. Only for completeness reasons. You can adjust the allocation strategy of the container with the Allocator.

Which requirements have the key and the value of an unordered associative container to fulfil?

The key has to be

  • comparable with the equality function,
  • available as hash value,
  • copyable and moveable.

The value has to be

  • default constructable,
  • copyable and moveable.

The hash function

A hash function is good if their mapping from the keys to the values produces few collisions and the hash values are uniform distributed among the buckets. Because the execution time of the hash function is constant, the access time of the elements can also be constant. Instead of that, the access time in the bucket is linear. Therefore, the overall access time of a value depends on the number of collisions in the bucket, respectively.

The hash function

  • is available for the fundamental data types like booleans, integers, and floating points.
  • is available for the data types std::string and std::wstring,
  • creates for C string const char* a hash value of the pointer address,
  • can be defined for user-defined data types.

By applying the theory to my own data types, which I want to use as key of an unordered associative container, my data type has to fulfil the two requirements: it needs a hash function and an equality function.

 
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
// hashFunction.cpp

#include <iostream>
#include <ostream>
#include <unordered_map>

struct MyInt{
  MyInt(int v):val(v){}
  bool operator== (const MyInt& other) const {
    return val == other.val;
  }
  int val;
};

struct MyHash{
  std::size_t operator()(MyInt m) const {
    std::hash<int> hashVal;
    return hashVal(m.val);
  }
};

struct MyAbsHash{
  std::size_t operator()(MyInt m) const {
    std::hash<int> hashVal;
    return hashVal(abs(m.val));
  }
};

struct MyEq{
  bool operator() (const MyInt& l, const MyInt& r) const {
    return abs(l.val) ==  abs(r.val);
  }
};

std::ostream& operator << (std::ostream& strm, const MyInt& myIn){
  strm << "MyInt(" << myIn.val << ")";
  return strm;
}

int main(){

  std::cout << std::endl;

  std::hash<int> hashVal;

  // a few hash values
  for ( int i= -2; i <= 1 ; ++i){
    std::cout << "hashVal(" << i << "): " << hashVal(i) << std::endl;
  }

  std::cout << std::endl;

  typedef std::unordered_map<MyInt,int,MyHash> MyIntMap;

  std::cout << "MyIntMap: ";
  MyIntMap myMap{{MyInt(-2),-2},{MyInt(-1),-1},{MyInt(0),0},{MyInt(1),1}};

  for(auto m : myMap) std::cout << '{' << m.first << ',' << m.second << '}';

  std::cout << std::endl << std::endl;

  typedef std::unordered_map<MyInt,int,MyAbsHash,MyEq> MyAbsMap;
  std::cout << "MyAbsMap: ";
  MyAbsMap myAbsMap{{MyInt(-2),-2},{MyInt(-1),-1},{MyInt(0),0},{MyInt(1),1}};

  for(auto m : myAbsMap) std::cout << '{' << m.first << ',' << m.second << '}';

  std::cout << std::endl << std::endl;

  std::cout << "myAbsMap[MyInt(-2)]: " << myAbsMap[MyInt(-2)] << std::endl;
  std::cout << "myAbsMap[MyInt(2)]: " << myAbsMap[MyInt(2)] << std::endl;
  std::cout << "myAbsMap[MyInt(3)]: " << myAbsMap[MyInt(3)] << std::endl;

  std::cout << std::endl;

}
 

 

My analysis of the program starts in the main function. The easiest way to get the program is to keep a close look at the output. I create in line 44 the hash function hasVal and use them to calculate the hash values in line 48. hasVal returns a hash value of type std::size_t. std::size_t stands for a sufficient big enough unsigned integer.  MyIntMap in line 53 defines a new name for a type. This type uses MyInt (line 7 - 13) as key. Now, MyIntMap needs a hash function and an equality function. It uses MyHash (line 15 -20) as hash function. The hash function uses internally the hash function of the data type int. I already overload the equality function for MyInt.

MyAbsMap follows a different strategy. According to its name, MyAbsMap creates its hash value based on the absolute value of the integer (line 25). I use the class MyEq with the overloaded call operator as equality function. MyEq is only interested in the absolute value of the integer. The output shows that the hash functions of MyAbsMap returns for different keys the same hash value. The result is that the hash value for MyInt(-2) (line 70) is identical to the hash value of MyInt(2). This holds, although I didn't initialize MyAbsMap with the pair (MyInt(2),2).

 

 hashfunction

 

What's next?

One piece of the puzzle is still missing to get a better understanding of hash tables. The hash functions maps the key onto the value. Therefore, the hash functions maps the key of type int or std::string to its bucket. How is that possible? On one hand, we have an almost infinite number of keys but only a finite number of buckets. But that is not the only question I have. How many elements goes to one bucket? Or, to say it differently. How often does a collision occur? Question to which the next post will give an answer.

 

 

 

 

 

 

 

 

title page smalltitle page small Go to Leanpub/cpplibrary "What every professional C++ programmer should know about the C++ standard library".   Get your e-book. Support my blog.

 

Comments   

0 #1 internet businesses 2016-12-12 21:06
A person necessarily assist to make critically posts I'd state.
That is the very first time I frequented your website page and so far?

I amazed with the research you made to make this actual publish extraordinary.
Fantastic process!
Quote

Add comment


My Newest E-Book

Latest comments

Subscribe to the newsletter (+ pdf bundle)

Blog archive

Source Code

Visitors

Today 168

All 277622

Currently are 143 guests and no members online