The hash function is responsible for the constant access time (best cast) of the unordered associative containers. As ever, C++ offers a lot of ways to adjust the behavior of the hash functions. On one hand, C++ has a lot of different hash functions; on the other hand, you can define your own hash function. You can even adjust the number of buckets.
Before I write about the hash functions, I want to have at first a closer look at the declaration of the unordered associative containers. So we will not lose the big picture. I take std::unordered_map as the most prominent unordered associative container.
std::unordered_map
The declaration of the hash table std::unordered_map reveals a lot of interesting details.
template<
class Key,
class Val,
class Hash = std::hash<Key>,
class KeyEqual = std::equal_to<Key>,
class Allocator = std::allocator< std::pair<const Key, Val> >
> class unordered_map;
Let's have a closer look at the template parameters. The std::unordered_map associates the key (Key) with its value (Val). The remaining three template parameters are derived from the type of the key and the type of the value. This statement holds true for the hash function (Hash), the equality function (KeyEqual), and the allocator (Allocator). Therefore, it's quite easy to instantiate a std::unordered_map char2int.
std::unordered_map<char,int> char2int;
Now it gets more interesting. By default, the hash function std::hash<Key> and the equality function std::equal_to<Key> is used. Therefore, you can instantiate a std::unordered_map that has a special hash or equality function. But stop. Why do we need an equality function? The hash function maps the key to the hash value. The hash value determines the bucket. This mapping can cause a collision. That means, different keys go into the same bucket. The std::unorderd_map has to deal with these collisions. To do that, it uses the equality function. Only for completeness reasons. You can adjust the allocation strategy of the container with the Allocator.
Which requirements have the key and the value of an unordered associative container to fulfill?
The key has to be
- comparable with the equality function,
- available as a hash value,
- copyable and moveable.
The value has to be
- default constructible,
- copyable and moveable.
The hash function
A hash function is good if their mapping from the keys to the values produces few collisions and the hash values are uniformly distributed among the buckets. Because the execution time of the hash function is constant, the access time of the elements can also be constant. Instead of that, the access time in the bucket is linear. Therefore, the overall access time of a value depends on the number of collisions in the bucket, respectively.
The hash function
- is available for fundamental data types like booleans, integers, and floating points.
- is available for the data types std::string and std::wstring,
- creates for C string const char* a hash value of the pointer address,
- can be defined for user-defined data types.
By applying the theory to my own data types, which I want to use as the key of an unordered associative container, my data type must fulfill the two requirements: a hash function and an equality function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
|
// hashFunction.cpp
#include <iostream>
#include <ostream>
#include <unordered_map>
struct MyInt{
MyInt(int v):val(v){}
bool operator== (const MyInt& other) const {
return val == other.val;
}
int val;
};
struct MyHash{
std::size_t operator()(MyInt m) const {
std::hash<int> hashVal;
return hashVal(m.val);
}
};
struct MyAbsHash{
std::size_t operator()(MyInt m) const {
std::hash<int> hashVal;
return hashVal(abs(m.val));
}
};
struct MyEq{
bool operator() (const MyInt& l, const MyInt& r) const {
return abs(l.val) == abs(r.val);
}
};
std::ostream& operator << (std::ostream& strm, const MyInt& myIn){
strm << "MyInt(" << myIn.val << ")";
return strm;
}
int main(){
std::cout << std::endl;
std::hash<int> hashVal;
// a few hash values
for ( int i= -2; i <= 1 ; ++i){
std::cout << "hashVal(" << i << "): " << hashVal(i) << std::endl;
}
std::cout << std::endl;
typedef std::unordered_map<MyInt,int,MyHash> MyIntMap;
std::cout << "MyIntMap: ";
MyIntMap myMap{{MyInt(-2),-2},{MyInt(-1),-1},{MyInt(0),0},{MyInt(1),1}};
for(auto m : myMap) std::cout << '{' << m.first << ',' << m.second << '}';
std::cout << std::endl << std::endl;
typedef std::unordered_map<MyInt,int,MyAbsHash,MyEq> MyAbsMap;
std::cout << "MyAbsMap: ";
MyAbsMap myAbsMap{{MyInt(-2),-2},{MyInt(-1),-1},{MyInt(0),0},{MyInt(1),1}};
for(auto m : myAbsMap) std::cout << '{' << m.first << ',' << m.second << '}';
std::cout << std::endl << std::endl;
std::cout << "myAbsMap[MyInt(-2)]: " << myAbsMap[MyInt(-2)] << std::endl;
std::cout << "myAbsMap[MyInt(2)]: " << myAbsMap[MyInt(2)] << std::endl;
std::cout << "myAbsMap[MyInt(3)]: " << myAbsMap[MyInt(3)] << std::endl;
std::cout << std::endl;
}
|
My analysis of the program starts with the main function. The easiest way to get the program is to examine the output closely. I create in line 44 the hash function hasVal and use them to calculate the hash values in line 48. hasVal returns a hash value of type std::size_t. std::size_t stands for a sufficiently big enough unsigned integer. MyIntMap in line 53 defines a new name for a type. This type uses MyInt (lines 7 - 13) as a key. Now, MyIntMap needs a hash function and an equality function. It uses MyHash (line 15 -20) as a hash function. The hash function uses the hash function of the data type int
internally. I already overload the equality function for MyInt.
MyAbsMap follows a different strategy. According to its name, MyAbsMap creates its hash value based on the absolute value of the integer (line 25). I use the class MyEq with the overloaded call operator as an equality function. MyEq is only interested in the absolute value of the integer. The output shows that the hash function of MyAbsMap returns the same hash value for different keys. The result is that the hash value for MyInt(-2) (line 70) is identical to the hash value of MyInt(2). This holds, although I didn't initialize MyAbsMap with the pair (MyInt(2),2).

What's next?
One piece of the puzzle is still missing to better understand hash tables. The hash function maps the key onto the value. Therefore, the hash functions map the key of type int or std::string to its bucket. How is that possible? On one hand, we have an almost infinite number of keys but only a finite number of buckets. But that is not the only question I have. How many elements go to one bucket? Or, to say it differently. How often does a collision occur? Question to which the next post will give an answer.
Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Animus24, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Matthieu Bolt, Stephen Kelley, Kyle Dean, Tusar Palauri, Dmitry Farberov, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, Dominik Vošček, and Rob North.
Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, and Slavko Radman.
My special thanks to Embarcadero 
My special thanks to PVS-Studio 
My special thanks to Tipi.build 
Seminars
I'm happy to give online seminars or face-to-face seminars worldwide. Please call me if you have any questions.
Bookable (Online)
German
Standard Seminars (English/German)
Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.
- C++ - The Core Language
- C++ - The Standard Library
- C++ - Compact
- C++11 and C++14
- Concurrency with Modern C++
- Design Pattern and Architectural Pattern with C++
- Embedded Programming with Modern C++
- Generic Programming (Templates) with C++
New
- Clean Code with Modern C++
- C++20
Contact Me
- Phone: +49 7472 917441
- Mobil:: +49 176 5506 5086
- Mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
- German Seminar Page: www.ModernesCpp.de
- Mentoring Page: www.ModernesCpp.org
Modernes C++,

Comments
That is the very first time I frequented your website page and so far?
I amazed with the research you made to make this actual publish extraordinary.
Fantastic process!
error, while I was looking on Aol for something else, Nonetheless I am here now
and would just like to say many thanks for a marvelous post
and a all round entertaining blog (I also love the theme/design), I don’t have time to
look over it all at the moment but I have saved it and also added in your
RSS feeds, so when I have time I will be back to read much
more, Please do keep up the excellent b.
RSS feed for comments to this post