More special Friends with std::map and std::unordered_map
Modern C++ has eight associative containers, but your special friends should be std::map and std::unordered_map. Why? Let me explain it in this post.
In my last post, C++ Core Guidelines: std::array and std::vector are your friends; I stated: In 99 % of your use-cases, you are lovely with a std::array or a std::vector. A similar statement exists for associative containers: In 95 % of your use cases, you are lovely with a std::map or std::unordered_map. In seldom cases, you don’t need the value associated with the key. These are the missing 5 %. Before I begin this post and give an overview and numbers to both associative containers, here is my rule of thumb for today: If you want a container with a key/value association and the keys should be ordered, use std::map; if not, use a std::unordered_map.
Here is the first overview. For more details, please read my previous posts about associative containers.
The eight Variations
To get an ordering into the eight variations of associative containers, you have to answer three questions. Each question can be answered by yes or no. 2 ^ 3 == 8. Here are the three questions:
- Is the container ordered?
- Does the key have an associated value?
- Are several identical keys possible?
And here are the answers.
- When the container is not ordered, it’s called unordered.
- When the key does have a value associated, it’s called map; if not set.
- When the container can have more than one identical key, it’s called multi.
When I speak of the ordered container, I mean the ordering of the keys.
Maybe this taxonomy was too complicated. Let me give you a more straightforward picture.
Modernes C++ Mentoring
Do you want to stay informed: Subscribe.
A Phone Book
The eight variations are just different versions of a phone book. What is a phone book? A phone book is a sequence of key/value pairs. You use the keys (family names) to get the values (phone numbers).
The family names of a phone book can be ordered or unordered; the phone book can have a phone number associated with the family name or not and can have only one or more identical family names. If you want to store your mobile and landline numbers in a phone book, you are pretty happy that you can use two identical keys.
The reason for this post is not to explain the associative containers: The reason is a different one. The access time to an ordered associative container is logarithmic, but the access time to an unordered associative container is amortized constant.
Performance of a std::map and a std::unordered::map
What does amortized constant access time mean for an unordered associative container such as std::unordered_map? It means that your query for a phone number is independent of the size of the phone book. Don’t you believe me? Let me show you a performance test.
I have a phone book with roughly 89,000 entries. I will increase its size successively by ten until it has almost 89,000,000 entries. After each step, I will ask for all its phone numbers. This means I randomly use all family names.
The following image shows you a part of the initial phone book. You can see the name/number pairs separated by a colon and the name separated from the number by a comma.
The program should be pretty easy to read.
// telephoneBook.cpp #include <chrono> #include <fstream> #include <iostream> #include <map> #include <random> #include <regex> #include <sstream> #include <string> #include <unordered_map> #include <vector> using map = std::unordered_map<std::string, int>; // (1) std::ifstream openFile(const std::string& myFile){ std::ifstream file(myFile, std::ios::in); if ( !file ){ std::cerr << "Can't open file "+ myFile + "!" << std::endl; exit(EXIT_FAILURE); } return file; } std::string readFile(std::ifstream file){ std::stringstream buffer; buffer << file.rdbuf(); return buffer.str(); } map createTeleBook(const std::string& fileCont){ map teleBook; std::regex regColon(":"); std::sregex_token_iterator fileContIt(fileCont.begin(), fileCont.end(), regColon, -1); const std::sregex_token_iterator fileContEndIt; std::string entry; std::string key; int value; while (fileContIt != fileContEndIt){ // (2) entry = *fileContIt++; auto comma = entry.find(","); // (3) key = entry.substr(0, comma); value = std::stoi(entry.substr(comma + 1, entry.length() -1)); teleBook[key] = value; // (4) } return teleBook; } std::vector<std::string> getRandomNames(const map& teleBook){ std::vector<std::string> allNames; for (const auto& pair: teleBook) allNames.push_back(pair.first); // (5) std::random_device randDev; std::mt19937 generator(randDev()); std::shuffle(allNames.begin(), allNames.end(), generator); // (6) return allNames; } void measurePerformance(const std::vector<std::string>& names, map& m){ auto start = std::chrono::steady_clock::now(); for (const auto& name: names) m[name]; // (7) std::chrono::duration<double> dur= std::chrono::steady_clock::now() - start; std::cout << "Access time: " << dur.count() << " seconds" << std::endl; } int main(int argc, char* argv[]){ std::cout << std::endl; // get the filename std::string myFile; if ( argc == 2 ){ myFile= {argv[1]}; } else{ std::cerr << "Filename missing !" << std::endl; exit(EXIT_FAILURE); } std::ifstream file = openFile(myFile); std::string fileContent = readFile(std::move(file)); map teleBook = createTeleBook(fileContent); std::cout << "teleBook.size(): " << teleBook.size() << std::endl; std::vector<std::string> randomNames = getRandomNames(teleBook); measurePerformance(randomNames, teleBook); std::cout << std::endl; }
Let me start with the main program. I open the file, read the content, create a phone book (std::map or std::unordered_map), get an arbitrary permutation of the family names, and make the performance test finally. Okay, this was too concise.
Line 1 is the most interesting one. A std::unordered_map supports a superset of the interface of a std::map. This makes it quite convenient for me to make my performance test. I first did it by using map = std::map<std::string, int>; and then changed the line to using map = std::unordered_map<std::string, int>;. The according to relations holds for the pairs (std::set/std::unordered_set),(std::mulitset, std::unordered_multiset), and (std::multimap, std::unordered_multimap). I assume the following functions are also quite interesting for you:
- createTeleBook
- the while loop iterates over all name/number tokens created by the regular expression regColon (line 2)
- each token is separated by a comma (line 3)
- in the end, the name/number pair is added to the phone book (line 4)
- getRandomNames
- puts all names onto a vector (line 5)
- shuffles the names (line 6)
- measurePerformance
- asks for each name in the phone book (line 7)
And now, finally, the performance numbers for a std::map and a std::unordered_map.
std::map
std::unordered_map
The screenshots show precisely how big the phone books are. The numbers confirm the access time I showed in the first table: The access time of a std::map depends logarithmic on its size, and the access time of a std::unordered_map is amortised constant. The following plot shows the performance relation between a std::map and a std::unordered_map.
For 100,000 entries, the std::map is three times slower than the std::unordered_map, and for 100,000,000 entries 7 1/2 times slower.
What’s next?
After this slight detour from the C++ core guidelines, I will write in my next post about bounds errors and how to avoid them.
Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Stephen Kelley, Kyle Dean, Tusar Palauri, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, Rob North, Bhavith C Achar, Marco Parri Empoli, Philipp Lenk, Charles-Jianye Chen, Keith Jeffery, Matt Godbolt, and Honey Sukesan.
Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, Slavko Radman, and David Poole.
My special thanks to Embarcadero | |
My special thanks to PVS-Studio | |
My special thanks to Tipi.build | |
My special thanks to Take Up Code | |
My special thanks to SHAVEDYAKS |
Modernes C++ GmbH
Modernes C++ Mentoring (English)
Rainer Grimm
Yalovastraße 20
72108 Rottenburg
Mail: schulung@ModernesCpp.de
Mentoring: www.ModernesCpp.org
Modernes C++ Mentoring,
Leave a Reply
Want to join the discussion?Feel free to contribute!