More special Friends with std::map and std::unordered_map

Contents[Show]

Modern C++ has eight associative containers, but your special friends should be std::map and std::unordered_map. Why? Let me explain it in this post.

 

book 159880 1280

In my last post C++ Core Guidelines:  std::array and std::vector are your friends, I stated: In 99 % of your use-cases, you are totally fine with a std::array or a std::vector. A similar statement exists for associative containers: In 95 % of your use-cases, you are totally fine with a std::map or std::unordered_map.  In seldom cases, you don't need the value which is associated with the key. This are the missing 5 %. Before I begin this post and give an overview and numbers to both associative containers, here is my rule of thumb for today: If you want to have a container with a key/value association and the keys should be ordered, use std::map; if not use a std::unordered_map.

Here is the first overview. For more details, read my previous posts about associative containers.

The eight Variations

OrderedVersusUnorderedContainers

To get an ordering into the eight variations of associative containers, you have to answer three questions. Each question can be answered by yes or no. 2 ^ 3 == 8. Here are the three questions:

  1. Is the container ordered?
  2. Does the key have an associated value?
  3. Are several identical keys possible?

And here are the answers.

  1. When the container is not ordered, it's called unordered.
  2. When the key does have a value associated, it's called map; if not set.
  3. When the container can have more than one identical key, it's called multi.

When I speak of the ordered container, I mean the ordering of the keys.

Maybe this taxonomy was to complicated. Let me give you a more straightforward picture.

A Phone Book

The eight variations are just different versions of a phone book. What is a phone book? A phone book is a sequence of key/value pairs. You use the keys (family names) to get the values (phone numbers).

The family names of a phone book can be ordered or be unordered, the phone book can have a phone number associated with the family name or not, and can have only one family name or more identical family names. If you want to store your mobile number and your landline number in a phone book, you are quite happy that you can use two identical keys.

The reason for this post is not to explaint the associative containers: The reason is a different one. The access time to an ordered associative container is logarithmic, but the access time to an unordered associative container is amortised constant. 

Performance of a std::map and a std::unordered::map

What does amortised constant access time mean for an unordered associative container such as std::unordered_map? It means that your query for a phone number is independent of the size of the phone book. Don't you believe me? Let me present you a performance test.

I have a phone book with roughly 89,000 entries. I will increase its size successively by ten until it has almost 89,000,000 entries. After each step, I will ask for all its phone numbers. This means I randomly use all family names. 

The following image shows you a part of the initial phone book. You can see the name/number pairs separated by a colon and the name separated from the number by a comma.

telebook

 The program should be quite easy to read.

 

// telephoneBook.cpp

#include <chrono>
#include <fstream>
#include <iostream>
#include <map>
#include <random>
#include <regex>
#include <sstream>
#include <string>
#include <unordered_map>
#include <vector>

using map = std::unordered_map<std::string, int>;                   // (1)

std::ifstream openFile(const std::string& myFile){                  

  std::ifstream file(myFile, std::ios::in);
  if ( !file ){
    std::cerr << "Can't open file "+ myFile + "!" << std::endl;
    exit(EXIT_FAILURE);
  }
  return file;
  
}

std::string readFile(std::ifstream file){                        
    
    std::stringstream buffer;
    buffer << file.rdbuf();
    
    return buffer.str();
    
}

map createTeleBook(const std::string& fileCont){                 
    
    map teleBook; 
    
    std::regex regColon(":");
    
    std::sregex_token_iterator fileContIt(fileCont.begin(), fileCont.end(), regColon, -1);
    const std::sregex_token_iterator fileContEndIt;
    
    std::string entry;
    std::string key;
    int value;
    while (fileContIt != fileContEndIt){                               // (2)
        entry = *fileContIt++;
        auto comma = entry.find(",");                                  // (3)
        key = entry.substr(0, comma);
        value = std::stoi(entry.substr(comma + 1, entry.length() -1));
        teleBook[key] = value;                                         // (4)
    }
    return teleBook;
    
}

std::vector<std::string> getRandomNames(const map& teleBook){    
    
    std::vector<std::string> allNames;
    for (const auto& pair: teleBook) allNames.push_back(pair.first);   // (5)
    
    std::random_device randDev;
    std::mt19937 generator(randDev());
 
    std::shuffle(allNames.begin(), allNames.end(), generator);         // (6) 
    
    return allNames;
}
    
void measurePerformance(const std::vector<std::string>& names, map& m){   
        
    auto start = std::chrono::steady_clock::now();
    for (const auto& name: names) m[name];                              // (7)
    std::chrono::duration<double> dur= std::chrono::steady_clock::now() - start;
    std::cout << "Access time: " << dur.count() << " seconds" << std::endl;
    
}
    
int main(int argc, char* argv[]){

    std::cout << std::endl;
  
    // get the filename
    std::string myFile;
    if ( argc == 2 ){
        myFile= {argv[1]};
    }
    else{
        std::cerr << "Filename missing !" << std::endl;
        exit(EXIT_FAILURE);
    } 
  
    std::ifstream file = openFile(myFile);
  
    std::string fileContent = readFile(std::move(file));
  
    map teleBook = createTeleBook(fileContent);
  
    std::cout << "teleBook.size(): " <<  teleBook.size() << std::endl;
  
    std::vector<std::string> randomNames = getRandomNames(teleBook);
  
    measurePerformance(randomNames, teleBook); 
    
    std::cout << std::endl;
    
}

 

Let me start in the main program. I open the file, read the content, create a phone book (std::map or std::unordered_map), get an arbitrary permutation of the family names, and make the performance test finally. Okay, this was too concise.

Line 1 is the most interesting one. A std::unordered_map supports a superset of the interface of a std::map. This makes it quite convenient for me to make my performance test. I first did it by using map = std::map<std::string, int>; and then changed the line to  using map = std::unordered_map<std::string, int>;.  The according relations holds for the pairs (std::set/std::unordered_set),(std::mulitset, std::unordered_multiset), and (std::multimap, std::unordered_multimap). I assume the following functions are also quite interesting for you:

  • createTeleBook
    • the while loop iterates over all name/number tokens, created by the regular expression regColon (line 2)
    • each token is separated by the comma (line 3)
    • at the end, the name/number pair is added to the phone book (line 4)
  • getRandomNames
    • puts all names onto a vector (line 5)
    • shuffles the names (line 6)
  • measurePerformance
    • asks for each name in the phone book (line 7)

And now, finally to the performance numbers for a std::map and a std::unordered_map.

std::map

telebookMap

std::unordered_map

 telebookUnordered

The screenshots show precisely how big the phone books are. The numbers confirm the access time, I showed in the first table: The access time of a std::map depends logarithmic on its size and the access time of a std::unordered_map is amortised constant. The following plot shows the performance relation between a std::map and a std::unordered_map.

comparison

For 100,000 entries the std::map is 3 times slower than the std::unordered_map and for 100,000,000 entries 7 1/2 times slower.

What's next?

After this small detour from the C++ core guidelines, I will write in my next post about bounds error and how to avoid them.

 

 

Thanks a lot to my Patreon Supporters: Paul Baxter,  Meeting C++, Matt Braun, Avi Lachmish, Roman Postanciuc, Venkata Ramesh Gudpati, Tobias Zindl, Marko, Ramesh Jangama, G Prvulovic, Reiner Eiteljörge, Benjamin Huth, Reinhold Dröge, Timo, Abernitzke, Richard Ohnemus , Frank Grimm, Sakib, and Broeserl.

Thanks in particular to:  TakeUpCode 450 60     crp4

 

Get your e-book at Leanpub:

The C++ Standard Library

 

Concurrency With Modern C++

 

Get Both as one Bundle

cover   ConcurrencyCoverFrame   bundle
With C++11, C++14, and C++17 we got a lot of new C++ libraries. In addition, the existing ones are greatly improved. The key idea of my book is to give you the necessary information to the current C++ libraries in about 200 pages.  

C++11 is the first C++ standard that deals with concurrency. The story goes on with C++17 and will continue with C++20.

I'll give you a detailed insight in the current and the upcoming concurrency in C++. This insight includes the theory and a lot of practice with more the 100 source files.

 

Get my books "The C++ Standard Library" (including C++17) and "Concurrency with Modern C++" in a bundle.

In sum, you get more than 600 pages full of modern C++ and more than 100 source files presenting concurrency in practice.

 

Get your interactive course

 

Modern C++ Concurrency in Practice

C++ Standard Library including C++14 & C++17

educative CLibrary

Based on my book "Concurrency with Modern C++" educative.io created an interactive course.

What's Inside?

  • 140 lessons
  • 110 code playgrounds => Runs in the browser
  • 78 code snippets
  • 55 illustrations

Based on my book "The C++ Standard Library" educative.io created an interactive course.

What's Inside?

  • 149 lessons
  • 111 code playgrounds => Runs in the browser
  • 164 code snippets
  • 25 illustrations
 

 

 

Add comment


My Newest E-Books

Course: Modern C++ Concurrency in Practice

Course: C++ Standard Library including C++14 & C++17

Course: Embedded Programming with Modern C++

Course: Generic Programming (Templates)

Subscribe to the newsletter (+ pdf bundle)

Blog archive

Source Code

Visitors

Today 13

All 2998153

Currently are 194 guests and no members online

Kubik-Rubik Joomla! Extensions

Latest comments