C++ Core Guidelines: std::array and std::vector are your Friends

Contents[Show]

In 99 % of your use-cases for a sequential container, you are totally fine with a std::array or a std::vector. What? If you don't believe me, read this post.

 rules 1752626 1280

Okay, I can make it short today. Here is a rule of thumb: If you want to add elements to your container or remove elements from your container, use a std::vector; if not, use a std::array.

If you are busy, you can stop to read, if not, continue.

The Details

Here is the reason for the rule of thumb from the guideline: SL.con.2: Prefer using STL vector by default unless you have a reason to use a different container

std::array and std::vector offer the following advantages:

  1. the fastest general-purpose access (random access, including being vectorization-friendly);
  2. the fastest default access pattern (begin-to-end or end-to-begin is prefetcher-friendly);
  3. the lowest space overhead (contiguous layout has zero per-element overhead, which is cache-friendly).

I already wrote in my last post C++ Core Guidelines: The Standard Library about the third point. The first point of random access via the index operator is apparent. So, if you don't like proof by authority, let me talk about the second point. To get the full picture, here are the sequential containers of the STL.

 sequentialOverview

You see, we have five sequential containers in the standard template library. Depending on your use-case, std::vector may be a 95% fit, because most of the times you have to add or remove elements to your std::vector. Let me add a few additional remarks to the table.

O(i) stands for the complexity (runtime) of an operation. So O(1) means that the runtime of an operation on a container is constant and is independent of the size of the container. Opposite to that, O(n) means, that the runtime depends linear on the number of the elements of the container. What does that mean for a std::vector or a std::array. The access time on an element is independent of the size of the std::vector or a std::array, but the insertion or deletion of an arbitrary element with k-times more elements is k-times slower. Of course, the modification is only possible for a std::vector.

std::array and std::vector provide similar access time guarantees, but there is one big difference between them, which many developers ignore. The std::array is typically created on the stack and the elements of a std::vector are created on the heap. This means, that a std::array can only have a limited number of elements but a std::vector an infinite number of elements.

Although the random access on the elements of a std::vector has the same complexity O(1) as the random access on the element of a std::deque, that doesn’t mean, that both operations are equally fast. I will come to this point later.

std::vector and std::deque support since C++11 the new method shrink_to_fit. The number of elements a std::vector or a std:.deque has (size) are usually smaller than the number of elements for which memory is already reserved (capacity). That is for a simple reason. The size of the std::vector or a std::deque can increase without an expensive allocation of new memory. The new method shrink_to_fit allows it to reduce the capacitiy of a std::vector a std::deque to its size. This call is not binding. That means the runtime can ignore it. But on popular platforms, I always observed the desired behaviour.

The complexity guarantee O(1) for the insertion or deletion into a double (std::list) or single linked list (std::forward_list) is only guaranteed if the iterator points to the right element. std::list and std::forward_list provide an exclusive guarantee, which may sometimes be necessary. When you modify a std::vector or a std::deque, the iterators become invalid. This will not hold for a std::list or a std::forward::list.

You must have an excellent reason to use the very special std::forward_list as your sequential container. std::forward_list is optimised for memory requirements and performance and is applicable if the insertion, extraction or movement of elements only affect adjacent elements. The reason for this special behaviour is quite obvious. As a single linked list, std::forward_list only supports a forward iterator and even don't know its size. This is the reason, why you can not use a std::forward_list ist many algorithms of the STL.

Memory Predictability

I said O(1) for the access time of an element in a std::vector and for an element in a std::deque does not mean the same. Here is my simple experiment, which I already provided in the post C++ Core Guidelines: The Remaining Rules to Performance. This is the reason I make my explanation quite short.

If you read an int from memory more than the size of one int is read from memory. An entire cache line is read from memory and stored in a cache. On modern architectures, a cache line has typically 64 bytes. If you now request an additional variable from memory and this variable is in the previous cache, the read directly uses this cache, and the operation is much faster.

Let's see what this means for a std::vector, a std::deque, std::list, and std::forward_list. I intentionally ignore in my performance test a std::array because of its limited size.

This was the theory of cache lines. Now I'm curious. Does it make a difference to read and accumulate all elements from std::vector, a std::deque, std::list, and std::forward_list. The small program should give an answer.

// memoryAcess.cpp

#include <forward_list>
#include <chrono>
#include <deque>
#include <iomanip>
#include <iostream>
#include <list>
#include <string>
#include <vector>
#include <numeric>
#include <random>

const int SIZE = 100'000'000; 

template <typename T>
void sumUp(T& t, const std::string& cont){            // (6)
  
  std::cout << std::fixed << std::setprecision(10);

  auto begin= std::chrono::steady_clock::now();
  std::size_t res = std::accumulate(t.begin(), t.end(), 0LL);
  std::chrono::duration<double> last=  std::chrono::steady_clock::now() - begin;
  std::cout << cont <<  std::endl;
  std::cout << "time: " << last.count() << std::endl;
  std::cout << "res: " << res << std::endl;
  std::cout << std::endl;
 
  std::cout << std::endl;
     
}

int main(){
    
    std::cout << std::endl;
    
    std::random_device seed;                            // (1)
    std::mt19937 engine(seed());
    std::uniform_int_distribution<int> dist(0, 100);
    std::vector<int> randNumbers;
    randNumbers.reserve(SIZE);
    for (int i=0; i < SIZE; ++i){
        randNumbers.push_back(dist(engine));
    }
    
    {
      std::vector<int> myVec(randNumbers.begin(), randNumbers.end());
      sumUp(myVec,"std::vector<int>");                 // (2)
    }

    
    {
      std::deque<int>myDec(randNumbers.begin(), randNumbers.end());
      sumUp(myDec,"std::deque<int>");                 // (3)
    }
    
    {
      std::list<int>myList(randNumbers.begin(), randNumbers.end());
      sumUp(myList,"std::list<int>");                 // (4)
    }
    
    {
      std::forward_list<int>myForwardList(randNumbers.begin(), randNumbers.end());
      sumUp(myForwardList,"std::forward_list<int>");  // (5)
    } 
    
}

 

The program memoryAccess.cpp creates the first 100 Million random numbers between 0 and 100 (1). Then it accumulate the elements using a std::vector (2), a std::deque (3), a std::list (4), and a std::forward_list (5). The actual work is done in the function sumUp (6). 

I compiled the program with maximum optimisation and executed it on Linux and Windows. I'm not interested in the comparison between Linux and Windows because that would be a comparison between a desktop PC and a Laptop. I'm interested in the read performance of the four containers. Here it is:memoryAccessWin

memoryAccessLinux

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

To make my performance comparison easy to digest, here is a graphic.

performaneSequential

I don't want to overestimate these performance numbers, but one key observation is obvious. The more cache line aware the container is, the faster is the access time of the elements: std::vector > std::deque > (std::list, std::forward_list).

What's next?

I think I should write a similar post to the associative containers in the standard template library. From my perspective, they are underrepresented in the C++ core guidelines.  My next post is about associative containers such as std::map and std::unordered_map.

 

Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, Marko, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Animus24, Jozo Leko, John Breland, espkk, Wolfgang Gärtner, Louis St-Amour, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Neil Wang, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Tobi Heideman, Daniel Hufschläger, Red Trip, Alexander Schwarz, Tornike Porchxidze, Alessandro Pezzato, and Evangelos Denaxas.

 

Thanks in particular to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, and Richard Sargeant.

My special thanks to Embarcadero CBUIDER STUDIO FINAL ICONS 1024 Small

 

Seminars

I'm happy to give online-seminars or face-to-face seminars world-wide. Please call me if you have any questions.

Bookable (Online)

German

Standard Seminars (English/German)

Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.

New

Contact Me

Modernes C++,

RainerGrimmSmall

My Newest E-Books

Course: Modern C++ Concurrency in Practice

Course: C++ Standard Library including C++14 & C++17

Course: Embedded Programming with Modern C++

Course: Generic Programming (Templates)

Course: C++ Fundamentals for Professionals

Subscribe to the newsletter (+ pdf bundle)

Blog archive

Source Code

Visitors

Today 3661

Yesterday 8041

Week 19285

Month 78914

All 6307386

Currently are 275 guests and no members online

Kubik-Rubik Joomla! Extensions

Latest comments