C++ Core Guidelines: std::array and std::vector are your Friends

In 99 % of your use cases for a sequential container, you are outstanding with a std::array or a std::vector. What? If you don’t believe me, read this post.

Okay, I can make it short today. Here is a rule of thumb: If you want to add elements to your container or remove elements from your container, use a std::vector; if not, use a std::array.

If you are busy, you can stop to read; if not, continue.

The Details

Here is the reason for the rule of thumb from the guideline: SL.con.2: Prefer using STL vector by default, unless you have a reason to use a different container

std::array and std::vector offer the following advantages:

the fastest general-purpose access (random access, including being vectorization-friendly);
the fastest default access pattern (begin-to-end or end-to-begin is prefetcher-friendly);
the lowest space overhead (contiguous layout has zero per-element overhead, which is cache-friendly).

I already wrote in my last post C++ Core Guidelines: The Standard Library, about the third point. The first point of random access via the index operator is apparent. So, if you don’t like proof by authority, let me talk about the second point. To get the complete picture, here are the sequential containers of the STL.

You see, we have five sequential containers in the standard template library. Depending on your use case, std::vector may be a 95% fit because you must often add or remove elements to your std::vector. Let me add a few additional remarks to the table.

O(i) stands for an operation’s complexity (runtime). So O(1) means that the runtime of an operation on a container is constant and independent of the container’s size. On opposite to that, O(n) means that the runtime depends linearly on the number of elements of the container. What does that mean for a std::vector or a std::array. The access time on an element is independent of the size of the std::vector or a std::array, but the insertion or deletion of an arbitrary element with k-times more elements is k-times slower. Of course, the modification is only possible for a std::vector.

std::array and std::vector provide similar access time guarantees, but there is one big difference between them, which many developers ignore. The std::array is typically created on the stack, and the elements of a std::vector are created on the heap. This means that a std::array can only have a limited number of elements, but a std::vector has infinite elements.

Although the random access on the elements of a std::vector has the same complexity O(1) as the random access on the element of a std::deque, that doesn’t mean that both operations are equally fast. I will come to this point later.

std::vector and std::deque support since C++11 the new method shrink_to_fit. The number of elements a std::vector or a std:.deque has (size) is usually smaller than the number of elements for which memory is already reserved (capacity). That is for a simple reason. The size of the std::vector or a std::deque can increase without an expensive allocation of new memory. The new method shrink_to_fit allows it to reduce the capacity of a std::vector a std::deque to its size. This call is not binding. That means the runtime can ignore it. But on popular platforms, I always observed the desired behavior.

The complexity guarantee O(1) for the insertion or deletion into a double (std::list) or singly linked list (std::forward_list) is only guaranteed if the iterator points to the correct element. std::list and std::forward_list provide an exclusive guarantee, which may sometimes be necessary. When you modify a std::vector or a std::deque, the iterators become invalid. This will not hold for a std::list or a std::forward::list.

Modernes C++ Mentoring

"Fundamentals for C++ Professionals" (open)

"Design Patterns and Architectural Patterns with C++" (open)

"C++20: Get the Details" (open)

"Concurrency with Modern C++" (open)

"Embedded Programming with Modern C++": (open)

"Generic Programming (Templates) with C++": (open)

"Clean Code: Best Practices for Modern C++": July 2025

Do you want to stay informed: Subscribe.

You must have an excellent reason to use the very special std::forward_list as your sequential container. std::forward_list is optimized for memory requirements and performance and is applicable if the insertion, extraction, or movement of elements only affects adjacent elements. The reason for this special behavior is quite apparent. As a singly linked list, std::forward_list only supports a forward iterator and doesn’t know its size. This is the reason why you can not use a std::forward_list ist many algorithms of the STL.

Memory Predictability

I said O(1) for the access time of an element in a std::vector, and an element in a std::deque does not mean the same. Here is my simple experiment, which I already provided in the post C++ Core Guidelines: The Remaining Rules to Performance. This is the reason I make my explanation quite short.

If you read an int from memory, more than the size of one int is read from memory. An entire cache line is read from memory and stored in a cache. On modern architectures, a cache line typically has 64 bytes. If you now request an additional variable from memory and this variable is in the previous cache, the read directly uses this cache, and the operation is much faster.

Let’s see what this means for a std::vector, a std::deque, std::list, and std::forward_list. I intentionally ignore in my performance test a std::array because of its limited size.

This was the theory of cache lines. Now I’m curious. Does it make a difference to read and accumulate all elements from std::vector, a std::deque, std::list, and std::forward_list. The small program should answer.

// memoryAcess.cpp

#include <forward_list>
#include <chrono>
#include <deque>
#include <iomanip>
#include <iostream>
#include <list>
#include <string>
#include <vector>
#include <numeric>
#include <random>

const int SIZE = 100'000'000; 

template <typename T>
void sumUp(T& t, const std::string& cont){            // (6)
  
  std::cout << std::fixed << std::setprecision(10);

  auto begin= std::chrono::steady_clock::now();
  std::size_t res = std::accumulate(t.begin(), t.end(), 0LL);
  std::chrono::duration<double> last=  std::chrono::steady_clock::now() - begin;
  std::cout << cont <<  std::endl;
  std::cout << "time: " << last.count() << std::endl;
  std::cout << "res: " << res << std::endl;
  std::cout << std::endl;
 
  std::cout << std::endl;
     
}

int main(){
    
    std::cout << std::endl;
    
    std::random_device seed;                            // (1)
    std::mt19937 engine(seed());
    std::uniform_int_distribution<int> dist(0, 100);
    std::vector<int> randNumbers;
    randNumbers.reserve(SIZE);
    for (int i=0; i < SIZE; ++i){
        randNumbers.push_back(dist(engine));
    }
    
    {
      std::vector<int> myVec(randNumbers.begin(), randNumbers.end());
      sumUp(myVec,"std::vector<int>");                 // (2)
    }

    
    {
      std::deque<int>myDec(randNumbers.begin(), randNumbers.end());
      sumUp(myDec,"std::deque<int>");                 // (3)
    }
    
    {
      std::list<int>myList(randNumbers.begin(), randNumbers.end());
      sumUp(myList,"std::list<int>");                 // (4)
    }
    
    {
      std::forward_list<int>myForwardList(randNumbers.begin(), randNumbers.end());
      sumUp(myForwardList,"std::forward_list<int>");  // (5)
    } 
    
}

The program memoryAccess.cpp creates the first 100 Million random numbers between 0 and 100 (1). Then it accumulates the elements using a std::vector (2), a std::deque (3), a std::list (4), and a std::forward_list (5). The actual work is done in the function sumUp (6).

I compiled the program with maximum optimization and executed it on Linux and Windows. I’m not interested in the comparison between Linux and Windows because that would be a comparison between a desktop PC and a Laptop. I’m interested in the read performance of the four containers. Here it is:

To make my performance comparison easy to digest, here is a graphic.

I don’t want to overestimate these performance numbers, but one key observation is obvious. The more cache line aware the container is, the faster the access time of the elements: std::vector > std::deque > (std::list, std::forward_list).

What’s next?

I should write a similar post to the associative containers in the standard template library. From my perspective, they are underrepresented in the C++ core guidelines. My next post is about associative containers such as std::map and std::unordered_map.

Post Views: 14,437

Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Stephen Kelley, Kyle Dean, Tusar Palauri, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, Rob North, Bhavith C Achar, Marco Parri Empoli, Philipp Lenk, Charles-Jianye Chen, Keith Jeffery, Matt Godbolt, Honey Sukesan, bruce_lee_wayne, Silviu Ardelean, schnapper79, Seeker, and Sundareswaran Senthilvel.

Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, Slavko Radman, and David Poole.

My special thanks to Embarcadero
My special thanks to PVS-Studio
My special thanks to Tipi.build
My special thanks to Take Up Code
My special thanks to SHAVEDYAKS