My Conclusion: Summation of a Vector in three Variants


After I've calculated in three different ways the sum of a std::vector I want to draw my conclusions.


The three strategies

At first, all numbers in an overview. First, the single threaded variant; second, the multiple threads with a shared summation variable; last, the multiple threads with minimal synchronization. I have to admit that I was astonished be the last variant.

Single threaded (1)


Multiple threads with a shared summation variable (2)


Multiple threads with minimal synchronization (3)


My observations

For simplicity reasons I will only reason about Linux. Thanks to Andreas Schäfer ( who gave me deeper insight. 

Single threaded

The range-based for-loop and the STL algorithm std::accumulate are in the same league. This observation holds for the maximal optimized and non-optimized program. It's very interesting that the maximal optimized versions is about 30 times faster than the non-optimized version. The compiler uses for the summation in case of the optimized version vectorized instruction (SSE or AVX). Therefore, the loop counter will be increased by 2 (SSE) or 4 (AVC).

Multiple threads with a shared summation variable

The synchronization on each access to the shared variable (2) shows on point: Synchronization is expensive. Although I break the sequential consistency with the relaxed semantic the program is about 40 times slower than the pendants (1) or (3). Not only out of performance reasons it must be our goal to minimize the synchronization of the shared variable.

Multiple threads with minimal synchronization

The summation with minimal synchronized threads (4 atomic operations or locks) (3) is hardly faster as the range-based for-loop or std::accumulate (1). That holds although in the multithreading variant where four threads can work independently on four cores. That surprised me because I was expecting a nearly fourfold improvement. But what surprised me even more, was that my four cores were not fully utilized.




The reason is simple. The cores can't get the data fast enough from the memory. Or to say it the other way around. The memory slows down the cores.

My conclusion

My conclusion from the performance measurements is to use for such a simple operation std::accumulate. That's for two reasons. First, the performance boost of variant (3) doesn't justify the expense; second, C++ will have in C++17 a parallel version of std::accumulate. Therefore, it's very easy to switch from the sequential to the parallel version. 

What's next?

The time library does not belong to the multithreading library but it's an important component of the multithreading capabilities of C++. For example you have to wait for an absolute time for a lock or put your thread for a relative time to sleep. So in the next post I write about time.




Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, Marko, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Animus24, Jozo Leko, John Breland, espkk, Louis St-Amour, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Neil Wang, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Tobi Heideman, Daniel Hufschläger, Red Trip, Alexander Schwarz, Tornike Porchxidze, Alessandro Pezzato, Evangelos Denaxas, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Dimitrov Tsvetomir, Leo Goodstadt, Eduardo Velasquez, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, and Michael Young.


Thanks in particular to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, and Rusty Fleming.



My special thanks to Embarcadero CBUIDER STUDIO FINAL ICONS 1024 Small



I'm happy to give online seminars or face-to-face seminars worldwide. Please call me if you have any questions.

Bookable (Online)


Standard Seminars (English/German)

Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.


Contact Me

Modernes C++,




0 #1 2016-11-17 06:50
This design is incredible! You certainly know
how to keep a reader entertained. Between your wit and your
videos, I was almost moved to start my own blog (well, almost...HaHa!) Fantastic job.
I really loved what you had to say, and more than that,
how you presented it. Too cool!
0 #2 inhouse pharmacy 2016-11-24 04:49
With thanks for sharing this neat site.

My Newest E-Books

Course: Modern C++ Concurrency in Practice

Course: C++ Standard Library including C++14 & C++17

Course: Embedded Programming with Modern C++

Course: Generic Programming (Templates)

Course: C++ Fundamentals for Professionals

Interactive Course: The All-in-One Guide to C++20

Subscribe to the newsletter (+ pdf bundle)

Blog archive

Source Code


Today 556

Yesterday 7029

Week 40880

Month 107546

All 7375386

Currently are 141 guests and no members online

Kubik-Rubik Joomla! Extensions

Latest comments