Performance Comparison of Condition Variables and Atomics in C++20
After the introduction to std::atomic_flag
in my last post, Synchronization with Atomics in C++20, I want to dive deeper. Today, I created a ping-pong game using condition variables, std::atomic_flag
and std::atomic<bool>
. Let’s play.
The key question I want to answer in this post is: What is the fastest way to synchronize threads in C++20? I use in this post three different data types: std::condition_variable
, std::atomic_flag
, and std::atomic<bool>
.
To get comparable numbers, I implement a ping-pong game. One thread executes a ping
function, and the other thread a pong
function. For simplicity reasons, I call the thread executing the ping
function the ping thread and the other thread the pong thread. The ping thread waits for the notification of the pong threads and returns the notification to the pong thread. The game stops after 1,000,000 ball changes. I perform each game five times to get comparable performance numbers.
I made my performance test with the brand new Visual Studio compiler because it already supports synchronization with atomics. Additionally, I compiled the examples with maximum optimization (/Ox
).
Let me start with C++11.
Condition Variables
Modernes C++ Mentoring
Do you want to stay informed: Subscribe.
// pingPongConditionVariable.cpp #include <condition_variable> #include <iostream> #include <atomic> #include <thread> bool dataReady{false}; std::mutex mutex_; std::condition_variable condVar1; // (1) std::condition_variable condVar2; // (2) std::atomic<int> counter{}; constexpr int countlimit = 1'000'000; void ping() { while(counter <= countlimit) { { std::unique_lock<std::mutex> lck(mutex_); condVar1.wait(lck, []{return dataReady == false;}); dataReady = true; } ++counter; condVar2.notify_one(); // (3) } } void pong() { while(counter < countlimit) { { std::unique_lock<std::mutex> lck(mutex_); condVar2.wait(lck, []{return dataReady == true;}); dataReady = false; } condVar1.notify_one(); // (3) } } int main(){ auto start = std::chrono::system_clock::now(); std::thread t1(ping); std::thread t2(pong); t1.join(); t2.join(); std::chrono::duration<double> dur = std::chrono::system_clock::now() - start; std::cout << "Duration: " << dur.count() << " seconds" << std::endl; }
I use two condition variables in the program: condVar1
and condVar2
(lines 1 and 2). The ping thread waits for the notification of condVar1
and sends its notification with condVar2
. dataReady
protects against spurious and lost wakeups (see “C++ Core Guidelines: Be Aware of the Traps of Condition Variables“). The ping-pong game ends when counter
reaches the countlimit
. The nofication_one
calls (lines 3) and the counter are thread-safe and are, therefore, outside the critical region.
Here are the numbers:
The average execution time is 0.52 seconds.
Porting this play to std::atomic_flags
‘s in C++20 is straightforward.
std::atomic_flag
Here is the play using two atomic flags.
Two Atomic Flags
In the following program, I replace the waiting on the condition variable with the waiting on the atomic flag and the notification of the condition variable with the setting of the atomic flag followed by the notification.
// pingPongAtomicFlags.cpp #include <iostream> #include <atomic> #include <thread> std::atomic_flag condAtomicFlag1{}; std::atomic_flag condAtomicFlag2{}; std::atomic<int> counter{}; constexpr int countlimit = 1'000'000; void ping() { while(counter <= countlimit) { condAtomicFlag1.wait(false); // (1) condAtomicFlag1.clear(); // (2) ++counter; condAtomicFlag2.test_and_set(); // (4) condAtomicFlag2.notify_one(); // (3) } } void pong() { while(counter < countlimit) { condAtomicFlag2.wait(false); condAtomicFlag2.clear(); condAtomicFlag1.test_and_set(); condAtomicFlag1.notify_one(); } } int main() { auto start = std::chrono::system_clock::now(); condAtomicFlag1.test_and_set(); // (5) std::thread t1(ping); std::thread t2(pong); t1.join(); t2.join(); std::chrono::duration<double> dur = std::chrono::system_clock::now() - start; std::cout << "Duration: " << dur.count() << " seconds" << std::endl; }
A call condAtomicFlag1.wait(false)
(1) blocks, if the value of the atomic flag is false
. On the contrary, it returns if condAtomicFlag1
has the value true
. The boolean value serves as a kind of predicate and must, therefore, be set back to false
(2). Before the notification (3) is sent to the pong thread, condAtomicFlag1
is set to true
(4). The initial setting of condAtomicFlag1
to true
(5) starts the game.
Thanks to std::atomic_flag
the game end earlier.
On average, a game takes 0.32 seconds.
When you analyze the program, you may recognize that one atomics flag is sufficient for the play.
One Atomic Flag
Using one atomic flag makes the play easier to understand.
// pingPongAtomicFlag.cpp #include <iostream> #include <atomic> #include <thread> std::atomic_flag condAtomicFlag{}; std::atomic<int> counter{}; constexpr int countlimit = 1'000'000; void ping() { while(counter <= countlimit) { condAtomicFlag.wait(true); condAtomicFlag.test_and_set(); ++counter; condAtomicFlag.notify_one(); } } void pong() { while(counter < countlimit) { condAtomicFlag.wait(false); condAtomicFlag.clear(); condAtomicFlag.notify_one(); } } int main() { auto start = std::chrono::system_clock::now(); condAtomicFlag.test_and_set(); std::thread t1(ping); std::thread t2(pong); t1.join(); t2.join(); std::chrono::duration<double> dur = std::chrono::system_clock::now() - start; std::cout << "Duration: " << dur.count() << " seconds" << std::endl; }
In this case, the ping thread blocks on true
but the pong thread blocks on false
. Using one or two atomic flags make no difference from the performance perspective.
The average execution time is 0.31 seconds.
I used in this example std::atomic_flag
such as an atomic boolean. Let’s give it another try with std::atomic<bool>
.
std::atomic<bool>
From the readability perspective, I prefer the following C++20 implementation based on std::atomic<bool>.
// pingPongAtomicBool.cpp #include <iostream> #include <atomic> #include <thread> std::atomic<bool> atomicBool{}; std::atomic<int> counter{}; constexpr int countlimit = 1'000'000; void ping() { while(counter <= countlimit) { atomicBool.wait(true); atomicBool.store(true); ++counter; atomicBool.notify_one(); } } void pong() { while(counter < countlimit) { atomicBool.wait(false); atomicBool.store(false); atomicBool.notify_one(); } } int main() { std::cout << std::boolalpha << std::endl; std::cout << "atomicBool.is_lock_free(): " // (1) << atomicBool.is_lock_free() << std::endl; std::cout << std::endl; auto start = std::chrono::system_clock::now(); atomicBool.store(true); std::thread t1(ping); std::thread t2(pong); t1.join(); t2.join(); std::chrono::duration<double> dur = std::chrono::system_clock::now() - start; std::cout << "Duration: " << dur.count() << " seconds" << std::endl; }
std::atomic<bool>
can internally use a locking mechanism such as a mutex. As I assumed, my Windows runtime is lock-free (1).
On average, the execution time is 0.38 seconds.
All Numbers
As expected, condition variables are the slowest way, and atomic flag is the fastest way to synchronize threads. The performance of a std::atomic<bool>
is in-between.
What’s next?
With C++20, we have a few new mechanisms for thread coordination. In my next post, I will look deeper into latches, barriers, and semaphores. They also allow it to play Ping-Pong.
Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Stephen Kelley, Kyle Dean, Tusar Palauri, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, Rob North, Bhavith C Achar, Marco Parri Empoli, Philipp Lenk, Charles-Jianye Chen, Keith Jeffery, Matt Godbolt, and Honey Sukesan.
Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, Slavko Radman, and David Poole.
My special thanks to Embarcadero | |
My special thanks to PVS-Studio | |
My special thanks to Tipi.build | |
My special thanks to Take Up Code | |
My special thanks to SHAVEDYAKS |
Modernes C++ GmbH
Modernes C++ Mentoring (English)
Rainer Grimm
Yalovastraße 20
72108 Rottenburg
Mail: schulung@ModernesCpp.de
Mentoring: www.ModernesCpp.org
Modernes C++ Mentoring,
I run your code and find the program may block forever.
Take the code in pingPongAtomicBool.cpp for instance, and let countlimit = 1 wo make it easier.
The ping thread must iterate twice before exiting, however, the pond thread may iterate only once if the following running sequence happens:
pong: checks while condition, gets true
pong: atomicBool.wait(false);
pong: atomicBool.store(false);
pong: atomicBool.notify_one();
ping: checks while condition, gets true
ping: atomicBool.wait(true);
ping: atomicBool.store(true);
ping: ++counter
ping: atomicBool.notify_one();
pong: loads counter and gets 1, exits the loop, no more stores false to atomicBool
ping: atomicBool.wait(true) never returns
This cannot happen, because the ping thread starts first. Initially, atomicBool is false.
It’s indeed happening. Try to run this code above several times.
But anyway std::atomic::wait is faster than std::condition_variable so much.
I am pretty sure that there is no guarantee that thread1 starts first, it could also bee that thread2 starts first. If you want thread1 to start first, then you need to synchronise them.
Hi,
I run the code both on Windows MSVC and Linux GCC, with a huge performance variance.
Condition variable is much slower on Linux (9s) than that on Windows.(0.55s)
I cannot figure out what is going on here.
Many thanks