Performance Comparison of Condition Variables and Atomics in C++20

Contents[Show]

After the introduction to std::atomic_flag in my last post Synchronization with Atomics in C++20, I want to dive deeper. Today, I create a ping-pong game using condition variables, std::atomic_flag, and std::atomic<bool>. Let's play.

 TimelineCpp20CoreLanguage

The key question I want to answer in this post is the following: What is the fastest way to synchronize threads in C++20? I use in this post three different data types: std::condition_variable, std::atomic_flag, and std::atomic<bool>.

To get comparable numbers, I implement a ping-pong game. One thread executes a ping function and the other thread a pong function. For simplicity reasons, I call the thread executing the ping function the ping thread and the other thread the pong thread. The ping thread waits for the notification of the pong threads and sends the notification back to the pong thread. The game stops after 1,000,000 ball changes. I perform each game five times to get comparable performance numbers.

I made my performance test with the brand new Visual Studio compiler because it already supports synchronization with atomics. Additionally, I compiled the examples with maximum optimization (/Ox).

windowsCompiler

Let me start with the C++11.

Condition Variables

 

// pingPongConditionVariable.cpp

#include <condition_variable>
#include <iostream>
#include <atomic>
#include <thread>

bool dataReady{false};

std::mutex mutex_;
std::condition_variable condVar1;          // (1)
std::condition_variable condVar2;          // (2)

std::atomic<int> counter{};
constexpr int countlimit = 1'000'000;

void ping() {

    while(counter <= countlimit) {
        {
            std::unique_lock<std::mutex> lck(mutex_);
            condVar1.wait(lck, []{return dataReady == false;});
            dataReady = true;
        }
        ++counter;                          
        condVar2.notify_one();              // (3)
  }
}

void pong() {

    while(counter < countlimit) {  
        {
            std::unique_lock<std::mutex> lck(mutex_);
            condVar2.wait(lck, []{return dataReady == true;});
            dataReady = false;
        }
        condVar1.notify_one();            // (3)
  }

}

int main(){

    auto start = std::chrono::system_clock::now();  

    std::thread t1(ping);
    std::thread t2(pong);

    t1.join();
    t2.join();
  
    std::chrono::duration<double> dur = std::chrono::system_clock::now() - start;
    std::cout << "Duration: " << dur.count() << " seconds" << std::endl;

}

 

I use two condition variables in the program: condVar1 and condVar2 (line 1 and 2). The ping thread wait for the notification of condVar1 and sends its notification with condVar2. dataReady protects against spurious and lost wakeups (see "C++ Core Guidelines: Be Aware of the Traps of Condition Variables"). The ping-pong game ends when counter reaches the countlimit. The nofication_one calls (lines 3) and the counter are thread-safe and are, therefore, outside the critical region.

Here are the numbers:

pingPongConditionVariable

The average execution time is 0.52 seconds.

Porting this play to std::atomic_flags's in C++20 is straightforward.

std::atomic_flag

Here is the play using two atomic flags.

Two Atomic Flags

In the following program, I replace the waiting on the condition variable with the waiting on the atomic flag and the notification of the condition variable with the setting of the atomic flag followed by the notification.

// pingPongAtomicFlags.cpp

#include <iostream>
#include <atomic>
#include <thread>

std::atomic_flag condAtomicFlag1{};
std::atomic_flag condAtomicFlag2{};

std::atomic<int> counter{};
constexpr int countlimit = 1'000'000;

void ping() {
    while(counter <= countlimit) {
        condAtomicFlag1.wait(false);               // (1)
        condAtomicFlag1.clear();                   // (2)

        ++counter;
        
        condAtomicFlag2.test_and_set();           // (4)
        condAtomicFlag2.notify_one();             // (3)
    }
}

void pong() {
    while(counter < countlimit) {
        condAtomicFlag2.wait(false);
        condAtomicFlag2.clear();
        
        condAtomicFlag1.test_and_set();
        condAtomicFlag1.notify_one();
    }
}

int main() {

     auto start = std::chrono::system_clock::now();  

    condAtomicFlag1.test_and_set();                    // (5)
    std::thread t1(ping);
    std::thread t2(pong);

    t1.join();
    t2.join();

    std::chrono::duration<double> dur = std::chrono::system_clock::now() - start;
    std::cout << "Duration: " << dur.count() << " seconds" << std::endl;

}

 

A call condAtomicFlag1.wait(false) (1) blocks, if the value of the atomic flag is false. On the contrary, it returns if condAtomicFlag1 has the value true. The boolean value serves as a kind of predicate and must, therefore, set back to false (2). Before the notification (3) is sent to the pong thread, condAtomicFlag1 is set to true (4). The initial setting of condAtomicFlag1 to true (5) starts the game.

Thanks to std::atomic_flag the game ends earlier.

pingPongAtomicFlags

On average, a game takes 0.32 seconds.

When you analyze the program, you may recognize, that one atomics flag is sufficient for the play.

One Atomic Flag

Using one atomic flag makes the play easier to understand.

 

// pingPongAtomicFlag.cpp

#include <iostream>
#include <atomic>
#include <thread>

std::atomic_flag condAtomicFlag{};

std::atomic<int> counter{};
constexpr int countlimit = 1'000'000;

void ping() {
    while(counter <= countlimit) {
        condAtomicFlag.wait(true);
        condAtomicFlag.test_and_set();
        
        ++counter;
        
        condAtomicFlag.notify_one();
    }
}

void pong() {
    while(counter < countlimit) {
        condAtomicFlag.wait(false);
        condAtomicFlag.clear();
        condAtomicFlag.notify_one();
    }
}

int main() {

     auto start = std::chrono::system_clock::now();  

    
    condAtomicFlag.test_and_set();
    std::thread t1(ping);
    std::thread t2(pong);

    t1.join();
    t2.join();

    std::chrono::duration<double> dur = std::chrono::system_clock::now() - start;
    std::cout << "Duration: " << dur.count() << " seconds" << std::endl;

}

 

In this case, the ping thread blocks on true but the pong thread blocks on false. From the performance perspective, using one or two atomic flags makes no difference.

 pingPongAtomicFlag

The average execution time is 0.31 seconds.

 I used in this example std::atomic_flag such as an atomic boolean. Let's give it another try with std::atomic<bool>.

std::atomic<bool>

From the readability perspective, I prefer the following C++20 implementation based on std::atomic<bool>.

 

// pingPongAtomicBool.cpp

#include <iostream>
#include <atomic>
#include <thread>

std::atomic<bool> atomicBool{};

std::atomic<int> counter{};
constexpr int countlimit = 1'000'000;

void ping() {
    while(counter <= countlimit) {
        atomicBool.wait(true);
        atomicBool.store(true);

        ++counter;
        
        atomicBool.notify_one();
    }
}

void pong() {
    while(counter < countlimit) {
        atomicBool.wait(false);
        atomicBool.store(false);
        atomicBool.notify_one();
    }
}

int main() {

    std::cout << std::boolalpha << std::endl;

    std::cout << "atomicBool.is_lock_free(): "              // (1)
              << atomicBool.is_lock_free()  << std::endl; 

    std::cout << std::endl;

    auto start = std::chrono::system_clock::now();

    atomicBool.store(true);
    std::thread t1(ping);
    std::thread t2(pong);

    t1.join();
    t2.join();

    std::chrono::duration<double> dur = std::chrono::system_clock::now() - start;
    std::cout << "Duration: " << dur.count() << " seconds" << std::endl;

}

std::atomic<bool> can internally use a locking mechanism such as a mutex. As I assumed it, my Windows runtime is lock-free (1).

pingPongAtomicBool

On average, the execution time is 0.38 seconds.

All Numbers

As expected, condition variables are the slowest way, and atomic flag the fastest way to synchronize threads. The performance of a std::atomic<bool> is in-between. But there is one downside with std:.atomic<bool>. std::atomic_flag is the only atomic data type which is lock-free.

PerformanceComparison

What's next?

With C++20, we have a few new mechanisms for thread coordination. In my next post, I will take a deeper view into latches, barriers, and semaphores. They also allow it to play Ping-Pong.

 

Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, Marko, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Darshan Mody, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Animus24, Jozo Leko, John Breland, espkk, Wolfgang Gärtner,  Louis St-Amour, Stephan Roslen, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Avi Kohn, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Neil Wang, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, and Peter Ware.

 

Thanks in particular to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, and Sudhakar Belagurusamy. 

 

Seminars

I'm happy to give online-seminars or face-to-face seminars world-wide. Please call me if you have any questions.

Bookable (Online)

Deutsch

English

Standard Seminars 

Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.

New

Contact Me

Modernes C++,

RainerGrimmSmall

 

 

 

 

Comments   

0 #1 Jonathan OConnor 2021-01-11 16:27
Great article!

I know the condition_variable solution, at least on Linux, does not use a busy wait loop. But do the other solutions avoid busy waits as well? This is critical in the code I write.
Quote
0 #2 Rainer Grimm 2021-01-11 16:37
Quoting Jonathan OConnor:
Great article!

I know the condition_variable solution, at least on Linux, does not use a busy wait loop. But do the other solutions avoid busy waits as well? This is critical in the code I write.

I never thought about a busy wait. I assume that a synchronization mechanism first makes a short spinning wait (busy wait) and then goes into the kernel and falls asleep. You can easily observe a busy wait. Let the condition variable wait and look at your CPU. One of the cores must go to 100 %.
Quote
0 #3 Jonathan OConnor 2021-01-11 18:04
Maybe this is a quality of implementation issue. You're normally a Windows person? I'll try running your samples on my machine later.
Quote
0 #4 Rainer Grimm 2021-01-11 21:12
Quoting Jonathan OConnor:
Maybe this is a quality of implementation issue. You're normally a Windows person? I'll try running your samples on my machine later.

No, I'm a Linux person. So far, only windows supports the C++20 atomics. This includes also latches, barriers, and semaphores.
Quote
0 #5 Marius 2021-01-11 21:54
Condition variable "notify" method calls need to be protected with associated mutex, otherwise there is a risk of lost updates.
Quote
0 #6 B. Grebenar 2021-01-14 21:09
Great article. I feel like I'm not alone, while reading this.
Thanks!!!
Quote
0 #7 NateK 2021-01-17 23:57
Last year you also had nice blog about Thread Synchronization with c++20 Coroutines... I'm wondering what comparable code would look like, as well as performance comparison. Thanks!
Quote

My Newest E-Books

Course: Modern C++ Concurrency in Practice

Course: C++ Standard Library including C++14 & C++17

Course: Embedded Programming with Modern C++

Course: Generic Programming (Templates)

Course: C++ Fundamentals for Professionals

Subscribe to the newsletter (+ pdf bundle)

Blog archive

Source Code

Visitors

Today 668

Yesterday 6382

Week 668

Month 167502

All 5464606

Currently are 130 guests and no members online

Kubik-Rubik Joomla! Extensions

Latest comments