A release operation synchronizes-with an acquire operation on the same atomic variable. So we can easily synchronise threads, if ... . Today's post is about the if.
What's my motivation for writing a post about the typical misunderstanding of the acquire-release semantic? Sure, I and many of my listeners and trainees have already fallen into the trap. But at first the straightforward case.
Waiting included
I use this simple program as a starting point.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
// acquireReleaseWithWaiting.cpp
#include <atomic>
#include <iostream>
#include <thread>
#include <vector>
std::vector<int> mySharedWork;
std::atomic<bool> dataProduced(false);
void dataProducer(){
mySharedWork={1,0,3};
dataProduced.store(true, std::memory_order_release);
}
void dataConsumer(){
while( !dataProduced.load(std::memory_order_acquire) );
mySharedWork[1]= 2;
}
int main(){
std::cout << std::endl;
std::thread t1(dataConsumer);
std::thread t2(dataProducer);
t1.join();
t2.join();
for (auto v: mySharedWork){
std::cout << v << " ";
}
std::cout << "\n\n";
}
|
The consumer thread t1 in line 17 waits until the consumer thread t2 in line 13 has set dataProduced to true.dataPruduced is the guard because it guarantees access to the non-atomic variable mySharedWork is synchronized. That means, at first, the producer thread t2 initializes mySharedWork, then the consumer thread t2 finishes the work by setting mySharedWork[1] to 2. So the program is well-defined.

The graphic shows the happens-before relation within the threads and the synchronized-with relation between the threads. synchronize-with establishes a happens-before relation. The rest of the reasoning is the transitivity of the happens-before relation. mySharedWork={1,0,3} happens-before mySharedWork[1]= 2.

But what aspect is often missing in this reasoning. The if.
If, ...
What is happing if the consumer thread t2 in line 17 is not waiting for the producer thread?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
// acquireReleaseWithoutWaiting.cpp
#include <atomic>
#include <iostream>
#include <thread>
#include <vector>
std::vector<int> mySharedWork;
std::atomic<bool> dataProduced(false);
void dataProducer(){
mySharedWork={1,0,3};
dataProduced.store(true, std::memory_order_release);
}
void dataConsumer(){
dataProduced.load(std::memory_order_acquire);
mySharedWork[1]= 2;
}
int main(){
std::cout << std::endl;
std::thread t1(dataConsumer);
std::thread t2(dataProducer);
t1.join();
t2.join();
for (auto v: mySharedWork){
std::cout << v << " ";
}
std::cout << "\n\n";
}
|
The program has undefined behavior because there is a data race on the variable mySharedWork. In case I let the program run, the undefined behavior gets immediately visible. That holds for Linux and Windows.


What's the issue? It holds: store(true, std::memory_order_release) synchronizes-with dataProduced.load(std::memory_order_acquire). Yes, of course, but that doesn't mean the acquire operation is waiting for the release operation. Exactly that is displayed in the graphic. In the graphic the dataProduced.load(std::memory_order_acquire) instruction is performed before the instruction dataProduced.store(true, std::memory_order_release). So we have no synchronize-with relation.

The solution
synchronize-with means in this specific case: If dataProduced.store(true, std::memory_order_release) happens before dataProduced.load(std::memory_order_acquire), then all visible effects of operations before dataProduced.store(true, std::memory_order_release) are visible after dataProduced.load(std::memory_order_acquire). The key is the word if. Exactly that if will be guaranteed in the first program with (while(!dataProduced.load(std::memory_order_acquire)).
Once again, but formal.
- All operations before dataProduced.store(true, std::memory_order_release)happens-before all operations after dataProduced.load(std::memory_order_acquire), if holds: dataProduced.store(true, std::memory_order_release) happens-before dataProduced.load(std::memory_order_acquire).
What's next?
Acquire-release semantic with operations on atomic variables. Does this work? Yeah, with fences. Have a look at the next post.
Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Animus24, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Matthieu Bolt, Stephen Kelley, Kyle Dean, Tusar Palauri, Dmitry Farberov, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, Dominik Vošček, and Rob North.
Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, and Slavko Radman.
My special thanks to Embarcadero 
My special thanks to PVS-Studio 
My special thanks to Tipi.build 
Seminars
I'm happy to give online seminars or face-to-face seminars worldwide. Please call me if you have any questions.
Bookable (Online)
German
Standard Seminars (English/German)
Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.
- C++ - The Core Language
- C++ - The Standard Library
- C++ - Compact
- C++11 and C++14
- Concurrency with Modern C++
- Design Pattern and Architectural Pattern with C++
- Embedded Programming with Modern C++
- Generic Programming (Templates) with C++
New
- Clean Code with Modern C++
- C++20
Contact Me
- Phone: +49 7472 917441
- Mobil:: +49 176 5506 5086
- Mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
- German Seminar Page: www.ModernesCpp.de
- Mentoring Page: www.ModernesCpp.org
Modernes C++,

Comments
can it just use relaxed?
For example, assume mySharedWork[idx] doesn't be accessed concurrently:
void dataProducer(int idx, int value){
assert(idx > 0);
mySharedWork[idx] = value
dataProduced.store(idx, std::memory_order_relaxed);
}
void dataConsumer(){
int idx;
while( (idx = dataProduced.load(std::memory_order_relaxed)) != 0);
int value = mySharedWork[idx];
// do some thing.
}
For example, mySharedWork[idx] can be moved after the dataProduced.store or int value = mySharedWork[idx] can be moved before dataProduced.load. Now, you have a concurrent read and write on the non-atomic mySharedWork which is a data race.
RSS feed for comments to this post