relaxed memory ordering for thread stop flags - c++11

The author in the video on atomic ops has the following snippet. While the load to the stop flag is not relaxed, the store cannot be relaxed. My question is if the reason for store not to be relaxed has to do with it potentially being visible after the join of threads / if there is some other reason?
Worker threads:
while (!stop.load(std::memory_order_relaxed)) {
// do something (that's independent of stop flag)
}
Main thread:
int main() {
launch_workers();
stop = true; // <-- not relaxed
join_threads();
// do something
}

Related

Why supporting atomic.Load and atomic.Store in Go?

I think atomic.Load(addr) should equal *addr and atomic.Store(addr, newval) should equal *addr = newval. So why doing so(using *addr or *addr = newval) is not a atomic operation? I mean they will eventually be interpreted to be just one cpu instruction(which is atomic)?
Because of ordering guarantees, and memory operation visibility. For instance:
y:=0
x:=0
x=1
y=1
In the above program, another goroutine can see (0,0), (0,1), (1,0), or (1,1) for x and y. This is because of compiler reordering the code, compiler optimization,s or because of memory operation reordering at the hardware level. However:
y:=0
x:=0
x:=1
atomic.StoreInt64(&y,1)
If another goroutine sees atomic.LoadInt64(&y)==1, then the goroutine is guaranteed to see x=1.
Another example is the busy-waiting. The following example is from the go memory model:
var a string
var done bool
func setup() {
a = "hello, world"
done = true
}
func main() {
go setup()
for !done {
}
print(a)
}
This program is not guaranteed to terminate, because the for-loop in main is not guaranteed to see the done=true assignment. The program may run indefinitely, may print empty string, or it may print "hello, world".
Replacing done=true with an atomic store, and the check in the for-loop with an atomic load guarantees that the program always finishes and prints "hello, world".
The authoritative document about these is the go memory model:
https://go.dev/ref/mem

Since len() for slice and map are O(1), why len() is not thread-safe?

I read some post(Are len(string) and len(slice)s O(1) operation in Go?, How to check if a map is empty in Golang?) and found that len() for slice and map are O(1), the implementation is get a member(e.g. size, length) of the structure.
But the test(Is len() thread safe in golang?) show len() is not thread-safe, why?
What you're referring to is called a benign data race. The map size is just an int, so one could assume that reading and writing a single int is atomic, right? Well, no.
The Go memory model is derived from the C memory model, in which accessing the same memory from two or more threads, at least one of which is writing, without synchronization is a data race - a form of undefined behavior. It does not matter if you're accessing a single int or a more complex structure.
There are two reasons why a benign data race does not actually exist:
1. Hardware. There are two types of CPU cache architectures: one with strong cache coherency (e.g. x86) and another with weak cache coherency (e.g. ARM). Regular writes to memory may not become "visible" on other cores immediately, depending on the hardware. Special "atomic" instructions are required to make data visible between cores.
2. Software. According to the memory model, each thread is assumed to execute in isolation (until a synchronization event with happens-before semantics occurs). The compiler is allowed to assume that reading the same memory location will provide the same result, and for example, hoist these reads of the loop (thus breaking your program). This is why synchronization must be explicit in the code, even when targeting hardware with strong cache coherency.
The following program may or may not ever finish, depending on the CPU and compiler optimization flags:
func main() {
x := 0
go func() {
x = 1
}()
for x == 0 { // DATA RACE! also, the compiler is allowed to simplify "x == 0" to "true" here
// wait ...
}
}
To make it correct, use synchronization to let the compiler know there is concurrency involved:
func main() {
var mtx sync.Mutex
x := 0
go func() {
mtx.Lock()
x = 1 // protect writes by a mutex
mtx.Unlock()
}()
for {
mtx.Lock()
x_copy := x // yes, also reads must be protected
mtx.Unlock()
if x_copy != 0 {
break
}
// wait ...
}
}
The locking and unlocking of the same mutex creates an acquire-release fence such that all memory writes done before unlocking the mutex are "released" and become visible to the thread that subsequently locks the mutex and "acquires" them.

What happens when reading or writing concurrently without a mutex

In Go, a sync.Mutex or chan is used to prevent concurrent access of shared objects. However, in some cases I am just interested in the "latest" value of a variable or field of an object.
Or I like to write a value and do not care if another go-routine overwrites it later or has just overwritten it before.
Update: TLDR; Just don't do this. It is not safe. Read the answers, comments, and linked documents!
Update 2021: The Go memory model is going to be specified more thoroughly and there are three great articles by Russ Cox that will teach you more about the surprising effects of unsynchronized memory access. These articles summarize a lot of the below discussions and learnings.
Here are two variants good and bad of an example program, where both seem to produce "correct" output using the current Go runtime:
package main
import (
"flag"
"fmt"
"math/rand"
"time"
)
var bogus = flag.Bool("bogus", false, "use bogus code")
func pause() {
time.Sleep(time.Duration(rand.Uint32()%100) * time.Millisecond)
}
func bad() {
stop := time.After(100 * time.Millisecond)
var name string
// start some producers doing concurrent writes (DANGER!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
name = fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
for {
select {
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func good() {
stop := time.After(100 * time.Millisecond)
names := make(chan string, 10)
// start some producers concurrently writing to a channel (GOOD!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
names <- fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
var name string
for {
select {
case name = <-names:
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func main() {
flag.Parse()
if *bogus {
bad()
} else {
good()
}
}
The expected output is as follows:
...
read: name = 3
read: name = 3
read: name = 5
read: name = 4
...
Any combination of read: and read: name=[0-9] is correct output for this program. Receiving any other string as output would be an error.
When running this program with go run --race bogus.go it is safe.
However, go run --race bogus.go -bogus warns of the concurrent reads and writes.
For map types and when appending to slices I always need a mutex or a similar method of protection to avoid segfaults or unexpected behavior. However, reading and writing literals (atomic values) to variables or field values seems to be safe.
Question: Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
Please explain why something is safe or unsafe in Go in your answer.
Update: I rewrote the example to better reflect the original code, where I had the the concurrent writes issue. The important leanings are already in the comments. I will accept an answer that summarizes these learnings with enough detail (esp. on the Go-runtime).
However, in some cases I am just interested in the latest value of a variable or field of an object.
Here is the fundamental problem: What does the word "latest" mean?
Suppoose that, mathematically speaking, we have a sequence of values Xi, with 0 <= i < N. Then obviously Xj is "later than" Xi if j > i. That's a nice simple definition of "latest" and is probably the one you want.
But when two separate CPUs within a single machine—including two goroutines in a Go program—are working at the same time, time itself loses meaning. We cannot say whether i < j, i == j, or i > j. So there is no correct definition for the word latest.
To solve this kind of problem, modern CPU hardware, and Go as a programming language, gives us certain synchronization primitives. If CPUs A and B execute memory fence instructions, or synchronization instructions, or use whatever other hardware provisions exist, the CPUs (and/or some external hardware) will insert whatever is required for the notion of "time" to regain its meaning. That is, if the CPU uses barrier instructions, we can say that a memory load or store that was executed before the barrier is a "before" and a memory load or store that is executed after the barrier is an "after".
(The actual implementation, in some modern hardware, consists of load and store buffers that can rearrange the order in which loads and stores go to memory. The barrier instruction either synchronizes the buffers, or places an actual barrier in them, so that loads and stores cannot move across the barrier. This particular concrete implementation gives an easy way to think about the problem, but isn't complete: you should think of time as simply not existing outside the hardware-provided synchronization, i.e., all loads from, and stores to, some location are happening simultaneously, rather than in some sequential order, except for these barriers.)
In any case, Go's sync package gives you a simple high level access method to these kinds of barriers. Compiled code that executes before a mutex Lock call really does complete before the lock function returns, and the code that executes after the call really does not start until after the lock function returns.
Go's channels provide the same kinds of before/after time guarantees.
Go's sync/atomic package provides much lower level guarantees. In general you should avoid this in favor of the higher level channel or sync.Mutex style guarantees. (Edit to add note: You could use sync/atomic's Pointer operations here, but not with the string type directly, as Go strings are actually implemented as a header containing two separate values: a pointer, and a length. You could solve this with another layer of indirection, by updating a pointer that points to the string object. But before you even consider doing that, you should benchmark the use of the language's preferred methods and verify that these are a problem, because code that works at the sync/atomic level is hard to write and hard to debug.)
Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
None.
It really is that simple: You cannot, under no circumstance whatsoever, read and write concurrently to anything in Go.
(Btw: Your "correct" program is not correct, it is racy and even if you get rid of the race condition it would not deterministically produce the output.)
Why can't you use channels
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup // wait group to close channel
var buffer int = 1 // buffer of the channel
// channel to get the share data
cName := make(chan string, buffer)
for i := 0; i < 10; i++ {
wg.Add(1) // add to wait group
go func(i int) {
cName <- fmt.Sprintf("name = %d", i)
wg.Done() // decrease wait group.
}(i)
}
go func() {
wg.Wait() // wait of wait group to be 0
close(cName) // close the channel
}()
// process all the data
for n := range cName {
println("read:", n)
}
}
The above code returns the following output
read: name = 0
read: name = 5
read: name = 1
read: name = 2
read: name = 3
read: name = 4
read: name = 7
read: name = 6
read: name = 8
read: name = 9
https://play.golang.org/p/R4n9ssPMOeS
Article about channels

Is there a race condition in the golang implementation of mutex the m.state is read without atomic function

In golang if two goroutines read and write a variable without mutex and atomic, that may bring data race condition.
Use command go run --race xxx.go will detect the race point.
While the implementation of Mutex in src/sync/mutex.go use the following code
func (m *Mutex) Lock() {
// Fast path: grab unlocked mutex.
if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
if race.Enabled {
race.Acquire(unsafe.Pointer(m))
}
return
}
var waitStartTime int64
starving := false
awoke := false
iter := 0
old := m.state // This line confuse me !!!
......
The code old := m.state confuse me, because m.state is read and write by different goroutine.
The following function Test obvious has race condition problem. But if i put it in mutex.go, no race conditon will detect.
# mutex.go
func Test(){
a := int32(1)
go func(){
atomic.CompareAndSwapInt32(&a, 1, 4)
}()
_ = a
}
If put it in other package like src/os/exec.go, the conditon race problem will detect.
package main
import(
"sync"
"os"
)
func main(){
sync.Test() // race condition will not detect
os.Test() // race condition will detect
}
First of all the golang source always changes so let's make sure we are looking at the same thing. Take release 1.12 at
https://github.com/golang/go/blob/release-branch.go1.12/src/sync/mutex.go
as you said the Lock function begins
func (m *Mutex) Lock() {
// fast path where it will set the high order bit and return if not locked
if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
return
}
//reads value to decide on the lower order bits
for {
//if statements involving CompareAndSwaps on the lower order bits
}
}
What is this CompareAndSwap doing? it looks atomically in that int32 and if it is 0 it swaps it to mutexLocked (which is 1 defined as a const above) and returns true that it swapped it.
Then it promptly returns. That is its fast path. The goroutine acquired the lock and now it is running can start running it's protected path.
If it is 1 (mutexLocked) already, it doesn't swap it and returns false (it didn't swap it).
Then it reads the state and enters a loop that it does atomic compare and swaps to determine how it should behave.
What are the possible states? combinations of locked, woken and starving as you see from the const block.
Now depending on how long the goroutine has been waiting on the waitlist it will get priority on when to check again if the mutex is now free.
But also observe that only Unlock() can set the mutexLocked bit back to 0.
in the Lock() CAS loop the only bits that are set are the starving and woken ones.Yes you can have multiple readers but only one writer at any time, and that writer is the one who is holding the mutex and is executing its protected path until calling Unlock(). Check out this article for more details.
By disassemble the binary output file, The Test function in different pack generate different code.
The reason is that the compiler forbid to generate race detect instrument in the sync package.
The code is :
var norace_inst_pkgs = []string{"sync", "sync/atomic"} // https://github.com/golang/go/blob/release-branch.go1.12/src/cmd/compile/internal/gc/racewalk.go
``

Settings and accessing a pointer from concurrent goroutines

I have a map which is used by goroutine A and replaced once in a time in goroutine B. By replacement I mean:
var a map[T]N
// uses the map
func goroutineA() {
for (...) {
tempA = a
..uses tempA in some way...
}
}
//refreshes the map
func gorountineB() {
for (...) {
time.Sleep(10 * time.Seconds)
otherTempA = make(map[T]N)
...initializes other tempA....
a = otherTempA
}
}
Do you see any problem in this pseudo code? (in terms of concurrecy)
The code isn't safe, since assignments and reads to a pointer value are not guaranteed to be atomic. This can mean that as one goroutine writes the new pointer value, the other may see a mix of bytes from the old and new value, which will cause your program to die in a nasty way. Another thing that may happen is that since there's no synchronisation in your code, the compiler may notice that nothing can change a in goroutineA, and lift the tempA := a statement out of the loop. This will mean that you'll never see new map assignments as the other goroutine updates them.
You can use go test -race to find these sorts of problems automatically.
One solution is to lock all access to the map with a mutex.
You may wish to read the Go Memory Model document, which explains clearly when changes to variables are visible inside goroutines.
When unsure about data races, run go run -race file.go, that being said, yes there will be a race.
The easiest way to fix that is using a sync.RWMutex :
var a map[T]N
var lk sync.RWMutex
// uses the map
func goroutineA() {
for (...) {
lk.RLock()
//actions on a
lk.RUnlock()
}
}
//refreshes the map
func gorountineB() {
for (...) {
otherTempA = make(map[T]N)
//...initializes other tempA....
lk.Lock()
a = otherTempA
lk.Unlock()
}
}

Resources