Since len() for slice and map are O(1), why len() is not thread-safe? - go

I read some post(Are len(string) and len(slice)s O(1) operation in Go?, How to check if a map is empty in Golang?) and found that len() for slice and map are O(1), the implementation is get a member(e.g. size, length) of the structure.
But the test(Is len() thread safe in golang?) show len() is not thread-safe, why?

What you're referring to is called a benign data race. The map size is just an int, so one could assume that reading and writing a single int is atomic, right? Well, no.
The Go memory model is derived from the C memory model, in which accessing the same memory from two or more threads, at least one of which is writing, without synchronization is a data race - a form of undefined behavior. It does not matter if you're accessing a single int or a more complex structure.
There are two reasons why a benign data race does not actually exist:
1. Hardware. There are two types of CPU cache architectures: one with strong cache coherency (e.g. x86) and another with weak cache coherency (e.g. ARM). Regular writes to memory may not become "visible" on other cores immediately, depending on the hardware. Special "atomic" instructions are required to make data visible between cores.
2. Software. According to the memory model, each thread is assumed to execute in isolation (until a synchronization event with happens-before semantics occurs). The compiler is allowed to assume that reading the same memory location will provide the same result, and for example, hoist these reads of the loop (thus breaking your program). This is why synchronization must be explicit in the code, even when targeting hardware with strong cache coherency.
The following program may or may not ever finish, depending on the CPU and compiler optimization flags:
func main() {
x := 0
go func() {
x = 1
}()
for x == 0 { // DATA RACE! also, the compiler is allowed to simplify "x == 0" to "true" here
// wait ...
}
}
To make it correct, use synchronization to let the compiler know there is concurrency involved:
func main() {
var mtx sync.Mutex
x := 0
go func() {
mtx.Lock()
x = 1 // protect writes by a mutex
mtx.Unlock()
}()
for {
mtx.Lock()
x_copy := x // yes, also reads must be protected
mtx.Unlock()
if x_copy != 0 {
break
}
// wait ...
}
}
The locking and unlocking of the same mutex creates an acquire-release fence such that all memory writes done before unlocking the mutex are "released" and become visible to the thread that subsequently locks the mutex and "acquires" them.

Related

Deadlock-free locking multiple locks in Go

Is there a proven programmatic way to achieve mutual exclusion of multiple Mutexes / Locks / whatever in Golang?
Eg.
mutex1.Lock()
defer mutex1.Unlock()
mutex2.Lock()
defer mutex2.Unlock()
mutex3.Lock()
defer mutex3.Unlock()
would keep mutex1 locked I guess while waiting for mutex2 / mutex3. For several goroutines, all using a different subset of several locks, this easily can get deadlocked.
So is there any way to acquire those locks only if all of them are available? Or is there any other pattern (using channels maybe?) in achieving the same?
So is there any way to acquire those locks only if all of them are available?
No. Not with standard library mutex at least. There is no way to "check" if a lock is available, or "try" acquiring a lock. Every call of Lock() will block until locking is successful.
The implementation of mutex relies on atomic operations which only act on a single value at once. In order to achieve what you describe, you'd need some kind of "meta locking" where execution of the lock and unlock methods are themselves protected by a lock, but this probably isn't necessary just to have correct and safe locking in your program:
Penelope Stevens' comment explains correctly: As long as the order of acquisition is consistent between different goroutines, you won't get deadlocks, even for arbitrary subsets of the locks, if each goroutine eventually releases the locks it acquires.
If the structure of your program provides some obvious locking order, then use that. If not, you can create your own special mutex type that has intrinsic ordering.
type Mutex struct {
sync.Mutex
id uint64
}
var mutexIDCounter uint64
func NewMutex() *Mutex {
return &Mutex{
id: atomic.AddUint64(&mutexIDCounter, 1),
}
}
func MultiLock(locks ...*Mutex) {
sort.Slice(locks, func(i, j int) bool { return locks[i].id < locks[j].id })
for i := range locks {
locks[i].Lock()
}
}
func MultiUnlock(locks ...*Mutex) {
for i := range locks {
locks[i].Unlock()
}
}
Usage:
a := NewMutex()
b := NewMutex()
c := NewMutex()
MultiLock(a, b, c)
MultiUnlock(a, b, c)
Each mutex is assigned an incrementing ID, and unlocked in order of ID.
Try it for yourself in this example program on the playground. To prevent the deadlock, change const safe = true.
You might as well just put all the Mutexes into an array or into a map and implement a function that will call Lock() method on each of those using range based loop, so the order of locking remains the same.
The function might also take an array containing indexes it shall skip(or not skip if you like).
This way there will be no way to change the order of locking the mutexes as the order of the loop is consistent.

Why is atomic.StoreUint32 preferred over a normal assignment in sync.Once?

While reading the source codes of Go, I have a question about the code in src/sync/once.go:
func (o *Once) Do(f func()) {
// Note: Here is an incorrect implementation of Do:
//
// if atomic.CompareAndSwapUint32(&o.done, 0, 1) {
// f()
// }
//
// Do guarantees that when it returns, f has finished.
// This implementation would not implement that guarantee:
// given two simultaneous calls, the winner of the cas would
// call f, and the second would return immediately, without
// waiting for the first's call to f to complete.
// This is why the slow path falls back to a mutex, and why
// the atomic.StoreUint32 must be delayed until after f returns.
if atomic.LoadUint32(&o.done) == 0 {
// Outlined slow-path to allow inlining of the fast-path.
o.doSlow(f)
}
}
func (o *Once) doSlow(f func()) {
o.m.Lock()
defer o.m.Unlock()
if o.done == 0 {
defer atomic.StoreUint32(&o.done, 1)
f()
}
}
Why is atomic.StoreUint32 used, rather than, say o.done = 1? Are these not equivalent? What are the differences?
Must we use the atomic operation (atomic.StoreUint32) to make sure that other goroutines can observe the effect of f() before o.done is set to 1 on a machine with weak memory model?
Remember, unless you are writing the assembly by hand, you are not programming to your machine's memory model, you are programming to Go's memory model. This means that even if primitive assignments are atomic with your architecture, Go requires the use of the atomic package to ensure correctness across all supported architectures.
Access to the done flag outside of the mutex only needs to be safe, not strictly ordered, so atomic operations can be used instead of always obtaining a lock with a mutex. This is an optimization to make the fast path as efficient as possible, allowing sync.Once to be used in hot paths.
The mutex used for doSlow is for mutual exclusion within that function alone, to ensure that only one caller ever makes it to f() before the done flag is set. The flag is written using atomic.StoreUint32, because it may happen concurrently with atomic.LoadUint32 outside of the critical section protected by the mutex.
Reading the done field concurrently with writes, even atomic writes, is a data race. Just because the field is read atomically, does not mean you can use normal assignment to write it, hence the flag is checked first with atomic.LoadUint32 and written with atomic.StoreUint32
The direct read of done within doSlow is safe, because it is protected from concurrent writes by the mutex. Reading the value concurrently with atomic.LoadUint32 is safe because both are read operations.
Must we use the atomic operation (atomic.StoreUint32) to make sure that other goroutines can observe the effect of f() before o.done is set to 1 on a machine with weak memory model?
Yes you are in the right direction of thought, but please note that even if the targeting machine has a strong memory model, the Go compiler can and will reorder instructions as long as the result adheres to the Go memory model. In contrast, even if the machine memory model is weaker than the language one, the compiler has to emit additional barriers so that the final code behaves compliantly with the language specification.
Let's consider the implementation of sync.Once without sync/atomic, with modifications for easier explaining:
func (o *Once) Do(f func()) {
if o.done == 0 { // (1)
o.m.Lock() // (2)
defer o.m.Unlock() // (3)
if o.done == 0 { // (4)
f() // (5)
o.done = 1 // (6)
}
}
}
If a goroutine observes that o.done != 0, it will return, as a result, the function must ensure that f() happens before any read can observe a 1 from o.done.
If the read is at (4), then it is protected by the mutex, which means that it will surely happen after the previous acquisition of the mutex which executes f and set o.done to 1.
If the read is at (1), we don't have the protection of the mutex, so we must construct a synchronise-with relationship between the write (6) at the writing goroutine to the read (1) at the current goroutine, after that, since (5) is sequenced before (6), a read with value 1 from (1) will surely happen after the execution of (5) according to the transitivity of happen-before relationship.
As a result, the write (6) must have release semantics, as well as the read (1) having acquire semantics. Since Go does not support acquire-read and release-store, we must resort to the stronger order, which is sequential consistency, provided by atomic.(Load/Store)Uint32.
Final note: since accesses to memory locations not larger than a machine word are guaranteed to be atomic, this usage of atomic here has nothing to do with atomicity and everything to do with synchronisation.
func (o *Once) Do(f func()) {
if atomic.LoadUint32(&o.done) == 0 { # 1
// Outlined slow-path to allow inlining of the fast-path.
o.doSlow(f)
}
}
func (o *Once) doSlow(f func()) {
o.m.Lock()
defer o.m.Unlock()
if o.done == 0 { # 2
defer atomic.StoreUint32(&o.done, 1) # 3
f()
}
}
#1 and #3 : #1 is read, #3 is write, it's not safe, need mutext to protect
#2 and #3 : in critical section, procted by mutex, safe.
Atomic operations can be used to synchronize the execution of different goroutines.
Without synchronization, even if a goroutine observes o.done == 1, there is no guarantee that it will observe the effect of f().

What happens when reading or writing concurrently without a mutex

In Go, a sync.Mutex or chan is used to prevent concurrent access of shared objects. However, in some cases I am just interested in the "latest" value of a variable or field of an object.
Or I like to write a value and do not care if another go-routine overwrites it later or has just overwritten it before.
Update: TLDR; Just don't do this. It is not safe. Read the answers, comments, and linked documents!
Update 2021: The Go memory model is going to be specified more thoroughly and there are three great articles by Russ Cox that will teach you more about the surprising effects of unsynchronized memory access. These articles summarize a lot of the below discussions and learnings.
Here are two variants good and bad of an example program, where both seem to produce "correct" output using the current Go runtime:
package main
import (
"flag"
"fmt"
"math/rand"
"time"
)
var bogus = flag.Bool("bogus", false, "use bogus code")
func pause() {
time.Sleep(time.Duration(rand.Uint32()%100) * time.Millisecond)
}
func bad() {
stop := time.After(100 * time.Millisecond)
var name string
// start some producers doing concurrent writes (DANGER!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
name = fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
for {
select {
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func good() {
stop := time.After(100 * time.Millisecond)
names := make(chan string, 10)
// start some producers concurrently writing to a channel (GOOD!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
names <- fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
var name string
for {
select {
case name = <-names:
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func main() {
flag.Parse()
if *bogus {
bad()
} else {
good()
}
}
The expected output is as follows:
...
read: name = 3
read: name = 3
read: name = 5
read: name = 4
...
Any combination of read: and read: name=[0-9] is correct output for this program. Receiving any other string as output would be an error.
When running this program with go run --race bogus.go it is safe.
However, go run --race bogus.go -bogus warns of the concurrent reads and writes.
For map types and when appending to slices I always need a mutex or a similar method of protection to avoid segfaults or unexpected behavior. However, reading and writing literals (atomic values) to variables or field values seems to be safe.
Question: Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
Please explain why something is safe or unsafe in Go in your answer.
Update: I rewrote the example to better reflect the original code, where I had the the concurrent writes issue. The important leanings are already in the comments. I will accept an answer that summarizes these learnings with enough detail (esp. on the Go-runtime).
However, in some cases I am just interested in the latest value of a variable or field of an object.
Here is the fundamental problem: What does the word "latest" mean?
Suppoose that, mathematically speaking, we have a sequence of values Xi, with 0 <= i < N. Then obviously Xj is "later than" Xi if j > i. That's a nice simple definition of "latest" and is probably the one you want.
But when two separate CPUs within a single machine—including two goroutines in a Go program—are working at the same time, time itself loses meaning. We cannot say whether i < j, i == j, or i > j. So there is no correct definition for the word latest.
To solve this kind of problem, modern CPU hardware, and Go as a programming language, gives us certain synchronization primitives. If CPUs A and B execute memory fence instructions, or synchronization instructions, or use whatever other hardware provisions exist, the CPUs (and/or some external hardware) will insert whatever is required for the notion of "time" to regain its meaning. That is, if the CPU uses barrier instructions, we can say that a memory load or store that was executed before the barrier is a "before" and a memory load or store that is executed after the barrier is an "after".
(The actual implementation, in some modern hardware, consists of load and store buffers that can rearrange the order in which loads and stores go to memory. The barrier instruction either synchronizes the buffers, or places an actual barrier in them, so that loads and stores cannot move across the barrier. This particular concrete implementation gives an easy way to think about the problem, but isn't complete: you should think of time as simply not existing outside the hardware-provided synchronization, i.e., all loads from, and stores to, some location are happening simultaneously, rather than in some sequential order, except for these barriers.)
In any case, Go's sync package gives you a simple high level access method to these kinds of barriers. Compiled code that executes before a mutex Lock call really does complete before the lock function returns, and the code that executes after the call really does not start until after the lock function returns.
Go's channels provide the same kinds of before/after time guarantees.
Go's sync/atomic package provides much lower level guarantees. In general you should avoid this in favor of the higher level channel or sync.Mutex style guarantees. (Edit to add note: You could use sync/atomic's Pointer operations here, but not with the string type directly, as Go strings are actually implemented as a header containing two separate values: a pointer, and a length. You could solve this with another layer of indirection, by updating a pointer that points to the string object. But before you even consider doing that, you should benchmark the use of the language's preferred methods and verify that these are a problem, because code that works at the sync/atomic level is hard to write and hard to debug.)
Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
None.
It really is that simple: You cannot, under no circumstance whatsoever, read and write concurrently to anything in Go.
(Btw: Your "correct" program is not correct, it is racy and even if you get rid of the race condition it would not deterministically produce the output.)
Why can't you use channels
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup // wait group to close channel
var buffer int = 1 // buffer of the channel
// channel to get the share data
cName := make(chan string, buffer)
for i := 0; i < 10; i++ {
wg.Add(1) // add to wait group
go func(i int) {
cName <- fmt.Sprintf("name = %d", i)
wg.Done() // decrease wait group.
}(i)
}
go func() {
wg.Wait() // wait of wait group to be 0
close(cName) // close the channel
}()
// process all the data
for n := range cName {
println("read:", n)
}
}
The above code returns the following output
read: name = 0
read: name = 5
read: name = 1
read: name = 2
read: name = 3
read: name = 4
read: name = 7
read: name = 6
read: name = 8
read: name = 9
https://play.golang.org/p/R4n9ssPMOeS
Article about channels

Why sync.Once using atomic in doSlow()? [duplicate]

While reading the source codes of Go, I have a question about the code in src/sync/once.go:
func (o *Once) Do(f func()) {
// Note: Here is an incorrect implementation of Do:
//
// if atomic.CompareAndSwapUint32(&o.done, 0, 1) {
// f()
// }
//
// Do guarantees that when it returns, f has finished.
// This implementation would not implement that guarantee:
// given two simultaneous calls, the winner of the cas would
// call f, and the second would return immediately, without
// waiting for the first's call to f to complete.
// This is why the slow path falls back to a mutex, and why
// the atomic.StoreUint32 must be delayed until after f returns.
if atomic.LoadUint32(&o.done) == 0 {
// Outlined slow-path to allow inlining of the fast-path.
o.doSlow(f)
}
}
func (o *Once) doSlow(f func()) {
o.m.Lock()
defer o.m.Unlock()
if o.done == 0 {
defer atomic.StoreUint32(&o.done, 1)
f()
}
}
Why is atomic.StoreUint32 used, rather than, say o.done = 1? Are these not equivalent? What are the differences?
Must we use the atomic operation (atomic.StoreUint32) to make sure that other goroutines can observe the effect of f() before o.done is set to 1 on a machine with weak memory model?
Remember, unless you are writing the assembly by hand, you are not programming to your machine's memory model, you are programming to Go's memory model. This means that even if primitive assignments are atomic with your architecture, Go requires the use of the atomic package to ensure correctness across all supported architectures.
Access to the done flag outside of the mutex only needs to be safe, not strictly ordered, so atomic operations can be used instead of always obtaining a lock with a mutex. This is an optimization to make the fast path as efficient as possible, allowing sync.Once to be used in hot paths.
The mutex used for doSlow is for mutual exclusion within that function alone, to ensure that only one caller ever makes it to f() before the done flag is set. The flag is written using atomic.StoreUint32, because it may happen concurrently with atomic.LoadUint32 outside of the critical section protected by the mutex.
Reading the done field concurrently with writes, even atomic writes, is a data race. Just because the field is read atomically, does not mean you can use normal assignment to write it, hence the flag is checked first with atomic.LoadUint32 and written with atomic.StoreUint32
The direct read of done within doSlow is safe, because it is protected from concurrent writes by the mutex. Reading the value concurrently with atomic.LoadUint32 is safe because both are read operations.
Must we use the atomic operation (atomic.StoreUint32) to make sure that other goroutines can observe the effect of f() before o.done is set to 1 on a machine with weak memory model?
Yes you are in the right direction of thought, but please note that even if the targeting machine has a strong memory model, the Go compiler can and will reorder instructions as long as the result adheres to the Go memory model. In contrast, even if the machine memory model is weaker than the language one, the compiler has to emit additional barriers so that the final code behaves compliantly with the language specification.
Let's consider the implementation of sync.Once without sync/atomic, with modifications for easier explaining:
func (o *Once) Do(f func()) {
if o.done == 0 { // (1)
o.m.Lock() // (2)
defer o.m.Unlock() // (3)
if o.done == 0 { // (4)
f() // (5)
o.done = 1 // (6)
}
}
}
If a goroutine observes that o.done != 0, it will return, as a result, the function must ensure that f() happens before any read can observe a 1 from o.done.
If the read is at (4), then it is protected by the mutex, which means that it will surely happen after the previous acquisition of the mutex which executes f and set o.done to 1.
If the read is at (1), we don't have the protection of the mutex, so we must construct a synchronise-with relationship between the write (6) at the writing goroutine to the read (1) at the current goroutine, after that, since (5) is sequenced before (6), a read with value 1 from (1) will surely happen after the execution of (5) according to the transitivity of happen-before relationship.
As a result, the write (6) must have release semantics, as well as the read (1) having acquire semantics. Since Go does not support acquire-read and release-store, we must resort to the stronger order, which is sequential consistency, provided by atomic.(Load/Store)Uint32.
Final note: since accesses to memory locations not larger than a machine word are guaranteed to be atomic, this usage of atomic here has nothing to do with atomicity and everything to do with synchronisation.
func (o *Once) Do(f func()) {
if atomic.LoadUint32(&o.done) == 0 { # 1
// Outlined slow-path to allow inlining of the fast-path.
o.doSlow(f)
}
}
func (o *Once) doSlow(f func()) {
o.m.Lock()
defer o.m.Unlock()
if o.done == 0 { # 2
defer atomic.StoreUint32(&o.done, 1) # 3
f()
}
}
#1 and #3 : #1 is read, #3 is write, it's not safe, need mutext to protect
#2 and #3 : in critical section, procted by mutex, safe.
Atomic operations can be used to synchronize the execution of different goroutines.
Without synchronization, even if a goroutine observes o.done == 1, there is no guarantee that it will observe the effect of f().

Is there a race condition in the golang implementation of mutex the m.state is read without atomic function

In golang if two goroutines read and write a variable without mutex and atomic, that may bring data race condition.
Use command go run --race xxx.go will detect the race point.
While the implementation of Mutex in src/sync/mutex.go use the following code
func (m *Mutex) Lock() {
// Fast path: grab unlocked mutex.
if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
if race.Enabled {
race.Acquire(unsafe.Pointer(m))
}
return
}
var waitStartTime int64
starving := false
awoke := false
iter := 0
old := m.state // This line confuse me !!!
......
The code old := m.state confuse me, because m.state is read and write by different goroutine.
The following function Test obvious has race condition problem. But if i put it in mutex.go, no race conditon will detect.
# mutex.go
func Test(){
a := int32(1)
go func(){
atomic.CompareAndSwapInt32(&a, 1, 4)
}()
_ = a
}
If put it in other package like src/os/exec.go, the conditon race problem will detect.
package main
import(
"sync"
"os"
)
func main(){
sync.Test() // race condition will not detect
os.Test() // race condition will detect
}
First of all the golang source always changes so let's make sure we are looking at the same thing. Take release 1.12 at
https://github.com/golang/go/blob/release-branch.go1.12/src/sync/mutex.go
as you said the Lock function begins
func (m *Mutex) Lock() {
// fast path where it will set the high order bit and return if not locked
if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
return
}
//reads value to decide on the lower order bits
for {
//if statements involving CompareAndSwaps on the lower order bits
}
}
What is this CompareAndSwap doing? it looks atomically in that int32 and if it is 0 it swaps it to mutexLocked (which is 1 defined as a const above) and returns true that it swapped it.
Then it promptly returns. That is its fast path. The goroutine acquired the lock and now it is running can start running it's protected path.
If it is 1 (mutexLocked) already, it doesn't swap it and returns false (it didn't swap it).
Then it reads the state and enters a loop that it does atomic compare and swaps to determine how it should behave.
What are the possible states? combinations of locked, woken and starving as you see from the const block.
Now depending on how long the goroutine has been waiting on the waitlist it will get priority on when to check again if the mutex is now free.
But also observe that only Unlock() can set the mutexLocked bit back to 0.
in the Lock() CAS loop the only bits that are set are the starving and woken ones.Yes you can have multiple readers but only one writer at any time, and that writer is the one who is holding the mutex and is executing its protected path until calling Unlock(). Check out this article for more details.
By disassemble the binary output file, The Test function in different pack generate different code.
The reason is that the compiler forbid to generate race detect instrument in the sync package.
The code is :
var norace_inst_pkgs = []string{"sync", "sync/atomic"} // https://github.com/golang/go/blob/release-branch.go1.12/src/cmd/compile/internal/gc/racewalk.go
``

Resources