What happens if concurrent processes write to a global variable the same value?

What happens if concurrent processes write to a global variable the same value? - go

I'm just wondering if there is potential for corruption as a result of writing the same value to a global variable at the same time. My brain is telling me there is nothing wrong with this because its just a location in memory, but I figure I should probably double check this assumption.
I have concurrent processes writing to a global map var linksToVisit map[string]bool. The map is actually tracking what links on a website need to be further crawled.
However it can be the case that concurrent processes may have the same link on their respective pages and therefore each will mark that same link as true concurrently. There's nothing wrong with NOT using locks in this case right? NOTE: I never change the value back to false so either the key exists and it's value is true or it doesn't exist.
I.e.
var linksToVisit = map[string]bool{}
...
// somewhere later a goroutine finds a link and marks it as true
// it is never marked as false anywhere
linksToVisit[someLink] = true

What happens if concurrent processes write to a global variable the
same value?
The results of a data race are undefined.
Run the Go data race detector.
References:
Wikipedia: Race condition
Benign Data Races: What Could Possibly Go Wrong?
The Go Blog: Introducing the Go Race Detector
Go: Data Race Detector
Go 1.8 Release Notes
Concurrent Map Misuse
In Go 1.6, the runtime added lightweight, best-effort detection of
concurrent misuse of maps. This release improves that detector with
support for detecting programs that concurrently write to and iterate
over a map.
As always, if one goroutine is writing to a map, no other goroutine
should be reading (which includes iterating) or writing the map
concurrently. If the runtime detects this condition, it prints a
diagnosis and crashes the program. The best way to find out more about
the problem is to run the program under the race detector, which will
more reliably identify the race and give more detail.
For example,
package main
import "time"
var linksToVisit = map[string]bool{}
func main() {
someLink := "someLink"
go func() {
for {
linksToVisit[someLink] = true
}
}()
go func() {
for {
linksToVisit[someLink] = true
}
}()
time.Sleep(100 * time.Millisecond)
}
Output:
$ go run racer.go
fatal error: concurrent map writes
$
$ go run -race racer.go
==================
WARNING: DATA RACE
Write at 0x00c000078060 by goroutine 6:
runtime.mapassign_faststr()
/home/peter/go/src/runtime/map_faststr.go:190 +0x0
main.main.func2()
/home/peter/gopath/src/racer.go:16 +0x6a
Previous write at 0x00c000078060 by goroutine 5:
runtime.mapassign_faststr()
/home/peter/go/src/runtime/map_faststr.go:190 +0x0
main.main.func1()
/home/peter/gopath/src/racer.go:11 +0x6a
Goroutine 6 (running) created at:
main.main()
/home/peter/gopath/src/racer.go:14 +0x88
Goroutine 5 (running) created at:
main.main()
/home/peter/gopath/src/racer.go:9 +0x5b
==================
fatal error: concurrent map writes
$

It is better to use locks if you are changing the same value concurrently using multiple go routines. Since mutex and locks are used whenever it comes to secure the value from accessing when another function is changing the same just like writing to database table while accessing the same table.
For your question on using maps with different keys it is not preferable in Go as:
The typical use of maps did not require safe access from multiple
goroutines, and in those cases where it did, the map was probably part
of some larger data structure or computation that was already
synchronized. Therefore requiring that all map operations grab a mutex
would slow down most programs and add safety to few.
Map access is unsafe only when updates are occurring. As long as all
goroutines are only reading—looking up elements in the map, including
iterating through it using a for range loop—and not changing the map
by assigning to elements or doing deletions, it is safe for them to
access the map concurrently without synchronization.
So In case of update of maps it is not recommended. For more information Check FAQ on why maps operations not defined atomic.
Also it is noticed that if you realy wants to go for there should be a way to synchronize them.
Maps are not safe for concurrent use: it's not defined what happens
when you read and write to them simultaneously. If you need to read
from and write to a map from concurrently executing goroutines, the
accesses must be mediated by some kind of synchronization mechanism.
One common way to protect maps is with sync.RWMutex.

Concurrent map write is not ok, so you will most likely get a fatal error. So I think a lock should be used

As of Go 1.6, simultaneous map writes will cause a panic. Use a sync.Map to synchronize access.
See the map value assign implementation:
https://github.com/golang/go/blob/fe8a0d12b14108cbe2408b417afcaab722b0727c/src/runtime/hashmap.go#L519

Related

How does a mutex.Lock() know which variables to lock?

I'm a go-newbie, so please be gentle.
So I've been using mutexes in some of my code for a couple weeks now. I understand the concept behind it: lock access to a certain resource, interact with it (read or write), and then unlock it for others again.
The mutex code I use is mostly copy-paste-adjust. The code runs, but I'm still trying to wrap my head around it's internal working. Until now I've always used a mutex within a struct to lock the struct. Today I found this example though, which made it completely unclear for me what the mutex actually locks. Below is a piece of the example code:
var state = make(map[int]int)
var mutex = &sync.Mutex{}
var readOps uint64
var writeOps uint64
// Here we start 100 goroutines to execute repeated reads against the state, once per millisecond in each goroutine.
for r := 0; r < 100; r++ {
go func() {
total := 0
for {
key := rand.Intn(5)
mutex.Lock()
total += state[key]
mutex.Unlock()
atomic.AddUint64(&readOps, 1)
time.Sleep(time.Millisecond)
}
}()
}
What puzzles me here is that there doesn't seem to be any connection between the mutex and the value it is supposed to lock. Until today I thought that the mutex can lock a specific variable, but looking at this code it seems to somehow lock the whole program into doing only the lines below the lock, until the unlock is ran again. I suppose that means that all the other goroutines are paused for a moment until the unlock is ran again. Since the code is compiled I suppose it can know what variables are accessed between the lock() and the unlock(), but I'm not sure if that is the case.
If all other programs pause for a moment, it doesn't sound like real multi-processing, so I'm guessing I don't have a good understanding of what's going on.
Could anybody help me out in understanding how the computer knows which variables it should lock?

lock access to a certain resource, interact with it (read or write), and then unlock it for others again.
Basically yes.
What puzzles me here is that there doesn't seem to be any connection between the mutex and the value it is supposed to lock.
Mutex is just a mutual exclusion object that synchronizes access to a resource. That means, if two different goroutines want to lock the mutex, only the first can access it. The second goroutines now waits indefinitely until it can itself lock the mutex. There is no connection to variables whatsoever, you can use mutex however you want. For example only one http request, only one database read/write operation or only one variable assignment. While i don't advice the usage of mutex for those examples, the general idea should become clear.
but looking at this code it seems to somehow lock the whole program into doing only the lines below the lock, until the unlock is ran again.
Not the whole program, only every goroutine who wants to access the same mutex waits until it can.
I suppose that means that all the other goroutines are paused for a moment until the unlock is ran again.
No, they don't pause. They execute until they want to access the same mutex.
If you want to group your mutex specifically with a variable, why not create a struct?

Why code in loop not executed when I have two go-routines

I'm facing a problem in golang
var a = 0
func main() {
go func() {
for {
a = a + 1
}
}()
time.Sleep(time.Second)
fmt.Printf("result=%d\n", a)
}
expected: result=(a big int number)
result: result=0

You have a race condition,
run your program with -race flag
go run -race main.go
==================
WARNING: DATA RACE
Read at 0x0000005e9600 by main goroutine:
main.main()
/home/jack/Project/GoProject/src/gitlab.com/hooshyar/GoNetworkLab/StackOVerflow/race/main.go:17 +0x6c
Previous write at 0x0000005e9600 by goroutine 6:
main.main.func1()
/home/jack/Project/GoProject/src/gitlab.com/hooshyar/GoNetworkLab/StackOVerflow/race/main.go:13 +0x56
Goroutine 6 (running) created at:
main.main()
/home/jack/Project/GoProject/src/gitlab.com/hooshyar/GoNetworkLab/StackOVerflow/race/main.go:11 +0x46
==================
result=119657339
Found 1 data race(s)
exit status 66
what is solution?
There is some solution, A solution is using a mutex:
var a = 0
func main() {
var mu sync.Mutex
go func() {
for {
mu.Lock()
a = a + 1
mu.Unlock()
}
}()
time.Sleep(3*time.Second)
mu.Lock()
fmt.Printf("result=%d\n", a)
mu.Unlock()
}
before any read and write lock the mutex and then unlock it, now you don not have any race and resault will bi big int at the end.
For more information read this topic.
Data races in Go(Golang) and how to fix them
and this
Golang concurrency - data races

As other writers have mentioned, you have a data race, but if you are comparing this behavior to, say, a program written in C using pthreads, you are missing some important data. Your problem is not just about timing, it's about the very language definition. Because concurrency primitives are baked into the language itself, the Go language memory model (https://golang.org/ref/mem) describes exactly when and how changes in one goroutine -- think of goroutines as "super-lightweight user-space threads" and you won't be too far off -- are guaranteed to be visible to code running in another goroutine.
Without any synchronizing actions, like channel sends/receives or sync.Mutex locks/unlocks, the Go memory model says that any changes you make to 'a' inside that goroutine don't ever have to be visible to the main goroutine. And, since the compiler knows that, it is free to optimize away pretty much everything in your for loop. Or not.
It's a similar situation to when you have, say, a local int variable in C set to 1, and maybe you have a while loop reading that variable in a loop waiting for it to be set to 0 by an ISR, but then your compiler gets too clever and decides to optimize away the test for zero because it thinks your variable can't ever change within the loop and you really just wanted an infinite loop, and so you have to declare the variable as volatile to fix the 'bug'.
If you are going to be working in Go, (my current favorite language, FWIW,) take time to read and thoroughly grok the Go memory model linked above, and it will really pay off in the future.

Your program is running into race condition. go can detect such scenarios.
Try running your program using go run -race main.go assuming your file name is main.go. It will show how race occured ,
attempted write inside the goroutine ,
simultaneous read by the main goroutine.
It will also print a random int number as you expected.

How safe are Golang maps for concurrent Read/Write operations?

According to the Go blog,
Maps are not safe for concurrent use: it's not defined what happens when you read and write to them simultaneously. If you need to read from and write to a map from concurrently executing goroutines, the accesses must be mediated by some kind of synchronization mechanism.
(source: https://blog.golang.org/go-maps-in-action)
Can anyone elaborate on this? Concurrent read operations seem permissible across routines, but concurrent read/write operations may generate a race condition if one attempts to read from and write to the same key.
Can this last risk be reduced in some cases? For example:
Function A generates k and sets m[k]=0. This is the only time A writes to map m. k is known to not be in m.
A passes k to function B running concurrently
A then reads m[k]. If m[k]==0, it waits, continuing only when m[k]!=0
B looks for k in the map. If it finds it, B sets m[k] to some positive integer. If it doesn't it waits until k is in m.
This isn't code (obviously) but I think it shows the outlines of a case where even if A and B both try to access m there won't be a race condition, or if there is it won't matter because of the additional constraints.

Before Golang 1.6, concurrent read is OK, concurrent write is not OK, but write and concurrent read is OK. Since Golang 1.6, map cannot be read when it's being written.
So After Golang 1.6, concurrent access map should be like:
package main
import (
"sync"
"time"
)
var m = map[string]int{"a": 1}
var lock = sync.RWMutex{}
func main() {
go Read()
time.Sleep(1 * time.Second)
go Write()
time.Sleep(1 * time.Minute)
}
func Read() {
for {
read()
}
}
func Write() {
for {
write()
}
}
func read() {
lock.RLock()
defer lock.RUnlock()
_ = m["a"]
}
func write() {
lock.Lock()
defer lock.Unlock()
m["b"] = 2
}
Or you will get the error below:
ADDED:
You can detect the race by using go run -race race.go
Change the read function:
func read() {
// lock.RLock()
// defer lock.RUnlock()
_ = m["a"]
}
Another choise:
As we known, map was implemented by buckets and sync.RWMutex will lock all the buckets. concurrent-map use fnv32 to shard the key and every bucket use one sync.RWMutex.

Concurrent read (read only) is ok. Concurrent write and/or read is not ok.
Multiple goroutines can only write and/or read the same map if access is synchronized, e.g. via the sync package, with channels or via other means.
Your example:
Function A generates k and sets m[k]=0. This is the only time A writes to map m. k is known to not be in m.
A passes k to function B running concurrently
A then reads m[k]. If m[k]==0, it waits, continuing only when m[k]!=0
B looks for k in the map. If it finds it, B sets m[k] to some positive integer. If it doesn't it waits until k is in m.
Your example has 2 goroutines: A and B, and A tries to read m (in step 3) and B tries to write it (in step 4) concurrently. There is no synchronization (you didn't mention any), so this alone is not permitted / not determined.
What does it mean? Not determined means even though B writes m, A may never observe the change. Or A may observe a change that didn't even happen. Or a panic may occur. Or the Earth may explode due to this non-synchronized concurrent access (although the chance of this latter case is extremely small, maybe even less than 1e-40).
Related questions:
Map with concurrent access
what does not being thread safe means about maps in Go?
What is the danger of neglecting goroutine/thread-safety when using a map in Go?

Go 1.6 Release Notes
The runtime has added lightweight, best-effort detection of concurrent
misuse of maps. As always, if one goroutine is writing to a map, no
other goroutine should be reading or writing the map concurrently. If
the runtime detects this condition, it prints a diagnosis and crashes
the program. The best way to find out more about the problem is to run
the program under the race detector, which will more reliably identify
the race and give more detail.
Maps are complex, self-reorganizing data structures. Concurrent read and write access is undefined.
Without code, there's not much else to say.

After long discussion it was decided that the typical use of maps did not require safe access from multiple goroutines, and in those cases where it did, the map was probably part of some larger data structure or computation that was already synchronized. Therefore requiring that all map operations grab a mutex would slow down most programs and add safety to few. This was not an easy decision, however, since it means uncontrolled map access can crash the program.
The language does not preclude atomic map updates. When required, such as when hosting an untrusted program, the implementation could interlock map access.
Map access is unsafe only when updates are occurring. As long as all goroutines are only reading—looking up elements in the map, including iterating through it using a for range loop—and not changing the map by assigning to elements or doing deletions, it is safe for them to access the map concurrently without synchronization.
As an aid to correct map use, some implementations of the language contain a special check that automatically reports at run time when a map is modified unsafely by concurrent execution.

You can use sync.Map which is safe for concurrent use. The only caveat is that you are gonna give up on type safety and change all the reads and writes to your map to use the methods defined for this type

You can store a pointer to an int in the map, and have multiple goroutines read the int being pointed to while another writes a new value to the int. The map is not being updated in this case.
This wouldn't be idiomatic for Go and not what you were asking.
Or instead of passing a key to a map, you could pass the index to an array, and have that updated by one goroutine while others read the location.
But you're probably just wondering why a map's value can't be updated with a new value when the key is already in the map. Presumably nothing about the map's hashing scheme is being changed - at least not given their current implementation. It would seem the Go authors don't want to make allowances for such special cases. Generally they want code to be easy to read and understand, and a rule like not allowing map writes when other goroutines could be reading keeps things simple and now in 1.6 they can even start to catch misuse during normal runtimes - saving many people many hours of debugging.

As the other answers here stated, the native map type is not goroutine-safe. A couple of notes after reading the current answers:
Do not use defer to unlock, it has some overhead that affects performance (see this nice post). Call unlock directly.
You can achieve better performance by reducing time spent between locks. For example, by sharding the map.
There is a common package (approaching 400 stars on GitHub) used to solve this called concurrent-map here which has performance and usability in mind. You could use it to handle the concurrency issues for you.

Map is concurrent safe for read only in Golang. Let's say, your map is written first and never be written again then you don't need any mutex type of thing to make sure that only one go routine is accessing your map. I have given an example below about map concurrent safe reading.
package main
import (
"fmt"
"sync"
)
var freq map[int]int
// An example of concurrent read from a map
func main() {
// Map is written before accessing from go routines
freq = make(map[int]int)
freq[1] = 1
freq[2] = 2
wg := sync.WaitGroup{}
wg.Add(10)
for i:=1;i<=10;i++ {
// In go routine we are only reading val from map
go func(id int, loop int) {
defer wg.Done()
fmt.Println("In loop ", loop)
fmt.Println("Freq of 1: ", freq[id])
}(1, i)
}
wg.Wait()
}

How are Go channels implemented?

After (briefly) reviewing the Go language spec, effective Go, and the Go memory model, I'm still a little unclear as to how Go channels work under the hood.
What kind of structure are they? They act kind of like a thread-safe queue /array.
Does their implementation depend on the architecture?

The source file for channels is (from your go source code root) in /src/pkg/runtime/chan.go.
hchan is the central data structure for a channel, with send and receive linked lists (holding a pointer to their goroutine and the data element) and a closed flag. There's a Lock embedded structure that is defined in runtime2.go and that serves as a mutex (futex) or semaphore depending on the OS. The locking implementation is in lock_futex.go (Linux/Dragonfly/Some BSD) or lock_sema.go (Windows/OSX/Plan9/Some BSD), based on the build tags.
Channel operations are all implemented in this chan.go file, so you can see the makechan, send and receive operations, as well as the select construct, close, len and cap built-ins.
For a great in-depth explanation on the inner workings of channels, you have to read Go channels on steroids by Dmitry Vyukov himself (Go core dev, goroutines, scheduler and channels among other things).

Here is a good talk that describes roughly how channels are implemented:
https://youtu.be/KBZlN0izeiY
Talk description:
GopherCon 2017: Kavya Joshi - Understanding Channels
Channels provide a simple mechanism for goroutines to communicate, and a powerful construct to build sophisticated concurrency patterns. We will delve into the inner workings of channels and channel operations, including how they're supported by the runtime scheduler and memory management systems.

You asked two questions:
What kind of structure are they?
Channels in go are indeed "kind of like a thread-safe queue", to be more precise, channels in Go have the following properties:
goroutine-safe
Provide FIFO semantics
Can store and pass values between goroutines
Cause goroutines to block and unblock
Every time you create a channel, an hchan struct is allocated on the heap, and a pointer to the hchan memory location is returned represented as a channel, this is how go-routines can share it.
The first two properties described above are implemented similarly to a queue with a lock.
The elements that the channel can pass to different go-routines are implemented as a circular queue (ring buffer) with indices in the hchan struct, the indices account for the position of elements in the buffer.
Circular queue:
qcount uint // total data in the queue
dataqsiz uint // size of the circular queue
buf unsafe.Pointer // points to an array of dataqsiz elements
And the indices:
sendx uint // send index
recvx uint // receive index
Every time a go-routine needs to access the channel structure and modify it's state it holds the lock, e.g: copy elements to/ from the buffer, update lists or an index. Some operations are optimized to be lock-free, but this is out of the scope for this answer.
The block and un-block property of go channels is achieved using two queues (linked lists) that hold the blocked go-routines
recvq waitq // list of recv waiters
sendq waitq // list of send waiters
Every time a go-routine wants to add a task to a full channel (buffer is full), or to take a task from an empty channel (buffer is empty), a pseudo go-routine sudog struct is allocated and the go-routine adds the sudog as a node to the send or receive waiters list accordingly. Then the go-routine updates the go runtime scheduler using special calls, which hints when they should be taken out of execution (gopark) or ready to run (goready).
Notice this is a very simplified explanations that hides some complexities.
Does their implementation depend on the architecture?
Besides the lock implementation that is OS specific as #mna already explained, I'm not aware of any architecture specific constraints optimizations or differences.

A simpler way to look at channels is as such, in that you may like to hold a program up while waiting for a condition to complete, typically used to prevent RACE condition, which means a thread might not finish before another, and then something your later thread or code depends on sometimes does not complete.
An example could be, you have a thread to retrieve some data from a database or other server and place the data into a variable, slice or map, and for some reason it gets delayed. then you have a process that uses that variable, but since it hasn't been initialised, or its not got its data yet. the program fails.
So a simple way to look at it in code is as follows:
package main
import "fmt"
var doneA = make(chan bool)
var doneB = make(chan bool)
var doneC = make(chan bool)
func init() { // this runs when you program starts.
go func() {
doneA <- true //Give donA true
}()
}
func initB() { //blocking
go func() {
a := <- doneA //will wait here until doneA is true
// Do somthing here
fmt.Print(a)
doneB <- true //State you finished
}()
}
func initC() {
go func() {
<-doneB // still blocking, but dont care about the value
// some code here
doneC <- true // Indicate finished this function
}()
}
func main() {
initB()
initC()
}
So hope this helps. not the selected answer above, but i believe should help to remove the mystery. I wonder if I should make a question and self answer?

Why is this Go code blocking?

I wrote the following program:
package main
import (
"fmt"
)
func processevents(list chan func()) {
for {
//a := <-list
//a()
}
}
func test() {
fmt.Println("Ho!")
}
func main() {
eventlist := make(chan func(), 100)
go processevents(eventlist)
for {
eventlist <- test
fmt.Println("Hey!")
}
}
Since the channel eventlist is a buffered channel, I think I should get at exactly 100 times the output "Hey!", but it is displayed only once. Where is my mistake?

Update (Go version 1.2+)
As of Go 1.2, the scheduler works on the principle of pre-emptive multitasking.
This means that the problem in the original question (and the solution presented below) are no longer relevant.
From the Go 1.2 release notes
Pre-emption in the scheduler
In prior releases, a goroutine that was looping forever could starve out other goroutines
on the same thread, a serious problem when GOMAXPROCS provided only one user thread.
In Go > 1.2, this is partially addressed: The scheduler is invoked occasionally upon
entry to a function. This means that any loop that includes a (non-inlined) function
call can be pre-empted, allowing other goroutines to run on the same thread.
Short answer
It is not blocking on the writes. It is stuck in the infinite loop of processevents.
This loop never yields to the scheduler, causing all goroutines to lock indefinitely.
If you comment out the call to processevents, you will get results as expected, right until the 100th write. At which point the program panics, because nobody reads from the channel.
Another solution is to put a call to runtime.Gosched() in the loop.
Long answer
With Go1.0.2, Go's scheduler works on the principle of Cooperative multitasking.
This means that it allocates CPU time to the various goroutines running within a given OS thread by having these routines interact with the scheduler in certain conditions.
These 'interactions' occur when certain types of code are executed in a goroutine.
In go's case this involves doing some kind of I/O, syscalls or memory allocation (in certain conditions).
In the case of an empty loop, no such conditions are ever encountered. The scheduler is therefore never allowed to run its scheduling algorithms for as long as that loop is running. This consequently prevents it from allotting CPU time to other goroutines waiting to be run and the result you observed ensues: You effectively created a deadlock that can not be detected or broken out of by the scheduler.
The empty loop is usually never desired in Go and will, in most cases, indicate a bug in the program. If you do need it for whatever reason, you have to manually yield to the scheduler by calling runtime.Gosched() in every iteration.
for {
runtime.Gosched()
}
Setting GOMAXPROCS to a value > 1 was mentioned as a solution. While this will get rid of the immediate problem you observed, it will effectively move the problem to a different OS thread, if the scheduler decides to move the looping goroutine to its own OS thread that is. There is no guarantee of this, unless you call runtime.LockOSThread() at the start of the processevents function. Even then, I would still not rely on this approach to be a good solution. Simply calling runtime.Gosched() in the loop itself, will solve all the issues, regardless of which OS thread the goroutine is running in.

Here is another solution - use range to read from the channel. This code will yield to the scheduler correctly and also terminate properly when the channel is closed.
func processevents(list chan func()) {
for a := range list{
a()
}
}

Good news, since Go 1.2 (december 2013) the original program now works as expected.
You may try it on Playground.
This is explained in the Go 1.2 release notes, section "Pre-emption in the scheduler" :
In prior releases, a goroutine that was looping forever could starve
out other goroutines on the same thread, a serious problem when
GOMAXPROCS provided only one user thread. In Go 1.2, this is partially
addressed: The scheduler is invoked occasionally upon entry to a
function.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio