According to the Go blog,
Maps are not safe for concurrent use: it's not defined what happens when you read and write to them simultaneously. If you need to read from and write to a map from concurrently executing goroutines, the accesses must be mediated by some kind of synchronization mechanism.
(source: https://blog.golang.org/go-maps-in-action)
Can anyone elaborate on this? Concurrent read operations seem permissible across routines, but concurrent read/write operations may generate a race condition if one attempts to read from and write to the same key.
Can this last risk be reduced in some cases? For example:
Function A generates k and sets m[k]=0. This is the only time A writes to map m. k is known to not be in m.
A passes k to function B running concurrently
A then reads m[k]. If m[k]==0, it waits, continuing only when m[k]!=0
B looks for k in the map. If it finds it, B sets m[k] to some positive integer. If it doesn't it waits until k is in m.
This isn't code (obviously) but I think it shows the outlines of a case where even if A and B both try to access m there won't be a race condition, or if there is it won't matter because of the additional constraints.
Before Golang 1.6, concurrent read is OK, concurrent write is not OK, but write and concurrent read is OK. Since Golang 1.6, map cannot be read when it's being written.
So After Golang 1.6, concurrent access map should be like:
package main
import (
"sync"
"time"
)
var m = map[string]int{"a": 1}
var lock = sync.RWMutex{}
func main() {
go Read()
time.Sleep(1 * time.Second)
go Write()
time.Sleep(1 * time.Minute)
}
func Read() {
for {
read()
}
}
func Write() {
for {
write()
}
}
func read() {
lock.RLock()
defer lock.RUnlock()
_ = m["a"]
}
func write() {
lock.Lock()
defer lock.Unlock()
m["b"] = 2
}
Or you will get the error below:
ADDED:
You can detect the race by using go run -race race.go
Change the read function:
func read() {
// lock.RLock()
// defer lock.RUnlock()
_ = m["a"]
}
Another choise:
As we known, map was implemented by buckets and sync.RWMutex will lock all the buckets. concurrent-map use fnv32 to shard the key and every bucket use one sync.RWMutex.
Concurrent read (read only) is ok. Concurrent write and/or read is not ok.
Multiple goroutines can only write and/or read the same map if access is synchronized, e.g. via the sync package, with channels or via other means.
Your example:
Function A generates k and sets m[k]=0. This is the only time A writes to map m. k is known to not be in m.
A passes k to function B running concurrently
A then reads m[k]. If m[k]==0, it waits, continuing only when m[k]!=0
B looks for k in the map. If it finds it, B sets m[k] to some positive integer. If it doesn't it waits until k is in m.
Your example has 2 goroutines: A and B, and A tries to read m (in step 3) and B tries to write it (in step 4) concurrently. There is no synchronization (you didn't mention any), so this alone is not permitted / not determined.
What does it mean? Not determined means even though B writes m, A may never observe the change. Or A may observe a change that didn't even happen. Or a panic may occur. Or the Earth may explode due to this non-synchronized concurrent access (although the chance of this latter case is extremely small, maybe even less than 1e-40).
Related questions:
Map with concurrent access
what does not being thread safe means about maps in Go?
What is the danger of neglecting goroutine/thread-safety when using a map in Go?
Go 1.6 Release Notes
The runtime has added lightweight, best-effort detection of concurrent
misuse of maps. As always, if one goroutine is writing to a map, no
other goroutine should be reading or writing the map concurrently. If
the runtime detects this condition, it prints a diagnosis and crashes
the program. The best way to find out more about the problem is to run
the program under the race detector, which will more reliably identify
the race and give more detail.
Maps are complex, self-reorganizing data structures. Concurrent read and write access is undefined.
Without code, there's not much else to say.
After long discussion it was decided that the typical use of maps did not require safe access from multiple goroutines, and in those cases where it did, the map was probably part of some larger data structure or computation that was already synchronized. Therefore requiring that all map operations grab a mutex would slow down most programs and add safety to few. This was not an easy decision, however, since it means uncontrolled map access can crash the program.
The language does not preclude atomic map updates. When required, such as when hosting an untrusted program, the implementation could interlock map access.
Map access is unsafe only when updates are occurring. As long as all goroutines are only reading—looking up elements in the map, including iterating through it using a for range loop—and not changing the map by assigning to elements or doing deletions, it is safe for them to access the map concurrently without synchronization.
As an aid to correct map use, some implementations of the language contain a special check that automatically reports at run time when a map is modified unsafely by concurrent execution.
You can use sync.Map which is safe for concurrent use. The only caveat is that you are gonna give up on type safety and change all the reads and writes to your map to use the methods defined for this type
You can store a pointer to an int in the map, and have multiple goroutines read the int being pointed to while another writes a new value to the int. The map is not being updated in this case.
This wouldn't be idiomatic for Go and not what you were asking.
Or instead of passing a key to a map, you could pass the index to an array, and have that updated by one goroutine while others read the location.
But you're probably just wondering why a map's value can't be updated with a new value when the key is already in the map. Presumably nothing about the map's hashing scheme is being changed - at least not given their current implementation. It would seem the Go authors don't want to make allowances for such special cases. Generally they want code to be easy to read and understand, and a rule like not allowing map writes when other goroutines could be reading keeps things simple and now in 1.6 they can even start to catch misuse during normal runtimes - saving many people many hours of debugging.
As the other answers here stated, the native map type is not goroutine-safe. A couple of notes after reading the current answers:
Do not use defer to unlock, it has some overhead that affects performance (see this nice post). Call unlock directly.
You can achieve better performance by reducing time spent between locks. For example, by sharding the map.
There is a common package (approaching 400 stars on GitHub) used to solve this called concurrent-map here which has performance and usability in mind. You could use it to handle the concurrency issues for you.
Map is concurrent safe for read only in Golang. Let's say, your map is written first and never be written again then you don't need any mutex type of thing to make sure that only one go routine is accessing your map. I have given an example below about map concurrent safe reading.
package main
import (
"fmt"
"sync"
)
var freq map[int]int
// An example of concurrent read from a map
func main() {
// Map is written before accessing from go routines
freq = make(map[int]int)
freq[1] = 1
freq[2] = 2
wg := sync.WaitGroup{}
wg.Add(10)
for i:=1;i<=10;i++ {
// In go routine we are only reading val from map
go func(id int, loop int) {
defer wg.Done()
fmt.Println("In loop ", loop)
fmt.Println("Freq of 1: ", freq[id])
}(1, i)
}
wg.Wait()
}
Related
I'm trying to implement double compare and swap (and maybe MWCAS) both in Golang and Rust, and since no CPU supports this instruction I need to find a way using what the language offers me.
Let's take this code:
func HttpServer(parallelism int) {
runtime.GOMAXPROCS(parallelism)
called := 0
http.HandleFunc("/", func(w http.ResponseWriter, req *http.Request) {
time.Sleep(2 * time.Second)
called++
fmt.Fprint(w, called)
})
http.ListenAndServe(":8090", nil)
}
In general it's wrong, but if we call HttpServer(1) it works as expected: let's say I call the endpoint 1000 times, I will get 1000 different values of called, from 1 to 1001, in less than 3 seconds. This is because if we execute with parallelism = 1 the go runtime will execute the handler after time.sleep as an atomic block of code, while waiting for all the other handlers at once.
Using this trick I can implement DCAS and MWCAS with little effort, without using atomics (5x slower) or mutexes (25x slower)
My questions are:
Is anything wrong with this code? Am I missing some details? Why the race detector still detect races even with parallelism set to 1? (but the results are correct, so it doesn't seem to affect output)
Is there a way to set runtime.GOMAXPROCS per goroutine and its descendants? so for example one goroutine and its children will execute with parallelism 1, while another will execute with parallelism 10. For now the only way I see is to launch multiple processes and then communicate with IPC.
Is anything wrong with this code?
Yes, it's racy.
Is there a way to set runtime.GOMAXPROCS per goroutine and its descendants?
No.
Let’s say I use a WaitGroup to make the main thread of an application wait until all the goroutines I have launched from said main have completed.
Is there a safe, straightforward, way to assess at any point in time how many goroutines associated with said WaitGroup are still running?
The internal state of the WaitGroup is not exposed, and it won't be: https://github.com/golang/go/issues/7202
I don't think we're likely to make this API more complex. I don't see any way to use
counter and waiters that is not subject to race conditions, other than simply printing
them out. And for that you can maintain your own counts.
You could implement a counter yourself:
type WaitGroupCount struct {
sync.WaitGroup
count int64
}
func (wg *WaitGroupCount) Add(delta int) {
atomic.AddInt64(&wg.count, int64(delta))
wg.WaitGroup.Add(delta)
}
func (wg *WaitGroupCount) Done() {
atomic.AddInt64(&wg.count, -1)
wg.WaitGroup.Done()
}
func (wg *WaitGroupCount) GetCount() int {
return int(atomic.LoadInt64(&wg.count))
}
// Wait() promoted from the embedded field
However, even if the counter access is synchronized, it will become stale immediately after you read it, since other goroutines may go on and call Add or Done irrespective of what you are doing with the count — unless you synchronize the entire operation that depends on the count. But in that case, you might need a more complex data structure altogether.
Is there a safe, straightforward, way to assess at any point in time how many goroutines associated with said waitgroup are still running?
No, there isn't.
Simply because Go (the language) has no notion of "goroutines associated with [a] waitgroup".
I'm just wondering if there is potential for corruption as a result of writing the same value to a global variable at the same time. My brain is telling me there is nothing wrong with this because its just a location in memory, but I figure I should probably double check this assumption.
I have concurrent processes writing to a global map var linksToVisit map[string]bool. The map is actually tracking what links on a website need to be further crawled.
However it can be the case that concurrent processes may have the same link on their respective pages and therefore each will mark that same link as true concurrently. There's nothing wrong with NOT using locks in this case right? NOTE: I never change the value back to false so either the key exists and it's value is true or it doesn't exist.
I.e.
var linksToVisit = map[string]bool{}
...
// somewhere later a goroutine finds a link and marks it as true
// it is never marked as false anywhere
linksToVisit[someLink] = true
What happens if concurrent processes write to a global variable the
same value?
The results of a data race are undefined.
Run the Go data race detector.
References:
Wikipedia: Race condition
Benign Data Races: What Could Possibly Go Wrong?
The Go Blog: Introducing the Go Race Detector
Go: Data Race Detector
Go 1.8 Release Notes
Concurrent Map Misuse
In Go 1.6, the runtime added lightweight, best-effort detection of
concurrent misuse of maps. This release improves that detector with
support for detecting programs that concurrently write to and iterate
over a map.
As always, if one goroutine is writing to a map, no other goroutine
should be reading (which includes iterating) or writing the map
concurrently. If the runtime detects this condition, it prints a
diagnosis and crashes the program. The best way to find out more about
the problem is to run the program under the race detector, which will
more reliably identify the race and give more detail.
For example,
package main
import "time"
var linksToVisit = map[string]bool{}
func main() {
someLink := "someLink"
go func() {
for {
linksToVisit[someLink] = true
}
}()
go func() {
for {
linksToVisit[someLink] = true
}
}()
time.Sleep(100 * time.Millisecond)
}
Output:
$ go run racer.go
fatal error: concurrent map writes
$
$ go run -race racer.go
==================
WARNING: DATA RACE
Write at 0x00c000078060 by goroutine 6:
runtime.mapassign_faststr()
/home/peter/go/src/runtime/map_faststr.go:190 +0x0
main.main.func2()
/home/peter/gopath/src/racer.go:16 +0x6a
Previous write at 0x00c000078060 by goroutine 5:
runtime.mapassign_faststr()
/home/peter/go/src/runtime/map_faststr.go:190 +0x0
main.main.func1()
/home/peter/gopath/src/racer.go:11 +0x6a
Goroutine 6 (running) created at:
main.main()
/home/peter/gopath/src/racer.go:14 +0x88
Goroutine 5 (running) created at:
main.main()
/home/peter/gopath/src/racer.go:9 +0x5b
==================
fatal error: concurrent map writes
$
It is better to use locks if you are changing the same value concurrently using multiple go routines. Since mutex and locks are used whenever it comes to secure the value from accessing when another function is changing the same just like writing to database table while accessing the same table.
For your question on using maps with different keys it is not preferable in Go as:
The typical use of maps did not require safe access from multiple
goroutines, and in those cases where it did, the map was probably part
of some larger data structure or computation that was already
synchronized. Therefore requiring that all map operations grab a mutex
would slow down most programs and add safety to few.
Map access is unsafe only when updates are occurring. As long as all
goroutines are only reading—looking up elements in the map, including
iterating through it using a for range loop—and not changing the map
by assigning to elements or doing deletions, it is safe for them to
access the map concurrently without synchronization.
So In case of update of maps it is not recommended. For more information Check FAQ on why maps operations not defined atomic.
Also it is noticed that if you realy wants to go for there should be a way to synchronize them.
Maps are not safe for concurrent use: it's not defined what happens
when you read and write to them simultaneously. If you need to read
from and write to a map from concurrently executing goroutines, the
accesses must be mediated by some kind of synchronization mechanism.
One common way to protect maps is with sync.RWMutex.
Concurrent map write is not ok, so you will most likely get a fatal error. So I think a lock should be used
As of Go 1.6, simultaneous map writes will cause a panic. Use a sync.Map to synchronize access.
See the map value assign implementation:
https://github.com/golang/go/blob/fe8a0d12b14108cbe2408b417afcaab722b0727c/src/runtime/hashmap.go#L519
I feel like the answer to my question is no but asking for certainty as I've only started playing around with Go for a few days. Should we encapsulate IO bound tasks (like http requests) into goroutines even if it was to be used in a sequential use case?
Here's my naive example. Say I have a method that makes 3 http requests but need to be executed sequentially. Is there any benefit in creating the invoke methods as goroutines? I understand the example below would actually take a performance hit.
func myMethod() {
chan1 := make(chan int)
chan2 := make(chan int)
chan3 := make(chan int)
go invoke1(chan1)
res1 := <-chan1
invoke2(res1, chan2)
res2 := <-chan2
invoke3(res2, chan3)
// Do something with <-chan3
}
One possible reason that comes to mind is to future proof the invoke methods for when they're called in a concurrent context later on when other develops start re-using the method. Any other reasons?
There's nothing standard that would say yes or no to this question.
Although you can do it correctly this way, it is much simpler to stick to plain sequential execution.
Three reasons come to mind:
this is still sequential: you're waiting for every single goroutine sequentially, so this buys you nothing. Performance probably doesn't change much if it's only doing an http request, both cases will spend most of their time waiting for the response.
error handling is much simpler if you just get result, err := invoke; if err != nil .... rather than having to pass both results and errors through channels
over-generalization is a more apt word than "future proofing". If you need to call your invoke methods asynchronously in the future, then change your code in the future. It will be just as easy then to add asynchronous wrappers around your functions.
I am a little bit late to the party but I think I still have something to share.
Before answering the question, I would like to dive a little into the Go Proverb, Concurrency is not parellism. It is a very common misunderstanding of goroutine and Go's language feature that when people thinking of goroutine, they thinks about the ability of being parell.
But as Rob Pike pointed out in many Go Talks, what Go and goroutine auctually provides, is concurency. Concurency is a model of better interepting the real world, a way and a structure of code, of how code interact.
So back to the question. Should goroutine be used when steps are sequential? It depends on the design. If your code compose of individual parts that talks to each other very naturally, or if some of your code preserve a state and frequently return just does not make sense, or if your code fits in any other concurency design, it is perfectly fine to use goroutine and channel and select statement. Rob Pike, again, gives a Go Talk on a lexer used by now Go's text/template where the lexer uses a goroutine but the parser, obviously, only use the lexer sequentially. He stated that by using goroutine and channel, at cost of a little performance, a better API is acheived.
But on the other hand, in your example and probably what you are thinking about (codes that may require future parellism), I agree with #Marc. Stick for blocking call, at least for now.
You could use a channel like buf, or use []string (slice of strings urls). You don't get any benefit from gorutines, if you need only sequential execution, because we couldn't control goroutine when it's start.
But we can ask then don't wait runtime.Gosched()
From documentation:
Gosched yields the processor, allowing other goroutines to run. It does not suspend the current goroutine, so execution resumes automatically.
Example of sequential execution:
package main
func main() {
urls := []string{
"https://google.com",
"https://yahoo.com",
"https://youtube.com",
}
buf := make([][]byte, 0, len(urls))
for _, v := range urls {
buf = append(buf, sendRequest(v))
}
}
func sendRequest(url string) []byte {
//send
return []byte("")
}
For the solution as such, I see no gain by solving your problem the way you are in the real world. That doesn't mean there isn't gain in your solution. If you're after practice for example synchronising go routines, then you have a plethora of possibilities experimenting and learning.
So to me this is far from a clear no, but no clear yes either. I would refrain from saying maybe, it depends on what exactly your after. Are you after actually solving a problem or learning?
After (briefly) reviewing the Go language spec, effective Go, and the Go memory model, I'm still a little unclear as to how Go channels work under the hood.
What kind of structure are they? They act kind of like a thread-safe queue /array.
Does their implementation depend on the architecture?
The source file for channels is (from your go source code root) in /src/pkg/runtime/chan.go.
hchan is the central data structure for a channel, with send and receive linked lists (holding a pointer to their goroutine and the data element) and a closed flag. There's a Lock embedded structure that is defined in runtime2.go and that serves as a mutex (futex) or semaphore depending on the OS. The locking implementation is in lock_futex.go (Linux/Dragonfly/Some BSD) or lock_sema.go (Windows/OSX/Plan9/Some BSD), based on the build tags.
Channel operations are all implemented in this chan.go file, so you can see the makechan, send and receive operations, as well as the select construct, close, len and cap built-ins.
For a great in-depth explanation on the inner workings of channels, you have to read Go channels on steroids by Dmitry Vyukov himself (Go core dev, goroutines, scheduler and channels among other things).
Here is a good talk that describes roughly how channels are implemented:
https://youtu.be/KBZlN0izeiY
Talk description:
GopherCon 2017: Kavya Joshi - Understanding Channels
Channels provide a simple mechanism for goroutines to communicate, and a powerful construct to build sophisticated concurrency patterns. We will delve into the inner workings of channels and channel operations, including how they're supported by the runtime scheduler and memory management systems.
You asked two questions:
What kind of structure are they?
Channels in go are indeed "kind of like a thread-safe queue", to be more precise, channels in Go have the following properties:
goroutine-safe
Provide FIFO semantics
Can store and pass values between goroutines
Cause goroutines to block and unblock
Every time you create a channel, an hchan struct is allocated on the heap, and a pointer to the hchan memory location is returned represented as a channel, this is how go-routines can share it.
The first two properties described above are implemented similarly to a queue with a lock.
The elements that the channel can pass to different go-routines are implemented as a circular queue (ring buffer) with indices in the hchan struct, the indices account for the position of elements in the buffer.
Circular queue:
qcount uint // total data in the queue
dataqsiz uint // size of the circular queue
buf unsafe.Pointer // points to an array of dataqsiz elements
And the indices:
sendx uint // send index
recvx uint // receive index
Every time a go-routine needs to access the channel structure and modify it's state it holds the lock, e.g: copy elements to/ from the buffer, update lists or an index. Some operations are optimized to be lock-free, but this is out of the scope for this answer.
The block and un-block property of go channels is achieved using two queues (linked lists) that hold the blocked go-routines
recvq waitq // list of recv waiters
sendq waitq // list of send waiters
Every time a go-routine wants to add a task to a full channel (buffer is full), or to take a task from an empty channel (buffer is empty), a pseudo go-routine sudog struct is allocated and the go-routine adds the sudog as a node to the send or receive waiters list accordingly. Then the go-routine updates the go runtime scheduler using special calls, which hints when they should be taken out of execution (gopark) or ready to run (goready).
Notice this is a very simplified explanations that hides some complexities.
Does their implementation depend on the architecture?
Besides the lock implementation that is OS specific as #mna already explained, I'm not aware of any architecture specific constraints optimizations or differences.
A simpler way to look at channels is as such, in that you may like to hold a program up while waiting for a condition to complete, typically used to prevent RACE condition, which means a thread might not finish before another, and then something your later thread or code depends on sometimes does not complete.
An example could be, you have a thread to retrieve some data from a database or other server and place the data into a variable, slice or map, and for some reason it gets delayed. then you have a process that uses that variable, but since it hasn't been initialised, or its not got its data yet. the program fails.
So a simple way to look at it in code is as follows:
package main
import "fmt"
var doneA = make(chan bool)
var doneB = make(chan bool)
var doneC = make(chan bool)
func init() { // this runs when you program starts.
go func() {
doneA <- true //Give donA true
}()
}
func initB() { //blocking
go func() {
a := <- doneA //will wait here until doneA is true
// Do somthing here
fmt.Print(a)
doneB <- true //State you finished
}()
}
func initC() {
go func() {
<-doneB // still blocking, but dont care about the value
// some code here
doneC <- true // Indicate finished this function
}()
}
func main() {
initB()
initC()
}
So hope this helps. not the selected answer above, but i believe should help to remove the mystery. I wonder if I should make a question and self answer?