Concurrent read/write of a map var snapshot - go

I encounter a situation that I can not understand. In my code, I use functions have the need to read a map (but not write, only loop through a snapshot of existing datas in this map). There is my code :
type MyStruct struct {
*sync.RWMutex
MyMap map[int]MyDatas
}
var MapVar = MyStruct{ &sync.RWMutex{}, make(map[int]MyDatas) }
func MyFunc() {
MapVar.Lock()
MapSnapshot := MapVar.MyMap
MapVar.Unlock()
for _, a := range MapSnapshot { // Map concurrent write/read occur here
//Some stuff
}
}
main() {
go MyFunc()
}
The function "MyFunc" is run in a go routine, only once, there is no multiple runs of this func. Many other functions are accessing to the same "MapVar" with the same method and it randomly produce a "map concurrent write/read". I hope someone will explain to me why my code is wrong.
Thank you for your time.
edit: To clarify, I am just asking why my range MapSnapshot produce a concurrent map write/read. I cant understand how this map can be concurrently used since I save the real global var (MapVar) in a local var (MapSnapshot) using a sync mutex.
edit: Solved. To copy the content of a map in a new variable without using the same reference (and so to avoid map concurrent read/write), I must loop through it and write each index and content to a new map with a for loop.
Thanks xpare and nilsocket.

there is no multiple runs of this func. Many other functions are accessing to the same "MapVar" with the same method and it randomly produce a "map concurrent write/read"
When you pass the value of MapVar.MyMap to MapSnapshot, the Map concurrent write/read will never be occur, because the operation is wrapped with mutex.
But on the loop, the error could happen since practically reading process is happening during loop. So better to wrap the loop with mutex as well.
MapVar.Lock() // lock begin
MapSnapshot := MapVar.MyMap
for _, a := range MapSnapshot {
// Map concurrent write/read occur here
// Some stuff
}
MapVar.Unlock() // lock end
UPDATE 1
Here is my response to your argument below:
This for loop takes a lot of time, there is many stuff in this loop, so locking will slow down other routines
As per your statement The function "MyFunc" is run in a go routine, only once, there is no multiple runs of this func, then I think making the MyFunc to be executed as goroutine is not a good choice.
And to increase the performance, better to make the process inside the loop to be executed in a goroutine.
func MyFunc() {
for _, a := range MapVar.MyMap {
go func(a MyDatas) {
// do stuff here
}(a)
}
}
main() {
MyFunc() // remove the go keyword
}
UPDATE 2
If you really want to copy the MapVar.MyMap into another object, passing it to another variable will not solve that (map is different type compared to int, float32 or other primitive type).
Please refer to this thread How to copy a map?

Related

Lock slice before reading and modifying it

My experience working with Go is recent and in reviewing some code, I have seen that while it is write-protected, there is a problem with reading the data. Not with the reading itself, but with possible modifications that can occur between the reading and the modification of the slice.
type ConcurrentSlice struct {
sync.RWMutex
items []Item
}
type Item struct {
Index int
Value Info
}
type Info struct {
Name string
Labels map[string]string
Failure bool
}
As mentioned, the writing is protected in this way:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
for inList := range cs.Iter() {
if item.Name == inList.Value.Name{
cs.items[i] = item
found = true
}
i++
}
if !found {
cs.Lock()
defer cs.Unlock()
cs.items = append(cs.items, item)
}
}
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.Lock()
defer cs.Unlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
But between collecting the content of the slice and modifying it, modifications can occur.It may be that another routine modifies the same slice and when it is time to assign a value, it no longer exists: slice[i] = item
What would be the right way to deal with this?
I have implemented this method:
func GetList() *ConcurrentSlice {
if list == nil {
denylist = NewConcurrentSlice()
return denylist
}
return denylist
}
And I use it like this:
concurrentSlice := GetList()
concurrentSlice.UpdateOrAppend(item)
But I understand that between the get and the modification, even if it is practically immediate, another routine may have modified the slice. What would be the correct way to perform the two operations atomically? That the slice I read is 100% the one I modify. Because if I try to assign an item to a index that no longer exists, it will break the execution.
Thank you in advance!
The way you are doing the blocking is incorrect, because it does not ensure that the items you return have not been removed. In case of an update, the array would still be at least the same length.
A simpler solution that works could be the following:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
found := false
i := 0
cs.Lock()
defer cs.Unlock()
for _, it := range cs.items {
if item.Name == it.Name{
cs.items[i] = it
found = true
}
i++
}
if !found {
cs.items = append(cs.items, item)
}
}
Use a sync.Map if the order of the values is not important.
type Items struct {
m sync.Map
}
func (items *Items) Update(item Info) {
items.m.Store(item.Name, item)
}
func (items *Items) Range(f func(Info) bool) {
items.m.Range(func(key, value any) bool {
return f(value.(Info))
})
}
Data structures 101: always pick the best data structure for your use case. If you’re going to be looking up objects by name, that’s EXACTLY what map is for. If you still need to maintain the order of the items, you use a treemap
Concurrency 101: like transactions, your mutex should be atomic, consistent, and isolated. You’re failing isolation here because the data structure read does not fall inside your mutex lock.
Your code should look something like this:
func {
mutex.lock
defer mutex.unlock
check map or treemap for name
if exists update
else add
}
After some tests, I can say that the situation you fear can indeed happen with sync.RWMutex. I think it could happen with sync.Mutex too, but I can't reproduce that. Maybe I'm missing some informations, or maybe the calls are in order because they all are blocked and the order they redeem the right to lock is ordered in some way.
One way to keep your two calls safe without other routines getting in 'conflict' would be to use an other mutex, for every task on that object. You would lock that mutex before your read and write, and release it when you're done. You would also have to use that mutex on any other call that write (or read) to that object. You can find an implementation of what I'm talking about here in the main.go file. In order to reproduce the issue with RWMutex, you can simply comment the startTask and the endTask calls and the issue is visible in the terminal output.
EDIT : my first answer was wrong as I misinterpreted a test result, and fell in the situation described by OP.
tl;dr;
If ConcurrentSlice is to be used from a single goroutine, the locks are unnecessary, because the way algorithm written there is not going to be any concurrent read/writes to slice elements, or the slice.
If ConcurrentSlice is to be used from multiple goroutines, existings locks are not sufficient. This is because UpdateOrAppend may modify slice elements concurrently.
A safe version woule need two versions of Iter:
This can be called by users of ConcurrentSlice, but it cannot be called from `UpdateOrAppend:
func (cs *ConcurrentSlice) Iter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
cs.RLock()
defer cs.RUnlock()
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
and this is only to be called from UpdateOrAppend:
func (cs *ConcurrentSlice) internalIter() <-chan ConcurrentSliceItem {
c := make(chan ConcurrentSliceItem)
f := func() {
// No locking
for index, value := range cs.items {
c <- ConcurrentSliceItem{index, value}
}
close(c)
}
go f()
return c
}
And UpdateOrAppend should be synchronized at the top level:
func (cs *ConcurrentSlice) UpdateOrAppend(item ScalingInfo) {
cs.Lock()
defer cs.Unlock()
....
}
Here's the long version:
This is an interesting piece of code. Based on my understanding of the go memory model, the mutex lock in Iter() is only necessary if there is another goroutine working on this code, and even with that, there is a possible race in the code. However, UpdateOrAppend only modifies elements of the slice with lower indexes than what Iter is working on, so that race never manifests itself.
The race can happen as follows:
The for-loop in iter reads element 0 of the slice
The element is sent through the channel. Thus, the slice receive happens after the first step.
The receiving end potentially updates element 0 of the slice. There is no problem up to here.
Then the sending goroutine reads element 1 of the slice. This is when a race can happen. If step 3 updated index 1 of the slice, the read at step 4 is a race. That is: if step 3 reads the update done by step 4, it is a race. You can see this if you start with i:=1 in UpdateOrAppend, and running it with the -race flag.
But UpdateOrAppend always modifies slice elements that are already seen by Iter when i=0, so this code is safe, even without the lock.
If there will be other goroutines accessing and modifying the structure, you need the Mutex, but you need it to protect the complete UpdateOrAppend method, because only one goroutine should be allowed to run that. You need the mutex to protect the potential updates in the first for-loop, and that mutex has to also include the slice append case, because that may actually modify the slice of the underlying object.
If Iter is only called from UpdateOrAppend, then this single mutex should be sufficient. If however Iter can be called from multiple goroutines, then there is another race possibility. If one UpdateOrAppend is running concurrently with multiple Iter instances, then some of those Iter instances will read from the modified slice elements concurrently, causing a race. So, it should be such that multiple Iters can only run if there are no UpdateOrAppend calls. That is a RWMutex.
But Iter can be called from UpdateOrAppend with a lock, so it cannot really call RLock, otherwise it is a deadlock.
Thus, you need two versions of Iter: one that can be called outside UpdateOrAppend, and that issues RLock in the goroutine, and another that can only be called from UpdateOrAppend and does not call RLock.

How to implement a singleton

I want to implement a singleton with Go. The difference between normal singleton is the instance is singleton with different key in map struct. Something like this code. I am not sure is there any data race with the demo code.
var instanceLock sync.Mutex
var instances map[string]string
func getDemoInstance(key string) string {
if value, ok := instances[key]; ok {
return value
}
instanceLock.Lock()
defer instanceLock.Unlock()
if value, ok := instances[key]; ok {
return value
} else {
instances[key] = key + key
return key + key
}
}
Yes, there is data race, you can confirm by running it with go run -race main.go. If one goroutine locks and modifies the map, another goroutine may be reading it before the lock.
You may use sync.RWMutex to read-lock when just reading it (multiple readers are allowed without blocking each other).
For example:
var (
instancesMU sync.RWMutex
instances = map[string]string{}
)
func getDemoInstance(key string) string {
instancesMU.RLock()
if value, ok := instances[key]; ok {
instancesMU.RUnlock()
return value
}
instancesMU.RUnlock()
instancesMU.Lock()
defer instancesMU.Unlock()
if value, ok := instances[key]; ok {
return value
}
value := key + key
instances[key] = value
return value
}
You can try this as well: sync.Map
Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple goroutines without additional locking or coordination. Loads, stores, and deletes run in amortized constant time.
The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys.
In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex.
Note: In the third paragraph it mentions why using sync.Map is beneficial rather than using Go Map simply paired up with sync.RWMutex.
So this perfectly fits your case, I guess?
Little late to answer, anyways this should help: https://github.com/ashwinshirva/api/tree/master/dp/singleton
This shows two ways to implement singleton:
Using the sync.Mutex
Using sync.Once

What happens when reading or writing concurrently without a mutex

In Go, a sync.Mutex or chan is used to prevent concurrent access of shared objects. However, in some cases I am just interested in the "latest" value of a variable or field of an object.
Or I like to write a value and do not care if another go-routine overwrites it later or has just overwritten it before.
Update: TLDR; Just don't do this. It is not safe. Read the answers, comments, and linked documents!
Update 2021: The Go memory model is going to be specified more thoroughly and there are three great articles by Russ Cox that will teach you more about the surprising effects of unsynchronized memory access. These articles summarize a lot of the below discussions and learnings.
Here are two variants good and bad of an example program, where both seem to produce "correct" output using the current Go runtime:
package main
import (
"flag"
"fmt"
"math/rand"
"time"
)
var bogus = flag.Bool("bogus", false, "use bogus code")
func pause() {
time.Sleep(time.Duration(rand.Uint32()%100) * time.Millisecond)
}
func bad() {
stop := time.After(100 * time.Millisecond)
var name string
// start some producers doing concurrent writes (DANGER!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
name = fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
for {
select {
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func good() {
stop := time.After(100 * time.Millisecond)
names := make(chan string, 10)
// start some producers concurrently writing to a channel (GOOD!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
names <- fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
var name string
for {
select {
case name = <-names:
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func main() {
flag.Parse()
if *bogus {
bad()
} else {
good()
}
}
The expected output is as follows:
...
read: name = 3
read: name = 3
read: name = 5
read: name = 4
...
Any combination of read: and read: name=[0-9] is correct output for this program. Receiving any other string as output would be an error.
When running this program with go run --race bogus.go it is safe.
However, go run --race bogus.go -bogus warns of the concurrent reads and writes.
For map types and when appending to slices I always need a mutex or a similar method of protection to avoid segfaults or unexpected behavior. However, reading and writing literals (atomic values) to variables or field values seems to be safe.
Question: Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
Please explain why something is safe or unsafe in Go in your answer.
Update: I rewrote the example to better reflect the original code, where I had the the concurrent writes issue. The important leanings are already in the comments. I will accept an answer that summarizes these learnings with enough detail (esp. on the Go-runtime).
However, in some cases I am just interested in the latest value of a variable or field of an object.
Here is the fundamental problem: What does the word "latest" mean?
Suppoose that, mathematically speaking, we have a sequence of values Xi, with 0 <= i < N. Then obviously Xj is "later than" Xi if j > i. That's a nice simple definition of "latest" and is probably the one you want.
But when two separate CPUs within a single machine—including two goroutines in a Go program—are working at the same time, time itself loses meaning. We cannot say whether i < j, i == j, or i > j. So there is no correct definition for the word latest.
To solve this kind of problem, modern CPU hardware, and Go as a programming language, gives us certain synchronization primitives. If CPUs A and B execute memory fence instructions, or synchronization instructions, or use whatever other hardware provisions exist, the CPUs (and/or some external hardware) will insert whatever is required for the notion of "time" to regain its meaning. That is, if the CPU uses barrier instructions, we can say that a memory load or store that was executed before the barrier is a "before" and a memory load or store that is executed after the barrier is an "after".
(The actual implementation, in some modern hardware, consists of load and store buffers that can rearrange the order in which loads and stores go to memory. The barrier instruction either synchronizes the buffers, or places an actual barrier in them, so that loads and stores cannot move across the barrier. This particular concrete implementation gives an easy way to think about the problem, but isn't complete: you should think of time as simply not existing outside the hardware-provided synchronization, i.e., all loads from, and stores to, some location are happening simultaneously, rather than in some sequential order, except for these barriers.)
In any case, Go's sync package gives you a simple high level access method to these kinds of barriers. Compiled code that executes before a mutex Lock call really does complete before the lock function returns, and the code that executes after the call really does not start until after the lock function returns.
Go's channels provide the same kinds of before/after time guarantees.
Go's sync/atomic package provides much lower level guarantees. In general you should avoid this in favor of the higher level channel or sync.Mutex style guarantees. (Edit to add note: You could use sync/atomic's Pointer operations here, but not with the string type directly, as Go strings are actually implemented as a header containing two separate values: a pointer, and a length. You could solve this with another layer of indirection, by updating a pointer that points to the string object. But before you even consider doing that, you should benchmark the use of the language's preferred methods and verify that these are a problem, because code that works at the sync/atomic level is hard to write and hard to debug.)
Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
None.
It really is that simple: You cannot, under no circumstance whatsoever, read and write concurrently to anything in Go.
(Btw: Your "correct" program is not correct, it is racy and even if you get rid of the race condition it would not deterministically produce the output.)
Why can't you use channels
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup // wait group to close channel
var buffer int = 1 // buffer of the channel
// channel to get the share data
cName := make(chan string, buffer)
for i := 0; i < 10; i++ {
wg.Add(1) // add to wait group
go func(i int) {
cName <- fmt.Sprintf("name = %d", i)
wg.Done() // decrease wait group.
}(i)
}
go func() {
wg.Wait() // wait of wait group to be 0
close(cName) // close the channel
}()
// process all the data
for n := range cName {
println("read:", n)
}
}
The above code returns the following output
read: name = 0
read: name = 5
read: name = 1
read: name = 2
read: name = 3
read: name = 4
read: name = 7
read: name = 6
read: name = 8
read: name = 9
https://play.golang.org/p/R4n9ssPMOeS
Article about channels

In sync.Map is it necessary to use Load followed by LoadOrStore for complex values

In code where a global map with an expensive to generate value structure may be modified by multiple concurrent threads, which pattern is correct?
// equivalent to map[string]*activity where activity is a
// fairly heavyweight structure
var ipActivity sync.Map
// version 1: not safe with multiple threads, I think
func incrementIP(ip string) {
val, ok := ipActivity.Load(ip)
if !ok {
val = buildComplexActivityObject()
ipActivity.Store(ip, val)
}
updateTheActivityObject(val.(*activity), ip)
}
// version 2: inefficient, I think, because a complex object is built
// every time even through it's only needed the first time
func incrementIP(ip string) {
tmp := buildComplexActivityObject()
val, _ := ipActivity.LoadOrStore(ip, tmp)
updateTheActivity(val.(*activity), ip)
}
// version 3: more complex but technically correct?
func incrementIP(ip string) {
val, found := ipActivity.Load(ip)
if !found {
tmp := buildComplexActivityObject()
// using load or store incase the mapping was already made in
// another store
val, _ = ipActivity.LoadOrStore(ip, tmp)
}
updateTheActivity(val.(*activity), ip)
}
Is version three the correct pattern given Go's concurrency model?
Option 1 obviously can be called by multiple goroutines with a new ip concurrently, and only the last one in the if block would get stored. This possibility is greatly increased the longer buildComplexActivityObject takes, as there is more time in the critical section.
Option 2 works, but calls buildComplexActivityObject every time, which you state is not what you want.
Given that you want to call buildComplexActivityObject as infrequently as possible, the third option is the only one that makes sense.
The sync.Map however cannot protect the actual activity values referenced by the stored pointers. You also need synchronization there when updating the activity value.

Settings and accessing a pointer from concurrent goroutines

I have a map which is used by goroutine A and replaced once in a time in goroutine B. By replacement I mean:
var a map[T]N
// uses the map
func goroutineA() {
for (...) {
tempA = a
..uses tempA in some way...
}
}
//refreshes the map
func gorountineB() {
for (...) {
time.Sleep(10 * time.Seconds)
otherTempA = make(map[T]N)
...initializes other tempA....
a = otherTempA
}
}
Do you see any problem in this pseudo code? (in terms of concurrecy)
The code isn't safe, since assignments and reads to a pointer value are not guaranteed to be atomic. This can mean that as one goroutine writes the new pointer value, the other may see a mix of bytes from the old and new value, which will cause your program to die in a nasty way. Another thing that may happen is that since there's no synchronisation in your code, the compiler may notice that nothing can change a in goroutineA, and lift the tempA := a statement out of the loop. This will mean that you'll never see new map assignments as the other goroutine updates them.
You can use go test -race to find these sorts of problems automatically.
One solution is to lock all access to the map with a mutex.
You may wish to read the Go Memory Model document, which explains clearly when changes to variables are visible inside goroutines.
When unsure about data races, run go run -race file.go, that being said, yes there will be a race.
The easiest way to fix that is using a sync.RWMutex :
var a map[T]N
var lk sync.RWMutex
// uses the map
func goroutineA() {
for (...) {
lk.RLock()
//actions on a
lk.RUnlock()
}
}
//refreshes the map
func gorountineB() {
for (...) {
otherTempA = make(map[T]N)
//...initializes other tempA....
lk.Lock()
a = otherTempA
lk.Unlock()
}
}

Resources