I want to know exactly what could happen when go maps accessed by multiple goroutins lets assume we have a map[int]*User. can modifying fields of User structure by multiple goroutins cause data corruption ? or just operations like len() are not thread safe what would be different if map was thread safe in Go ?
Concurrently modifying the *User could cause corruption regardless of the map. Reading the pointer from the map concurrently is safe, as long as there are no modifications to the map. Modifying the data *User points to makes no changes to the map itself.
Concurrently modifying the map[int]*User itself also risks data corruption.
There are no benign data races, always test your code with the race detector.
Simplest example;
go WorkerMethodOne(myMapReference)
go WorkerMethodTwo(myMapReference)
in worker method one I have some code like this (example)
for i := 0; i < len(MyMapReference); i++ {
if i % 2 == 0 {
delete(MyMapReference, i)
}
}
Then when WorkerMethodTwo is iterating that same map and tries to access the item that just got deleted, what happens? While a k, err := map[index] may still be safe, unlike in many languages where you'd throw, it doesn't make sense and is unpredictable. Ultimately worse examples could happen like attempts to concurrently write to the value of some *User. It could cause concurrent modification to the actual value (what's at the pointer), or you could have the pointer pulled our from under you and randomly be working with a value different than what you expected ect. It's really no different than if you made two closures run as goroutines and started modifying a non-atomic int without locking/using a mutex. You don't what's going to happen since there is contention for that memory between two fully decoupled executions.
Related
I am using Go race detection (the -race argument), and it detects some race conditions issues that I think should not be reported. I've created this sample code to explain my findings. Please do not comment about the goal of this example, as it has no goal other than to explain the issue.
This code:
var count int
func main() {
go update()
for {
fmt.Println(count)
time.Sleep(time.Second)
}
}
func update() {
for {
time.Sleep(time.Second)
count++
}
}
is reported with a race condition.
While this code:
var count int
var mutex sync.RWMutex
func main() {
go update()
for {
mutex.RLock()
fmt.Println(count)
mutex.RUnlock()
time.Sleep(time.Second)
}
}
func update(){
for {
time.Sleep(time.Second)
mutex.Lock()
count++
mutex.Unlock()
}
}
is not reported with any race condition issues.
My question is why?
There no bug in the first code.
The main function is reading a variable that another go routine is updating.
There is no potential hidden bug here.
The second code mutex does not provide any different behavior.
Where am I wrong here?
Your code contains a very clear race.
Your for loop is accessing count at the same time that the other goroutine is updating it. That's the definition of a race.
The main function is reading a variable that another go routine is updating.
Yes, exactly. That's what a race is.
The second code mutex does not provide any different behavior.
Yes, it does. It prevents the variable from being read and written at the same time from different goroutines.
You need to draw a distinction between a synchronization bug and a data race. A synchronization bug is a property of the code, whereas a data race is a property of a particular execution of the program. The latter is a manifestation of the former, but is in general not guaranteed to occur.
There no bug in the first code. The main function is reading a variable that another go routine is updating. There is no potential hidden bug here.
The race detector only detects data races, not synchronization bugs. It may miss some data races (false negatives), but it never reports false positives:
The race detector is a powerful tool for checking the correctness of concurrent programs. It will not issue false positives, so take its warnings seriously.
In other words, when the race detector reports a data race, you can be sure that your code contains at least one synchronization bug. You need to fix such bugs; otherwise, all bets are off.
Lo and behold, your first code snippet does indeed contain a synchronization bug: package-level variable count is accessed (by main) and updated (by update, started as a goroutine) concurrently without any synchronization. Here is a relevant passage of the Go Memory Model:
Programs that modify data being simultaneously accessed by multiple goroutines must serialize such access.
To serialize access, protect the data with channel operations or other synchronization primitives such as those in the sync and sync/atomic packages.
Using a reader/writer mutual-exclusion lock, as you did in your second snippet, fixes your synchronization bug.
The second code mutex does not provide any different behavior.
You just got lucky, when you executed the first program, that no data race occurred. In general, you have no guarantee.
This is off topic for Go (and the sample Go code won't trigger the problem even on x86 CPUs), but I have a demonstration proof, from roughly a decade ago at this point, that "torn reads" can produce inconsistent values even if the read and write operations are done with LOCK CMPXCHG8B, on some x86 CPUs (I think we were using early Haswell implementations).
The particular conditions that trigger this are a little difficult to set up. We had a custom allocator that had a bug: it only did four-byte alignment.1 We then had a "lock-free" (single locking instruction) algorithm to add entries to a queue, with single-writer multi-reader semantics.
It turns out that LOCK CMPXCHG8B instructions "work" on misaligned pointers as long as they do not cross page boundaries. As soon as they do, though, the readers can see a torn read, in which they get half the old value and half the new value, when a writer is doing an atomic write.
The result was an extremely difficult-to-track-down bug, where the system would run well for hours or even days before tripping over one of these. I finally diagnosed it by observing the data patterns, and eventually tracked the problem down to the allocator.
1Whether this is a bug depends on how one uses the allocated objects, but we were using them as 8-byte-wide pointers with LOCK CMPXCHG8B instructions.
As tile, I am referring to Go package sync.Map, can its functions be considered as atomic? Mainly the Load, Store, LoadOrStore, and Delete function.
I also build a simple example go playground, is it guaranteed that only one goroutine can enter the code range line 15 - 17? As my test seems it can be guaranteed.
Please help to explain.
The godoc page for the sync package says: "Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple goroutines without additional locking or coordination."
This statement guarantees that there's no need for additional mutexes or synchronization across goroutines. I wouldn't call that claim "atomic" (which has a very precise meaning), but it does mean that you don't have to worry about multiple goroutines being able to enter a LoadOrStore block (with the same key) like in your example.
I am adding to a map[[]byte]int concurrently from multiple go routines.
Will I get a runtime panic for doing this?
I don't care if data in the map becomes corrupted, because it can't , I am only inserting a value once and never again. But I can't get a runtime panic because the whole porgram will abort.
Maps are not safe for concurrent writes. Use a mutex to access it safely.
Furthermore map[[]byte]int is not valid -- the key must be comparable. Slices are not comparable.
There is a parm which is read more and written less, And I don't want to use a mutex .I got it done like this by unsafe and atomic:
type tmp struct {
}
var value unsafe.Pointer
func GetValue() *tmp {
return (*tmp)(atomic.LoadPointer(&value))
}
func SetValue(p *tmp) {
atomic.StorePointer(&value, unsafe.Pointer(p))
}
Is this thread-safe ? and atomic.StorePointer happen before atomic.LoadPointer ?
It will be thread safe in the sense that you don't know what happens first, but the updating is atomic. Have you considered using a RWMutex instead? This won't lock readers unless there's a write going on.
As far as I know, storing and retrieving a pointer using atomic.LoadPointer and atomic.StorePointer is thread-safe (in the sense that the pointer itself will not be corrupted).
Of course, the pointed object must be immutable, and this is not enforced by this mechanism. This is your job to be sure that updating the configuration will result in a new object being created before calling UpdateConfig.
However, the main issue is the code relies on unsafe operations. Making it type safe is difficult, since two pointers have actually to be stored (one for the object, one for the type). Atomic pointer operations are not enough anymore to guarantee the consistency.
This is why a specific type has been recently added in the sync/atomic package: the sync.Value type. It is designed exactly to do what you want (i.e. optimizing access to mostly constant data).
You can find an example here: http://golang.org/pkg/sync/atomic/#example_Value_config
If you look in the implementation, you will realize why atomic pointer operations alone are not enough to implement a type safe mechanism.
According to https://code.google.com/p/go/issues/detail?id=5045
Regarding Add/CAS, it should be formulated in in more general terms, along the
lines of: at atomic operation that stores a value (incl ADD/CAS) happens before
atomic operation that reads that value from the memory location (incl ADD/CAS).
I think We can use atomic to ensure it thread-safe.
Question 1
I am building/searching for a RAM memory cache layer for my server. It is a simple LRU cache that needs to handle concurrent requests (both Gets an Sets).
I have found https://github.com/pmylund/go-cache claiming to be thread safe.
This is true as far as getting the stored interface. But if multiple goroutines requests the same data, they are all retrieving a pointer (stored in the interface) to the same block of memory. If any goroutine changes the data, this is no longer very safe.
Are there any cache-packages out there that tackles this problem?
Question 1.1
If the answer to Question 1 is No, then what would be the suggested solution?
I see two options:
Alternative 1
Solution: Storing the values in a wrapping struct with a sync.Mutex so that each goroutine needs to lock the data before reading/writing to it.
type cacheElement struct { value interface{}, lock sync.Mutex }
Drawbacks: The cache becomes unaware of changes made to data or might even have dropped it out of the cache. One goroutine might also lock others.
Alternative 2
Solution: Make a copy of the data (assuming the data in itself doesn't contain pointers)
Drawbacks: Memory allocation every time a cache Get is performed, more garbage collection.
Sorry for the multipart question. But you don't have to answer all of them. If you have a good answer to Question 1, that would be sufficient for me!
Alternative 2 sounds good to me, but please note that you do not have to copy the data for each cache.Get(). As long as your data can be considered immutable, you can access it with many multiple readers at once.
You only have to create a copy if you intend to modify it. This idiom is called COW (copy on write) and is quite common in concurrent software design. It's especially well suited for scenarios with a high read/write ratio (just like a cache).
So, whenever you want to modify a cached entry, you basically have to:
create a copy of the old cached data, if any.
modify the data (after this step, the data should be considered immutable and must not be changed anymore)
add / replace the existing element in the cache. You could either use the go-cache library you have pointed out earlier (which is based on locks) for that, or write your own lock-free library that simply swaps the pointers to the data element atomically.
At this point any goroutine that performs a cache.Get operation will get the new data. Existing goroutines however, might still be reading the old data. So, your program might operate on many different versions of the same data at once. But don't worry, as soon as all goroutines have finished accessing the old data, the GC will collect it automatically.
tux21b gave a good answer. I'll just point out that you don't have to return pointers to data. you can store non pointer values in your cache and go will pass by value which will be a copy. Then your Get and Set methods will be safe since nothing can actually modify the cache contents.