I have a hash like this:
var TransfersInFlight map[string]string = make(map[string]string)
And before I send a file I make a key for it store, send it, delete it:
timeKey := fmt.Sprintf("%v",time.Now().UnixNano())
TransfersInFlight[timeKey] = filename
total, err := sendTheFile(filename)
delete(TransfersInFlight, timeKey)
i.e. during the time it takes to send the file, there is a key in the hash with a timestamp pointing to the filename.
the func sendTheFile always either works, or has an err but never throws a stacktrace exception and crashes the whole program so the line:
delete(TransfersInFlight, timeKey)
should be called 100% of the time. And yet, I sometimes find cases where it's like this line was never called and the file is stuck in TransfersInFlight forever. How is this possible?
Maps are not safe for concurrent access. I would do this either using a mutex to moderate map access or having a goroutine reading either a channel of "op" structs or have a "add" channel and a "delete" channel.
You're probably safe having multiple read-only accesses concurrently, but once you have writes in the mix, you really want to ensure you only have one access at a time.
If you are set on using a goroutine to manage the count, one way would be something like:
import "sync/atomic"
var TransferChan chan int32
var TransfersInFlight int32
func TransferManager() {
TransfersInFlight = 0
for delta := range TransferChan {
// You're *probably* safe just using +=, but, you know...
atomic.AddInt32(&TransfersInFlight, delta)
}
}
That way, you only need to do go TransferManager() and then pass your increments and decrements over the TransferChan channel.
Related
I want to implement a singleton with Go. The difference between normal singleton is the instance is singleton with different key in map struct. Something like this code. I am not sure is there any data race with the demo code.
var instanceLock sync.Mutex
var instances map[string]string
func getDemoInstance(key string) string {
if value, ok := instances[key]; ok {
return value
}
instanceLock.Lock()
defer instanceLock.Unlock()
if value, ok := instances[key]; ok {
return value
} else {
instances[key] = key + key
return key + key
}
}
Yes, there is data race, you can confirm by running it with go run -race main.go. If one goroutine locks and modifies the map, another goroutine may be reading it before the lock.
You may use sync.RWMutex to read-lock when just reading it (multiple readers are allowed without blocking each other).
For example:
var (
instancesMU sync.RWMutex
instances = map[string]string{}
)
func getDemoInstance(key string) string {
instancesMU.RLock()
if value, ok := instances[key]; ok {
instancesMU.RUnlock()
return value
}
instancesMU.RUnlock()
instancesMU.Lock()
defer instancesMU.Unlock()
if value, ok := instances[key]; ok {
return value
}
value := key + key
instances[key] = value
return value
}
You can try this as well: sync.Map
Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple goroutines without additional locking or coordination. Loads, stores, and deletes run in amortized constant time.
The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys.
In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex.
Note: In the third paragraph it mentions why using sync.Map is beneficial rather than using Go Map simply paired up with sync.RWMutex.
So this perfectly fits your case, I guess?
Little late to answer, anyways this should help: https://github.com/ashwinshirva/api/tree/master/dp/singleton
This shows two ways to implement singleton:
Using the sync.Mutex
Using sync.Once
In Go, a sync.Mutex or chan is used to prevent concurrent access of shared objects. However, in some cases I am just interested in the "latest" value of a variable or field of an object.
Or I like to write a value and do not care if another go-routine overwrites it later or has just overwritten it before.
Update: TLDR; Just don't do this. It is not safe. Read the answers, comments, and linked documents!
Update 2021: The Go memory model is going to be specified more thoroughly and there are three great articles by Russ Cox that will teach you more about the surprising effects of unsynchronized memory access. These articles summarize a lot of the below discussions and learnings.
Here are two variants good and bad of an example program, where both seem to produce "correct" output using the current Go runtime:
package main
import (
"flag"
"fmt"
"math/rand"
"time"
)
var bogus = flag.Bool("bogus", false, "use bogus code")
func pause() {
time.Sleep(time.Duration(rand.Uint32()%100) * time.Millisecond)
}
func bad() {
stop := time.After(100 * time.Millisecond)
var name string
// start some producers doing concurrent writes (DANGER!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
name = fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
for {
select {
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func good() {
stop := time.After(100 * time.Millisecond)
names := make(chan string, 10)
// start some producers concurrently writing to a channel (GOOD!)
for i := 0; i < 10; i++ {
go func(i int) {
pause()
names <- fmt.Sprintf("name = %d", i)
}(i)
}
// start consumer that shows the current value every 10ms
go func() {
tick := time.Tick(10 * time.Millisecond)
var name string
for {
select {
case name = <-names:
case <-stop:
return
case <-tick:
fmt.Println("read:", name)
}
}
}()
<-stop
}
func main() {
flag.Parse()
if *bogus {
bad()
} else {
good()
}
}
The expected output is as follows:
...
read: name = 3
read: name = 3
read: name = 5
read: name = 4
...
Any combination of read: and read: name=[0-9] is correct output for this program. Receiving any other string as output would be an error.
When running this program with go run --race bogus.go it is safe.
However, go run --race bogus.go -bogus warns of the concurrent reads and writes.
For map types and when appending to slices I always need a mutex or a similar method of protection to avoid segfaults or unexpected behavior. However, reading and writing literals (atomic values) to variables or field values seems to be safe.
Question: Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
Please explain why something is safe or unsafe in Go in your answer.
Update: I rewrote the example to better reflect the original code, where I had the the concurrent writes issue. The important leanings are already in the comments. I will accept an answer that summarizes these learnings with enough detail (esp. on the Go-runtime).
However, in some cases I am just interested in the latest value of a variable or field of an object.
Here is the fundamental problem: What does the word "latest" mean?
Suppoose that, mathematically speaking, we have a sequence of values Xi, with 0 <= i < N. Then obviously Xj is "later than" Xi if j > i. That's a nice simple definition of "latest" and is probably the one you want.
But when two separate CPUs within a single machine—including two goroutines in a Go program—are working at the same time, time itself loses meaning. We cannot say whether i < j, i == j, or i > j. So there is no correct definition for the word latest.
To solve this kind of problem, modern CPU hardware, and Go as a programming language, gives us certain synchronization primitives. If CPUs A and B execute memory fence instructions, or synchronization instructions, or use whatever other hardware provisions exist, the CPUs (and/or some external hardware) will insert whatever is required for the notion of "time" to regain its meaning. That is, if the CPU uses barrier instructions, we can say that a memory load or store that was executed before the barrier is a "before" and a memory load or store that is executed after the barrier is an "after".
(The actual implementation, in some modern hardware, consists of load and store buffers that can rearrange the order in which loads and stores go to memory. The barrier instruction either synchronizes the buffers, or places an actual barrier in them, so that loads and stores cannot move across the barrier. This particular concrete implementation gives an easy way to think about the problem, but isn't complete: you should think of time as simply not existing outside the hardware-provided synchronization, i.e., all loads from, and stores to, some location are happening simultaneously, rather than in some sequential order, except for these barriers.)
In any case, Go's sync package gives you a simple high level access method to these kinds of barriers. Compiled code that executes before a mutex Lock call really does complete before the lock function returns, and the code that executes after the call really does not start until after the lock function returns.
Go's channels provide the same kinds of before/after time guarantees.
Go's sync/atomic package provides much lower level guarantees. In general you should avoid this in favor of the higher level channel or sync.Mutex style guarantees. (Edit to add note: You could use sync/atomic's Pointer operations here, but not with the string type directly, as Go strings are actually implemented as a header containing two separate values: a pointer, and a length. You could solve this with another layer of indirection, by updating a pointer that points to the string object. But before you even consider doing that, you should benchmark the use of the language's preferred methods and verify that these are a problem, because code that works at the sync/atomic level is hard to write and hard to debug.)
Which Go data types can I safely read and safely write concurrently without a mutext and without producing segfaults and without reading garbage from memory?
None.
It really is that simple: You cannot, under no circumstance whatsoever, read and write concurrently to anything in Go.
(Btw: Your "correct" program is not correct, it is racy and even if you get rid of the race condition it would not deterministically produce the output.)
Why can't you use channels
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup // wait group to close channel
var buffer int = 1 // buffer of the channel
// channel to get the share data
cName := make(chan string, buffer)
for i := 0; i < 10; i++ {
wg.Add(1) // add to wait group
go func(i int) {
cName <- fmt.Sprintf("name = %d", i)
wg.Done() // decrease wait group.
}(i)
}
go func() {
wg.Wait() // wait of wait group to be 0
close(cName) // close the channel
}()
// process all the data
for n := range cName {
println("read:", n)
}
}
The above code returns the following output
read: name = 0
read: name = 5
read: name = 1
read: name = 2
read: name = 3
read: name = 4
read: name = 7
read: name = 6
read: name = 8
read: name = 9
https://play.golang.org/p/R4n9ssPMOeS
Article about channels
I encounter a situation that I can not understand. In my code, I use functions have the need to read a map (but not write, only loop through a snapshot of existing datas in this map). There is my code :
type MyStruct struct {
*sync.RWMutex
MyMap map[int]MyDatas
}
var MapVar = MyStruct{ &sync.RWMutex{}, make(map[int]MyDatas) }
func MyFunc() {
MapVar.Lock()
MapSnapshot := MapVar.MyMap
MapVar.Unlock()
for _, a := range MapSnapshot { // Map concurrent write/read occur here
//Some stuff
}
}
main() {
go MyFunc()
}
The function "MyFunc" is run in a go routine, only once, there is no multiple runs of this func. Many other functions are accessing to the same "MapVar" with the same method and it randomly produce a "map concurrent write/read". I hope someone will explain to me why my code is wrong.
Thank you for your time.
edit: To clarify, I am just asking why my range MapSnapshot produce a concurrent map write/read. I cant understand how this map can be concurrently used since I save the real global var (MapVar) in a local var (MapSnapshot) using a sync mutex.
edit: Solved. To copy the content of a map in a new variable without using the same reference (and so to avoid map concurrent read/write), I must loop through it and write each index and content to a new map with a for loop.
Thanks xpare and nilsocket.
there is no multiple runs of this func. Many other functions are accessing to the same "MapVar" with the same method and it randomly produce a "map concurrent write/read"
When you pass the value of MapVar.MyMap to MapSnapshot, the Map concurrent write/read will never be occur, because the operation is wrapped with mutex.
But on the loop, the error could happen since practically reading process is happening during loop. So better to wrap the loop with mutex as well.
MapVar.Lock() // lock begin
MapSnapshot := MapVar.MyMap
for _, a := range MapSnapshot {
// Map concurrent write/read occur here
// Some stuff
}
MapVar.Unlock() // lock end
UPDATE 1
Here is my response to your argument below:
This for loop takes a lot of time, there is many stuff in this loop, so locking will slow down other routines
As per your statement The function "MyFunc" is run in a go routine, only once, there is no multiple runs of this func, then I think making the MyFunc to be executed as goroutine is not a good choice.
And to increase the performance, better to make the process inside the loop to be executed in a goroutine.
func MyFunc() {
for _, a := range MapVar.MyMap {
go func(a MyDatas) {
// do stuff here
}(a)
}
}
main() {
MyFunc() // remove the go keyword
}
UPDATE 2
If you really want to copy the MapVar.MyMap into another object, passing it to another variable will not solve that (map is different type compared to int, float32 or other primitive type).
Please refer to this thread How to copy a map?
In code where a global map with an expensive to generate value structure may be modified by multiple concurrent threads, which pattern is correct?
// equivalent to map[string]*activity where activity is a
// fairly heavyweight structure
var ipActivity sync.Map
// version 1: not safe with multiple threads, I think
func incrementIP(ip string) {
val, ok := ipActivity.Load(ip)
if !ok {
val = buildComplexActivityObject()
ipActivity.Store(ip, val)
}
updateTheActivityObject(val.(*activity), ip)
}
// version 2: inefficient, I think, because a complex object is built
// every time even through it's only needed the first time
func incrementIP(ip string) {
tmp := buildComplexActivityObject()
val, _ := ipActivity.LoadOrStore(ip, tmp)
updateTheActivity(val.(*activity), ip)
}
// version 3: more complex but technically correct?
func incrementIP(ip string) {
val, found := ipActivity.Load(ip)
if !found {
tmp := buildComplexActivityObject()
// using load or store incase the mapping was already made in
// another store
val, _ = ipActivity.LoadOrStore(ip, tmp)
}
updateTheActivity(val.(*activity), ip)
}
Is version three the correct pattern given Go's concurrency model?
Option 1 obviously can be called by multiple goroutines with a new ip concurrently, and only the last one in the if block would get stored. This possibility is greatly increased the longer buildComplexActivityObject takes, as there is more time in the critical section.
Option 2 works, but calls buildComplexActivityObject every time, which you state is not what you want.
Given that you want to call buildComplexActivityObject as infrequently as possible, the third option is the only one that makes sense.
The sync.Map however cannot protect the actual activity values referenced by the stored pointers. You also need synchronization there when updating the activity value.
I recently started writing Go after years of programming in C#, and I'm having a hard time wrapping my head around several concepts of the language. Here's an example of what I'm trying to solve: I'd like to be able to create a routine that iterates over a list, calls a function, and stores the output in a buffered channel. The issue is I want to return a distinct set of these output values, as the function can return similar results for two different elements in the list.
Since Go doesn't have a built-in set type, I'm trying to use a map[string]bool to store distinct values (using map[string]bool or map[string]struct is what others suggested as a replacement for a set); and I'm using a buffered channel to insert into this map, however I'm not certain what the right syntax for inserting 1 element into a map would look like. Here's what I'm trying to do:
resultsChnl := make(chan map[string]bool, len(myList))
go func(myList []string, resultsChnl chan map[string]bool) {
for _, item := range myList {
result, err := getResult(item)
/* error checking */
resultsChnl <- {result: true}
}
close(resultsChnl)
}(myList, resultsChnl)
for item := range resultsChnl {
...
}
Obviously this doesn't compile due to invalid syntax of resultsChnl <- {result: true}. I know this sounds impractical since naturally in this particular case I could create a local map inside the for loop and assign one map[string]bool object to a non-buffered channel and return that, but let's assume I was creating a go routine for each item in the list and really wanted to use a buffered channel (as opposed to using a mutex to grab a lock on a shared map). So is there any way to insert one key-value pair in a map channel? Or am I thinking about this completely wrong?
To answer the question directly, you would want
resultsChnl <- map[string]bool{result: true}
But this doesn't seem useful at all. You may want to collect the results in a map, but there's no reason to pass a map over the channel for each result when you know it will only have one element. Simply use a channel of string, do
resultsChnl <- result
for each result in your producer goroutine, and
seenResult[item] = true
in your consumer loop to collect the results (where seenResult is a map[string]bool).
Or forget about the channel entirely and have your producer goroutines write directly into a sync.Map.