WaitGroup data sync guarantees

WaitGroup data sync guarantees - go

Go memory model, states the following:
Programs that modify data being simultaneously accessed by multiple goroutines must serialize such access. To serialize access, protect the data with channel operations or other synchronization primitives such as those in the sync and sync/atomic packages.
Sadly, the documentation doesn't specify what synchronization primitive they relate to.
My question, is WaitGroup included in those synchronization primitives?
According to the above statement, in the following code, done might never be seen as true:
var a string
var done bool
func setup() {
a = "hello, world"
done = true
}
func main() {
go setup()
for !done {
}
print(a)
}
Does WaitGroup guarantees that once waitgroup.Wait() was called, values written in the group's goroutines execution to shared memory will be seen with the most updated values?
In other words, is the following code guaranteed to print "hello"? (Playground)
package main
import (
"sync"
)
var a string = "start"
var wg sync.WaitGroup
func hello() {
wg.Add(1)
go func() {
a = "hello"
wg.Done()
}()
wg.Wait()
print(a)
}
func main() {
hello()
}

The answers to your questions are all: yes. WaitGroup is a synchronization primitive which is why it is in the sync package. The call to Wait() guarantees that you will see the correct value of a. The code is guaranteed to print "hello".
There's not much else to say.

Related

Any concurrency issues when multiple goroutines access different fields of the same struct

Are there are any concurrency issues if we access mutually exclusive fields of a struct inside different go co-routines?
I remember reading somewhere that if two parallel threads access same object they might get run on different cores of the cpu both having different cpu level caches with different copies of the object in question. (Not related to Go)
Will the below code be sufficient to achieve the functionality correctly or does additional synchronization mechanism needs to be used?
package main
import (
"fmt"
"sync"
)
type structure struct {
x string
y string
}
func main() {
val := structure{}
wg := new(sync.WaitGroup)
wg.Add(2)
go func1(&val, wg)
go func2(&val, wg)
wg.Wait()
fmt.Println(val)
}
func func1(val *structure, wg *sync.WaitGroup) {
val.x = "Test 1"
wg.Done()
}
func func2(val *structure, wg *sync.WaitGroup) {
val.y = "Test 2"
wg.Done()
}
Edit: - for people who ask why not channels unfortunately this is not the actual code I am working on. Both the func has calls to different api and get the data in a struct, those struct has a pragma.DoNotCopy in them ask protobuf auto generator why they thought it was a good idea. So those data can't be sent over the channel, or else i have to create another struct to send the data over or ask the linter to stop complaining. Or i can send a pointer to the object but feel that is also sharing the memory.

You must synchronize when at least one of accesses to shared resources is a write.
Your code is doing write access, yes, but different struct fields have different memory locations. So you are not accessing shared variables.
If you run your program with the race detector, eg. go run -race main.go it will not print a warning.
Now add fmt.Println(val.y) in func1 and run again, it will print:
WARNING: DATA RACE
Write at 0x00c0000c0010 by goroutine 8:
... rest of race warning

The preferred way in Go would be to communicate memory as opposed to share the memory.
In practice that would mean you should use the Go channels as I show in this blogpost.
https://marcofranssen.nl/concurrency-in-go
If you really want to stick with sharing memory you will have to use a Mutex.
https://tour.golang.org/concurrency/9
However that would cause context switching and Go routines synchronization that slows down your program.
Example using channels
package main
import (
"fmt"
"time"
)
type structure struct {
x string
y string
}
func main() {
val := structure{}
c := make(chan structure)
go func1(c)
go func2(c)
func(c chan structure) {
for {
select {
case v, ok := <-c:
if !ok {
return
}
if v.x != "" {
fmt.Printf("Received %v\n", v)
val.x = v.x
}
if v.y != "" {
fmt.Printf("Received %v\n", v)
val.y = v.y
}
if val.x != "" && val.y != "" {
close(c)
}
}
}
}(c)
fmt.Printf("%v\n", val)
}
func func1(c chan<- structure) {
time.Sleep(1 * time.Second)
c <- structure{x: "Test 1"}
}
func func2(c chan<- structure) {
c <- structure{y: "Test 2"}
}

pass message by sending buffer to go channel but got overwritten, why?

A very simple and usual case in golang as below, but got result not expected.
package main
import (
"fmt"
"time"
)
func main() {
consumer(generator())
for {
time.Sleep(time.Duration(time.Second))
}
}
// simple generator through channel
func generator() <-chan []byte {
ret := make(chan []byte)
go func() {
// make buf outside of loop, and result is not expected
var ch = byte('A')
count := 0
buf := make([]byte, 1)
for {
if count > 10 {
return
}
// make buf inside loop, and result is expected
// buf := make([]byte, 1)
buf[0] = ch
ret <- buf
ch++
count++
// time.Sleep(time.Duration(time.Second))
}
}()
return ret
}
// simple consumer through channel
func consumer(recv <-chan []byte) {
go func() {
for buf := range recv {
fmt.Println("received:" + string(buf[0]))
}
}()
}
output:
received:A
received:B
received:D
received:D
received:F
received:F
received:H
received:H
received:J
received:J
received:K
In generator, if put the buf variable inside for loop, result is what I expected:
received:A
received:B
received:C
received:D
received:E
received:F
received:G
received:H
received:I
received:J
received:K
I am thinking even buf is outside for loop and not changed always, after we write it to channe, receiver will read out it until next write can happen, so its' content should not be override, but looks like golang behaviors not in this way, what wrong for happened here?

Problem: your code contains a data race
Save your your program in a file named main.go; then run it with the race detector: go run -race main.go. You should see something like the following:
$ go run -race main.go
received:A
==================
WARNING: DATA RACE
Write at 0x00c000180000 by goroutine 7:
main.generator.func1()
/redacted/main.go:29 +0x8c
Previous read at 0x00c000180000 by goroutine 8:
main.consumer.func1()
/redacted/main.go:43 +0x55
The race detector tells you your program contains a data race because two goroutines are writing and reading to some shared memory without synchronisation:
the anonymous function launched as a goroutine in your generator function updates its local variable named buf at line 29;
the anonymous function launched as a goroutine in your consumer function reads from its local variable named buf at line 43.
The data race stems from the conjunction of two things:
Although local variable buf in consumer is just a copy of the homonymous local variable in generator, those slice variables are coupled because they refer to the same underlying array.
See [the relevant section of the language specification] (https://golang.org/ref/spec#Slice_types):
A slice, once initialized, is always associated with an underlying array that holds its elements. A slice therefore shares storage with its array and with other slices of the same array [...]
Operations on slices are not concurrency-safe and require proper synchronisation if performed concurrently (i.e. from multiple goroutines at the same time).
What your code displays is a typical case of aliasing. You should better familiarise yourself with how slices work.
Solution
You could eliminate the data race by using a one-byte array ([1]byte) instead of a slice, but arrays are quite inflexible in Go. Whether you really need to use a slice of bytes at all here is unclear. Since you're effectively only sending one byte at a time to the channel, why not simply use a chan byte rather than a chan []byte?
Other improvements unrelated to the data race include:
modifying the API of your two functions to make them synchronous (and therefore, easier to reason about);
simplifying the generator logic and closing the channel so that main can actually terminate;
simplifying the consumer logic and not spawning a goroutine for it.
package main
import "fmt"
func main() {
ch := make(chan byte)
go generator(ch)
consumer(ch)
}
func generator(ch chan<- byte) {
var c byte = 'A'
for i := 0; i < 10; i++ {
ch <- c
c++
}
close(ch)
}
func consumer(ch <-chan byte) {
for c := range ch {
fmt.Printf("received: %c\n", c)
}
}

The case is very simple. Both threads have ownership of the buffer and so channel does not guarantee synchronization. While consumer is reading the channel, generator is fast enough to modify the buffer so this char skip happens. to fix this you have to introduce another channel (that will send buffer back) or pass a copy of buffer.

How to be 100% sure that goroutines are waited on condition?

reading the book Concurrency in Go by Katherine Cox-Buday and there is example how to use Condition.Broadcast to wake up waiting goroutines. here is the example from the book
package main
import (
"fmt"
"sync"
)
type Button struct {
Clicked *sync.Cond
}
func main() {
button := &Button{Clicked: &sync.Cond{L: &sync.Mutex{}}}
subscribe := func(c *sync.Cond, fn func()) {
var goroutineRunning sync.WaitGroup
goroutineRunning.Add(1)
go func() {
goroutineRunning.Done() //1
c.L.Lock()
// 2
defer c.L.Unlock()
c.Wait()
fn()
}()
goroutineRunning.Wait()
}
var clickedRegistred sync.WaitGroup
clickedRegistred.Add(2)
subscribe(button.Clicked, func() {
fmt.Println("Func #1")
clickedRegistred.Done()
})
subscribe(button.Clicked, func() {
fmt.Println("Func #2")
clickedRegistred.Done()
})
button.Clicked.Broadcast()
clickedRegistred.Wait()
}
is it code 100% concurrency safe?
is it possible situation when main goroutine continue the work and does broadcast before func goroutines parked and waited the signal/broadcast from condition?
I mean the situation when func goroutines done their work at line //1 but did not acquire the condition lock yet and main goroutine do broadcast this time, is it possible?
my assumption the follow: should the line //1 be moved at place //2 to avoid this situation?
or there is no problem and the situation I discribed does not make sence?

Your question has two parts, I think:
Does this code have a potential race bug? Yes: see, e.g., How do I know that all my goroutines are indeed waiting for a condition using golang's sync package
Is moving the Done call a fix? In this particular code, yes, but I would say, perhaps not a particularly good fix. As Andy Schweig notes in a comment on his own answer, it's more typical for code that uses the condition's Wait method to loop:
for some_boolean { c.Wait() }
The code that calls c.Signal or c.Broadcast would set the boolean appropriately (here, to false) before calling Signal or Broadcast, to indicate that the waiter should now stop waiting.

Can't get map of channels to work

This might be a rookies mistake. I have a slice with a string value and a map of channels. For each string in the slice, a channel is created and a map entry is created for it, with the string as key.
I watch the channels and pass a value to one of them, which is never found.
package main
import (
"fmt"
"time"
)
type TestStruct struct {
Test string
}
var channelsMap map[string](chan *TestStruct)
func main() {
stringsSlice := []string{"value1"}
channelsMap := make(map[string](chan *TestStruct))
for _, value := range stringsSlice {
channelsMap[value] = make(chan *TestStruct, 1)
go watchChannel(value)
}
<-time.After(3 * time.Second)
testStruct := new(TestStruct)
testStruct.Test = "Hello!"
channelsMap["value1"] <- testStruct
<-time.After(3 * time.Second)
fmt.Println("Program ended")
}
func watchChannel(channelMapKey string) {
fmt.Println("Watching channel: " + channelMapKey)
for channelValue := range channelsMap[channelMapKey] {
fmt.Printf("Channel '%s' used. Passed value: '%s'\n", channelMapKey, channelValue.Test)
}
}
Playground link: https://play.golang.org/p/IbucTqMjdGO
Output:
Watching channel: value1
Program ended
How do I execute something when the message is fed into the channel?

There are many problems with your approach.
The first one is that you're redeclaring ("shadowing") the global
variable channelsMap in your main function.
(Had you completed at least some
most basic intro to Go, you should have had no such problem.)
This means that your watchChannel (actually, all the goroutines which execute that function) read the global channelsMap while your main function writes to its local channelsMap.
What happens next, is as follows:
The range statement
in the watchChannel has a simple
map lookup expression as its source—channelsMap[channelMapKey].
In Go, this form of map lookup
never fails, but if the map has no such key (or if the map is not initialized, that is, it's nil), the so-called
"zero value"
of the appropriate type is returned.
Since the global channelsMap is always empty, any call to watchChannel performs a map lookup which always returns
the zero value of type chan *TestStruct.
The zero value for any channel is nil.
The range statement executed over a nil channel
produces zero iterations.
In other words, the for loop in watchChannel always executes
zero times.
The more complex problem, still, is not shadowing of the global variable but rather the complete absense of synchronization between the goroutines. You're using "sleeping" as a sort of band-aid in an attempt to perform implicit synchronization between goroutines
but while this does appear to be okay judged by so-called
"common sense", it's not going to work in practice for two
reasons:
Sleeping is always a naïve approach to synchronization as it solely depens of the fact all the goroutines will run relatively freely and uncontended. This is far from being true in many (if not most) production settings and hence is always the reason for subtle bugs. Don't ever do that again, please.
Nothing in the Go memory model
says that waiting against wall-clock timing is considered by the runtime as establishing the order on how execution of different goroutines relate to each other.
There exist various ways to synchronize execution between goroutines. Basically they amount to sends and receives over channels and using the types provided by the sync package.
In your particular case the simplest approach is probably using the sync.WaitGroup type.
Here is what we would
have after fixing the problems explained above:
- Initialize the map variable right at the point of its
definition and not mess with it in main.
- Use sync.WaitGroup to make main properly wait for all
the goroutines it spawned to singal they're done:
package main
import (
"fmt"
"sync"
)
type TestStruct struct {
Test string
}
var channelsMap = make(map[string](chan *TestStruct))
func main() {
stringsSlice := []string{"value1"}
var wg sync.WaitGroup
wg.Add(len(stringsSlice))
for _, value := range stringsSlice {
channelsMap[value] = make(chan *TestStruct, 1)
go watchChannel(value, &wg)
}
testStruct := new(TestStruct)
testStruct.Test = "Hello!"
channelsMap["value1"] <- testStruct
wg.Wait()
fmt.Println("Program ended")
}
func watchChannel(channelMapKey string, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Watching channel: " + channelMapKey)
for channelValue := range channelsMap[channelMapKey] {
fmt.Printf("Channel '%s' used. Passed value: '%s'\n", channelMapKey, channelValue.Test)
}
}
The next two problems with your code become apparent once we will
have fixed the former two—after you make the "watcher" goroutines
use the same map variable as the goroutine running main, and
make the latter properly wait for the watchers:
There is a data race
over the map variable between the
code which updates the map after the for loop spawning the
watcher goroutines ended and the code which accesses this
variable in all the watcher goroutines.
There is a deadlock
between the watcher goroutines and the main goroutine which waits for them to complete.
The reason for the deadlock is that the watcher goroutines
never receive any signal they have to quit processing and
hence are stuck forever trying to read from their respective
channels.
The ways to fix these two new problems are simple but they
might actually "break" your original idea of structuring
your code.
First, I'd remove the data race by simply making the watchers
not access the map variable. As you can see, each call to
watchChannel receives a single value to use as the key to
read a value off the shared map, and hence each watcher always
reads a single value exactly once during its run time.
The code would become much clearer if we remove this extra
map access altogether and instead pass the appropriate channel
value directly to each watcher.
A nice byproduct of this is that we do not need a global
map variable anymore.
Here's what we'll get:
package main
import (
"fmt"
"sync"
)
type TestStruct struct {
Test string
}
func main() {
stringsSlice := []string{"value1"}
channelsMap := make(map[string](chan *TestStruct))
var wg sync.WaitGroup
wg.Add(len(stringsSlice))
for _, value := range stringsSlice {
channelsMap[value] = make(chan *TestStruct, 1)
go watchChannel(value, channelsMap[value], &wg)
}
testStruct := new(TestStruct)
testStruct.Test = "Hello!"
channelsMap["value1"] <- testStruct
wg.Wait()
fmt.Println("Program ended")
}
func watchChannel(channelMapKey string, ch <-chan *TestStruct, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Watching channel: " + channelMapKey)
for channelValue := range ch {
fmt.Printf("Channel '%s' used. Passed value: '%s'\n", channelMapKey, channelValue.Test)
}
}
Okay, we still have the deadlock.
There are multiple approaches to solving this but they depend
on the actual circumstances, and with this toy example, any
attempt to iterate over at least a subset of them would just
muddle the waters.
Instead, let's employ the simplest one for this case: closing
a channel makes any pending receive operation on it immediately
unblock and produce the zero value for the channel's type.
For a channel being iterated over using the range statement
it simply means the stamement terminates without producing any
value from the channel.
In other words, let's just close all the channels to unblock
the range statements being run by the watcher goroutines
and then wait for these goroutines to report their completion via the wait group.
To not make the answer overly long, I also added programmatic initialization of the string slice to make the example more interesting by having multiple watchers—not just a single one—actually do useful work:
package main
import (
"fmt"
"sync"
)
type TestStruct struct {
Test string
}
func main() {
var stringsSlice []string
channelsMap := make(map[string](chan *TestStruct))
for i := 1; i <= 10; i++ {
stringsSlice = append(stringsSlice, fmt.Sprintf("value%d", i))
}
var wg sync.WaitGroup
wg.Add(len(stringsSlice))
for _, value := range stringsSlice {
channelsMap[value] = make(chan *TestStruct, 1)
go watchChannel(value, channelsMap[value], &wg)
}
for _, value := range stringsSlice {
testStruct := new(TestStruct)
testStruct.Test = fmt.Sprint("Hello! ", value)
channelsMap[value] <- testStruct
}
for _, ch := range channelsMap {
close(ch)
}
wg.Wait()
fmt.Println("Program ended")
}
func watchChannel(channelMapKey string, ch <-chan *TestStruct, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Watching channel: " + channelMapKey)
for channelValue := range ch {
fmt.Printf("Channel '%s' used. Passed value: '%s'\n", channelMapKey, channelValue.Test)
}
}
Playground link.
As you can see, there are things you should actually learn
about in way more greater detail before embarking on working with
concurrency.
I'd recommend to proceed in the following order:
The Go tour would make you accustomed with the bare bones of concurrency.
The Go Programming Language has two chapters dedicated to providing the readers with a gentle introduction with tackling concurrency both using channels and the types from the sync package.
Concurrency In Go goes on with presenting more hard-core details of how one deals with concurrency in Go, including advanced topics approaching the real-world problems concurrent programs face in production—such as ways to rate-limit incoming requests.

The shadowing in main of channelsMap mentioned above was a critical bug, but aside from that, the program was playing "Russian roulette" with the calls to time.After so that main wouldn't finish before the watcher goroutines did. This is unstable and unreliable, so I recommend the following approach using a channel to signal when all watcher goroutines are done:
package main
import (
"fmt"
)
type TestStruct struct {
Test string
}
var channelsMap map[string](chan *TestStruct)
func main() {
stringsSlice := []string{"value1", "value2", "value3"}
structsSlice := []TestStruct{
{"Hello1"},
{"Hello2"},
{"Hello3"},
}
channelsMap = make(map[string](chan *TestStruct))
// Signal channel to wait for watcher goroutines.
done := make(chan struct{})
for _, s := range stringsSlice {
channelsMap[s] = make(chan *TestStruct)
// Give watcher goroutines the signal channel.
go watchChannel(s, done)
}
for _, ts := range structsSlice {
for _, s := range stringsSlice {
channelsMap[s] <- &ts
}
}
// Close the channels so watcher goroutines can finish.
for _, s := range stringsSlice {
close(channelsMap[s])
}
// Wait for all watcher goroutines to finish.
for range stringsSlice {
<-done
}
// Now we're really done!
fmt.Println("Program ended")
}
func watchChannel(channelMapKey string, done chan<- struct{}) {
fmt.Println("Watching channel: " + channelMapKey)
for channelValue := range channelsMap[channelMapKey] {
fmt.Printf("Channel '%s' used. Passed value: '%s'\n", channelMapKey, channelValue.Test)
}
done <- struct{}{}
}
(Go Playground link: https://play.golang.org/p/eP57Ru44-NW)
Of importance is the use of the done channel to let watcher goroutines signal that they're finished to main. Another critical part is the closing of the channels once you're done with them. If you don't close them, the range loops in the watcher goroutines will never end, waiting forever. Once you close the channel, the range loop exits and the watcher goruoutine can send on the done channel, signaling that it has finished working.
Finally, back in main, you have to receive on the done channel once for each watcher goroutine you created. Since the number of watcher goroutines is equal to the number of items in stringsSlice, you simply range over stringsSlice to receive the correct amount of times from the done channel. Once that's finished, the main function can exit with a guarantee that all watchers have finished.

Data race when GOMAXPROCS=1

I'm trying to understand one of the Golang typical data races where access to an unprotected global variable from multiple goroutines may cause a race condition:
var service map[string]net.Addr
func RegisterService(name string, addr net.Addr) {
service[name] = addr
}
func LookupService(name string) net.Addr {
return service[name]
}
It goes on to say that we can solve this by protecting it with a mutex:
var (
service map[string]net.Addr
serviceMu sync.Mutex
)
func RegisterService(name string, addr net.Addr) {
serviceMu.Lock()
defer serviceMu.Unlock()
service[name] = addr
}
func LookupService(name string) net.Addr {
serviceMu.Lock()
defer serviceMu.Unlock()
return service[name]
}
So far, so good. What's confusing me is this:
The accepted answer to this question suggests that a CPU-bound goroutine will any starve other goroutines that have been multiplexed onto the same OS thread (unless we explicitly yield with runtime.Gosched()). Which makes sense.
To me, the RegisterService() and LookupService() functions above look to be CPU bound, as there's no IO and no yield. Is this correct?
If it is, and if GOMAXPROCS is set to 1, then is a mutex in the above example still strictly necessary? Doesn't the fact that the goroutines are CPU-bound at the point where race conditions might occur take care of it?
Even if it does, I presume that in real life using a mutex is here is still a good idea as we may not be able to guarantee what GOMAXPROCS is set to. Are there another reasons?

As FUZxxl and Nick Craig-Wood noted, current behavior of goroutines is implementation-specific. So, maybe, read or write to map can yield. Considering that maps are not thread safe, mutex or another syncronization is required for correct concurrent access.
As alternative to mutex, you can spawn a goroutine, that performs all operations on your map and talk to it with channels:
type writereq struct {
key string
value net.Addr
reply chan struct{}
}
type readreq struct {
key string
reply chan net.Addr
}
var service map[string]net.Addr
var reads = make(chan readreq)
var writes = make(chan writereq)
func RegisterService(name string, addr net.Addr) {
w := writereq{name, addr, make(chan struct{})}
writes <- w
return <-w.reply // return after registration confirmation
}
func LookupService(name string) net.Addr {
r := readreq{name, make(chan net.Addr)}
reads <- r
return <-r.reply
}
func serveRegistry() {
for {
select {
case r := <-reads:
r.reply <- service[r.name]
case w := <-writes:
service[w.name] = w.addr
w.reply <- struct{}
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

WaitGroup data sync guarantees - go

The answers to your questions are all: yes. WaitGroup is a synchronization primitive which is why it is in the sync package. The call to Wait() guarantees that you will see the correct value of a. The code is guaranteed to print "hello". There's not much else to say.

Related

Any concurrency issues when multiple goroutines access different fields of the same struct

pass message by sending buffer to go channel but got overwritten, why?

How to be 100% sure that goroutines are waited on condition?

Can't get map of channels to work

Data race when GOMAXPROCS=1

Categories

Resources