Race condition with waitgroup and unbuffered channel - go

After getting (the right) solution to my initial problem in this post Understanding golang channels: deadlock, I have come up with a slightly different solution (which in my opinion reads better:
// Binary histogram counts the occurences of each word.
package main
import (
"fmt"
"strings"
"sync"
)
var data = []string{
"The yellow fish swims slowly in the water",
"The brown dog barks loudly after a drink ...",
"The dark bird bird of prey lands on a small ...",
}
func main() {
histogram := make(map[string]int)
words := make(chan string)
var wg sync.WaitGroup
for _, line := range data {
wg.Add(1)
go func(l string) {
for _, w := range strings.Split(l, " ") {
words <- w
}
wg.Done()
}(line)
}
go func() {
for w := range words {
histogram[w]++
}
}()
wg.Wait()
close(words)
fmt.Println(histogram)
}
It does work, but unfortunately running it against race, it shows 2 race conditions:
==================
WARNING: DATA RACE
Read at 0x00c420082180 by main goroutine:
...
Previous write at 0x00c420082180 by goroutine 9:
...
Goroutine 9 (running) created at:
main.main()
Can you help me understand where is the race condition?

You are trying to read from histogram in fmt.Println(histogram) which is not synchronized to the write of the goroutine mutating it histogram[w]++. You can add a lock to synchronize the writes and reads.
e.g.
var lock sync.Mutex
go func() {
lock.Lock()
defer lock.Unlock()
for w := range words {
histogram[w]++
}
}()
//...
lock.Lock()
fmt.Println(histogram)
Note you can also use a sync.RWMutex.
Another thing you could do is to wait for the goroutine mutating histogram to finish.
var histWG sync.WaitGroup
histWG.Add(1)
go func() {
for w := range words {
histogram[w]++
}
histWG.Done()
}()
wg.Wait()
close(words)
histWG.Wait()
fmt.Println(histogram)
Or simply use a channel to wait.
done := make(chan bool)
go func() {
for w := range words {
histogram[w]++
}
done <- true
}()
wg.Wait()
close(words)
<-done
fmt.Println(histogram)

Related

Why does this goroutine not call wg.Done()?

Suppose there are a maximum two elements (worker addresses) on registerChan at any point. Then for some reason, the following code does not call wg.Done() in the last two goroutines.
func schedule(jobName string, mapFiles []string, nReduce int, phase jobPhase, registerChan chan string) {
var ntasks int
var nOther int // number of inputs (for reduce) or outputs (for map)
switch phase {
case mapPhase:
ntasks = len(mapFiles)
nOther = nReduce
case reducePhase:
ntasks = nReduce
nOther = len(mapFiles)
}
fmt.Printf("Schedule: %v %v tasks (%d I/Os)\n", ntasks, phase, nOther)
const rpcname = "Worker.DoTask"
var wg sync.WaitGroup
for taskNumber := 0; taskNumber < ntasks; taskNumber++ {
file := mapFiles[taskNumber%len(mapFiles)]
taskArgs := DoTaskArgs{jobName, file, phase, taskNumber, nOther}
wg.Add(1)
go func(taskArgs DoTaskArgs) {
workerAddr := <-registerChan
print("hello\n")
// _ = call(workerAddr, rpcname, taskArgs, nil)
registerChan <- workerAddr
wg.Done()
}(taskArgs)
}
wg.Wait()
fmt.Printf("Schedule: %v done\n", phase)
}
If I put wg.Done() before registerChan <- workerAddr it works just fine and I have no idea why. I have also tried deferring wg.Done() but that doesn't seem to work even though I expected it to. I think I have some misunderstanding of how go routines and channels work which is causing my confusion.
Because it stopped here:
workerAddr := <-registerChan
For a buffered channel:
To get this workerAddr := <-registerChan to work: the channel registerChan must have a value; otherwise, the code will stop here waiting for the channel.
I managed to run your code this way (try this):
package main
import (
"fmt"
"sync"
)
func main() {
registerChan := make(chan int, 1)
for i := 1; i <= 10; i++ {
wg.Add(1)
go fn(i, registerChan)
}
registerChan <- 0 // seed
wg.Wait()
fmt.Println(<-registerChan)
}
func fn(taskArgs int, registerChan chan int) {
workerAddr := <-registerChan
workerAddr += taskArgs
registerChan <- workerAddr
wg.Done()
}
var wg sync.WaitGroup
Output:
55
Explanation:
This code adds 1 to 10 using a channel and 10 goroutines plus one main goroutine.
I hope this helps.
When you run this statement registerChan <- workerAddr, if the channel capacity is full you cannot add in it and it will block. If you have a pool of, say 10, workerAddr, you could add all of them in a buffered channel of capacity 10 before calling schedule. Do not add after the call, to guarantee that if you take a value from the channel, there is space to add it again after. Using defer at the beginning of your goroutine is good.

Channels terminate prematurely

I am prototyping a series of go routines for a pipeline that each perform a transformation. The routines are terminating before all the data has passed through.
I have checked Donavan and Kernighan book and Googled for solutions.
Here is my code:
package main
import (
"fmt"
"sync"
)
func main() {
a1 := []string{"apple", "apricot"}
chan1 := make(chan string)
chan2 := make(chan string)
chan3 := make(chan string)
var wg sync.WaitGroup
go Pipe1(chan2, chan1, &wg)
go Pipe2(chan3, chan2, &wg)
go Pipe3(chan3, &wg)
func (data []string) {
defer wg.Done()
for _, s := range data {
wg.Add(1)
chan1 <- s
}
go func() {
wg.Wait()
close(chan1)
}()
}(a1)
}
func Pipe1(out chan<- string, in <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for s := range in {
wg.Add(1)
out <- s + "s are"
}
}
func Pipe2(out chan<- string, in <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for s := range in {
wg.Add(1)
out <- s + " good for you"
}
}
func Pipe3(in <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for s := range in {
wg.Add(1)
fmt.Println(s)
}
}
My expected output is:
apples are good for you
apricots are good for you
The results of running main are inconsistent. Sometimes I get both lines. Sometimes I just get the apples. Sometimes nothing is output.
As Adrian already pointed out, your WaitGroup.Add and WaitGroup.Done calls are mismatched. However, in cases like this the "I am done" signal is typically given by closing the output channel. WaitGroups are only necessary if work is shared between several goroutines (i.e. several goroutines consume the same channel), which isn't the case here.
package main
import (
"fmt"
)
func main() {
a1 := []string{"apple", "apricot"}
chan1 := make(chan string)
chan2 := make(chan string)
chan3 := make(chan string)
go func() {
for _, s := range a1 {
chan1 <- s
}
close(chan1)
}()
go Pipe1(chan2, chan1)
go Pipe2(chan3, chan2)
// This range loop terminates when chan3 is closed, which Pipe2 does after
// chan2 is closed, which Pipe1 does after chan1 is closed, which the
// anonymous goroutine above does after it sent all values.
for s := range chan3 {
fmt.Println(s)
}
}
func Pipe1(out chan<- string, in <-chan string) {
for s := range in {
out <- s + "s are"
}
close(out) // let caller know that we're done
}
func Pipe2(out chan<- string, in <-chan string) {
for s := range in {
out <- s + " good for you"
}
close(out) // let caller know that we're done
}
Try it on the playground: https://play.golang.org/p/d2J4APjs_lL
You're calling wg.Wait in a goroutine, so main is allowed to return (and therefore your program exits) before the other routines have finished. This would cause the behavior your see, but taking out of a goroutine alone isn't enough.
You're also misusing the WaitGroup in general; your Add and Done calls don't relate to one another, and you don't have as many Dones as you have Adds, so the WaitGroup will never finish. If you're calling Add in a loop, then every loop iteration must also result in a Done call; as you have it now, you defer wg.Done() before each of your loops, then call Add inside the loop, resulting in one Done and many Adds. This code would need to be significantly revised to work as intended.

Understanding golang channels: deadlock

The following code:
package main
import (
"fmt"
"strings"
)
var data = []string{
"The yellow fish swims slowly in the water",
"The brown dog barks loudly after a drink ...",
"The dark bird bird of prey lands on a small ...",
}
func main() {
histogram := make(map[string]int)
words := make(chan string)
for _, line := range data {
go func(l string) {
for _, w := range strings.Split(line, " ") {
words <- w
}
}(line)
}
defer close(words)
for w := range words {
histogram[w]++
}
fmt.Println(histogram)
}
ends with deadlock:
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.main()
/tmp/sandbox780076580/main.go:28 +0x1e0
My understanding is that channel words will block writers and readers to achieve some synchronization. I'm trying to use a single channel for all goroutines (writers) and a single reader in main (using "range" command).
I have tried also with buffered channels - similar failures.
I have problems to understand why this is not working. Any tips towards understanding?
Thank you.
As stated in the comments to the question, the defer is not executed until main returns. As a result, the range over words blocks forever.
To fix the issue, the application must close words when all of the goroutines are done sending. One way to do this is to use a wait group. The wait group is incremented for each goroutine, decremented when the goroutines exit. Yet another goroutine waits on the group and closes the channel.
func main() {
histogram := make(map[string]int)
words := make(chan string)
var wg sync.WaitGroup
for _, line := range data {
wg.Add(1)
go func(l string) {
for _, w := range strings.Split(l, " ") {
words <- w
}
wg.Done()
}(line)
}
go func() {
wg.Wait()
close(words)
}()
for w := range words {
histogram[w]++
}
fmt.Println(histogram)
}
Bonus fix: The goroutine in the question referred to the loop variable iine instead of the argument l. The FAQ explains why this is a problem.

Closing a Go channel, and syncing a go routine

Im unable to terminate my WaitGroup in go and consequently can't exit the range loop. Can anybody tell me why. Or a better way of limiting the number of go routines while still being able to exit on chan close!
Most examples i have seen relate to a statically typed chan length, but this channel is dynamically resized as a result of other processes.
The print statement ("DONE!") in the example are printed showing that the testValProducer prints the right amount of times but the code never reaches ("--EXIT--") which means wg.Wait is still blocking somehow.
type TestValContainer chan string
func StartFunc(){
testValContainer := make(TestValContainer)
go func(){testValContainer <- "string val 1"}()
go func(){testValContainer <- "string val 2"}()
go func(){testValContainer <- "string val 3"}()
go func(){testValContainer <- "string val 4"}()
go func(){testValContainer <- "string val 5"}()
go func(){testValContainer <- "string val 6"}()
go func(){testValContainer <- "string val 7"}()
wg := sync.WaitGroup{}
// limit the number of worker goroutines
for i:=0; i < 3; i++ {
wg.Add(1)
go func(){
v := i
fmt.Printf("launching %v", i)
for str := range testValContainer{
testValProducer(str, &wg)
}
fmt.Println(v, "--EXIT --") // never called
}()
}
wg.Wait()
close(testValContainer)
}
func get(url string){
http.Get(url)
ch <- getUnvisited()
}
func testValProducer(testStr string, wg *sync.WaitGroup){
doSomething(testStr)
fmt.Println("done !") // called
wg.Done() // NO EFFECT??
}
I might do something like this, it keeps everything easy to follow. I define a structure which implements a semaphore to control the number of active Go routines spinning up... and allows me to read from the channel as they come in.
package main
import (
"fmt"
"sync"
)
type TestValContainer struct {
wg sync.WaitGroup
sema chan struct{}
data chan int
}
func doSomething(number int) {
fmt.Println(number)
}
func main() {
//semaphore limit 10 routines at time
tvc := TestValContainer{
sema: make(chan struct{}, 10),
data: make(chan int),
}
for i := 0; i <= 100; i++ {
tvc.wg.Add(1)
go func(i int) {
tvc.sema <- struct{}{}
defer func() {
<-tvc.sema
tvc.wg.Done()
}()
tvc.data <- i
}(i)
}
// wait in the background so that waiting and closing the channel dont
// block the for loop below
go func() {
tvc.wg.Wait()
close(tvc.data)
}()
// get channel results
for res := range tvc.data {
doSomething(res)
}
}
In your example you have two errors:
You are calling wg.Done in side the loop in each worker thread rather than at the end of the worker thread (right before it completes). The calls to wg.Done must be matched one-to-one with wg.Add(1)s.
With that fixed, there is a deadlock where the main thread is waiting for the worker threads to complete, while the worker threads area waiting for the input channel to be closed by the main thread.
The logic will be cleaner and easier to understand if you separate the producer side from the consumer side more clearly. Run a separate goroutine for each side. Example:
// Producer side (only write and close allowed).
go func() {
testValContainer <- "string val 1"
testValContainer <- "string val 2"
testValContainer <- "string val 3"
testValContainer <- "string val 4"
testValContainer <- "string val 5"
testValContainer <- "string val 6"
testValContainer <- "string val 7"
close(testValContainer) // Signals that production is done.
}()
// Consumer side (only read allowed).
for i:=0; i < 3; i++ {
wg.Add(1)
go func() {
defer wg.Done()
v := i
fmt.Printf("launching %v", i)
for str := range testValContainer {
doSomething(str)
}
fmt.Println(v, "--EXIT --")
}()
}
wg.Wait()
If the items are being produced from some other source, potentially a collection of goroutines, you should still have either: 1) a separate goroutine or logic somewhere that oversees that production and calls close once it's done, or 2) make your main thread wait for the production side to complete (e.g. with a WaitGroup waiting for the producer goroutines) and close the channel before waiting for the consumptions side.
If you think about it, no matter how you arrange the logic you need to have some "side-channel" way of detecting, in one single synchronised place, that there are no more messages being produced. Otherwise you can never know when the channel should be closed.
In other words, you can't wait for the range loops on the consumer side to complete to trigger the close, as this leads to a catch 22.

Best way of using sync.WaitGroup with external function

I have some issues with the following code:
package main
import (
"fmt"
"sync"
)
// This program should go to 11, but sometimes it only prints 1 to 10.
func main() {
ch := make(chan int)
var wg sync.WaitGroup
wg.Add(2)
go Print(ch, wg) //
go func(){
for i := 1; i <= 11; i++ {
ch <- i
}
close(ch)
defer wg.Done()
}()
wg.Wait() //deadlock here
}
// Print prints all numbers sent on the channel.
// The function returns when the channel is closed.
func Print(ch <-chan int, wg sync.WaitGroup) {
for n := range ch { // reads from channel until it's closed
fmt.Println(n)
}
defer wg.Done()
}
I get a deadlock at the specified place. I have tried setting wg.Add(1) instead of 2 and it solves my problem. My belief is that I'm not successfully sending the channel as an argument to the Printer function. Is there a way to do that? Otherwise, a solution to my problem is replacing the go Print(ch, wg)line with:
go func() {
Print(ch)
defer wg.Done()
}
and changing the Printer function to:
func Print(ch <-chan int) {
for n := range ch { // reads from channel until it's closed
fmt.Println(n)
}
}
What is the best solution?
Well, first your actual error is that you're giving the Print method a copy of the sync.WaitGroup, so it doesn't call the Done() method on the one you're Wait()ing on.
Try this instead:
package main
import (
"fmt"
"sync"
)
func main() {
ch := make(chan int)
var wg sync.WaitGroup
wg.Add(2)
go Print(ch, &wg)
go func() {
for i := 1; i <= 11; i++ {
ch <- i
}
close(ch)
defer wg.Done()
}()
wg.Wait() //deadlock here
}
func Print(ch <-chan int, wg *sync.WaitGroup) {
for n := range ch { // reads from channel until it's closed
fmt.Println(n)
}
defer wg.Done()
}
Now, changing your Print method to remove the WaitGroup of it is a generally good idea: the method doesn't need to know something is waiting for it to finish its job.
I agree with #Elwinar's solution, that the main problem in your code caused by passing a copy of your Waitgroup to the Print function.
This means the wg.Done() is operated on a copy of wg you defined in the main. Therefore, wg in the main could not get decreased, and thus a deadlock happens when you wg.Wait() in main.
Since you are also asking about the best practice, I could give you some suggestions of my own:
Don't remove defer wg.Done() in Print. Since your goroutine in main is a sender, and print is a receiver, removing wg.Done() in receiver routine will cause an unfinished receiver. This is because only your sender is synced with your main, so after your sender is done, your main is done, but it's possible that the receiver is still working. My point is: don't leave some dangling goroutines around after your main routine is finished. Close them or wait for them.
Remember to do panic recovery everywhere, especially anonymous goroutine. I have seen a lot of golang programmers forgetting to put panic recovery in goroutines, even if they remember to put recover in normal functions. It's critical when you want your code to behave correctly or at least gracefully when something unexpected happened.
Use defer before every critical calls, like sync related calls, at the beginning since you don't know where the code could break. Let's say you removed defer before wg.Done(), and a panic occurrs in your anonymous goroutine in your example. If you don't have panic recover, it will panic. But what happens if you have a panic recover? Everything's fine now? No. You will get deadlock at wg.Wait() since your wg.Done() gets skipped because of panic! However, by using defer, this wg.Done() will be executed at the end, even if panic happened. Also, defer before close is important too, since its result also affects the communication.
So here is the code modified according to the points I mentioned above:
package main
import (
"fmt"
"sync"
)
func main() {
ch := make(chan int)
var wg sync.WaitGroup
wg.Add(2)
go Print(ch, &wg)
go func() {
defer func() {
if r := recover(); r != nil {
println("panic:" + r.(string))
}
}()
defer func() {
wg.Done()
}()
for i := 1; i <= 11; i++ {
ch <- i
if i == 7 {
panic("ahaha")
}
}
println("sender done")
close(ch)
}()
wg.Wait()
}
func Print(ch <-chan int, wg *sync.WaitGroup) {
defer func() {
if r := recover(); r != nil {
println("panic:" + r.(string))
}
}()
defer wg.Done()
for n := range ch {
fmt.Println(n)
}
println("print done")
}
Hope it helps :)

Resources