The following code:
package main
import (
"fmt"
"strings"
)
var data = []string{
"The yellow fish swims slowly in the water",
"The brown dog barks loudly after a drink ...",
"The dark bird bird of prey lands on a small ...",
}
func main() {
histogram := make(map[string]int)
words := make(chan string)
for _, line := range data {
go func(l string) {
for _, w := range strings.Split(line, " ") {
words <- w
}
}(line)
}
defer close(words)
for w := range words {
histogram[w]++
}
fmt.Println(histogram)
}
ends with deadlock:
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.main()
/tmp/sandbox780076580/main.go:28 +0x1e0
My understanding is that channel words will block writers and readers to achieve some synchronization. I'm trying to use a single channel for all goroutines (writers) and a single reader in main (using "range" command).
I have tried also with buffered channels - similar failures.
I have problems to understand why this is not working. Any tips towards understanding?
Thank you.
As stated in the comments to the question, the defer is not executed until main returns. As a result, the range over words blocks forever.
To fix the issue, the application must close words when all of the goroutines are done sending. One way to do this is to use a wait group. The wait group is incremented for each goroutine, decremented when the goroutines exit. Yet another goroutine waits on the group and closes the channel.
func main() {
histogram := make(map[string]int)
words := make(chan string)
var wg sync.WaitGroup
for _, line := range data {
wg.Add(1)
go func(l string) {
for _, w := range strings.Split(l, " ") {
words <- w
}
wg.Done()
}(line)
}
go func() {
wg.Wait()
close(words)
}()
for w := range words {
histogram[w]++
}
fmt.Println(histogram)
}
Bonus fix: The goroutine in the question referred to the loop variable iine instead of the argument l. The FAQ explains why this is a problem.
Related
I got this code from someone on github and I am trying to play around with it to understand concurrency.
package main
import (
"bufio"
"fmt"
"os"
"sync"
"time"
)
var wg sync.WaitGroup
func sad(url string) string {
fmt.Printf("gonna sleep a bit\n")
time.Sleep(2 * time.Second)
return url + " added stuff"
}
func main() {
sc := bufio.NewScanner(os.Stdin)
urls := make(chan string)
results := make(chan string)
for i := 0; i < 20; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for url := range urls {
n := sad(url)
results <- n
}
}()
}
for sc.Scan() {
url := sc.Text()
urls <- url
}
for result := range results {
fmt.Printf("%s arrived\n", result)
}
wg.Wait()
close(urls)
close(results)
}
I have a few questions:
Why does this code give me a deadlock?
How does that for loop exist before the operation of taking in input from user does the go routines wait until anything is passes in the urls channel then start doing work? I don't get this because it's not sequential, like why is taking in input from user then putting every input in the urls channel then running the go routines is considered wrong?
Inside the for loop I have another loop which is iterating over the urls channel, does each go routine deal with exactly one line of input? or does one go routine handle multiple lines at once? how does any of this work?
Am i gathering the output correctly here?
Mostly you're doing things correctly, but have things a little out of order. The for sc.Scan() loop will continue until Scanner is done, and the for result := range results loop will never run, thus no go routine ('main' in this case) will be able to receive from results. When running your example, I started the for result := range results loop before for sc.Scan() and also in its own go routine--otherwise for sc.Scan() will never be reached.
go func() {
for result := range results {
fmt.Printf("%s arrived\n", result)
}
}()
for sc.Scan() {
url := sc.Text()
urls <- url
}
Also, because you run wg.Wait() before close(urls), the main goroutine is left blocked waiting for the 20 sad() go routines to finish. But they can't finish until close(urls) is called. So just close that channel before waiting for the waitgroup.
close(urls)
wg.Wait()
close(results)
The for-loop creates 20 goroutines, all waiting input from the urls channel. When someone writes into this channel, one of the goroutines will pick it up and work on in. This is a typical worker-pool implementation.
Then, then scanner reads input line by line, and sends it to the urls channel, where one of the goroutines will pick it up and write the response to the results channel. At this point, there are no other goroutines reading from the results channel, so this will block.
As the scanner reads URLs, all other goroutines will pick them up and block. So if the scanner reads more than 20 URLs, it will deadlock because all goroutines will be waiting for a reader.
If there are fewer than 20 URLs, the scanner for-loop will end, and the results will be read. However that will eventually deadlock as well, because the for-loop will terminate when the channel is closed, and there is no one there to close the channel.
To fix this, first, close the urls channel right after you finish reading. That will release all the for-loops in the goroutines. Then you should put the for-loop reading from the results channel into a goroutine, so you can call wg.Wait while results are being processed. After wg.Wait, you can close the results channel.
This does not guarantee that all items in the results channel will be read. The program may terminate before all messages are processed, so use a third channel which you close at the end of the goroutine that reads from the results channel. That is:
done:=make(chan struct{})
go func() {
defer close(done)
for result := range results {
fmt.Printf("%s arrived\n", result)
}
}()
wg.Wait()
close(results)
<-done
I am not super happy with previous answers, so here is a solution based on the documented behavior in the go tour, the go doc, the specifications.
package main
import (
"bufio"
"fmt"
"strings"
"sync"
"time"
)
var wg sync.WaitGroup
func sad(url string) string {
fmt.Printf("gonna sleep a bit\n")
time.Sleep(2 * time.Millisecond)
return url + " added stuff"
}
func main() {
// sc := bufio.NewScanner(os.Stdin)
sc := bufio.NewScanner(strings.NewReader(strings.Repeat("blah blah\n", 15)))
urls := make(chan string)
results := make(chan string)
for i := 0; i < 20; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for url := range urls {
n := sad(url)
results <- n
}
}()
}
// results is consumed by so many goroutines
// we must wait for them to finish before closing results
// but we dont want to block here, so put that into a routine.
go func() {
wg.Wait()
close(results)
}()
go func() {
for sc.Scan() {
url := sc.Text()
urls <- url
}
close(urls) // done consuming a channel, close it, right away.
}()
for result := range results {
fmt.Printf("%s arrived\n", result)
} // the program will finish when it gets out of this loop.
// It will get out of this loop because you have made sure the results channel is closed.
}
I am new to go and I am trying to learn some basic use of signal functions in goroutines. I have an infinite for loop in go. Through this for loop, I pass values to a goroutine through a channel. I also have a threshold value after which I will like to stop sending values indefinitely to the goroutine (i.e. close the channel). When the threshold value is reached, I will like to break the for loop. Following is what I have tried so far.
In this particular example, thresholdValue = 10 and I would like to print values from 0 , ..., 9 and then stop.
I followed this post on medium and this post on stackoverflow. I picked elements from these posts which I could use.
This is what I have done at the present. In the main function of my code, I purposefully make the for loop an infinite loop. My main intention is to learn how to have the goroutine readValues() take the threshold value and then stop transmission of values indefinitely in the channel.
package main
import (
"fmt"
)
func main() {
ch := make(chan int)
quitCh := make(chan struct{}) // signal channel
thresholdValue := 10 //I want to stop the incoming data to readValues() after this value
go readValues(ch, quitCh, thresholdValue)
for i:=0; ; i++{
ch <- i
}
}
func readValues(ch chan int, quitCh chan struct{}, thresholdValue int) {
for value := range ch {
fmt.Println(value)
if (value == thresholdValue){
close(quitCh)
}
}
}
The goroutine in my code still misses the threshold. I will appreciate any direction as to how I should proceed from here.
to show good faith, this is the program rewritten.
package main
import (
"log"
"sync"
"time"
)
func main() {
ch := make(chan int, 5) // capacity increased for demonstration
thresholdValue := 10
var wg sync.WaitGroup
wg.Add(1)
go func() {
readValues(ch)
wg.Done()
}()
for i := 0; i < thresholdValue; i++ {
ch <- i
}
close(ch)
log.Println("sending done.")
wg.Wait()
}
func readValues(ch chan int) {
for value := range ch {
<-time.After(time.Second) // for demonstratin purposes.
log.Println(value)
}
}
In this version readValues exits because the for loop did exit and that main closed ch.
In other words a stop condition take effects and triggers the exit sequence (signal end of input then wait for the processing to finish)
After getting (the right) solution to my initial problem in this post Understanding golang channels: deadlock, I have come up with a slightly different solution (which in my opinion reads better:
// Binary histogram counts the occurences of each word.
package main
import (
"fmt"
"strings"
"sync"
)
var data = []string{
"The yellow fish swims slowly in the water",
"The brown dog barks loudly after a drink ...",
"The dark bird bird of prey lands on a small ...",
}
func main() {
histogram := make(map[string]int)
words := make(chan string)
var wg sync.WaitGroup
for _, line := range data {
wg.Add(1)
go func(l string) {
for _, w := range strings.Split(l, " ") {
words <- w
}
wg.Done()
}(line)
}
go func() {
for w := range words {
histogram[w]++
}
}()
wg.Wait()
close(words)
fmt.Println(histogram)
}
It does work, but unfortunately running it against race, it shows 2 race conditions:
==================
WARNING: DATA RACE
Read at 0x00c420082180 by main goroutine:
...
Previous write at 0x00c420082180 by goroutine 9:
...
Goroutine 9 (running) created at:
main.main()
Can you help me understand where is the race condition?
You are trying to read from histogram in fmt.Println(histogram) which is not synchronized to the write of the goroutine mutating it histogram[w]++. You can add a lock to synchronize the writes and reads.
e.g.
var lock sync.Mutex
go func() {
lock.Lock()
defer lock.Unlock()
for w := range words {
histogram[w]++
}
}()
//...
lock.Lock()
fmt.Println(histogram)
Note you can also use a sync.RWMutex.
Another thing you could do is to wait for the goroutine mutating histogram to finish.
var histWG sync.WaitGroup
histWG.Add(1)
go func() {
for w := range words {
histogram[w]++
}
histWG.Done()
}()
wg.Wait()
close(words)
histWG.Wait()
fmt.Println(histogram)
Or simply use a channel to wait.
done := make(chan bool)
go func() {
for w := range words {
histogram[w]++
}
done <- true
}()
wg.Wait()
close(words)
<-done
fmt.Println(histogram)
Im unable to terminate my WaitGroup in go and consequently can't exit the range loop. Can anybody tell me why. Or a better way of limiting the number of go routines while still being able to exit on chan close!
Most examples i have seen relate to a statically typed chan length, but this channel is dynamically resized as a result of other processes.
The print statement ("DONE!") in the example are printed showing that the testValProducer prints the right amount of times but the code never reaches ("--EXIT--") which means wg.Wait is still blocking somehow.
type TestValContainer chan string
func StartFunc(){
testValContainer := make(TestValContainer)
go func(){testValContainer <- "string val 1"}()
go func(){testValContainer <- "string val 2"}()
go func(){testValContainer <- "string val 3"}()
go func(){testValContainer <- "string val 4"}()
go func(){testValContainer <- "string val 5"}()
go func(){testValContainer <- "string val 6"}()
go func(){testValContainer <- "string val 7"}()
wg := sync.WaitGroup{}
// limit the number of worker goroutines
for i:=0; i < 3; i++ {
wg.Add(1)
go func(){
v := i
fmt.Printf("launching %v", i)
for str := range testValContainer{
testValProducer(str, &wg)
}
fmt.Println(v, "--EXIT --") // never called
}()
}
wg.Wait()
close(testValContainer)
}
func get(url string){
http.Get(url)
ch <- getUnvisited()
}
func testValProducer(testStr string, wg *sync.WaitGroup){
doSomething(testStr)
fmt.Println("done !") // called
wg.Done() // NO EFFECT??
}
I might do something like this, it keeps everything easy to follow. I define a structure which implements a semaphore to control the number of active Go routines spinning up... and allows me to read from the channel as they come in.
package main
import (
"fmt"
"sync"
)
type TestValContainer struct {
wg sync.WaitGroup
sema chan struct{}
data chan int
}
func doSomething(number int) {
fmt.Println(number)
}
func main() {
//semaphore limit 10 routines at time
tvc := TestValContainer{
sema: make(chan struct{}, 10),
data: make(chan int),
}
for i := 0; i <= 100; i++ {
tvc.wg.Add(1)
go func(i int) {
tvc.sema <- struct{}{}
defer func() {
<-tvc.sema
tvc.wg.Done()
}()
tvc.data <- i
}(i)
}
// wait in the background so that waiting and closing the channel dont
// block the for loop below
go func() {
tvc.wg.Wait()
close(tvc.data)
}()
// get channel results
for res := range tvc.data {
doSomething(res)
}
}
In your example you have two errors:
You are calling wg.Done in side the loop in each worker thread rather than at the end of the worker thread (right before it completes). The calls to wg.Done must be matched one-to-one with wg.Add(1)s.
With that fixed, there is a deadlock where the main thread is waiting for the worker threads to complete, while the worker threads area waiting for the input channel to be closed by the main thread.
The logic will be cleaner and easier to understand if you separate the producer side from the consumer side more clearly. Run a separate goroutine for each side. Example:
// Producer side (only write and close allowed).
go func() {
testValContainer <- "string val 1"
testValContainer <- "string val 2"
testValContainer <- "string val 3"
testValContainer <- "string val 4"
testValContainer <- "string val 5"
testValContainer <- "string val 6"
testValContainer <- "string val 7"
close(testValContainer) // Signals that production is done.
}()
// Consumer side (only read allowed).
for i:=0; i < 3; i++ {
wg.Add(1)
go func() {
defer wg.Done()
v := i
fmt.Printf("launching %v", i)
for str := range testValContainer {
doSomething(str)
}
fmt.Println(v, "--EXIT --")
}()
}
wg.Wait()
If the items are being produced from some other source, potentially a collection of goroutines, you should still have either: 1) a separate goroutine or logic somewhere that oversees that production and calls close once it's done, or 2) make your main thread wait for the production side to complete (e.g. with a WaitGroup waiting for the producer goroutines) and close the channel before waiting for the consumptions side.
If you think about it, no matter how you arrange the logic you need to have some "side-channel" way of detecting, in one single synchronised place, that there are no more messages being produced. Otherwise you can never know when the channel should be closed.
In other words, you can't wait for the range loops on the consumer side to complete to trigger the close, as this leads to a catch 22.
I was simply experimenting in golang. I came across an interesting result. This is my code.
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
var str1, str2 string
wg.Add(2)
go func() {
fmt.Scanf("%s", &str1)
wg.Done()
}()
go func() {
fmt.Scanf("%s", &str2)
wg.Done()
}()
wg.Wait()
fmt.Printf("%s %s\n", str1, str2)
}
I gave the following input.
beat
it
I was expecting the result to be either
it beat
or
beat it
But I got.
eat bit
Can any one please help me figure out why it is so?
fmt.Scanf isn't an atomic operation. Here's the implementation : http://golang.org/src/pkg/fmt/scan.go#L1115
There's no semaphor, nothing preventing two parallel executions. So what happens is simply that the executions are really parallel, and as there's no buffering, any byte reading is an IO operation and thus a perfect time for the go scheduler to change goroutine.
The problem is that you are sharing a single resource (the stdin byte stream) across multiple goroutines.
Each goroutine could be spawn at different non-deterministic times. i.e:
first goroutine 1 read all stdin, then start goroutine 2
first goroutine 2 read all stdin, then start goroutine 1
first goroutine 1 block on read, then start goroutine 2 read one char and then restart goroutine 1
... and so on and on ...
In most cases is enough to use only one goroutine to access a linear resource as a byte stream and attach a channel to it and then spawn multiple consumers that listen to that channel.
For example:
package main
import (
"fmt"
"io"
"sync"
)
func main() {
var wg sync.WaitGroup
words := make(chan string, 10)
wg.Add(1)
go func() {
for {
var buff string
_, err := fmt.Scanf("%s", &buff)
if err != nil {
if err != io.EOF {
fmt.Println("Error: ", err)
}
break
}
words <- buff
}
close(words)
wg.Done()
}()
// Multiple consumers
for i := 0; i < 5; i += 1 {
go func() {
for word := range words {
fmt.Printf("%s\n", word)
}
}()
}
wg.Wait()
}