I've just installed Go on Mac, and here's the code
package main
import (
"fmt"
"time"
)
func Product(ch chan<- int) {
for i := 0; i < 100; i++ {
fmt.Println("Product:", i)
ch <- i
}
}
func Consumer(ch <-chan int) {
for i := 0; i < 100; i++ {
a := <-ch
fmt.Println("Consmuer:", a)
}
}
func main() {
ch := make(chan int, 1)
go Product(ch)
go Consumer(ch)
time.Sleep(500)
}
I "go run producer_consumer.go", there's no output on screen, and then it quits.
Any problem with my program ? How to fix it ?
This is a rather verbose answer, but to put it simply:
Using time.Sleep to wait until hopefully other routines have completed their jobs is bad.
The consumer and producer shouldn't know anything about each other, apart from the type they exchange over the channel. Your code relies on both consumer and producer knowing how many ints will be passed around. Not a realistic scenario
Channels can be iterated over (think of them as a thread-safe, shared slice)
channels should be closed
At the bottom of this rather verbose answer where I attempt to explain some basic concepts and best practices (well, better practices), you'll find your code rewritten to work and display all the values without relying on time.Sleep. I've not tested that code, but should be fine
Right, there's a couple of problems here. Just as a bullet-list:
Your channel is buffered to 1, which is fine, but it's not necessary
Your channel is never closed
You're waiting 500ns, then exit regardless of the routines having completed, or even started processing for that matter.
There's no centralised control on over the routines, once you've started them, you have 0 control. If you hit ctrl+c, you might want to cancel routines when writing code that'll handle important data. Check signal handling, and context for this
Channel buffer
Seeing as you already know how many values you're going to push onto your channel, why not simply create ch := make(chan int, 100)? That way your publisher can continue to push messages onto the channel, regardless of what the consumer does.
You don't need to do this, but adding a sensible buffer to your channel, depending on what you're trying to do, is definitely worth checking out. At the moment, though, both routines are using fmt.Println & co, which is going to be a bottleneck either way. Printing to STDOUT is thread-safe, and buffered. This means that each call to fmt.Print* is going to acquire a lock, to avoid text from both routines to be combined.
Closing the channel
You could simply push all the values onto your channel, and then close it. This is, however, bad form. The rule of thumb WRT channels is that channels are created and closed in the same routine. Meaning: you're creating the channel in the main routine, that's where it should be closed.
You need a mechanism to sync up, or at least keep tabs on whether or not your routines have completed their job. That's done using the sync package, or through a second channel.
// using a done channel
func produce(ch chan<- int) <-chan struct{} {
done := make(chan struct{})
go func() {
for i := 0; i < 100; i++ {
ch <- i
}
// all values have been published
// close done channel
close(done)
}()
return done
}
func main() {
ch := make(chan int, 1)
done := produce(ch)
go consume(ch)
<-done // if producer has done its thing
close(ch) // we can close the channel
}
func consume(ch <-chan int) {
// we can now simply loop over the channel until it's closed
for i := range ch {
fmt.Printf("Consumed %d\n", i)
}
}
OK, but here you'll still need to wait for the consume routine to complete.
You may have already noticed that the done channel technically isn't closed in the same routine that creates it either. Because the routine is defined as a closure, however, this is an acceptable compromise. Now let's see how we could use a waitgroup:
import (
"fmt"
"sync"
)
func product(wg *sync.WaitGroup, ch chan<- int) {
defer wg.Done() // signal we've done our job
for i := 0; i < 100; i++ {
ch <- i
}
}
func main() {
ch := make(chan int, 1)
wg := sync.WaitGroup{}
wg.Add(1) // I'm adding a routine to the channel
go produce(&wg, ch)
wg.Wait() // will return once `produce` has finished
close(ch)
}
OK, so this looks promising, I can have the routines tell me when they've finished their tasks. But if I add both consumer and producer to the waitgroup, I can't simply iterate over the channel. The channel will only ever get closed if both routines invoke wg.Done(), but if the consumer is stuck looping over a channel that'll never get closed, then I've created a deadlock.
Solution:
A hybrid would be the easiest solution at this point: Add the consumer to a waitgroup, and use the done channel in the producer to get:
func produce(ch chan<- int) <-chan struct{} {
done := make(chan struct{})
go func() {
for i := 0; i < 100; i++ {
ch <- i
}
close(done)
}()
return done
}
func consume(wg *sync.WaitGroup, ch <-chan int) {
defer wg.Done()
for i := range ch {
fmt.Printf("Consumer: %d\n", i)
}
}
func main() {
ch := make(chan int, 1)
wg := sync.WaitGroup{}
done := produce(ch)
wg.Add(1)
go consume(&wg, ch)
<- done // produce done
close(ch)
wg.Wait()
// consumer done
fmt.Println("All done, exit")
}
I have changed slightly(expanded time.Sleep) your code. Works fine on my Linux x86_64
func Product(ch chan<- int) {
for i := 0; i < 10; i++ {
fmt.Println("Product:", i)
ch <- i
}
}
func Consumer(ch <-chan int) {
for i := 0; i < 10; i++ {
a := <-ch
fmt.Println("Consmuer:", a)
}
}
func main() {
ch := make(chan int, 1)
go Product(ch)
go Consumer(ch)
time.Sleep(10000)
}
Output
go run s1.go
Product: 0
Product: 1
Product: 2
As JimB hinted at, time.Sleep takes a time.Duration, not an integer. The godoc shows an example of how to call this correctly. In your case, you probably want:
time.Sleep(500 * time.Millisecond)
The reason that your program is exiting quickly (but not giving you an error) is due to the (somewhat surprising) way that time.Duration is implemented.
time.Duration is simply a type alias for int64. Internally, it uses the value to represent the duration in nanoseconds. When you call time.Sleep(500), the compiler will gladly interpret the numeric literal 500 as a time.Duration. Unfortunately, that means 500 ns.
time.Millisecond is a constant equal to the number of nanoseconds in a millisecond (1,000,000). The nice thing is that requiring you to do that multiplication explicitly makes it obvious to that caller what the units are on that argument. Unfortunately, time.Sleep(500) is perfectly valid go code but doesn't do what most beginners would expect.
Related
I am trying to implement a simple logic where a Producer sends data to a channel ch with an forever for loop and a Consumer reads from the channel ch.
The Producer stops producing and exit the forever loop when it receives a signal on the channel quit.
The code is this (see also this playground)
func main() {
ch := make(chan int)
quit := make(chan bool)
var wg sync.WaitGroup
wg.Add(1)
go produce(ch, quit, &wg)
go consume(ch)
time.Sleep(1 * time.Millisecond)
fmt.Println("CLOSE")
close(quit)
wg.Wait()
}
func produce(ch chan int, quit chan bool, wg *sync.WaitGroup) {
for i := 0; ; i++ {
select {
case <-quit:
close(ch)
fmt.Println("exit")
wg.Done()
return //we exit
default:
ch <- i
fmt.Println("Producer sends", i)
}
}
}
func consume(ch chan int) {
for {
runtime.Gosched() // give the opportunity to the main goroutine to close the "quit" channel
select {
case i, more := <-ch:
if !more {
fmt.Println("exit consumer")
return
}
fmt.Println("Consumer receives", i)
}
}
}
If I run this piece of code on my machine (a Mac with 4 cores) everything works fine. If I try the same code on the Go Playgroud it always times out. I guess that this because the Go Playground is a single core and so the infinite loop does not give the chance to other goroutines to run, but then I do not understand why the instruction runtime.Gosched() does not have any effect.
Just to complete the picture I have seen that, if I set GOMAXPROCS=1 on my Mac, the program still works fine and exits as expected. If I set GOMAXPROCS=1 on my Mac and remove the runtime.Gosched() instruction, the behavior gets brittle: sometimes the program terminates as expected, some other times it seems to never exit the infinite loop.
You created a pathological situation that shouldn't happen in a real program, so the scheduler is not optimized to handle this. Combined with the fake time implementation in the playground, and you get far too many cycles of the producer and consumer before hitting a timeout.
The producer goroutine is creating values as fast as possible, while the consumer is always ready to receive them. With GOMAPXPROCS=1, the scheduler spends all its time bouncing between the two before it is forced to preempt the available work to check on the main goroutine, which takes longer than the playground will allow.
If we add something for the producer-consumer pair to do, we can limit the amount of time they have to monopolize the scheduler. For example, adding a time.Sleep(time.Microsecond) to the consumer will cause the playground to print 1000 values. This also goes to show how "accurate" the simulated time is in the playground, since that would not be possible with normal hardware which takes a non-zero amount time to process each message.
While an interesting case, this has little bearing on real programs.
A few notes, you can range over a channel to receive all values, you should always defer wg.Done at the start of the goroutine when possible, you can send values in the select case which allows you to actually cancel the for-select loop when the send isn't ready, and if you want the "exit consumer" message you need to send the WaitGroup to the consumer as well.
https://play.golang.org/p/WyPmpY9pFl7
func main() {
ch := make(chan int)
quit := make(chan bool)
var wg sync.WaitGroup
wg.Add(2)
go produce(ch, quit, &wg)
go consume(ch, &wg)
time.Sleep(50 * time.Microsecond)
fmt.Println("CLOSE")
close(quit)
wg.Wait()
}
func produce(ch chan int, quit chan bool, wg *sync.WaitGroup) {
defer wg.Done()
for i := 0; ; i++ {
select {
case <-quit:
close(ch)
fmt.Println("exit")
return
case ch <- i:
fmt.Println("Producer sends", i)
}
}
}
func consume(ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
for i := range ch {
fmt.Println("Consumer receives", i)
time.Sleep(time.Microsecond)
}
fmt.Println("exit consumer")
return
}
I'm trying to build a generic pipeline library using worker pools. I created an interface for a source, pipe, and sink. You see, the pipe's job is to receive data from an input channel, process it, and output the result onto a channel. Here is its intended behavior:
Receive data from an input channel.
Delegate the data to an available worker.
The worker sends the result to the output channel.
Close the output channel once all workers are finished.
func (p *pipe) Process(in chan interface{}) (out chan interface{}) {
var wg sync.WaitGroup
out = make(chan interface{}, 100)
go func() {
for i := 1; i <= 100; i++ {
go p.work(in, out, &wg)
}
wg.Wait()
close(out)
}()
return
}
func (p *pipe) work(jobs <-chan interface{}, out chan<- interface{}, wg *sync.WaitGroup) {
for j := range jobs {
func(j Job) {
defer wg.Done()
wg.Add(1)
res := doSomethingWith(j)
out <- res
}(j)
}
}
However, running it may either exit without processing all of the inputs or panic with a send on closed channel message. Building the source with the -race flag gives out a data race warning between close(out) and out <- res.
Here's what I think might happen. Once a number of workers have finished their jobs, there's a split second where wg's counter reach zero. Hence, wg.Wait() is done and the program proceeds to close(out). Meanwhile, the job channel isn't finished producing data, meaning some workers are still running in another goroutine. Since the out channel is already closed, it results in a panic.
Should the wait group be placed somewhere else? Or is there a better way to wait for all workers to finish?
It's not clear why you want one worker per job, but if you do, you can restructure your outer loop setup (see untested code below). This kind of obviates the need for worker pools in the first place.
Always, though, do a wg.Add before spinning off any worker. Right here, you are spinning off exactly 100 workers:
var wg sync.WaitGroup
out = make(chan interface{}, 100)
go func() {
for i := 1; i <= 100; i++ {
go p.work(in, out, &wg)
}
wg.Wait()
close(out)
}()
You could therefore do this:
var wg sync.WaitGroup
out = make(chan interface{}, 100)
go func() {
wg.Add(100) // ADDED - count the 100 workers
for i := 1; i <= 100; i++ {
go p.work(in, out, &wg)
}
wg.Wait()
close(out)
}()
Note that you can now move wg itself down into the goroutine that spins off the workers. This can make things cleaner, if you give up on the notion of having each worker spin off jobs as new goroutines. But if each worker is going to spin off another goroutine, that worker itself must also use wg.Add, like this:
for j := range jobs {
wg.Add(1) // ADDED - count the spun-off goroutines
func(j Job) {
res := doSomethingWith(j)
out <- res
wg.Done() // MOVED (for illustration only, can defer as before)
}(j)
}
wg.Done() // ADDED - our work in `p.work` is now done
That is, each anonymous function is another user of the channel, so increment the users-of-channel count (wg.Add(1)) before spinning off a new goroutine. When you have finished reading the input channel jobs, call wg.Done() (perhaps via an earlier defer, but I showed it at the end here).
The key to thinking about this is that wg counts the number of active goroutines that could, at this point, write to the channel. It only goes to zero when no goroutines intend to write any more. That makes it safe to close the channel.
Consider using the rather simpler (but untested):
func (p *pipe) Process(in chan interface{}) (out chan interface{}) {
out = make(chan interface{})
var wg sync.WaitGroup
go func() {
defer close(out)
for j := range in {
wg.Add(1)
go func(j Job) {
res := doSomethingWith(j)
out <- res
wg.Done()
}(j)
}
wg.Wait()
}()
return out
}
You now have one goroutine that is reading the in channel as fast as it can, spinning off jobs as it goes. You'll get one goroutine per incoming job, except when they finish their work early. There is no pool, just one worker per job (same as your code except that we knock out the pools that aren't doing anything useful).
Or, since there are only some number of CPUs available, spin off some number of goroutines as you did before at the start, but have each one run one job to completion, and deliver its result, then go back to reading the next job:
func (p *pipe) Process(in chan interface{}) (out chan interface{}) {
out = make(chan interface{})
go func() {
defer close(out)
var wg sync.WaitGroup
ncpu := runtime.NumCPU() // or something fancier if you like
wg.Add(ncpu)
for i := 0; i < ncpu; i++ {
go func() {
defer wg.Done()
for j := range in {
out <- doSomethingWith(j)
}
}()
}
wg.Wait()
}
return out
}
By using runtime.NumCPU() we get only as many workers reading jobs as there are CPUs to run jobs. Those are the pools and they only do one job at a time.
There's generally no need to buffer the output channel, if the output-channel readers are well-structured (i.e., don't cause the pipeline to constipate). If they're not, the depth of buffering here limits how many jobs you can "work ahead" of whoever is consuming the results. Set it based on how useful it is to do this "working ahead"—not necessarily the number of CPUs, or the number of expected jobs, or whatever.
It's possible that the jobs are being completed just as fast as they're being sent. In this case the WaitGroup will be floating near zero even while there's many more items to process.
One fix for this is to add one before sending jobs, and decrement that one after sending them all, effectively consider the sender to be one of the 'jobs'. In this case, it's better if we do the wg.Add in the sender:
func (p *pipe) Process(in chan interface{}) (out chan interface{}) {
var wg sync.WaitGroup
out = make(chan interface{}, 100)
go func() {
for i := 1; i <= 100; i++ {
wg.Add(1)
go p.work(in, out, &wg)
}
wg.Wait()
close(out)
}()
return
}
func (p *pipe) work(jobs <-chan interface{}, out chan<- interface{}, wg *sync.WaitGroup) {
for j := range jobs {
func(j Job) {
res := doSomethingWith(j)
out <- res
wg.Done()
}(j)
}
}
One thing I notice in the code is that a goroutine is started for each job. At the same time each job processes the jobs channel in a loop until empty/closed. It doesn't seem necessary to do both.
func GoCountColumns(in chan []string, r chan Result, quit chan int) {
for {
select {
case data := <-in:
r <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
func main() {
fmt.Println("Welcome to the csv Calculator")
file_path := os.Args[1]
fd, _ := os.Open(file_path)
reader := csv.NewReader(bufio.NewReader(fd))
var totalColumnsCount int64 = 0
var totallettersCount int64 = 0
linesCount := 0
numWorkers := 10000
rc := make(chan Result, numWorkers)
in := make(chan []string, numWorkers)
quit := make(chan int)
t1 := time.Now()
for i := 0; i < numWorkers; i++ {
go GoCountColumns(in, rc, quit)
}
//start worksers
go func() {
for {
record, err := reader.Read()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
if linesCount%1000000 == 0 {
fmt.Println("Adding to the channel")
}
in <- record
//data := countColumns(record)
linesCount++
//totalColumnsCount = totalColumnsCount + data.ColumnCount
//totallettersCount = totallettersCount + data.LettersCount
}
close(in)
}()
for i := 0; i < numWorkers; i++ {
quit <- 1 // quit goroutines from main
}
close(rc)
for i := 0; i < linesCount; i++ {
data := <-rc
totalColumnsCount = totalColumnsCount + data.ColumnCount
totallettersCount = totallettersCount + data.LettersCount
}
fmt.Printf("I counted %d lines\n", linesCount)
fmt.Printf("I counted %d columns\n", totalColumnsCount)
fmt.Printf("I counted %d letters\n", totallettersCount)
elapsed := time.Now().Sub(t1)
fmt.Printf("It took %f seconds\n", elapsed.Seconds())
}
My Hello World is a program that reads a csv file and passes it to a channel. Then the goroutines should consume from this channel.
My Problem is I have no idea how to detect from the main thread that all data was processed and I can exit my program.
on top of other answers.
Take (great) care that closing a channel should happen on the write call site, not the read call site. In GoCountColumns the r channel being written, the responsibility to close the channel are onto GoCountColumns function. Technical reasons are, it is the only actor knowing for sure that the channel will not being written anymore and thus is safe for close.
func GoCountColumns(in chan []string, r chan Result, quit chan int) {
defer close(r) // this line.
for {
select {
case data := <-in:
r <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
The function parameters naming convention, if i might say, is to have the destination as first parameter, the source as second, and others parameters along. The GoCountColumns is preferably written:
func GoCountColumns(dst chan Result, src chan []string, quit chan int) {
defer close(dst)
for {
select {
case data := <-src:
dst <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
You are calling quit right after the process started. Its illogical. This quit command is a force exit sequence, it should be called once an exit signal is detected, to force exit the current processing in best state possible, possibly all broken. In other words, you should be relying on the signal.Notify package to capture exit events, and notify your workers to quit. see https://golang.org/pkg/os/signal/#example_Notify
To write better parallel code, list at first the routines you need to manage the program lifetime, identify those you need to block onto to ensure the program has finished before exiting.
In your code, exists read, map. To ensure complete processing, the program main function must ensure that it captures a signal when map exits before exiting itself. Notice that the read function does not matter.
Then, you will also need the code required to capture an exit event from user input.
Overall, it appears we need to block onto two events to manage lifetime. Schematically,
func main(){
go read()
go map(mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
}
}
This simple code is good to process or die. Indeed, when the user event is caught, the program exits immediately, without giving a chance to others routines to do something required upon stop.
To improve those behaviors, you need first a way to signal the program wants to leave to other routines, second, a way to wait for those routines to finish their stop sequence before leaving.
To signal exit event, or cancellation, you can make use of a context.Context, pass it around to the workers, make them listen to it.
Again, schematically,
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
}
}
(more onto read and map later)
To wait for completion, many things are possible, for as long as they are thread safe. Usually, a sync.WaitGroup is being used. Or, in cases like yours where there is only one routine to wait for, we can re use the current mapDone channel.
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
<-mapDone
}
}
That is simple and straight forward. But it is not totally correct. The last mapDone chan might block forever and make the program unstoppable. So you might implement a second signal handler, or a timeout.
Schematically, the timeout solution is
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
select {
case <-mapDone:
case <-time.After(time.Second):
}
}
}
You might also accumulate a signal handling and a timeout in the last select.
Finally, there are few things to tell about read and map context listening.
Starting with map, the implementation requires to read for context.Done channel regularly to detect cancellation.
It is the easy part, it requires to only update the select statement.
func GoCountColumns(ctx context.Context, dst chan Result, src chan []string) {
defer close(dst)
for {
select {
case <-ctx.Done():
<-time.After(time.Minute) // do something more useful.
return // quit. Notice the defer will be called.
case data := <-src:
dst <- countColumns(data) // some calculation function
}
}
}
Now the read part is bit more tricky as it is an IO it does not provide a selectable programming interface and listening to the context channel cancellation might seem contradictory. It is. As IOs are blocking, impossible to listen the context. And while reading from the context channel, impossible to read the IO. In your case, the solution requires to understand that your read loop is not relevant to your program lifetime (recall we only listen onto mapDone?), and that we can just ignore the context.
In other cases, if for example you wanted to restart at last byte read (so at every read, we increment an n, counting bytes, and we want to save that value upon stop). Then, a new routine is required to be started, and thus, multiple routines are to wait for completion. In such cases a sync.WaitGroup will be more appropriate.
Schematically,
func main(){
var wg sync.WaitGroup
processDone:=make(chan struct{})
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
wg.Add(1)
go saveN(ctx,&wg)
wg.Add(1)
go map(ctx,&wg)
go signal()
go func(){
wg.Wait()
close(processDone)
}()
select {
case <-processDone:
case <-sig:
cancel()
select {
case <-processDone:
case <-time.After(time.Second):
}
}
}
In this last code, the waitgroup is being passed around. Routines are responsible to call for wg.Done(), when all routines are done, the processDone channel is closed, to signal the select.
func GoCountColumns(ctx context.Context, dst chan Result, src chan []string, wg *sync.WaitGroup) {
defer wg.Done()
defer close(dst)
for {
select {
case <-ctx.Done():
<-time.After(time.Minute) // do something more useful.
return // quit. Notice the defer will be called.
case data := <-src:
dst <- countColumns(data) // some calculation function
}
}
}
It is undecided which patterns is preferred, but you might also see waitgroup being managed at call sites only.
func main(){
var wg sync.WaitGroup
processDone:=make(chan struct{})
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
wg.Add(1)
go func(){
defer wg.Done()
saveN(ctx)
}()
wg.Add(1)
go func(){
defer wg.Done()
map(ctx)
}()
go signal()
go func(){
wg.Wait()
close(processDone)
}()
select {
case <-processDone:
case <-sig:
cancel()
select {
case <-processDone:
case <-time.After(time.Second):
}
}
}
Beyond all of that and OP questions, you must always evaluate upfront the pertinence of parallel processing for a given task. There is no unique recipe, practice and measure your code performances. see pprof.
There is way too much going on in this code. You should restructure your code into short functions that serve specific purposes to make it possible for someone to help you out easily (and help yourself as well).
You should read the following Go article, which goes into concurrency patterns:
https://blog.golang.org/pipelines
There are multiple ways to make one go-routine wait on some other work to finish. The most common ways are with wait groups (example I have provided) or channels.
func processSomething(...) {
...
}
func main() {
workers := &sync.WaitGroup{}
for i := 0; i < numWorkers; i++ {
workers.Add(1) // you want to call this from the calling go-routine and before spawning the worker go-routine
go func() {
defer workers.Done() // you want to call this from the worker go-routine when the work is done (NOTE the defer, which ensures it is called no matter what)
processSomething(....) // your async processing
}()
}
// this will block until all workers have finished their work
workers.Wait()
}
You can use a channel to block main until completion of a goroutine.
package main
import (
"log"
"time"
)
func main() {
c := make(chan struct{})
go func() {
time.Sleep(3 * time.Second)
log.Println("bye")
close(c)
}()
// This blocks until the channel is closed by the routine
<-c
}
No need to write anything into the channel. Reading is blocked until data is read or, which we use here, the channel is closed.
I've been trying to solve this simple problem I encountered in Golang concurrency. I've been searching all possible solutions, but found nothing specific to my problem(or I might be missed one). Here's my code:
package main
import (
"fmt"
"time"
)
func producer(ch chan int, d time.Duration, num int) {
for i:=0; i<num; i++ {
ch <- i
time.Sleep(d)
}
}
func main() {
ch := make(chan int)
go producer(ch, 100*time.Millisecond, 2)
go producer(ch, 200*time.Millisecond, 5)
for {
fmt.Println(<-ch)
}
close(ch)
}
It prints error:
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.main()
D:/Code/go/src/testconcurrency/main.go:23 +0xca
exit status 2
What is the efficient way to avoid this error?, Thank you.
You have producers which are "short-lived", they only send values on the channel for a finite amount of time, and you have an endless for loop which receives values from the channel endlessly, without a termination condition, and the channel is only closed after this endless loop. Once the producers stop sending values, it's a deadlock.
Channels must be closed by the producer(s), signalling that no more values will be sent on it. Since you have multiple producers without synchronization (producers are not synchronized with each other), in general you can't tell which one will finish first, so you can't designate one to close the channel (and a channel can only be closed once, see Why Go's channel can close twice?; and Closing channel of unknown length).
You have to "coordinate" the producers, and when all have finished their jobs, the coordinator should close the channel.
And the consumer should use a for range on the channel, as the for range construct receives all values from the channel that were sent on it before it was closed, then it terminates automatically.
For the coordination it is recommended to use sync.WaitGroup. Whether you use a global one in this case or a local one and you pass it to producers is up to you. Using a local will make the solution more general and easier to extend. One thing to note is that you must pass a pointer to sync.WaitGroup. Whenever you spin up a new producer, increment the waitgroup using WaitGroup.Add(). When a producer is done, it can signal this using WaitGroup.Done(), preferably using defer (so it runs no matter what, mitigating the deadlock in case of abnormal circumstances). And the controller can wait for all producers to finish using WaitGroup.Wait().
Here's a complete solution:
func producer(ch chan int, d time.Duration, num int, wg *sync.WaitGroup) {
defer wg.Done()
for i := 0; i < num; i++ {
ch <- i
time.Sleep(d)
}
}
func main() {
wg := &sync.WaitGroup{}
ch := make(chan int)
wg.Add(1)
go producer(ch, 100*time.Millisecond, 2, wg)
wg.Add(1)
go producer(ch, 200*time.Millisecond, 5, wg)
go func() {
wg.Wait()
close(ch)
}()
for v := range ch {
fmt.Println(v)
}
}
Output (try it on the Go Playground):
0
0
1
1
2
3
4
See related question: Prevent the main() function from terminating before goroutines finish in Golang
This problem can be solved in an elegant way using two wait groups. By closing channel ch we signal to the consumers that there is no more data.
The solutions scales well with more consumers.
package main
import (
"fmt"
"sync"
"time"
)
func producer(ch chan<- int, d time.Duration, num int, wg *sync.WaitGroup) {
defer wg.Done()
for i := 0; i < num; i++ {
ch <- i
time.Sleep(d)
}
}
func consumer(ch <-chan int, wg *sync.WaitGroup) {
defer wg.Done()
for x := range ch {
fmt.Println(x)
}
}
func main() {
ch := make(chan int)
producers := &sync.WaitGroup{}
consumers := &sync.WaitGroup{}
producers.Add(2)
go producer(ch, 100*time.Millisecond, 2, producers)
go producer(ch, 200*time.Millisecond, 5, producers)
consumers.Add(1)
go consumer(ch, consumers)
producers.Wait()
close(ch)
consumers.Wait()
}
The problem is that <-ch is blocking, so if you don't add any new values to the channel it will block forever. One way is to replace it with a switch select which is also blocking but allows to listen on multiple channels. You would also have to add an exit channel. In your example, as soon as the exit channel received two values we can break. The break statement needs a label because we wanna exit from the switch and the for loop.
https://play.golang.org/p/wGdCulZDnrx
Another way is to have multiple input channels and close them as soon as they are finished sending. For this, each goroutine needs it's own channel, otherwise we will exit when the first goroutine is finished.
A third option is to create a merge function which merges multiple channels into one. This allows for moving the creation of the channels into the producers, so they are created, filled and closed in one location. The merge function is relatively complex but it's removed from the business logic code and can separately be understood and tested. the main code is then reduced to just:
ch1 := producer(100*time.Millisecond, 2)
ch2 := producer(200*time.Millisecond, 5)
for i := range merge(ch1, ch2) {
fmt.Println(i)
}
https://play.golang.org/p/2mv8ILhJPIB
merge func is from https://blog.golang.org/pipelines
You need to synchronize all the asynchronous process in your goroutines. Your main thread and the goroutine threads are not synchronous process. Your main thread will never knew when to stop invoking channel from goroutines. Since your main thread loop over the channel, it always invoke the value from channel, and when the goroutines finished and the channel stop sending value, your main thread cannot get anymore value from the channel, hence the condition become deadlock. To avoid this use sync.WaitGroup to synchronize the asynchronous process.
Here's the code:
package main
import (
"fmt"
"time"
"sync"
)
func producer(ch chan int, d time.Duration, num int, wg *sync.WaitGroup) {
for i:=0; i<num; i++ {
ch <- i;
time.Sleep(d);
}
defer wg.Done();
}
func main() {
wg := &sync.WaitGroup{}
ch := make(chan int);
wg.Add(2);
go producer(ch, 100*time.Millisecond, 2, wg);
go producer(ch, 200*time.Millisecond, 5, wg);
go func() {
wg.Wait()
close(ch)
}()
// print the outputs
for i:= range ch {
fmt.Println(i);
}
}
https://play.golang.org/p/euMTGTIs83g
Hope it helps.
Since my solution looks a little similar to already answered, I change it to my original answer before modification to suit OP question.
Here's the code:
package main
import (
"fmt"
"time"
"sync"
)
// producer produce values tobe sent to consumer
func producer(ch chan int, d time.Duration, num int, wg *sync.WaitGroup) {
defer wg.Done();
for i:=0; i<num; i++ {
ch <- i;
time.Sleep(d);
}
}
// consumer consume all values from producers
func consumer(ch chan int, out chan int, wg *sync.WaitGroup) {
defer wg.Done();
for i:= range ch {
out <- i
}
}
// synchronizer synchronize all goroutines to avoid deadlocks
func synchronizer(ch chan int, out chan int, wgp *sync.WaitGroup, wgc *sync.WaitGroup) {
wgp.Wait()
close(ch)
wgc.Wait()
close(out)
}
func main() {
wgp := &sync.WaitGroup{}
wgc := &sync.WaitGroup{}
ch := make(chan int);
out := make(chan int);
wgp.Add(2);
go producer(ch, 100*time.Millisecond, 2, wgp);
go producer(ch, 200*time.Millisecond, 5, wgp);
wgc.Add(1);
go consumer(ch, out, wgc)
go synchronizer(ch, out, wgp, wgc)
// print the outputs
for i:= range out {
fmt.Println(i);
}
}
Using consumer goroutine to fan-in all input from multiple goroutines and read all values from the consumer goroutine.
Hope it helps.
Simpler answer- one of the producers needs to close the channel, and the consumer can just range over the channel.
package main
import (
"fmt"
"time"
)
func producer(ch chan int, d time.Duration, num int, closer bool) {
for i:=0; i<num; i++ {
ch <- i
time.Sleep(d)
}
if closer {
close(ch)
}
}
func main() {
ch := make(chan int)
go producer(ch, 100*time.Millisecond, 2, false)
go producer(ch, 200*time.Millisecond, 5, true)
for i := range ch {
fmt.Println(i)
}
}
Of course, unless you have a situation where you know which producer will always finish last, you would not want to do this in real code. Better designs are in the WaitGroup-based patterns in the other answers. But this is the simplest way for this code to avoid deadlock.
I wish to know what is the proper way of waiting for a go routine to finish before exiting the program. Reading some other answers it seems that a bool chan will do the trick, as in Playground link
func do_stuff(done chan bool) {
fmt.Println("Doing stuff")
done <- true
}
func main() {
fmt.Println("Main")
done := make(chan bool)
go do_stuff(done)
<-done
//<-done
}
I have two questions here:
why the <- done works at all?
what happens if I uncomment the last line? I have a deadlock error. Is this because the channel is empty and there is no other function sending values to it?
Listening to channel <- done, is a blocking operation, so your program won't continue until true or false is sent i.e. done <- true.
Your question can have a few different answers depending on the circumstance.
For instance, suppose you wanted to parallelize a series of function calls that take a long time.
I would use the sync package for this
package main
import (
"fmt"
"sync"
"time"
)
func main() {
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
longOp()
wg.Done()
}()
}
// will wait until wg.Done is called 10 times
// since we made wg.Add(1) call 10 times
wg.Wait()
}
func longOp() {
time.Sleep(time.Second * 2)
fmt.Println("long op done")
}
Why the <- done works at all?
It works because the runtime detects that you're writing something to the channel somewhere else.
what happens if I uncomment the last line?
The runtime is smart enough to know that there's nothing else being written and it deadlocks.
Bonus, if you're extremely limited on memory, you can use done := make(chan struct{}) and done <- struct{}{}, struct{} is guaranteed to use 0 memory.