Need run a goroutine for WaitGroup in a function - go

I've read a pipeline article from https://go.dev/blog/pipelines.
I've tried to remove the goroutine that covers wg.Wait() but it turns out that caused all goroutines are asleep - deadlock!
I really don't understand why.
func merge(cs ...<-chan int) <-chan int {
var wg sync.WaitGroup
out := make(chan int)
// Start an output goroutine for each input channel in cs. output
// copies values from c to out until c is closed, then calls wg.Done.
output := func(c <-chan int) {
for n := range c {
out <- n
}
wg.Done()
}
wg.Add(len(cs))
for _, c := range cs {
go output(c)
}
// Start a goroutine to close out once all the output goroutines are
// done. This must start after the wg.Add call.
go func() {
wg.Wait()
close(out)
}()
return out
}

out is a blocking channel. You need to return out and let something consume from out, or else the output goroutines will block as soon as they try to write anything. If merge waits on the waitgroup (i.e. waits for everything to finish) before the consumer even has a chance to start, that's an inevitable deadlock.

Related

Channels and Wait Groups Entering Deadlock

I'm having trouble wrangling go routines and getting them to communicate back to a channel on the main go routine. To simplify, my code looks something like this:
func main() {
channel := make(chan string)
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go performTest(channel, &wg, i)
}
wg.Wait()
close(channel)
for line := range channel {
fmt.Print(line)
}
}
func performTest(channel chan string, wg *sync.WaitGroup, i int) {
defer wg.Done()
// perform some work here
result := fmt.sprintf("Pretend result %d", i)
channel <- result
}
This seems to enter into some kind of a deadlock, but I don't understand why. It gets stuck on wg.Wait(), even though I would expect it to continue once all the goroutines have called Done on the wait group. What am I missing here? I'd like to wait for the goroutines, and then iterate over all results in the channel.
You can wait for the group and close the channel in a separate go routine. If the channel is closed, your range over the channel will end after the last sent value has been received.
If you just wait, nothing will receive from the channel. Since the channel is unbuffered, the performTest goroutines won't be able to send. For an unbuffered channel, the send operation will block until it has been received. Therefore, the deferred wg.Done call would never happen, and your program is deadlocked. Since Done is only called after the forever-blocking send has been performed.
func main() {
channel := make(chan string)
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go performTest(channel, &wg, i)
}
// this is the trick
go func() {
wg.Wait()
close(channel)
}()
for line := range channel {
fmt.Print(line)
}
}
func performTest(channel chan string, wg *sync.WaitGroup, i int) {
defer wg.Done()
// perform some work here
result := fmt.Sprintf("Pretend result %d\n", i)
channel <- result
}
https://play.golang.com/p/5pACJzwL4Hi

Exiting forever loop using channels - issues with Go Playground

I am trying to implement a simple logic where a Producer sends data to a channel ch with an forever for loop and a Consumer reads from the channel ch.
The Producer stops producing and exit the forever loop when it receives a signal on the channel quit.
The code is this (see also this playground)
func main() {
ch := make(chan int)
quit := make(chan bool)
var wg sync.WaitGroup
wg.Add(1)
go produce(ch, quit, &wg)
go consume(ch)
time.Sleep(1 * time.Millisecond)
fmt.Println("CLOSE")
close(quit)
wg.Wait()
}
func produce(ch chan int, quit chan bool, wg *sync.WaitGroup) {
for i := 0; ; i++ {
select {
case <-quit:
close(ch)
fmt.Println("exit")
wg.Done()
return //we exit
default:
ch <- i
fmt.Println("Producer sends", i)
}
}
}
func consume(ch chan int) {
for {
runtime.Gosched() // give the opportunity to the main goroutine to close the "quit" channel
select {
case i, more := <-ch:
if !more {
fmt.Println("exit consumer")
return
}
fmt.Println("Consumer receives", i)
}
}
}
If I run this piece of code on my machine (a Mac with 4 cores) everything works fine. If I try the same code on the Go Playgroud it always times out. I guess that this because the Go Playground is a single core and so the infinite loop does not give the chance to other goroutines to run, but then I do not understand why the instruction runtime.Gosched() does not have any effect.
Just to complete the picture I have seen that, if I set GOMAXPROCS=1 on my Mac, the program still works fine and exits as expected. If I set GOMAXPROCS=1 on my Mac and remove the runtime.Gosched() instruction, the behavior gets brittle: sometimes the program terminates as expected, some other times it seems to never exit the infinite loop.
You created a pathological situation that shouldn't happen in a real program, so the scheduler is not optimized to handle this. Combined with the fake time implementation in the playground, and you get far too many cycles of the producer and consumer before hitting a timeout.
The producer goroutine is creating values as fast as possible, while the consumer is always ready to receive them. With GOMAPXPROCS=1, the scheduler spends all its time bouncing between the two before it is forced to preempt the available work to check on the main goroutine, which takes longer than the playground will allow.
If we add something for the producer-consumer pair to do, we can limit the amount of time they have to monopolize the scheduler. For example, adding a time.Sleep(time.Microsecond) to the consumer will cause the playground to print 1000 values. This also goes to show how "accurate" the simulated time is in the playground, since that would not be possible with normal hardware which takes a non-zero amount time to process each message.
While an interesting case, this has little bearing on real programs.
A few notes, you can range over a channel to receive all values, you should always defer wg.Done at the start of the goroutine when possible, you can send values in the select case which allows you to actually cancel the for-select loop when the send isn't ready, and if you want the "exit consumer" message you need to send the WaitGroup to the consumer as well.
https://play.golang.org/p/WyPmpY9pFl7
func main() {
ch := make(chan int)
quit := make(chan bool)
var wg sync.WaitGroup
wg.Add(2)
go produce(ch, quit, &wg)
go consume(ch, &wg)
time.Sleep(50 * time.Microsecond)
fmt.Println("CLOSE")
close(quit)
wg.Wait()
}
func produce(ch chan int, quit chan bool, wg *sync.WaitGroup) {
defer wg.Done()
for i := 0; ; i++ {
select {
case <-quit:
close(ch)
fmt.Println("exit")
return
case ch <- i:
fmt.Println("Producer sends", i)
}
}
}
func consume(ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
for i := range ch {
fmt.Println("Consumer receives", i)
time.Sleep(time.Microsecond)
}
fmt.Println("exit consumer")
return
}

Race condition between close and send to channel

I'm trying to build a generic pipeline library using worker pools. I created an interface for a source, pipe, and sink. You see, the pipe's job is to receive data from an input channel, process it, and output the result onto a channel. Here is its intended behavior:
Receive data from an input channel.
Delegate the data to an available worker.
The worker sends the result to the output channel.
Close the output channel once all workers are finished.
func (p *pipe) Process(in chan interface{}) (out chan interface{}) {
var wg sync.WaitGroup
out = make(chan interface{}, 100)
go func() {
for i := 1; i <= 100; i++ {
go p.work(in, out, &wg)
}
wg.Wait()
close(out)
}()
return
}
func (p *pipe) work(jobs <-chan interface{}, out chan<- interface{}, wg *sync.WaitGroup) {
for j := range jobs {
func(j Job) {
defer wg.Done()
wg.Add(1)
res := doSomethingWith(j)
out <- res
}(j)
}
}
However, running it may either exit without processing all of the inputs or panic with a send on closed channel message. Building the source with the -race flag gives out a data race warning between close(out) and out <- res.
Here's what I think might happen. Once a number of workers have finished their jobs, there's a split second where wg's counter reach zero. Hence, wg.Wait() is done and the program proceeds to close(out). Meanwhile, the job channel isn't finished producing data, meaning some workers are still running in another goroutine. Since the out channel is already closed, it results in a panic.
Should the wait group be placed somewhere else? Or is there a better way to wait for all workers to finish?
It's not clear why you want one worker per job, but if you do, you can restructure your outer loop setup (see untested code below). This kind of obviates the need for worker pools in the first place.
Always, though, do a wg.Add before spinning off any worker. Right here, you are spinning off exactly 100 workers:
var wg sync.WaitGroup
out = make(chan interface{}, 100)
go func() {
for i := 1; i <= 100; i++ {
go p.work(in, out, &wg)
}
wg.Wait()
close(out)
}()
You could therefore do this:
var wg sync.WaitGroup
out = make(chan interface{}, 100)
go func() {
wg.Add(100) // ADDED - count the 100 workers
for i := 1; i <= 100; i++ {
go p.work(in, out, &wg)
}
wg.Wait()
close(out)
}()
Note that you can now move wg itself down into the goroutine that spins off the workers. This can make things cleaner, if you give up on the notion of having each worker spin off jobs as new goroutines. But if each worker is going to spin off another goroutine, that worker itself must also use wg.Add, like this:
for j := range jobs {
wg.Add(1) // ADDED - count the spun-off goroutines
func(j Job) {
res := doSomethingWith(j)
out <- res
wg.Done() // MOVED (for illustration only, can defer as before)
}(j)
}
wg.Done() // ADDED - our work in `p.work` is now done
That is, each anonymous function is another user of the channel, so increment the users-of-channel count (wg.Add(1)) before spinning off a new goroutine. When you have finished reading the input channel jobs, call wg.Done() (perhaps via an earlier defer, but I showed it at the end here).
The key to thinking about this is that wg counts the number of active goroutines that could, at this point, write to the channel. It only goes to zero when no goroutines intend to write any more. That makes it safe to close the channel.
Consider using the rather simpler (but untested):
func (p *pipe) Process(in chan interface{}) (out chan interface{}) {
out = make(chan interface{})
var wg sync.WaitGroup
go func() {
defer close(out)
for j := range in {
wg.Add(1)
go func(j Job) {
res := doSomethingWith(j)
out <- res
wg.Done()
}(j)
}
wg.Wait()
}()
return out
}
You now have one goroutine that is reading the in channel as fast as it can, spinning off jobs as it goes. You'll get one goroutine per incoming job, except when they finish their work early. There is no pool, just one worker per job (same as your code except that we knock out the pools that aren't doing anything useful).
Or, since there are only some number of CPUs available, spin off some number of goroutines as you did before at the start, but have each one run one job to completion, and deliver its result, then go back to reading the next job:
func (p *pipe) Process(in chan interface{}) (out chan interface{}) {
out = make(chan interface{})
go func() {
defer close(out)
var wg sync.WaitGroup
ncpu := runtime.NumCPU() // or something fancier if you like
wg.Add(ncpu)
for i := 0; i < ncpu; i++ {
go func() {
defer wg.Done()
for j := range in {
out <- doSomethingWith(j)
}
}()
}
wg.Wait()
}
return out
}
By using runtime.NumCPU() we get only as many workers reading jobs as there are CPUs to run jobs. Those are the pools and they only do one job at a time.
There's generally no need to buffer the output channel, if the output-channel readers are well-structured (i.e., don't cause the pipeline to constipate). If they're not, the depth of buffering here limits how many jobs you can "work ahead" of whoever is consuming the results. Set it based on how useful it is to do this "working ahead"—not necessarily the number of CPUs, or the number of expected jobs, or whatever.
It's possible that the jobs are being completed just as fast as they're being sent. In this case the WaitGroup will be floating near zero even while there's many more items to process.
One fix for this is to add one before sending jobs, and decrement that one after sending them all, effectively consider the sender to be one of the 'jobs'. In this case, it's better if we do the wg.Add in the sender:
func (p *pipe) Process(in chan interface{}) (out chan interface{}) {
var wg sync.WaitGroup
out = make(chan interface{}, 100)
go func() {
for i := 1; i <= 100; i++ {
wg.Add(1)
go p.work(in, out, &wg)
}
wg.Wait()
close(out)
}()
return
}
func (p *pipe) work(jobs <-chan interface{}, out chan<- interface{}, wg *sync.WaitGroup) {
for j := range jobs {
func(j Job) {
res := doSomethingWith(j)
out <- res
wg.Done()
}(j)
}
}
One thing I notice in the code is that a goroutine is started for each job. At the same time each job processes the jobs channel in a loop until empty/closed. It doesn't seem necessary to do both.

A case of `all goroutines are asleep - deadlock!` I can't figure out why

TL;DR: A typical case of all goroutines are asleep, deadlock! but can't figure it out
I'm parsing the Wiktionary XML dump to build a DB of words. I defer the parsing of each article's text to a goroutine hoping that it will speed up the process.
It's 7GB and is processed in under 2 minutes in my machine when doing it serially, but if I can take advantage of all cores, why not.
I'm new to threading in general, I'm getting a all goroutines are asleep, deadlock! error.
What's wrong here?
This may not be performant at all, as it uses an unbuffered channel, so all goroutines effectively end up executing serially, but my idea is to learn and understand threading and to benchmark how long it takes with different alternatives:
unbuffered channel
different sized buffered channel
only calling as many goroutines at a time as there are runtime.NumCPU()
The summary of my code in pseudocode:
while tag := xml.getNextTag() {
wg.Add(1)
go parseTagText(chan, wg, tag.text)
// consume a channel message if available
select {
case msg := <-chan:
// do something with msg
default:
}
}
// reading tags finished, wait for running goroutines, consume what's left on the channel
for msg := range chan {
// do something with msg
}
// Sometimes this point is never reached, I get a deadlock
wg.Wait()
----
func parseTagText(chan, wg, tag.text) {
defer wg.Done()
// parse tag.text
chan <- whatever // just inform that the text has been parsed
}
Complete code:
https://play.golang.org/p/0t2EqptJBXE
In your complete example on the Go Playground, you:
Create a channel (line 39, results := make(chan langs)) and a wait-group (line 40, var wait sync.WaitGroup). So far so good.
Loop: in the loop, sometimes spin off a task:
if ...various conditions... {
wait.Add(1)
go parseTerm(results, &wait, text)
}
In the loop, sometimes do a non-blocking read from the channel (as shown in your question). No problem here either. But...
At the end of the loop, use:
for res := range results {
...
}
without ever calling close(results) in exactly one place, after all writers finish. This loop uses a blocking read from the channel. As long as some writer goroutine is still running, the blocking read can block without having the whole system stop, but when the last writer finishes writing and exits, there are no remaining writer goroutines. Any other remaining goroutines might rescue you, but there are none.
Since you use the var wait correctly (adding 1 in the right place, and calling Done() in the right place in the writer), the solution is to add one more goroutine, which will be the one to rescue you:
go func() {
wait.Wait()
close(results)
}()
You should spin off this rescuer goroutine just before entering the for res := range results loop. (If you spin it off any earlier, it might see the wait variable count down to zero too soon, just before it gets counted up again by spinning off another parseTerm.)
This anonymous function will block in the wait variable's Wait() function until the last writer goroutine has called the final wait.Done(), which will unblock this goroutine. Then this goroutine will call close(results), which will arrange for the for loop in your main goroutine to finish, unblocking that goroutine. When this goroutine (the rescuer) returns and thus terminates, there are no more rescuers, but we no longer need any.
(This main code then calls wait.Wait() unnecessarily: Since the for didn't terminate until the wait.Wait() in the new goroutine already unblocked, we know that this next wait.Wait() will return immediately. So we can drop this second call, although leaving it in is harmless.)
The problem is that nothing is closing the results channel, yet the range loop only exits when it closes. I've simplified your code to illustrate this and propsed a solution - basically consume the data in a goroutine:
// This is our producer
func foo(i int, ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
ch <- i
fmt.Println(i, "done")
}
// This is our consumer - it uses a different WG to signal it's done
func consumeData(ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
for x := range ch {
fmt.Println(x)
}
fmt.Println("ALL DONE")
}
func main() {
ch := make(chan int)
wg := sync.WaitGroup{}
// create the producers
for i := 0; i < 10; i++ {
wg.Add(1)
go foo(i, ch, &wg)
}
// create the consumer on a different goroutine, and sync using another WG
consumeWg := sync.WaitGroup{}
consumeWg.Add(1)
go consumeData(ch,&consumeWg)
wg.Wait() // <<<< means that the producers are done
close(ch) // << Signal the consumer to exit
consumeWg.Wait() // << Wait for the consumer to exit
}

Channel has a strange behaviorior, why block?

go version go1.11.4 darwin/amd64
A new channel and goroutine were created, and the content of the old channel was transferred to the new channel through goroutine. It should not block, but after testing, it was found to be blocked.
thanks.
type waiter struct {
ch1 chan struct{}
ch2 <-chan time.Time
limit int
count int
}
func (w *waiter) recv1Block() chan struct{} {
ch := make(chan struct{})
go func() {
for m := range w.ch1 {
ch <- m
}
}()
return ch
}
func (w *waiter) runBlock(wg *sync.WaitGroup) {
defer wg.Done()
i := 0
for i < w.limit {
select {
case <-w.recv1Block(): **// why block here?**
i++
case <-w.recv2():
}
}
w.count = i
}
why recv1Block will be block.
Every time you call recv1Block(), it creates a new channel and launches a background goroutine that tries to read all of the data from it. Since you're calling it in a loop, you will have many things all trying to consume the data from the channel; since the channel never closes, all of the goroutines will run forever.
As an exercise, you might try changing your code to pass around a chan int instead of a chan struct{}, and write a series of sequential numbers, and print them out as they're ultimately received. A sequence like this is valid:
run on goroutine #1 calls recv1Block().
recv1Block() on GR#1 spawns GR#2, and returns channel#2.
run on GR#1 blocks receiving on channel#2.
recv1Block() on GR#2 reads 0 from w.c1.
recv1Block() on GR#2 writes 0 to channel#2 (where run on GR#1 is ready to read).
recv1Block() on GR#2 reads 1 from w.c1.
recv1Block() on GR#2 wants to write 0 to channel#2 but blocks.
run on GR#1 wakes up, and receives the 0.
run on GR#1 calls recv1Block().
recv1Block() on GR#1 spawns GR#3, and returns channel #3.
recv1Block() on GR#3 reads 2 from w.c1.
...
Notice that the value 1 in this sequence will never be written anywhere, and in fact there is nothing left that could read it.
The easy solution here is to not call the channel-creating function in a loop:
func (w *waiter) runBlock(wg *sync.WaitGroup) {
defer wg.Done()
ch1 := w.recv1Block()
ch2 := w.recv2()
for {
select {
case _, ok := <-ch1:
if !ok {
return
}
w.count++
case <-ch2:
}
}
It's also standard practice to close channels when you're done with them. This will terminate a for ... range ch loop, and it will appear as readable to a select statement. In your top-level generator function:
for i := 0; i < w.limit; i++ {
w.ch1 <- struct{}{}
}
close(w.ch1)
And in your "copy the channel" function:
func (w *waiter) recv1Block() chan struct{} {
ch := make(chan struct{})
go func() {
for m := range w.ch1 {
ch <- m
}
close(ch)
}()
return ch
}
This also means that you don't need to run the main loop by "dead reckoning", expecting it to produce exactly 100 items then stop; you can stop whenever its channel exits. The consumer loop I show above does this.

Resources