How to write to the same channel from multiple goroutines - go

I need several goroutines to write in the same channel. Then all the data is read in one place until all the goroutines complete the process. But I'm not sure how best to close this channel.
this is my example implementation:
func main() {
ch := make(chan data)
wg := &sync.WaitGroup{}
for instance := range dataSet {
wg.Add(1)
go doStuff(ch, instance)
}
go func() {
wg.Wait()
close(ch)
}()
for v := range ch { //range until it closes
//proceed v
}
}
func doStuff(ch chan data, instance data) {
//do some stuff with instance...
ch <- instance
}
but I'm not sure that it is idiomatic.

As you are using WaitGroup and increase the counter at the time of starting new goroutine, you have to notify WaitGroup when a goroutine is finished by calling the Done() method. Also you have to pass the same WaitGroup to the goroutine. You can do it by passing the address of WaitGroup. Otherwise each goroutine will use it's own WaitGroup which will be on different scope.
func main() {
ch := make(chan data)
wg := &sync.WaitGroup{}
for _, instance := range dataSet {
wg.Add(1)
go doStuff(ch, instance, wg)
}
go func() {
wg.Wait()
close(ch)
}()
for v := range ch { //range until it closes
//proceed v
}
}
func doStuff(ch chan data, instance data, wg *sync.WaitGroup) {
//do some stuff with instance...
ch <- instance
// call done method to decrease the counter of WaitGroup
wg.Done()
}

Related

Channels and Wait Groups Entering Deadlock

I'm having trouble wrangling go routines and getting them to communicate back to a channel on the main go routine. To simplify, my code looks something like this:
func main() {
channel := make(chan string)
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go performTest(channel, &wg, i)
}
wg.Wait()
close(channel)
for line := range channel {
fmt.Print(line)
}
}
func performTest(channel chan string, wg *sync.WaitGroup, i int) {
defer wg.Done()
// perform some work here
result := fmt.sprintf("Pretend result %d", i)
channel <- result
}
This seems to enter into some kind of a deadlock, but I don't understand why. It gets stuck on wg.Wait(), even though I would expect it to continue once all the goroutines have called Done on the wait group. What am I missing here? I'd like to wait for the goroutines, and then iterate over all results in the channel.
You can wait for the group and close the channel in a separate go routine. If the channel is closed, your range over the channel will end after the last sent value has been received.
If you just wait, nothing will receive from the channel. Since the channel is unbuffered, the performTest goroutines won't be able to send. For an unbuffered channel, the send operation will block until it has been received. Therefore, the deferred wg.Done call would never happen, and your program is deadlocked. Since Done is only called after the forever-blocking send has been performed.
func main() {
channel := make(chan string)
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go performTest(channel, &wg, i)
}
// this is the trick
go func() {
wg.Wait()
close(channel)
}()
for line := range channel {
fmt.Print(line)
}
}
func performTest(channel chan string, wg *sync.WaitGroup, i int) {
defer wg.Done()
// perform some work here
result := fmt.Sprintf("Pretend result %d\n", i)
channel <- result
}
https://play.golang.com/p/5pACJzwL4Hi

Quit all recursively spawned goroutines at once

I have a function that recursively spawns goroutines to walk a DOM tree, putting the nodes they find into a channel shared between all of them.
import (
"golang.org/x/net/html"
"sync"
)
func walk(doc *html.Node, ch chan *html.Node) {
var wg sync.WaitGroup
defer close(ch)
var f func(*html.Node)
f = func(n *html.Node) {
defer wg.Done()
ch <- n
for c := n.FirstChild; c != nil; c = c.NextSibling {
wg.Add(1)
go f(c)
}
}
wg.Add(1)
go f(doc)
wg.Wait()
}
Which I'd call like
// get the webpage using http
// parse the html into doc
ch := make(chan *html.Node)
go walk(doc, ch)
for c := range ch {
if someCondition(c) {
// do something with c
// quit all goroutines spawned by walk
}
}
I am wondering how I could quit all of these goroutines--i.e. close ch--once I have found a node of a certain type or some other condition has been fulfilled. I have tried using a quit channel that'd be polled before spawning the new goroutines and close ch if a value was received but that lead to race conditions where some goroutines tried sending on the channel that had just been closed by another one. I was pondering using a mutex but it seems inelegant and against the spirit of go to protect a channel with a mutex. Is there an idiomatic way to do this using channels? If not, is there any way at all? Any input appreciated!
The context package provides similar functionality. Using context.Context with a few Go-esque patterns, you can achieve what you need.
To start you can check this article to get a better feel of cancellation with context: https://www.sohamkamani.com/blog/golang/2018-06-17-golang-using-context-cancellation/
Also make sure to check the official GoDoc: https://golang.org/pkg/context/
So to achieve this functionality your function should look more like:
func walk(ctx context.Context, doc *html.Node, ch chan *html.Node) {
var wg sync.WaitGroup
defer close(ch)
var f func(*html.Node)
f = func(n *html.Node) {
defer wg.Done()
ch <- n
for c := n.FirstChild; c != nil; c = c.NextSibling {
select {
case <-ctx.Done():
return // quit the function as it is cancelled
default:
wg.Add(1)
go f(c)
}
}
}
select {
case <-ctx.Done():
return // perhaps it was cancelled so quickly
default:
wg.Add(1)
go f(doc)
wg.Wait()
}
}
And when calling the function, you will have something like:
// ...
ctx, cancelFunc := context.WithCancel(context.Background())
walk(ctx, doc, ch)
for value := range ch {
// ...
if someCondition {
cancelFunc()
// the for loop will automatically exit as the channel is being closed for the inside
}
}

Best way of using sync.WaitGroup with external function

I have some issues with the following code:
package main
import (
"fmt"
"sync"
)
// This program should go to 11, but sometimes it only prints 1 to 10.
func main() {
ch := make(chan int)
var wg sync.WaitGroup
wg.Add(2)
go Print(ch, wg) //
go func(){
for i := 1; i <= 11; i++ {
ch <- i
}
close(ch)
defer wg.Done()
}()
wg.Wait() //deadlock here
}
// Print prints all numbers sent on the channel.
// The function returns when the channel is closed.
func Print(ch <-chan int, wg sync.WaitGroup) {
for n := range ch { // reads from channel until it's closed
fmt.Println(n)
}
defer wg.Done()
}
I get a deadlock at the specified place. I have tried setting wg.Add(1) instead of 2 and it solves my problem. My belief is that I'm not successfully sending the channel as an argument to the Printer function. Is there a way to do that? Otherwise, a solution to my problem is replacing the go Print(ch, wg)line with:
go func() {
Print(ch)
defer wg.Done()
}
and changing the Printer function to:
func Print(ch <-chan int) {
for n := range ch { // reads from channel until it's closed
fmt.Println(n)
}
}
What is the best solution?
Well, first your actual error is that you're giving the Print method a copy of the sync.WaitGroup, so it doesn't call the Done() method on the one you're Wait()ing on.
Try this instead:
package main
import (
"fmt"
"sync"
)
func main() {
ch := make(chan int)
var wg sync.WaitGroup
wg.Add(2)
go Print(ch, &wg)
go func() {
for i := 1; i <= 11; i++ {
ch <- i
}
close(ch)
defer wg.Done()
}()
wg.Wait() //deadlock here
}
func Print(ch <-chan int, wg *sync.WaitGroup) {
for n := range ch { // reads from channel until it's closed
fmt.Println(n)
}
defer wg.Done()
}
Now, changing your Print method to remove the WaitGroup of it is a generally good idea: the method doesn't need to know something is waiting for it to finish its job.
I agree with #Elwinar's solution, that the main problem in your code caused by passing a copy of your Waitgroup to the Print function.
This means the wg.Done() is operated on a copy of wg you defined in the main. Therefore, wg in the main could not get decreased, and thus a deadlock happens when you wg.Wait() in main.
Since you are also asking about the best practice, I could give you some suggestions of my own:
Don't remove defer wg.Done() in Print. Since your goroutine in main is a sender, and print is a receiver, removing wg.Done() in receiver routine will cause an unfinished receiver. This is because only your sender is synced with your main, so after your sender is done, your main is done, but it's possible that the receiver is still working. My point is: don't leave some dangling goroutines around after your main routine is finished. Close them or wait for them.
Remember to do panic recovery everywhere, especially anonymous goroutine. I have seen a lot of golang programmers forgetting to put panic recovery in goroutines, even if they remember to put recover in normal functions. It's critical when you want your code to behave correctly or at least gracefully when something unexpected happened.
Use defer before every critical calls, like sync related calls, at the beginning since you don't know where the code could break. Let's say you removed defer before wg.Done(), and a panic occurrs in your anonymous goroutine in your example. If you don't have panic recover, it will panic. But what happens if you have a panic recover? Everything's fine now? No. You will get deadlock at wg.Wait() since your wg.Done() gets skipped because of panic! However, by using defer, this wg.Done() will be executed at the end, even if panic happened. Also, defer before close is important too, since its result also affects the communication.
So here is the code modified according to the points I mentioned above:
package main
import (
"fmt"
"sync"
)
func main() {
ch := make(chan int)
var wg sync.WaitGroup
wg.Add(2)
go Print(ch, &wg)
go func() {
defer func() {
if r := recover(); r != nil {
println("panic:" + r.(string))
}
}()
defer func() {
wg.Done()
}()
for i := 1; i <= 11; i++ {
ch <- i
if i == 7 {
panic("ahaha")
}
}
println("sender done")
close(ch)
}()
wg.Wait()
}
func Print(ch <-chan int, wg *sync.WaitGroup) {
defer func() {
if r := recover(); r != nil {
println("panic:" + r.(string))
}
}()
defer wg.Done()
for n := range ch {
fmt.Println(n)
}
println("print done")
}
Hope it helps :)

Implementing a job worker pool in Go

Since Go does not have generics, all the premade solutions use type casting which I do not like very much. I also want to implement it on my own and tried the following code. However, sometimes it does not wait for all goroutines, am I closing the jobs channel prematurely? I do not have anything to fetch from them. I might have used a pseudo output channel too and waited to fetch the exact amount from them however I believe the following code should work too. What am I missing?
func jobWorker(id int, jobs <-chan string, wg sync.WaitGroup) {
wg.Add(1)
defer wg.Done()
for job := range jobs {
item := ParseItem(job)
item.SaveItem()
MarkJobCompleted(item.ID)
log.Println("Saved", item.Title)
}
}
// ProcessJobs processes the jobs from the list and deletes them
func ProcessJobs() {
jobs := make(chan string)
list := GetJobs()
// Start workers
var wg sync.WaitGroup
for w := 0; w < 10; w++ {
go jobWorker(w, jobs, wg)
}
for _, url := range list {
jobs <- url
}
close(jobs)
wg.Wait()
}
Call wg.Add outside of the goroutine and pass a pointer to the wait group.
If Add is called from inside the goroutine, it's possible for the main goroutine to call Wait before the goroutines get a chance to run. If Add has not been called, then Wait will return immediately.
Pass a pointer to the goroutine. Otherwise, the goroutines use their own copy of the wait group.
func jobWorker(id int, jobs <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for job := range jobs {
item := ParseItem(job)
item.SaveItem()
MarkJobCompleted(item.ID)
log.Println("Saved", item.Title)
}
}
// ProcessJobs processes the jobs from the list and deletes them
func ProcessJobs() {
jobs := make(chan string)
list := GetJobs()
// Start workers
var wg sync.WaitGroup
for w := 0; w < 10; w++ {
wg.Add(1)
go jobWorker(w, jobs, &wg)
}
for _, url := range list {
jobs <- url
}
close(jobs)
wg.Wait()
}
You need to pass a pointer to the waitgroup, or else every job receives it's own copy.
func jobWorker(id int, jobs <-chan string, wg *sync.WaitGroup) {
wg.Add(1)
defer wg.Done()
for job := range jobs {
item := ParseItem(job)
item.SaveItem()
MarkJobCompleted(item.ID)
log.Println("Saved", item.Title)
}
}
// ProcessJobs processes the jobs from the list and deletes them
func ProcessJobs() {
jobs := make(chan string)
list := GetJobs()
// Start workers
var wg sync.WaitGroup
for w := 0; w < 10; w++ {
go jobWorker(w, jobs, &wg)
}
for _, url := range list {
jobs <- url
}
close(jobs)
wg.Wait()
}
See the difference here: without pointer, with pointer.

Go WaitGroup with goroutine

I wonder why we need to run the wg.Wait() in goroutine
// This one works as expected...
func main() {
var wg sync.WaitGroup
for i:=0; i<5; i++ {
wg.Add(1)
go func() {
defer wg.Done()
}()
time.Sleep(time.Second)
}
go func() {
wg.Wait()
}()
}
But this one never ends waiting forever
func main() {
var wg sync.WaitGroup
for i:=0; i<5; i++ {
wg.Add(1)
go func() {
defer wg.Done()
}()
time.Sleep(time.Second)
}
wg.Wait()
}
Can anybody explain why I need to wait in another goroutine?
Thanks!
why we need to run the wg.Wait() in goroutine?
In the example you mention (coop/blob/master/coop.go#L85), the wait is in a goroutine in order to return immediately a channel that will indicate when all the other goroutines have completed.
Those are the goroutines to be started:
for _, fn := range fns {
go func(f func()) {
f()
wg.Done()
}(fn)
}
They mention the completion through the var wg sync.WaitGroup.
That WaitGroup is set to wait for the right number of goroutines to finish:
wg.Add(len(fns))
The wait is done in a goroutine because it will in turn signal the global completion to a channel:
go func() {
wg.Wait()
doneSig(ch, true)
}()
But the channel is returned immediately.
ch := make(chan bool, 1)
...
return ch
That is why the Wait is done asynchronously: you don't want to wait in this function. You just want the channel.

Resources