Why does this WaitGroup sometimes not wait for all goroutines? - go

The code below outputs 2 sometimes. Why isn't the wait group waiting for all the goroutines to complete ?
type Scratch struct {
//sync.RWMutex
Itch []int
}
func (s *Scratch) GoScratch(done chan bool, j int) error {
var ws sync.WaitGroup
if len(s.Itch) == 0 {
s.Rash = make([]int, 0)
}
for i := 0; i < j; i++ {
ws.Add(1)
go func (i int) {
defer ws.Done()
s.Rash = append(s.Rash, i)
}(i)
}
ws.Wait()
done<- true
return nil
}
func main() {
done := make(chan bool, 3)
s := &Scratch{}
err := s.GoScratch(done, 3)
if err != nil {
log.Println("Error:%v",err)
}
<-done
log.Println("Length: ", len(s.Rash))
}`
Strangely i can't get it to output 2 with a main function but when I use a test case it outputs 2 sometimes.

There is a race condition in your code. It is right here:
go func (i int) {
defer ws.Done()
// race condition on s.Rash access
s.Rash = append(s.Rash, i)
}(i)
Since all the goroutines access s.Rash concurrently, this may cause the slice updates to be overwritten. Try running the same code with sync.Mutex locking to prevent this:
// create a global mutex
var mutex = &sync.Mutex{}
// use mutex to prevent race condition
go func (i int) {
defer ws.Done()
defer mutex.Unlock() // ensure that mutex unlocks
// Lock the resource before accessing it
mutex.Lock()
s.Rash = append(s.Rash, i)
}(i)
You can read more about this here and here.

If you run your code with race detector
go test -race .
you will find out race condition on slice s.Rash.

Related

Golang, proper way to restart a routine that panicked

I have the following sample code. I want to maintain 4 goroutines running at all times. They have the possibility of panicking. In the case of the panic, I have a recover where I restart the goroutine.
The way I implemented works but I am not sure whether its the correct and proper way to do this. Any thoughts
package main
import (
"fmt"
"time"
)
var gVar string
var pCount int
func pinger(c chan int) {
for i := 0; ; i++ {
fmt.Println("adding ", i)
c <- i
}
}
func printer(id int, c chan int) {
defer func() {
if err := recover(); err != nil {
fmt.Println("HERE", id)
fmt.Println(err)
pCount++
if pCount == 5 {
panic("TOO MANY PANICS")
} else {
go printer(id, c)
}
}
}()
for {
msg := <-c
fmt.Println(id, "- ping", msg, gVar)
if msg%5 == 0 {
panic("PANIC")
}
time.Sleep(time.Second * 1)
}
}
func main() {
var c chan int = make(chan int, 2)
gVar = "Preflight"
pCount = 0
go pinger(c)
go printer(1, c)
go printer(2, c)
go printer(3, c)
go printer(4, c)
var input string
fmt.Scanln(&input)
}
You can extract the recover logic in a function such as:
func recoverer(maxPanics, id int, f func()) {
defer func() {
if err := recover(); err != nil {
fmt.Println("HERE", id)
fmt.Println(err)
if maxPanics == 0 {
panic("TOO MANY PANICS")
} else {
go recoverer(maxPanics-1, id, f)
}
}
}()
f()
}
And then use it like:
go recoverer(5, 1, func() { printer(1, c) })
Like Zan Lynx's answer, I'd like to share another way to do it (although it's pretty much similar to OP's way.) I used an additional buffered channel ch. When a goroutine panics, the recovery function inside the goroutine send its identity i to ch. In for loop at the bottom of main(), it detects which goroutine's in panic and whether to restart by receiving values from ch.
Run in Go Playground
package main
import (
"fmt"
"time"
)
func main() {
var pCount int
ch := make(chan int, 5)
f := func(i int) {
defer func() {
if err := recover(); err != nil {
ch <- i
}
}()
fmt.Printf("goroutine f(%v) started\n", i)
time.Sleep(1000 * time.Millisecond)
panic("goroutine in panic")
}
go f(1)
go f(2)
go f(3)
go f(4)
for {
i := <-ch
pCount++
if pCount >= 5 {
fmt.Println("Too many panics")
break
}
fmt.Printf("Detected goroutine f(%v) panic, will restart\n", i)
f(i)
}
}
Oh, I am not saying that the following is more correct than your way. It is just another way to do it.
Create another function, call it printerRecover or something like it, and do your defer / recover in there. Then in printer just loop on calling printerRecover. Add in function return values to check if you need the goroutine to exit for some reason.
The way you implemented is correct. Just for me the approach to maintain exactly 4 routines running at all times looks not much go_way, either handling routine's ID, either spawning in defer which may leads unpredictable stack due to closure. I don't think you can efficiently balance resource this way. Why don't you like to simple spawn worker when it needed
func main() {
...
go func(tasks chan int){ //multiplexer
for {
task = <-tasks //when needed
go printer(task) //just spawns handler
}
}(ch)
...
}
and let runtime do its job? This way things are done in stdlib listeners/servers and them known to be efficient enough. goroutines are very lightweight to spawn and runtime is quite smart to balance load. Sure you must to recover either way. It is my very personal opinion.

Idiomatic goroutine termination and error handling

I have a simple concurrency use case in go, and I cannot figure out an elegant solution to my problem.
I want to write a method fetchAll that queries an unspecified number of resources from remote servers in parallel. If any of the fetches fails, I want to return that first error immediately.
My initial implementation leaks goroutines:
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
func fetchAll() error {
wg := sync.WaitGroup{}
errs := make(chan error)
leaks := make(map[int]struct{})
defer fmt.Println("these goroutines leaked:", leaks)
// run all the http requests in parallel
for i := 0; i < 4; i++ {
leaks[i] = struct{}{}
wg.Add(1)
go func(i int) {
defer wg.Done()
defer delete(leaks, i)
// pretend this does an http request and returns an error
time.Sleep(time.Duration(rand.Intn(100)) * time.Millisecond)
errs <- fmt.Errorf("goroutine %d's error returned", i)
}(i)
}
// wait until all the fetches are done and close the error
// channel so the loop below terminates
go func() {
wg.Wait()
close(errs)
}()
// return the first error
for err := range errs {
if err != nil {
return err
}
}
return nil
}
func main() {
fmt.Println(fetchAll())
}
Playground: https://play.golang.org/p/Be93J514R5
I know from reading https://blog.golang.org/pipelines that I can create a signal channel to cleanup the other threads. Alternatively, I could probably use context to accomplish it. But it seems like such a simple use case should have a simpler solution that I'm missing.
Using Error Group makes this even simpler. This automatically waits for all the supplied Go Routines to complete successfully, or cancels all those remaining in the case of any one routine returning an error (in which case that error is the one bubble back up to the caller).
package main
import (
"context"
"fmt"
"math/rand"
"time"
"golang.org/x/sync/errgroup"
)
func fetchAll(ctx context.Context) error {
errs, ctx := errgroup.WithContext(ctx)
// run all the http requests in parallel
for i := 0; i < 4; i++ {
errs.Go(func() error {
// pretend this does an http request and returns an error
time.Sleep(time.Duration(rand.Intn(100)) * time.Millisecond)
return fmt.Errorf("error in go routine, bailing")
})
}
// Wait for completion and return the first error (if any)
return errs.Wait()
}
func main() {
fmt.Println(fetchAll(context.Background()))
}
All but one of your goroutines are leaked, because they're still waiting to send to the errs channel - you never finish the for-range that empties it. You're also leaking the goroutine who's job is to close the errs channel, because the waitgroup is never finished.
(Also, as Andy pointed out, deleting from map is not thread-safe, so that'd need protection from a mutex.)
However, I don't think maps, mutexes, waitgroups, contexts etc. are even necessary here. I'd rewrite the whole thing to just use basic channel operations, something like the following:
package main
import (
"fmt"
"math/rand"
"time"
)
func fetchAll() error {
var N = 4
quit := make(chan bool)
errc := make(chan error)
done := make(chan error)
for i := 0; i < N; i++ {
go func(i int) {
// dummy fetch
time.Sleep(time.Duration(rand.Intn(100)) * time.Millisecond)
err := error(nil)
if rand.Intn(2) == 0 {
err = fmt.Errorf("goroutine %d's error returned", i)
}
ch := done // we'll send to done if nil error and to errc otherwise
if err != nil {
ch = errc
}
select {
case ch <- err:
return
case <-quit:
return
}
}(i)
}
count := 0
for {
select {
case err := <-errc:
close(quit)
return err
case <-done:
count++
if count == N {
return nil // got all N signals, so there was no error
}
}
}
}
func main() {
rand.Seed(time.Now().UnixNano())
fmt.Println(fetchAll())
}
Playground link: https://play.golang.org/p/mxGhSYYkOb
EDIT: There indeed was a silly mistake, thanks for pointing it out. I fixed the code above (I think...). I also added some randomness for added Realism™.
Also, I'd like to stress that there really are multiple ways to approach this problem, and my solution is but one way. Ultimately it comes down to personal taste, but in general, you want to strive towards "idiomatic" code - and towards a style that feels natural and easy to understand for you.
Here's a more complete example using errgroup suggested by joth. It shows processing successful data, and will exit on the first error.
https://play.golang.org/p/rU1v-Mp2ijo
package main
import (
"context"
"fmt"
"golang.org/x/sync/errgroup"
"math/rand"
"time"
)
func fetchAll() error {
g, ctx := errgroup.WithContext(context.Background())
results := make(chan int)
for i := 0; i < 4; i++ {
current := i
g.Go(func() error {
// Simulate delay with random errors.
time.Sleep(time.Duration(rand.Intn(100)) * time.Millisecond)
if rand.Intn(2) == 0 {
return fmt.Errorf("goroutine %d's error returned", current)
}
// Pass processed data to channel, or receive a context completion.
select {
case results <- current:
return nil
// Close out if another error occurs.
case <-ctx.Done():
return ctx.Err()
}
})
}
// Elegant way to close out the channel when the first error occurs or
// when processing is successful.
go func() {
g.Wait()
close(results)
}()
for result := range results {
fmt.Println("processed", result)
}
// Wait for all fetches to complete.
return g.Wait()
}
func main() {
fmt.Println(fetchAll())
}
As long as each goroutine completes, you won't leak anything. You should create the error channel as buffered with the buffer size equal to the number of goroutines so that the send operations on the channel won't block. Each goroutine should always send something on the channel when it finishes, whether it succeeds or fails. The loop at the bottom can then just iterate for the number of goroutines and return if it gets a non-nil error. You don't need the WaitGroup or the other goroutine that closes the channel.
I think the reason it appears that goroutines are leaking is that you return when you get the first error, so some of them are still running.
By the way, maps are not goroutine safe. If you share a map among goroutines and some of them are making changes to the map, you need to protect it with a mutex.
This answer includes the ability to get the responses back into doneData -
package main
import (
"fmt"
"math/rand"
"os"
"strconv"
)
var doneData []string // responses
func fetchAll(n int, doneCh chan bool, errCh chan error) {
partialDoneCh := make(chan string)
for i := 0; i < n; i++ {
go func(i int) {
if r := rand.Intn(100); r != 0 && r%10 == 0 {
// simulate an error
errCh <- fmt.Errorf("e33or for reqno=" + strconv.Itoa(r))
} else {
partialDoneCh <- strconv.Itoa(i)
}
}(i)
}
// mutation of doneData
for d := range partialDoneCh {
doneData = append(doneData, d)
if len(doneData) == n {
close(partialDoneCh)
doneCh <- true
}
}
}
func main() {
// rand.Seed(1)
var n int
var e error
if len(os.Args) > 1 {
if n, e = strconv.Atoi(os.Args[1]); e != nil {
panic(e)
}
} else {
n = 5
}
doneCh := make(chan bool)
errCh := make(chan error)
go fetchAll(n, doneCh, errCh)
fmt.Println("main: end")
select {
case <-doneCh:
fmt.Println("success:", doneData)
case e := <-errCh:
fmt.Println("failure:", e, doneData)
}
}
Execute using go run filename.go 50 where N=50 i.e amount of parallelism

Why all goroutines are asleep - deadlock. Identifying bottleneck

package main
import (
"fmt"
"runtime"
"sync"
"time"
)
func main() {
intInputChan := make(chan int, 50)
var wg sync.WaitGroup
for i := 0; i < 3; i++ {
wg.Add(1)
go worker(intInputChan, wg)
}
for i := 1; i < 51; i++ {
fmt.Printf("Inputs. %d \n", i)
intInputChan <- i
}
close(intInputChan)
wg.Wait()
fmt.Println("Existing Main App... ")
panic("---------------")
}
func worker(input chan int, wg sync.WaitGroup) {
defer func() {
fmt.Println("Executing defer..")
wg.Done()
}()
for {
select {
case intVal, ok := <-input:
time.Sleep(100 * time.Millisecond)
if !ok {
input = nil
return
}
fmt.Printf("%d %v\n", intVal, ok)
default:
runtime.Gosched()
}
}
}
error thrown is.
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [semacquire]:
sync.(*WaitGroup).Wait(0xc082004600)
c:/go/src/sync/waitgroup.go:132 +0x170
main.main()
E:/Go/go_projects/go/src/Test.go:22 +0x21a
I just tried it (playground) passing a wg *sync.WaitGroup and it works.
Passing sync.WaitGroup means passing a copy of the sync.WaitGroup (passing by value): the goroutine mentions Done() to a different sync.WaitGroup.
var wg sync.WaitGroup
for i := 0; i < 3; i++ {
wg.Add(1)
go worker(intInputChan, &wg)
}
Note the &wg: you are passing by value the pointer to the original sync.WaitGroup, for the goroutine to use.
As mentioned, don't pass types from the sync package around by value, right near the top of the sync package documentation: "Values containing the types defined in this package should not be copied." That also includes the types themselves (sync.Mutex, sync.WaitGroup, etc).
However, several notes:
You can use just a single call to wg.Add if you know how many you're going to add (but as documented make sure it's done before anything can call Wait).
You don't want to call runtime.Gosched like that; it makes the workers busy loop.
You can use range to read from the channel to simplify stopping when it's closed.
For small functions you can use a closure and not bother to pass the channel or wait group at all.
That turns it into this:
package main
import (
"fmt"
"sync"
"time"
)
func main() {
const numWorkers = 3
c := make(chan int, 10)
var wg sync.WaitGroup
wg.Add(numWorkers)
for i := 0; i < numWorkers; i++ {
go func() {
defer func() {
fmt.Println("Executing defer…")
wg.Done()
}()
for v := range c {
fmt.Println("recv:", v)
time.Sleep(100 * time.Millisecond)
}
}()
}
for i := 1; i < 51; i++ {
fmt.Println("send:", i)
c <- i
}
fmt.Println("closing…")
close(c)
fmt.Println("waiting…")
wg.Wait()
fmt.Println("Exiting Main App... ")
}
playground

Always have x number of goroutines running at any time

I see lots of tutorials and examples on how to make Go wait for x number of goroutines to finish, but what I'm trying to do is have ensure there are always x number running, so a new goroutine is launched as soon as one ends.
Specifically I have a few hundred thousand 'things to do' which is processing some stuff that is coming out of MySQL. So it works like this:
db, err := sql.Open("mysql", connection_string)
checkErr(err)
defer db.Close()
rows,err := db.Query(`SELECT id FROM table`)
checkErr(err)
defer rows.Close()
var id uint
for rows.Next() {
err := rows.Scan(&id)
checkErr(err)
go processTheThing(id)
}
checkErr(err)
rows.Close()
Currently that will launch several hundred thousand threads of processTheThing(). What I need is that a maximum of x number (we'll call it 20) goroutines are launched. So it starts by launching 20 for the first 20 rows, and from then on it will launch a new goroutine for the next id the moment that one of the current goroutines has finished. So at any point in time there are always 20 running.
I'm sure this is quite simple/standard, but I can't seem to find a good explanation on any of the tutorials or examples or how this is done.
You may find Go Concurrency Patterns article interesting, especially Bounded parallelism section, it explains the exact pattern you need.
You can use channel of empty structs as a limiting guard to control number of concurrent worker goroutines:
package main
import "fmt"
func main() {
maxGoroutines := 10
guard := make(chan struct{}, maxGoroutines)
for i := 0; i < 30; i++ {
guard <- struct{}{} // would block if guard channel is already filled
go func(n int) {
worker(n)
<-guard
}(i)
}
}
func worker(i int) { fmt.Println("doing work on", i) }
Here I think something simple like this will work :
package main
import "fmt"
const MAX = 20
func main() {
sem := make(chan int, MAX)
for {
sem <- 1 // will block if there is MAX ints in sem
go func() {
fmt.Println("hello again, world")
<-sem // removes an int from sem, allowing another to proceed
}()
}
}
Thanks to everyone for helping me out with this. However, I don't feel that anyone really provided something that both worked and was simple/understandable, although you did all help me understand the technique.
What I have done in the end is I think much more understandable and practical as an answer to my specific question, so I will post it here in case anyone else has the same question.
Somehow this ended up looking a lot like what OneOfOne posted, which is great because now I understand that. But OneOfOne's code I found very difficult to understand at first because of the passing functions to functions made it quite confusing to understand what bit was for what. I think this way makes a lot more sense:
package main
import (
"fmt"
"sync"
)
const xthreads = 5 // Total number of threads to use, excluding the main() thread
func doSomething(a int) {
fmt.Println("My job is",a)
return
}
func main() {
var ch = make(chan int, 50) // This number 50 can be anything as long as it's larger than xthreads
var wg sync.WaitGroup
// This starts xthreads number of goroutines that wait for something to do
wg.Add(xthreads)
for i:=0; i<xthreads; i++ {
go func() {
for {
a, ok := <-ch
if !ok { // if there is nothing to do and the channel has been closed then end the goroutine
wg.Done()
return
}
doSomething(a) // do the thing
}
}()
}
// Now the jobs can be added to the channel, which is used as a queue
for i:=0; i<50; i++ {
ch <- i // add i to the queue
}
close(ch) // This tells the goroutines there's nothing else to do
wg.Wait() // Wait for the threads to finish
}
Create channel for passing data to goroutines.
Start 20 goroutines that processes the data from channel in a loop.
Send the data to the channel instead of starting a new goroutine.
Grzegorz Żur's answer is the most efficient way to do it, but for a newcomer it could be hard to implement without reading code, so here's a very simple implementation:
type idProcessor func(id uint)
func SpawnStuff(limit uint, proc idProcessor) chan<- uint {
ch := make(chan uint)
for i := uint(0); i < limit; i++ {
go func() {
for {
id, ok := <-ch
if !ok {
return
}
proc(id)
}
}()
}
return ch
}
func main() {
runtime.GOMAXPROCS(4)
var wg sync.WaitGroup //this is just for the demo, otherwise main will return
fn := func(id uint) {
fmt.Println(id)
wg.Done()
}
wg.Add(1000)
ch := SpawnStuff(10, fn)
for i := uint(0); i < 1000; i++ {
ch <- i
}
close(ch) //should do this to make all the goroutines exit gracefully
wg.Wait()
}
playground
This is a simple producer-consumer problem, which in Go can be easily solved using channels to buffer the paquets.
To put it simple: create a channel that accept your IDs. Run a number of routines which will read from the channel in a loop then process the ID. Then run your loop that will feed IDs to the channel.
Example:
func producer() {
var buffer = make(chan uint)
for i := 0; i < 20; i++ {
go consumer(buffer)
}
for _, id := range IDs {
buffer <- id
}
}
func consumer(buffer chan uint) {
for {
id := <- buffer
// Do your things here
}
}
Things to know:
Unbuffered channels are blocking: if the item wrote into the channel isn't accepted, the routine feeding the item will block until it is
My example lack a closing mechanism: you must find a way to make the producer to wait for all consumers to end their loop before returning. The simplest way to do this is with another channel. I let you think about it.
I've wrote a simple package to handle concurrency for Golang. This package will help you limit the number of goroutines that are allowed to run concurrently:
https://github.com/zenthangplus/goccm
Example:
package main
import (
"fmt"
"goccm"
"time"
)
func main() {
// Limit 3 goroutines to run concurrently.
c := goccm.New(3)
for i := 1; i <= 10; i++ {
// This function have to call before any goroutine
c.Wait()
go func(i int) {
fmt.Printf("Job %d is running\n", i)
time.Sleep(2 * time.Second)
// This function have to when a goroutine has finished
// Or you can use `defer c.Done()` at the top of goroutine.
c.Done()
}(i)
}
// This function have to call to ensure all goroutines have finished
// after close the main program.
c.WaitAllDone()
}
Also can take a look here: https://github.com/LiangfengChen/goutil/blob/main/concurrent.go
The example can refer the test case.
func TestParallelCall(t *testing.T) {
format := "test:%d"
data := make(map[int]bool)
mutex := sync.Mutex{}
val, err := ParallelCall(1000, 10, func(pos int) (interface{}, error) {
mutex.Lock()
defer mutex.Unlock()
data[pos] = true
return pos, errors.New(fmt.Sprintf(format, pos))
})
for i := 0; i < 1000; i++ {
if _, ok := data[i]; !ok {
t.Errorf("TestParallelCall pos not found: %d", i)
}
if val[i] != i {
t.Errorf("TestParallelCall return value is not right (%d,%v)", i, val[i])
}
if err[i].Error() != fmt.Sprintf(format, i) {
t.Errorf("TestParallelCall error msg is not correct (%d,%v)", i, err[i])
}
}
}

Getting "fatal error: all goroutines are asleep - deadlock!" when using sync.WaitGroup

I'm trying to spin off a set of goroutines, and then wait for them all to finish.
import "sync"
func doWork(wg sync.WaitGroup) error {
defer wg.Done()
// Do some heavy lifting... request URL's or similar
return nil
}
func main() {
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go doWork(wg)
}
wg.Wait()
}
However when I run this code I get the following error:
fatal error: all goroutines are asleep - deadlock!
goroutine 16 [semacquire]:
sync.runtime_Semacquire(0xc20818c658)
/usr/local/Cellar/go/1.3/libexec/src/pkg/runtime/sema.goc:199 +0x30
sync.(*WaitGroup).Wait(0xc2080544e0)
/usr/local/Cellar/go/1.3/libexec/src/pkg/sync/waitgroup.go:129 +0x14b
main.main()
/Users/kevin/code/vrusability/scripts/oculus_share_ratings.go:150 +0x398
I'm confused because I wrote it pretty much exactly as the documentation example demonstrates.
You need to pass a pointer to the WaitGroup, and not the WaitGroup object. When you pass the actual WaitGroup, Go makes a copy of the value, and calls Done() on the copy. The result is the original WaitGroup will have ten Add's and no Done's, and each copy of the WaitGroup will have one Done() and however many Add's were there when the WaitGroup was passed to the function.
Pass a pointer instead, and every function will reference the same WaitGroup.
import "sync"
func doWork(wg *sync.WaitGroup) error {
defer wg.Done()
// Do some heavy lifting... request URL's or similar
return nil
}
func main() {
wg := &sync.WaitGroup{}
for i := 0; i < 10; i++ {
wg.Add(1)
go doWork(wg)
}
}
As #Kevin mentioned, you will need to pass a reference to your WaitGroup. This is actually the one thing I do not like about WaitGroup because you would be mixing your concurrency logic with your business logic.
So I came up with this generic function to solve this problem for me:
// Parallelize parallelizes the function calls
func Parallelize(functions ...func()) {
var waitGroup sync.WaitGroup
waitGroup.Add(len(functions))
defer waitGroup.Wait()
for _, function := range functions {
go func(copy func()) {
defer waitGroup.Done()
copy()
}(function)
}
}
Here is an example:
func1 := func() {
for char := 'a'; char < 'a' + 3; char++ {
fmt.Printf("%c ", char)
}
}
func2 := func() {
for number := 1; number < 4; number++ {
fmt.Printf("%d ", number)
}
}
Parallelize(func1, func2) // a 1 b 2 c 3
If you would like to use it, you can find it here https://github.com/shomali11/util

Resources