What's the best way to resume goroutines work

What's the best way to resume goroutines work - go

I have to update thousands of structs with datas available on some remote servers. So, I have to deal with thousands of Goroutines querying these remote servers (http requests or db requests) to update the struct with the responses. But the update (or not) of the struct is depending of the results of other structs.
So I imagined a simple code in which the goroutines are running, each of them performs its own request, put the result in a global struct that contain any information retrieved by the goroutines and informs the main func that the first part of the job is done (waiting for the signal that every goroutines do the same before deciding of updating or not its struct.)
The goroutine should now waits for a signal of the main thread that all goroutines are done before deciding updating or not
the simplified code looks like that :
type struct StructToUpdate {
//some properties
}
type struct GlobalWatcher {
//some userful informations for all structs to update (or not)
}
func main() {
//retrieving all structs to update
structsToUpdates := foo()
//launching goroutines
c := make(chan bool)
for _,structToUpdate := range structsToUpdates {
go updateStruct(&structToUpdate,c)
}
//waiting for all goroutines do their first part of job
for i:=0; i<len(structsToUpdates); i++ {
<- c
}
//HERE IS THE CODE TO INFORM GOROUTINES THEY CAN RESUME
}
func updateStruct(s *StructToUpdate, c chan bool) {
result := performSomeRequest(s)
informGlobalWatcherOfResult(result)
c <- true //I did my job
//HERE IS THE CODE TO WAIT FOR THE SIGNAL TO RESUME
}
The question is : What the more performant / idiomatic / elegant wait to :
send a signal from the main script ?
wait for this signal from the goroutine ?
I can imagine 3 ways to do this
in a first way, I can imagine a global bool var resume := false that will be turned to true by the main func when all goroutines do the first part of job. In this case, each goroutine can use an ugly for !resume { continue }....
in a more idiomatic code, I can imagine do the same thing but instead of using a bool, I can use a context.WithValue(ctx, "resume", false) and pass it to goroutine, but I still have a for !ctx.Value("resume")
in a last more elegant code, I can imagine using another resume := make(chan bool) passed to the goroutine. The main func could inform goroutines to resume closing this chan with a simple close(resume). The goroutine will wait the signal with something than :
for {
_, more := <-resume
if !more {
break
}
}
//update or not
Is there any other good idea to do this ? One of the above solutions is better than others ?

I'm not sure if I understand your question completely, but a simple solution to blocking the main thread is with an OS signal ie.:
done := make(chan os.Signal, 1)
signal.Notify(done, os.Interrupt, syscall.SIGTERM)
// Blocks main thread
<-done
This doesn't have to be an OS signal, this could be a channel that receives a struct{}{} value from somewhere else in your runtime.
If you need to do this as a result of multiple go routines working, I'd look into using sync.WaitGroup.

OK. Thanks to #blixenkrone I found an elegant solution using WaitGroup !
Usually, we use a sync.WaitGroup to synchronize goroutines and be sure that they finish their job : you add "1" to the WaitGroup and then wait... Goroutines, when they finish their job, use the Done() function to decrement the Waitgroup counter. In fact, the Wait() function of the WaitGroup must act like a for loop, looking for its internal counter to become "0" to resume.
What I do is exactly the opposite !!! I create a WaitGroup and increment it to 1 in the main program before launching the goroutines, and pass it to them. The goroutine do their first past of job, notify the main program and then wait for the WaitGroup. When the main program get every notification of first part job, it release the waitgroup, then each goroutine can finish the job. Here is a squeleton of what I did :
func main() {
structsToUpdate := getStructs() //some func to get my structs
communication := make(chan bool)
//the waitgroup used to block the goroutine
var resume sync.WaitGroup
resume.Add(1) //The waitgroup will be blocked !!!
success := 0
for _,structToUpdate := range structsToUpdates {
go updateStruct(&structToUpdate,communication,&resume)
}
//waiting for all goroutines do their first part of job
for i:=0; i<len(structsToUpdates); i++ {
res := <- communication
if res {
success++
}
}
//all structs should have done with first part now... Resume their job !
resume.Done()
//receive second part job
for success != 0 {
<-communication
success--
}
}
func updateStruct(s *StructToUpdate, c chan bool, resume *sync.WaitGroup) {
//First part of the job
result, err := performSomeRequest(s)
if err != nil {
//something wrong happened
c <- false
return
}
informGlobalWatcherOfResult(result)
c <- true //I did my job
//wait for all goroutines do their first part job
resume.Wait()
//here the main program 'Done()' the WaitGroup so it resumes the execution of the goroutine
updateOrNotStruct() //Some operations...
c <- true
}
Even if I think the wg.Wait() func should call a for loop itself, I think this code is cleaner than the solutions a proposed above...

Related

Recursive calls from function started as goroutine & Idiomatic way to continue caller when all worker goroutines finished

I am implementing a (sort of a) combinatorial backtracking algorithm in go utilising goroutines. My problem can be represented as a tree with a certain degree/spread where I want to visit each leaf and calculate a result depending on the path taken. On a given level, I want to spawn goroutines to process the subproblems concurrently, i.e. if I have a tree with degree 3 and I want to start the concurrency after level 2, I'd spawn 3*3=9 goroutines that proceed with processing the subproblems concurrently.
func main() {
cRes := make(chan string, 100)
res := []string{}
numLevels := 5
spread := 3
startConcurrencyAtLevel := 2
nTree("", numLevels, spread, startConcurrencyAtLevel, cRes)
for {
select {
case r := <-cRes:
res = append(res, r)
case <-time.After(10 * time.Second):
fmt.Println("Caculation timed out")
fmt.Println(len(res), math.Pow(float64(spread), float64(numLevels)))
return
}
}
}
func nTree(path string, maxLevels int, spread int, startConcurrencyAtLevel int, cRes chan string) {
if len(path) == maxLevels {
// some longer running task here associated with the found path, also using a lookup table
// real problem actually returns not the path but the result if it satisfies some condition
cRes <- path
return
}
for i := 1; i <= spread; i++ {
nextPath := path + fmt.Sprint(i)
if len(path) == startConcurrencyAtLevel {
go nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes)
} else {
nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes)
}
}
}
The above code works, however I rely on the for select statement timing out. I am looking for a way to continue with main() as soon as all goroutines have finished, i.e. all subproblems have been processed.
I already came up with two possible (unpreferred/unelegant) solutions:
Using a mutex protected result map + a waitgroup instead of a channel-based approach should do the trick, but I'm curious if there is a neat solution with channels.
Using a quit channel (of type int). Every time a goroutine is spawned, the quit channel gets a +1 int, everytime a comptutation finished in a leaf, it gets a -1 int and the caller sums up the values. See the following snippet, this however is not a good solution as it (rather blatantly) runs into timing issues I don't want to deal with. It quits prematurely if for instance the first goroutine finishes before another one has been spawned.
for {
select {
case q := <-cRunningRoutines:
runningRoutines += q
if runningRoutines == 0 {
fmt.Println("Calculation complete")
return res
}
// ...same cases as above
}
Playground: https://go.dev/play/p/9jzeCvl8Clj
Following questions:
Is doing recursive calls from a function started as a goroutine to itself a valid approach?
What would be an idiomatic way of reading the results from cRes until all spawned goroutines finish? I read somewhere that channels should be closed when computation is done, but I just cant wrap my head around how to integrate it in this case.
Happy about any ideas, thanks!

reading the description and the snippet I am not able to understand exactly what you are trying to achieve, but I have some hints and patterns for channels that I use daily and think are helpful.
the context package is very helpful to manage goroutines' state in a safe way. In your example, time.After is used to end the main program, but in non-main functions it could be leaking goroutines: if instead you use context.Context and pass it into the gorotuines (it's usually passed first arg of a function) you will be able to control cancellation of downstream calls. This explains it briefly.
it is common practice to create channels (and return them) in functions that produce messages and send them in the channel. The same function should be responsible of closing the channel, e,g, with defer close(channel) when it's done writing.
This is handy because when the channel is buffered, closing it is possible even when it still has data in it: the Go runtime will actually wait until all messages are polledbefore closing. For unbuffered channels, the function won't be able to send a message over the channel until a reader of the channel is ready to poll it, thus won;t be able to exit.
This is an example (without recursion).
We can close the channel both when it is buffered or unbuffered in this example, because the send will block until the for := range on the channel in the main goroutine reads from it.
This is a variant for the same principle, with the channel passed as argument.
we can use sync.WaitGroup in tandem with channels, to signal completion for individual goroutines, and let know to an "orchestrating" goroutine that the channel can be closed, because all message producers are done sending data into the channel. The same considerations as point 1 apply on the close operation.
This is an example showing the use of waitGroup and external closer of channel.
channels can have a direction! notice that in the example, I added/removed arrows next to the channel (e.g. <-chan string, or chan<- string) when passing them in/outside functions. This tells the compiler that a channel is read or write only respectively in the scope of that function.
This is helping in 2 ways:
the compiler will produce more efficient code, as the channels with direction will have a single lock instead of 2.
the signature of the function describes if it will only use the channel for writing to it (and possibly close()) or reading: remember that reading from a channel with a range automatically stops the iteration when the channel is closed.
you can build channels of channels: make(chan chan string) is a valid (and helpful) construct to build processing pipelines.
A common usage of it is a fan-in goroutine that is collecting multiple outputs of a series of channel-producing goroutines.
This is an example of how to use them.
In essence, to answer your initial questions:
Is doing recursive calls from a function started as a goroutine to itself a valid approach?
If you really need recursion, it's probably better to handle it separately from the concurrent code: create a dedicated function that recursively sends data into a channel, and orchestrate the closing of the channel in the caller.
What would be an idiomatic way of reading the results from cRes until all spawned goroutines finish? I read somewhere that channels should be closed when computation is done, but I just cant wrap my head around how to integrate it in this case.
A good reference is Go Concurrency Patterns: Pipelines and cancellation: this is a rather old post (before the context package existedin the std lib) and I think Parallel digestion is what you're looking for to address the original question.

As mentioned by torek, I spun off an anonymous function closing the channel after the waitgroup finished waiting. Also needed some logic around calling the wg.Done() of the spawned goroutines only after the the recursion of the goroutine spawning level returns.
Generally I think this is a useful idiom (correct me if I'm wrong :))
Playground: https://go.dev/play/p/bQjHENsZL25
func main() {
cRes := make(chan string, 100)
numLevels := 3
spread := 3
startConcurrencyAtLevel := 2
var wg sync.WaitGroup
nTree("", numLevels, spread, startConcurrencyAtLevel, cRes, &wg)
go func() {
// time.Sleep(1 * time.Second) // edit: code should work without this initial sleep
wg.Wait()
close(cRes)
}()
for r := range cRes {
fmt.Println(r)
}
fmt.Println("Done!")
}
func nTree(path string, maxLevels int, spread int, startConcurrencyAtLevel int, cRes chan string, wg *sync.WaitGroup) {
if len(path) == maxLevels {
// some longer running task here associated with the found path
cRes <- path
return
}
for i := 1; i <= spread; i++ {
nextPath := path + fmt.Sprint(i)
if len(path) == startConcurrencyAtLevel {
go nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes, wg)
} else {
nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes, wg)
}
}
}

how to batch dealing with files using Goroutine?

Assuming I have a bunch of files to deal with(say 1000 or more), first they should be processed by function A(), function A() will generate a file, then this file will be processed by B().
If we do it one by one, that's too slow, so I'm thinking process 5 files at a time using goroutine(we can not process too much at a time cause the CPU cannot bear).
I'm a newbie in golang, I'm not sure if my thought is correct, I think the function A() is a producer and the function B() is a consumer, function B() will deal with the file that produced by function A(), and I wrote some code below, forgive me, I really don't know how to write the code, can anyone give me a help? Thank you in advance!
package main
import "fmt"
var Box = make(chan string, 1024)
func A(file string) {
fmt.Println(file, "is processing in func A()...")
fileGenByA := "/path/to/fileGenByA1"
Box <- fileGenByA
}
func B(file string) {
fmt.Println(file, "is processing in func B()...")
}
func main() {
// assuming that this is the file list read from a directory
fileList := []string{
"/path/to/file1",
"/path/to/file2",
"/path/to/file3",
}
// it seems I can't do this, because fileList may have 1000 or more file
for _, v := range fileList {
go A(v)
}
// can I do this?
for file := range Box {
go B(file)
}
}
Update:
sorry, maybe I haven’t made myself clear, actually the file generated by function A() is stored in the hard disk(generated by a command line tool, I just simple execute it using exec.Command()), not in a variable(the memory), so it doesn't have to be passed to function B() immediately.
I think there are 2 approach:
approach1
approach2
Actually I prefer approach2, as you can see, the first B() doesn't have to process the file1GenByA, it's the same for B() to process any file in the box, because file1GenByA may generated after file2GenByA(maybe the file is larger so it takes more time).

You could spawn 5 goroutines that read from a work channel. That way you have at all times 5 goroutines running and don't need to batch them so that you have to wait until 5 are finished to start the next 5.
func main() {
stack := []string{"a", "b", "c", "d", "e", "f", "g", "h"}
work := make(chan string)
results := make(chan string)
// create worker 5 goroutines
wg := sync.WaitGroup{}
for i := 0; i < 5; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for s := range work {
results <- B(A(s))
}
}()
}
// send the work to the workers
// this happens in a goroutine in order
// to not block the main function, once
// all 5 workers are busy
go func() {
for _, s := range stack {
// could read the file from disk
// here and pass a pointer to the file
work <- s
}
// close the work channel after
// all the work has been send
close(work)
// wait for the workers to finish
// then close the results channel
wg.Wait()
close(results)
}()
// collect the results
// the iteration stops if the results
// channel is closed and the last value
// has been received
for result := range results {
// could write the file to disk
fmt.Println(result)
}
}
https://play.golang.com/p/K-KVX4LEEoK

you're halfway there. There's a few things you need to fix:
your program deadlocks because nothing closes Box, so the main function can never get done rangeing over it.
You aren't waiting for your goroutines to finish, and there than 5 goroutines. (The solutions to these are too intertwined to describe them separately)
1. Deadlock
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.main()
When you range over a channel, you read each value from the channel until it is both closed and empty. Since you never close the channel, the range over that channel can never complete, and the program can never finish.
This is a fairly easy problem to solve in your case: we just need to close the channel when we know there will be no more writes to the channel.
for _, v := range fileList {
go A(v)
}
close(Box)
Keep in mind that closeing a channel doesn't stop it from being read, only written. Now consumers can distinguish between an empty channel that may receive more data in the future, and an empty channel that will never receive more data.
Once you add the close(Box), the program doesn't deadlock anymore, but it still doesn't work.
2. Too Many Goroutines and not waiting for them to complete
To run a certain maximum number of concurrent executions, instead of creating a goroutine for each input, create the goroutines in a "worker pool":
Create a channel to pass the workers their work
Create a channel for the goroutines to return their results, if any
Start the number of goroutines you want
Start at least one additional goroutine to either dispatch work or collect the result, so you don't have to try doing both from the main goroutine
use a sync.WaitGroup to wait for all data to be processed
close the channels to signal to the workers and the results collector that their channels are done being filled.
Before we get into the implementation, let's talk aobut how A and B interact.
first they should be processed by function A(), function A() will generate a file, then this file will be processed by B().
A() and B() must, then, execute serially. They can still pass their data through a channel, but since their execution must be serial, it does nothing for you. Simpler is to run them sequentially in the workers. For that, we'll need to change A() to either call B, or to return the path for B and the worker can call. I choose the latter.
func A(file string) string {
fmt.Println(file, "is processing in func A()...")
fileGenByA := "/path/to/fileGenByA1"
return fileGenByA
}
Before we write our worker function, we also must consider the result of B. Currently, B returns nothing. In the real world, unless B() cannot fail, you would at least want to either return the error, or at least panic. I'll skip over collecting results for now.
Now we can write our worker function.
func worker(wg *sync.WaitGroup, incoming <-chan string) {
defer wg.Done()
for file := range incoming {
B(A(file))
}
}
Now all we have to do is start 5 such workers, write the incoming files to the channel, close it, and wg.Wait() for the workers to complete.
incoming_work := make(chan string)
var wg sync.WaitGroup
for i := 0; i < 5; i++ {
wg.Add(1)
go worker(&wg, incoming_work)
}
for _, v := range fileList {
incoming_work <- v
}
close(incoming_work)
wg.Wait()
Full example at https://go.dev/play/p/A1H4ArD2LD8
Returning Results.
It's all well and good to be able to kick off goroutines and wait for them to complete. But what if you need results back from your goroutines? In all but the simplest of cases, you would at least want to know if files failed to process so you could investigate the errors.
We have only 5 workers, but we have many files, so we have many results. Each worker will have to return several results. So, another channel. It's usually worth defining a struct for your return:
type result struct {
file string
err error
}
This tells us not just whether there was an error but also clearly defines which file from which the error resulted.
How will we test an error case in our current code? In your example, B always gets the same value from A. If we add A's incoming file name to the path it passes to B, we can mock an error based on a substring. My mocked error will be that file3 fails.
func A(file string) string {
fmt.Println(file, "is processing in func A()...")
fileGenByA := "/path/to/fileGenByA1/" + file
return fileGenByA
}
func B(file string) (r result) {
r.file = file
fmt.Println(file, "is processing in func B()...")
if strings.Contains(file, "file3") {
r.err = fmt.Errorf("Test error")
}
return
}
Our workers will be sending results, but we need to collect them somewhere. main() is busy dispatching work to the workers, blocking on its write to incoming_work when the workers are all busy. So the simplest place to collect the results is another goroutine. Our results collector goroutine has to read from a results channel, print out errors for debugging, and the return the total number of failures so our program can return a final exit status indicating overall success or failure.
failures_chan := make(chan int)
go func() {
var failures int
for result := range results {
if result.err != nil {
failures++
fmt.Printf("File %s failed: %s", result.file, result.err.Error())
}
}
failures_chan <- failures
}()
Now we have another channel to close, and it's important we close it after all workers are done. So we close(results) after we wg.Wait() for the workers.
close(incoming_work)
wg.Wait()
close(results)
if failures := <-failures_chan; failures > 0 {
os.Exit(1)
}
Putting all that together, we end up with this code:
package main
import (
"fmt"
"os"
"strings"
"sync"
)
func A(file string) string {
fmt.Println(file, "is processing in func A()...")
fileGenByA := "/path/to/fileGenByA1/" + file
return fileGenByA
}
func B(file string) (r result) {
r.file = file
fmt.Println(file, "is processing in func B()...")
if strings.Contains(file, "file3") {
r.err = fmt.Errorf("Test error")
}
return
}
func worker(wg *sync.WaitGroup, incoming <-chan string, results chan<- result) {
defer wg.Done()
for file := range incoming {
results <- B(A(file))
}
}
type result struct {
file string
err error
}
func main() {
// assuming that this is the file list read from a directory
fileList := []string{
"/path/to/file1",
"/path/to/file2",
"/path/to/file3",
}
incoming_work := make(chan string)
results := make(chan result)
var wg sync.WaitGroup
for i := 0; i < 5; i++ {
wg.Add(1)
go worker(&wg, incoming_work, results)
}
failures_chan := make(chan int)
go func() {
var failures int
for result := range results {
if result.err != nil {
failures++
fmt.Printf("File %s failed: %s", result.file, result.err.Error())
}
}
failures_chan <- failures
}()
for _, v := range fileList {
incoming_work <- v
}
close(incoming_work)
wg.Wait()
close(results)
if failures := <-failures_chan; failures > 0 {
os.Exit(1)
}
}
And when we run it, we get:
/path/to/file1 is processing in func A()...
/path/to/fileGenByA1//path/to/file1 is processing in func B()...
/path/to/file2 is processing in func A()...
/path/to/fileGenByA1//path/to/file2 is processing in func B()...
/path/to/file3 is processing in func A()...
/path/to/fileGenByA1//path/to/file3 is processing in func B()...
File /path/to/fileGenByA1//path/to/file3 failed: Test error
Program exited.
A final thought: buffered channels.
There is nothing wrong with buffered channels. Especially if you know the overall size of incoming work and results, buffered channels can obviate the results collector goroutine because you can allocate a buffered channel big enough to hold all results. However, I think it's more straightforward to understand this pattern if the channels are unbuffered. The key takeaway is that you don't need to know the number of incoming or outgoing results, which could indeed be different numbers or based on something that can't be predetermined.

A case of `all goroutines are asleep - deadlock!` I can't figure out why

TL;DR: A typical case of all goroutines are asleep, deadlock! but can't figure it out
I'm parsing the Wiktionary XML dump to build a DB of words. I defer the parsing of each article's text to a goroutine hoping that it will speed up the process.
It's 7GB and is processed in under 2 minutes in my machine when doing it serially, but if I can take advantage of all cores, why not.
I'm new to threading in general, I'm getting a all goroutines are asleep, deadlock! error.
What's wrong here?
This may not be performant at all, as it uses an unbuffered channel, so all goroutines effectively end up executing serially, but my idea is to learn and understand threading and to benchmark how long it takes with different alternatives:
unbuffered channel
different sized buffered channel
only calling as many goroutines at a time as there are runtime.NumCPU()
The summary of my code in pseudocode:
while tag := xml.getNextTag() {
wg.Add(1)
go parseTagText(chan, wg, tag.text)
// consume a channel message if available
select {
case msg := <-chan:
// do something with msg
default:
}
}
// reading tags finished, wait for running goroutines, consume what's left on the channel
for msg := range chan {
// do something with msg
}
// Sometimes this point is never reached, I get a deadlock
wg.Wait()
----
func parseTagText(chan, wg, tag.text) {
defer wg.Done()
// parse tag.text
chan <- whatever // just inform that the text has been parsed
}
Complete code:
https://play.golang.org/p/0t2EqptJBXE

In your complete example on the Go Playground, you:
Create a channel (line 39, results := make(chan langs)) and a wait-group (line 40, var wait sync.WaitGroup). So far so good.
Loop: in the loop, sometimes spin off a task:
if ...various conditions... {
wait.Add(1)
go parseTerm(results, &wait, text)
}
In the loop, sometimes do a non-blocking read from the channel (as shown in your question). No problem here either. But...
At the end of the loop, use:
for res := range results {
...
}
without ever calling close(results) in exactly one place, after all writers finish. This loop uses a blocking read from the channel. As long as some writer goroutine is still running, the blocking read can block without having the whole system stop, but when the last writer finishes writing and exits, there are no remaining writer goroutines. Any other remaining goroutines might rescue you, but there are none.
Since you use the var wait correctly (adding 1 in the right place, and calling Done() in the right place in the writer), the solution is to add one more goroutine, which will be the one to rescue you:
go func() {
wait.Wait()
close(results)
}()
You should spin off this rescuer goroutine just before entering the for res := range results loop. (If you spin it off any earlier, it might see the wait variable count down to zero too soon, just before it gets counted up again by spinning off another parseTerm.)
This anonymous function will block in the wait variable's Wait() function until the last writer goroutine has called the final wait.Done(), which will unblock this goroutine. Then this goroutine will call close(results), which will arrange for the for loop in your main goroutine to finish, unblocking that goroutine. When this goroutine (the rescuer) returns and thus terminates, there are no more rescuers, but we no longer need any.
(This main code then calls wait.Wait() unnecessarily: Since the for didn't terminate until the wait.Wait() in the new goroutine already unblocked, we know that this next wait.Wait() will return immediately. So we can drop this second call, although leaving it in is harmless.)

The problem is that nothing is closing the results channel, yet the range loop only exits when it closes. I've simplified your code to illustrate this and propsed a solution - basically consume the data in a goroutine:
// This is our producer
func foo(i int, ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
ch <- i
fmt.Println(i, "done")
}
// This is our consumer - it uses a different WG to signal it's done
func consumeData(ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
for x := range ch {
fmt.Println(x)
}
fmt.Println("ALL DONE")
}
func main() {
ch := make(chan int)
wg := sync.WaitGroup{}
// create the producers
for i := 0; i < 10; i++ {
wg.Add(1)
go foo(i, ch, &wg)
}
// create the consumer on a different goroutine, and sync using another WG
consumeWg := sync.WaitGroup{}
consumeWg.Add(1)
go consumeData(ch,&consumeWg)
wg.Wait() // <<<< means that the producers are done
close(ch) // << Signal the consumer to exit
consumeWg.Wait() // << Wait for the consumer to exit
}

exit program when all goroutines finish

I have two go routine functions, one creates data and one adds the data to the database. When I'm done creating data I want to finish adding data but I don't know how many times I'm going to call the function.
for {
select {
case c <- dataChan:
go func() {
if data == false {
return // keeps the program running but never exits
}
createData(c.Data) // sends data to dataChan until data == false
}
go func() {
addData(c.Data) // need to keep the program running until all these goroutines finish
}
}
}
createData() creates and sends data to the dataChan and based on some variables it will eventually set data to false and cause the program to be held open by returning over and over again and stops creating more data, but I feel like there should be a cleaner way that allows me to exit when all the addData() goroutines finish. I have seen people use a channel for this but I don't know exactly how many times I'm going to call the function.
UPDATE: WORKING CODE
var wg sync.WaitGroup
for {
select {
case c <- dataChan:
wg.Add(1)
go func() {
if data == false {
wg.Wait()
os.Exit(0)
}
createData(c.Data)
}
go func() {
addData(c.Data)
defer wg.Done()
}
}
}

sync.WaitGroup is what you want here. You wg.Add(1) before starting each goroutine (or wg.Add(n) if you know the count up front), wg.Done() in each routine when finished, and wg.Wait() in main to wait untill all finish. The linked docs have an example, and as the docs note, copying it won't do what you want; you may want to make the variable a pointer (wg := new(sync.WaitGroup)).

Running a Go routine indefinitely

I have a function that calls another function that returns a chan array. I currently have a for loop looping over the range in the array which runs the program indefinitely outputting the channel updates as the come through.
func getRoutes() {
for r := range rtu {
if r.Type == 24 {
fmt.Printf("Route added: %s via %s\n",r.Dst.String(),r.Gw.String())
} else if r.Type == 25 {
fmt.Printf("Route deleted: %s via %s\n",r.Dst.String(),r.Gw.String())
}
}
}
When I call getRoutes() from main() everything works as planned though, it's blocking the application. I've tried calling go getRoutes() from main() though it appears as if the function isn't being called at all.
How can I run this function in the background in a non-blocking way using go routines?

When the primary goroutine at main exits, all goroutines you might have spawned will be orphaned and eventually die.
You could keep the main goroutine running forever with an infinite loop with for {}, but you might want to keep an exit channel instead:
exit := make(chan string)
// Spawn all you worker goroutines, and send a message to exit when you're done.
for {
select {
case <-exit:
os.Exit(0)
}
}
Update: Cerise Limón pointed out that goroutines are just killed immediately when main exits. I think this is supposed to be the officially specified behaviour.
Another update: The for-select works better when you have multiple ways to exit / multiple exit channels. For a single exit channel you could just do
<-exit
at the end instead of a for and select loop.

Although other people have answered this, I think their solutions can be simplified in some cases.
Take this code snippet:
func main() {
go doSomething()
fmt.Println("Done.")
}
func doSomething() {
for i := 0; i < 10; i++ {
fmt.Println("Doing something...")
time.Sleep(time.Second)
}
}
When main() begins executing, it spawns a thread to concurrently execute doSomething(). It then immediately executes fmt.Println("Done."), and the main() finishes.
When main() finishes, all other goroutines will be orphaned, and die.
To avoid this, we can put a blocking operation at the end of main() that waits for input from a goroutine. It's easiest to do this with a channel:
var exit = make(chan bool)
func main() {
go doSomething()
<-exit // This blocks until the exit channel receives some input
fmt.Println("Done.")
}
func doSomething() {
for i := 0; i < 10; i++ {
fmt.Println("Doing something...")
time.Sleep(time.Second)
}
exit<-true // Notify main() that this goroutine has finished
}

Your main() returns before getRoutes()-goroutine finishes. When main() returns, the program exits, thereby killing all the running goroutines. (It's also totally possible that main() returns even before the goroutine gets a chance to get scheduled by go runtime.)
If you want main() (or any other function) to wait for a group of goroutines to finish, you'd have to make the function explicitly wait somehow. There're multiple ways to do that. sync.WaitGroup.Wait() can be used to wait for a group of the goroutines to finish. You could also use channels to communicate when goroutines are done.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

What's the best way to resume goroutines work - go

Related

Recursive calls from function started as goroutine & Idiomatic way to continue caller when all worker goroutines finished

how to batch dealing with files using Goroutine?

A case of `all goroutines are asleep - deadlock!` I can't figure out why

exit program when all goroutines finish

Running a Go routine indefinitely

Categories

Resources