there I am having some fun with GO and am just very curious about something I am trying to achieve. I have a package here that just gets a feed from Reddit noting special. When I receive the parent JSON file I would then like to retrieve child data. If you see the code below I launch a series of goroutines which I then block, waiting for them to finish using the sync package. What I would like is once the first series of goroutines finish the second series of goroutines using the previous results. There are a few was I was thinking such as for loop and switch statement. But what is the best and most efficient way to do this
func (m redditMatcher) retrieve(dataPoint *collect.DataPoint) (*redditCommentsDocument, error) {
if dataPoint.URI == "" {
return nil, errors.New("No datapoint uri provided")
}
// Get options data -> returns empty struct
// if no options are present
options := m.options(dataPoint.Options)
if len(options.subreddit) <= 0 {
return nil, fmt.Errorf("Matcher fail: Reddit - Subreddit option manditory\n")
}
// Create an buffered channel to receive match results to display.
results := make(chan *redditCommentsDocument, len(options.subreddit))
// Generte requests for each subreddit produced using
// goroutines concurency model
for _, s := range options.subreddit {
// Set the number of goroutines we need to wait for while
// they process the individual subreddit.
waitGroup.Add(1)
go retrieveComment(s.(string), dataPoint.URI, results)
}
// Launch a goroutine to monitor when all the work is done.
waitGroup.Wait()
// HERE I WOULD TO CALL ANOTHER SERIES OFF GOROUTINES
for commentFeed := range results {
// HERE I WOULD LIKE TO CALL GO ROUTINES USING THE RESULTS
// PROVIDED FROM THE PREVIOUS FUNCTIONS
waitGroup.Add(1)
log.Printf("%s\n\n", commentFeed.Kind)
}
waitGroup.Wait()
close(results)
return nil, nil
}
If you want to wait for all of the first series to complete, then you can just pass in a pointer to your waitgroup, wait after calling all the first series functions (which will call Done() on the waitgroup), and then start the second series. Here's a runnable annotated code example that does that:
package main
import(
"fmt"
"sync"
"time"
)
func first(wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Starting a first")
// do some stuff... here's a sleep to make some time pass
time.Sleep(250 * time.Millisecond)
fmt.Println("Done with a first")
}
func second(wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Starting a second")
// do some followup stuff
time.Sleep(50 * time.Millisecond)
fmt.Println("Done with a second")
}
func main() {
wg := new(sync.WaitGroup) // you'll need a pointer to avoid a copy when passing as parameter to goroutine function
// let's start 5 firsts and then wait for them to finish
wg.Add(5)
go first(wg)
go first(wg)
go first(wg)
go first(wg)
go first(wg)
wg.Wait()
// now that we're done with all the firsts, let's do the seconds
// how about two of these
wg.Add(2)
go second(wg)
go second(wg)
wg.Wait()
fmt.Println("All done")
}
It outputs:
Starting a first
Starting a first
Starting a first
Starting a first
Starting a first
Done with a first
Done with a first
Done with a first
Done with a first
Done with a first
Starting a second
Starting a second
Done with a second
Done with a second
All done
But if you want a "second" to start as soon as a "first" has finished, just have the seconds executing blocking receive operators on the channel while the firsts are running:
package main
import(
"fmt"
"math/rand"
"sync"
"time"
)
func first(res chan int, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Starting a first")
// do some stuff... here's a sleep to make some time pass
time.Sleep(250 * time.Millisecond)
fmt.Println("Done with a first")
res <- rand.Int() // this will block until a second is ready
}
func second(res chan int, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Wait for a value from first")
val := <-res // this will block until a first is ready
fmt.Printf("Starting a second with val %d\n", val)
// do some followup stuff
time.Sleep(50 * time.Millisecond)
fmt.Println("Done with a second")
}
func main() {
wg := new(sync.WaitGroup) // you'll need a pointer to avoid a copy when passing as parameter to goroutine function
ch := make(chan int)
// lets run first twice, and second once for each first result, for a total of four workers:
wg.Add(4)
go first(ch, wg)
go first(ch, wg)
// don't wait before starting the seconds
go second(ch, wg)
go second(ch, wg)
wg.Wait()
fmt.Println("All done")
}
Which outputs:
Wait for a value from first
Starting a first
Starting a first
Wait for a value from first
Done with a first
Starting a second with val 5577006791947779410
Done with a first
Starting a second with val 8674665223082153551
Done with a second
Done with a second
All done
Related
In this example, we have a worker. The idea here is simulate clean shutdown of all go routines based on a condition.
In this case, go routines get spun - based on workers count. Each go routine reads the channel, does some work and sends output to the outputChannel.
The main go routine reads this output and prints it. To simulate a stop condition, the doneChannel is closed. Expected outcome is that select inside each go routine will pick this up and execute return which in turn will call the defer println. The actual output is that it never gets called and main exits.
Not sure what's the reason behind this.
package main
import (
"log"
"time"
)
const jobs = 100
const workers = 1
var timeout = time.After(5 * time.Second)
func main() {
doneChannel := make(chan interface{})
outputChannel := make(chan int)
numberStream := generator()
for i := 1; i <= workers; i++ {
go worker(doneChannel, numberStream, outputChannel)
}
// listen for output
loop:
for {
select {
case i := <-outputChannel:
log.Println(i)
case <-timeout:
// before you timeout cleanup go routines
break loop
}
}
close(doneChannel)
time.Sleep(5 * time.Second)
log.Println("main exited")
}
func generator() <-chan int {
defer log.Println("generator completed !")
c := make(chan int)
go func() {
for i := 1; i <= jobs; i++ {
c <- i
}
defer close(c)
}()
return c
}
func worker(done <-chan interface{}, c <-chan int, output chan<- int) {
// this will be a go routine
// Do some work and send results to output Channel.
// Incase if the done channel is called kill the go routine.
defer log.Println("go routines exited")
for {
select {
case <-done:
log.Println("here")
return
case i := <-c:
time.Sleep(1 * time.Second) // worker delay
output <- i * 100
}
}
}
When your main loop finishes during the timeout, you continue your program and
Close done channel
Print message
Exit
There is no reason to wait for any goroutine to process the signal of this channel.
If you add a small sleep you will see some messages
In real scenarios we use a waitgroup to be sure all goroutine finish properly
I got this code from someone on github and I am trying to play around with it to understand concurrency.
package main
import (
"bufio"
"fmt"
"os"
"sync"
"time"
)
var wg sync.WaitGroup
func sad(url string) string {
fmt.Printf("gonna sleep a bit\n")
time.Sleep(2 * time.Second)
return url + " added stuff"
}
func main() {
sc := bufio.NewScanner(os.Stdin)
urls := make(chan string)
results := make(chan string)
for i := 0; i < 20; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for url := range urls {
n := sad(url)
results <- n
}
}()
}
for sc.Scan() {
url := sc.Text()
urls <- url
}
for result := range results {
fmt.Printf("%s arrived\n", result)
}
wg.Wait()
close(urls)
close(results)
}
I have a few questions:
Why does this code give me a deadlock?
How does that for loop exist before the operation of taking in input from user does the go routines wait until anything is passes in the urls channel then start doing work? I don't get this because it's not sequential, like why is taking in input from user then putting every input in the urls channel then running the go routines is considered wrong?
Inside the for loop I have another loop which is iterating over the urls channel, does each go routine deal with exactly one line of input? or does one go routine handle multiple lines at once? how does any of this work?
Am i gathering the output correctly here?
Mostly you're doing things correctly, but have things a little out of order. The for sc.Scan() loop will continue until Scanner is done, and the for result := range results loop will never run, thus no go routine ('main' in this case) will be able to receive from results. When running your example, I started the for result := range results loop before for sc.Scan() and also in its own go routine--otherwise for sc.Scan() will never be reached.
go func() {
for result := range results {
fmt.Printf("%s arrived\n", result)
}
}()
for sc.Scan() {
url := sc.Text()
urls <- url
}
Also, because you run wg.Wait() before close(urls), the main goroutine is left blocked waiting for the 20 sad() go routines to finish. But they can't finish until close(urls) is called. So just close that channel before waiting for the waitgroup.
close(urls)
wg.Wait()
close(results)
The for-loop creates 20 goroutines, all waiting input from the urls channel. When someone writes into this channel, one of the goroutines will pick it up and work on in. This is a typical worker-pool implementation.
Then, then scanner reads input line by line, and sends it to the urls channel, where one of the goroutines will pick it up and write the response to the results channel. At this point, there are no other goroutines reading from the results channel, so this will block.
As the scanner reads URLs, all other goroutines will pick them up and block. So if the scanner reads more than 20 URLs, it will deadlock because all goroutines will be waiting for a reader.
If there are fewer than 20 URLs, the scanner for-loop will end, and the results will be read. However that will eventually deadlock as well, because the for-loop will terminate when the channel is closed, and there is no one there to close the channel.
To fix this, first, close the urls channel right after you finish reading. That will release all the for-loops in the goroutines. Then you should put the for-loop reading from the results channel into a goroutine, so you can call wg.Wait while results are being processed. After wg.Wait, you can close the results channel.
This does not guarantee that all items in the results channel will be read. The program may terminate before all messages are processed, so use a third channel which you close at the end of the goroutine that reads from the results channel. That is:
done:=make(chan struct{})
go func() {
defer close(done)
for result := range results {
fmt.Printf("%s arrived\n", result)
}
}()
wg.Wait()
close(results)
<-done
I am not super happy with previous answers, so here is a solution based on the documented behavior in the go tour, the go doc, the specifications.
package main
import (
"bufio"
"fmt"
"strings"
"sync"
"time"
)
var wg sync.WaitGroup
func sad(url string) string {
fmt.Printf("gonna sleep a bit\n")
time.Sleep(2 * time.Millisecond)
return url + " added stuff"
}
func main() {
// sc := bufio.NewScanner(os.Stdin)
sc := bufio.NewScanner(strings.NewReader(strings.Repeat("blah blah\n", 15)))
urls := make(chan string)
results := make(chan string)
for i := 0; i < 20; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for url := range urls {
n := sad(url)
results <- n
}
}()
}
// results is consumed by so many goroutines
// we must wait for them to finish before closing results
// but we dont want to block here, so put that into a routine.
go func() {
wg.Wait()
close(results)
}()
go func() {
for sc.Scan() {
url := sc.Text()
urls <- url
}
close(urls) // done consuming a channel, close it, right away.
}()
for result := range results {
fmt.Printf("%s arrived\n", result)
} // the program will finish when it gets out of this loop.
// It will get out of this loop because you have made sure the results channel is closed.
}
func GoCountColumns(in chan []string, r chan Result, quit chan int) {
for {
select {
case data := <-in:
r <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
func main() {
fmt.Println("Welcome to the csv Calculator")
file_path := os.Args[1]
fd, _ := os.Open(file_path)
reader := csv.NewReader(bufio.NewReader(fd))
var totalColumnsCount int64 = 0
var totallettersCount int64 = 0
linesCount := 0
numWorkers := 10000
rc := make(chan Result, numWorkers)
in := make(chan []string, numWorkers)
quit := make(chan int)
t1 := time.Now()
for i := 0; i < numWorkers; i++ {
go GoCountColumns(in, rc, quit)
}
//start worksers
go func() {
for {
record, err := reader.Read()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
if linesCount%1000000 == 0 {
fmt.Println("Adding to the channel")
}
in <- record
//data := countColumns(record)
linesCount++
//totalColumnsCount = totalColumnsCount + data.ColumnCount
//totallettersCount = totallettersCount + data.LettersCount
}
close(in)
}()
for i := 0; i < numWorkers; i++ {
quit <- 1 // quit goroutines from main
}
close(rc)
for i := 0; i < linesCount; i++ {
data := <-rc
totalColumnsCount = totalColumnsCount + data.ColumnCount
totallettersCount = totallettersCount + data.LettersCount
}
fmt.Printf("I counted %d lines\n", linesCount)
fmt.Printf("I counted %d columns\n", totalColumnsCount)
fmt.Printf("I counted %d letters\n", totallettersCount)
elapsed := time.Now().Sub(t1)
fmt.Printf("It took %f seconds\n", elapsed.Seconds())
}
My Hello World is a program that reads a csv file and passes it to a channel. Then the goroutines should consume from this channel.
My Problem is I have no idea how to detect from the main thread that all data was processed and I can exit my program.
on top of other answers.
Take (great) care that closing a channel should happen on the write call site, not the read call site. In GoCountColumns the r channel being written, the responsibility to close the channel are onto GoCountColumns function. Technical reasons are, it is the only actor knowing for sure that the channel will not being written anymore and thus is safe for close.
func GoCountColumns(in chan []string, r chan Result, quit chan int) {
defer close(r) // this line.
for {
select {
case data := <-in:
r <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
The function parameters naming convention, if i might say, is to have the destination as first parameter, the source as second, and others parameters along. The GoCountColumns is preferably written:
func GoCountColumns(dst chan Result, src chan []string, quit chan int) {
defer close(dst)
for {
select {
case data := <-src:
dst <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
You are calling quit right after the process started. Its illogical. This quit command is a force exit sequence, it should be called once an exit signal is detected, to force exit the current processing in best state possible, possibly all broken. In other words, you should be relying on the signal.Notify package to capture exit events, and notify your workers to quit. see https://golang.org/pkg/os/signal/#example_Notify
To write better parallel code, list at first the routines you need to manage the program lifetime, identify those you need to block onto to ensure the program has finished before exiting.
In your code, exists read, map. To ensure complete processing, the program main function must ensure that it captures a signal when map exits before exiting itself. Notice that the read function does not matter.
Then, you will also need the code required to capture an exit event from user input.
Overall, it appears we need to block onto two events to manage lifetime. Schematically,
func main(){
go read()
go map(mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
}
}
This simple code is good to process or die. Indeed, when the user event is caught, the program exits immediately, without giving a chance to others routines to do something required upon stop.
To improve those behaviors, you need first a way to signal the program wants to leave to other routines, second, a way to wait for those routines to finish their stop sequence before leaving.
To signal exit event, or cancellation, you can make use of a context.Context, pass it around to the workers, make them listen to it.
Again, schematically,
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
}
}
(more onto read and map later)
To wait for completion, many things are possible, for as long as they are thread safe. Usually, a sync.WaitGroup is being used. Or, in cases like yours where there is only one routine to wait for, we can re use the current mapDone channel.
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
<-mapDone
}
}
That is simple and straight forward. But it is not totally correct. The last mapDone chan might block forever and make the program unstoppable. So you might implement a second signal handler, or a timeout.
Schematically, the timeout solution is
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
select {
case <-mapDone:
case <-time.After(time.Second):
}
}
}
You might also accumulate a signal handling and a timeout in the last select.
Finally, there are few things to tell about read and map context listening.
Starting with map, the implementation requires to read for context.Done channel regularly to detect cancellation.
It is the easy part, it requires to only update the select statement.
func GoCountColumns(ctx context.Context, dst chan Result, src chan []string) {
defer close(dst)
for {
select {
case <-ctx.Done():
<-time.After(time.Minute) // do something more useful.
return // quit. Notice the defer will be called.
case data := <-src:
dst <- countColumns(data) // some calculation function
}
}
}
Now the read part is bit more tricky as it is an IO it does not provide a selectable programming interface and listening to the context channel cancellation might seem contradictory. It is. As IOs are blocking, impossible to listen the context. And while reading from the context channel, impossible to read the IO. In your case, the solution requires to understand that your read loop is not relevant to your program lifetime (recall we only listen onto mapDone?), and that we can just ignore the context.
In other cases, if for example you wanted to restart at last byte read (so at every read, we increment an n, counting bytes, and we want to save that value upon stop). Then, a new routine is required to be started, and thus, multiple routines are to wait for completion. In such cases a sync.WaitGroup will be more appropriate.
Schematically,
func main(){
var wg sync.WaitGroup
processDone:=make(chan struct{})
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
wg.Add(1)
go saveN(ctx,&wg)
wg.Add(1)
go map(ctx,&wg)
go signal()
go func(){
wg.Wait()
close(processDone)
}()
select {
case <-processDone:
case <-sig:
cancel()
select {
case <-processDone:
case <-time.After(time.Second):
}
}
}
In this last code, the waitgroup is being passed around. Routines are responsible to call for wg.Done(), when all routines are done, the processDone channel is closed, to signal the select.
func GoCountColumns(ctx context.Context, dst chan Result, src chan []string, wg *sync.WaitGroup) {
defer wg.Done()
defer close(dst)
for {
select {
case <-ctx.Done():
<-time.After(time.Minute) // do something more useful.
return // quit. Notice the defer will be called.
case data := <-src:
dst <- countColumns(data) // some calculation function
}
}
}
It is undecided which patterns is preferred, but you might also see waitgroup being managed at call sites only.
func main(){
var wg sync.WaitGroup
processDone:=make(chan struct{})
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
wg.Add(1)
go func(){
defer wg.Done()
saveN(ctx)
}()
wg.Add(1)
go func(){
defer wg.Done()
map(ctx)
}()
go signal()
go func(){
wg.Wait()
close(processDone)
}()
select {
case <-processDone:
case <-sig:
cancel()
select {
case <-processDone:
case <-time.After(time.Second):
}
}
}
Beyond all of that and OP questions, you must always evaluate upfront the pertinence of parallel processing for a given task. There is no unique recipe, practice and measure your code performances. see pprof.
There is way too much going on in this code. You should restructure your code into short functions that serve specific purposes to make it possible for someone to help you out easily (and help yourself as well).
You should read the following Go article, which goes into concurrency patterns:
https://blog.golang.org/pipelines
There are multiple ways to make one go-routine wait on some other work to finish. The most common ways are with wait groups (example I have provided) or channels.
func processSomething(...) {
...
}
func main() {
workers := &sync.WaitGroup{}
for i := 0; i < numWorkers; i++ {
workers.Add(1) // you want to call this from the calling go-routine and before spawning the worker go-routine
go func() {
defer workers.Done() // you want to call this from the worker go-routine when the work is done (NOTE the defer, which ensures it is called no matter what)
processSomething(....) // your async processing
}()
}
// this will block until all workers have finished their work
workers.Wait()
}
You can use a channel to block main until completion of a goroutine.
package main
import (
"log"
"time"
)
func main() {
c := make(chan struct{})
go func() {
time.Sleep(3 * time.Second)
log.Println("bye")
close(c)
}()
// This blocks until the channel is closed by the routine
<-c
}
No need to write anything into the channel. Reading is blocked until data is read or, which we use here, the channel is closed.
I am trying to understand context in golang. I copied an example from https://golang.org/pkg/context/#example_WithCancel and changed it a bit:
Playgroud: https://play.golang.org/p/Aczc2CqcVZR
package main
import (
"context"
"fmt"
"time"
)
func main() {
// gen generates integers in a separate goroutine and
// sends them to the returned channel.
// The callers of gen need to cancel the context once
// they are done consuming generated integers not to leak
// the internal goroutine started by gen.
gen := func(ctx context.Context) <-chan int {
dst := make(chan int)
n := 1
go func() {
for {
select {
case <-ctx.Done():
fmt.Println("DONE")
return // returning not to leak the goroutine
case dst <- n:
n++
}
}
fmt.Println("END")
}()
return dst
}
ctx, cancel := context.WithCancel(context.Background())
defer time.Sleep(1 * time.Second)
defer fmt.Println("Before cancel")
defer cancel() // cancel when we are finished consuming integers
defer fmt.Println("After cancel")
channel := gen(ctx)
for n := range channel {
fmt.Println(n)
if n == 5 {
break
}
}
fmt.Println( <-channel)
}
When commenting out the
defer time.Sleep(1 * time.Second)
the "DONE" never gets printed. Playgroud: (https://play.golang.org/p/K0OcyZaj_xK)
I would expect the go routine which was started in the anonymous function still to be active. Once cancel() is called due to being deferred, the select should no longer block as
case <-ctx.Done():
should be available. However it seems to just end, unless I wait for 1 second and give it time. This behavior seems very wrong.
This behavior seems very wrong.
It's not. That's how program execution is specified. After main and its deferred functions return, the program exits.
Program execution begins by initializing the main package and then invoking the function main. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete.
https://golang.org/ref/spec#Program_execution
context.WithDeadline while passing context to go routine?
I have put together some sample code that will start a new goroutine for every item in my slice.
At the moment, this will wait for the done channel to be called len(slice) times.
However, I would also like to implement a timeout in the goroutines to event leaking.
It seems that context.WithDeadline (or maybe WithTimeout?)is the appropriate function to use.
For example, lets say I want to pass in a 23 second deadline for all goroutines that are initialized from main().
However, its not clear to me how I should do this.
I have read godoc along with Go Concurrency Patterns: Context (on the go blog) but as a new gopher,
I am none the wiser. Many of the examples that I have found use http.handler(or similar as examples and so they are a source of some confusion for me.
What is an appropriate way to pass in context with deadline / timeout here.
package main
import (
"fmt"
"time"
)
func sleepNow(i int, done chan bool) {
time.Sleep(time.Duration(i) * time.Second)
fmt.Println(i, "has just woken up from sleep and the time is", time.Now())
done <- true
}
func main() {
done := make(chan bool)
numbersSlice := []int{10, 20, 30, 12}
for _, v := range(numbersSlice){
go sleepNow(v, done)
}
for x := 0; x < len(numbersSlice); x++ {
<-done
}
fmt.Println("Looks like we are all done here!")
}
All you need to do is get the context into the function where you want to use it. In many cases you can use a simple closure, or in this case, add it to the function arguments.
Once you have the context in place, you can select on the Context.Done() channel to determine when it has expired.
https://play.golang.org/p/q-n_2mIW2X
func sleepNow(i int, ctx context.Context, wg *sync.WaitGroup) {
defer wg.Done()
select {
case <-time.After(time.Duration(i) * time.Second):
fmt.Println(i, "has just woken up from sleep and the time is", time.Now())
case <-ctx.Done():
fmt.Println(i, "has just been canceled")
}
}
func main() {
var wg sync.WaitGroup
numbersSlice := []int{1, 5, 4, 2}
ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
for _, v := range numbersSlice {
wg.Add(1)
go sleepNow(v, ctx, &wg)
}
wg.Wait()
cancel()
fmt.Println("Looks like we are all done here!")
}
You should also use a sync.WaitGroup rather than rely on counting tokens over channel, and use defer to call Done.