func GoCountColumns(in chan []string, r chan Result, quit chan int) {
for {
select {
case data := <-in:
r <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
func main() {
fmt.Println("Welcome to the csv Calculator")
file_path := os.Args[1]
fd, _ := os.Open(file_path)
reader := csv.NewReader(bufio.NewReader(fd))
var totalColumnsCount int64 = 0
var totallettersCount int64 = 0
linesCount := 0
numWorkers := 10000
rc := make(chan Result, numWorkers)
in := make(chan []string, numWorkers)
quit := make(chan int)
t1 := time.Now()
for i := 0; i < numWorkers; i++ {
go GoCountColumns(in, rc, quit)
}
//start worksers
go func() {
for {
record, err := reader.Read()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
if linesCount%1000000 == 0 {
fmt.Println("Adding to the channel")
}
in <- record
//data := countColumns(record)
linesCount++
//totalColumnsCount = totalColumnsCount + data.ColumnCount
//totallettersCount = totallettersCount + data.LettersCount
}
close(in)
}()
for i := 0; i < numWorkers; i++ {
quit <- 1 // quit goroutines from main
}
close(rc)
for i := 0; i < linesCount; i++ {
data := <-rc
totalColumnsCount = totalColumnsCount + data.ColumnCount
totallettersCount = totallettersCount + data.LettersCount
}
fmt.Printf("I counted %d lines\n", linesCount)
fmt.Printf("I counted %d columns\n", totalColumnsCount)
fmt.Printf("I counted %d letters\n", totallettersCount)
elapsed := time.Now().Sub(t1)
fmt.Printf("It took %f seconds\n", elapsed.Seconds())
}
My Hello World is a program that reads a csv file and passes it to a channel. Then the goroutines should consume from this channel.
My Problem is I have no idea how to detect from the main thread that all data was processed and I can exit my program.
on top of other answers.
Take (great) care that closing a channel should happen on the write call site, not the read call site. In GoCountColumns the r channel being written, the responsibility to close the channel are onto GoCountColumns function. Technical reasons are, it is the only actor knowing for sure that the channel will not being written anymore and thus is safe for close.
func GoCountColumns(in chan []string, r chan Result, quit chan int) {
defer close(r) // this line.
for {
select {
case data := <-in:
r <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
The function parameters naming convention, if i might say, is to have the destination as first parameter, the source as second, and others parameters along. The GoCountColumns is preferably written:
func GoCountColumns(dst chan Result, src chan []string, quit chan int) {
defer close(dst)
for {
select {
case data := <-src:
dst <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
You are calling quit right after the process started. Its illogical. This quit command is a force exit sequence, it should be called once an exit signal is detected, to force exit the current processing in best state possible, possibly all broken. In other words, you should be relying on the signal.Notify package to capture exit events, and notify your workers to quit. see https://golang.org/pkg/os/signal/#example_Notify
To write better parallel code, list at first the routines you need to manage the program lifetime, identify those you need to block onto to ensure the program has finished before exiting.
In your code, exists read, map. To ensure complete processing, the program main function must ensure that it captures a signal when map exits before exiting itself. Notice that the read function does not matter.
Then, you will also need the code required to capture an exit event from user input.
Overall, it appears we need to block onto two events to manage lifetime. Schematically,
func main(){
go read()
go map(mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
}
}
This simple code is good to process or die. Indeed, when the user event is caught, the program exits immediately, without giving a chance to others routines to do something required upon stop.
To improve those behaviors, you need first a way to signal the program wants to leave to other routines, second, a way to wait for those routines to finish their stop sequence before leaving.
To signal exit event, or cancellation, you can make use of a context.Context, pass it around to the workers, make them listen to it.
Again, schematically,
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
}
}
(more onto read and map later)
To wait for completion, many things are possible, for as long as they are thread safe. Usually, a sync.WaitGroup is being used. Or, in cases like yours where there is only one routine to wait for, we can re use the current mapDone channel.
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
<-mapDone
}
}
That is simple and straight forward. But it is not totally correct. The last mapDone chan might block forever and make the program unstoppable. So you might implement a second signal handler, or a timeout.
Schematically, the timeout solution is
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
select {
case <-mapDone:
case <-time.After(time.Second):
}
}
}
You might also accumulate a signal handling and a timeout in the last select.
Finally, there are few things to tell about read and map context listening.
Starting with map, the implementation requires to read for context.Done channel regularly to detect cancellation.
It is the easy part, it requires to only update the select statement.
func GoCountColumns(ctx context.Context, dst chan Result, src chan []string) {
defer close(dst)
for {
select {
case <-ctx.Done():
<-time.After(time.Minute) // do something more useful.
return // quit. Notice the defer will be called.
case data := <-src:
dst <- countColumns(data) // some calculation function
}
}
}
Now the read part is bit more tricky as it is an IO it does not provide a selectable programming interface and listening to the context channel cancellation might seem contradictory. It is. As IOs are blocking, impossible to listen the context. And while reading from the context channel, impossible to read the IO. In your case, the solution requires to understand that your read loop is not relevant to your program lifetime (recall we only listen onto mapDone?), and that we can just ignore the context.
In other cases, if for example you wanted to restart at last byte read (so at every read, we increment an n, counting bytes, and we want to save that value upon stop). Then, a new routine is required to be started, and thus, multiple routines are to wait for completion. In such cases a sync.WaitGroup will be more appropriate.
Schematically,
func main(){
var wg sync.WaitGroup
processDone:=make(chan struct{})
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
wg.Add(1)
go saveN(ctx,&wg)
wg.Add(1)
go map(ctx,&wg)
go signal()
go func(){
wg.Wait()
close(processDone)
}()
select {
case <-processDone:
case <-sig:
cancel()
select {
case <-processDone:
case <-time.After(time.Second):
}
}
}
In this last code, the waitgroup is being passed around. Routines are responsible to call for wg.Done(), when all routines are done, the processDone channel is closed, to signal the select.
func GoCountColumns(ctx context.Context, dst chan Result, src chan []string, wg *sync.WaitGroup) {
defer wg.Done()
defer close(dst)
for {
select {
case <-ctx.Done():
<-time.After(time.Minute) // do something more useful.
return // quit. Notice the defer will be called.
case data := <-src:
dst <- countColumns(data) // some calculation function
}
}
}
It is undecided which patterns is preferred, but you might also see waitgroup being managed at call sites only.
func main(){
var wg sync.WaitGroup
processDone:=make(chan struct{})
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
wg.Add(1)
go func(){
defer wg.Done()
saveN(ctx)
}()
wg.Add(1)
go func(){
defer wg.Done()
map(ctx)
}()
go signal()
go func(){
wg.Wait()
close(processDone)
}()
select {
case <-processDone:
case <-sig:
cancel()
select {
case <-processDone:
case <-time.After(time.Second):
}
}
}
Beyond all of that and OP questions, you must always evaluate upfront the pertinence of parallel processing for a given task. There is no unique recipe, practice and measure your code performances. see pprof.
There is way too much going on in this code. You should restructure your code into short functions that serve specific purposes to make it possible for someone to help you out easily (and help yourself as well).
You should read the following Go article, which goes into concurrency patterns:
https://blog.golang.org/pipelines
There are multiple ways to make one go-routine wait on some other work to finish. The most common ways are with wait groups (example I have provided) or channels.
func processSomething(...) {
...
}
func main() {
workers := &sync.WaitGroup{}
for i := 0; i < numWorkers; i++ {
workers.Add(1) // you want to call this from the calling go-routine and before spawning the worker go-routine
go func() {
defer workers.Done() // you want to call this from the worker go-routine when the work is done (NOTE the defer, which ensures it is called no matter what)
processSomething(....) // your async processing
}()
}
// this will block until all workers have finished their work
workers.Wait()
}
You can use a channel to block main until completion of a goroutine.
package main
import (
"log"
"time"
)
func main() {
c := make(chan struct{})
go func() {
time.Sleep(3 * time.Second)
log.Println("bye")
close(c)
}()
// This blocks until the channel is closed by the routine
<-c
}
No need to write anything into the channel. Reading is blocked until data is read or, which we use here, the channel is closed.
I was just experimenting with Go channels on my Ubuntu 64 bit environment and got confused with the output the following program produced.
I got the output:
0
1
2
3
Exit
The output when I uncommented the two commented lines:
0
1
2
3
4
Exit
Please explain the behavior.
TIA.
package main
import (
"fmt"
//"time"
)
func main() {
ch := make(chan int)
done := make(chan bool)
go func() {
for i := 0; i < 5; i++ {
ch <- i
}
//time.Sleep(1 * time.Second)
done <- false
}()
go func() {
for {
select {
case message := <-ch:
fmt.Println(message)
case <-done:
return
}
}
}()
<-done
fmt.Println("Exit")
}
Your main thread is waiting on done, then exiting. Meanwhile your first go function pipes 5 values into ch, then sends to done.
The value in done then gets read from the main thread and happens to occur before the second go function reads the last value from ch. When it does so, it exits the program.
Note that if your second thread did happen to both read from ch and done, then your program would deadlock since the main thread would never receive on done and all running go threads would be blocked waiting to receive on channels.
You're not waiting for both goroutines, and only sending a single value over done to 2 receivers, which will deadlock if the second receiver happens to be main.
Using a WaitGroup simplifies the code, and allows you to easily wait for as many goroutines as needed. https://play.golang.org/p/MWknv_9AFKp
ch := make(chan int)
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
defer close(ch)
for i := 0; i < 5; i++ {
ch <- i
}
}()
wg.Add(1)
go func() {
defer wg.Done()
for message := range ch {
fmt.Println(message)
}
}()
wg.Wait()
fmt.Println("Exit")
You have two go routines running in parallel. One inserts 5 numbers into the channel and then signals the main thread to exit, and another one reads the numbers from the channel.
Note that once the go routine that is responsible for enqueuing the numbers into the channel finishes, it signals the main thread to exit, regardless of whether the go routine that reads the numbers finished or not. So you could end up with a case where the enqueueing go routine finished before the dequeuing finished, and the main thread exited.
By adding the sleep, you make the enqueueing go routine live a bit longer and give a chance to the dequeueing go routine to read and print all the numbers before the enqueueing go routine signals the main thread to exit.
To solve that, you could just run the dequeuing code in the main thread. No need to run it in a go routine in this case.
How do I deal with a situation where undetected deadlock occurs when reading results of execution of uncertain number tasks from a channel in a complex program, e.g. web server?
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
rand.Seed(time.Now().UTC().UnixNano())
results := make(chan int, 100)
// we can't know how many tasks there will be
for i := 0; i < rand.Intn(1<<8)+1<<8; i++ {
go func(i int) {
time.Sleep(time.Second)
results <- i
}(i)
}
// can't close channel here
// because it is still written in
//close(results)
// something else is going on other threads (think web server)
// therefore a deadlock won't be detected
go func() {
for {
time.Sleep(time.Second)
}
}()
for j := range results {
fmt.Println(j)
// we just stuck in here
}
}
In case of simpler programs go detects a deadlock and properly fails. Most examples either fetch a known number of results, or write to the channel sequentially.
The trick is to use sync.WaitGroup and wait for the tasks to finish in a non-blocking way.
var wg sync.WaitGroup
// we can't know how many tasks there will be
for i := 0; i < rand.Intn(1<<8)+1<<8; i++ {
wg.Add(1)
go func(i int) {
time.Sleep(time.Second)
results <- i
wg.Done()
}(i)
}
// wait for all tasks to finish in other thread
go func() {
wg.Wait()
close(results)
}()
// execution continues here so you can print results
See also: Go Concurrency Patterns: Pipelines and cancellation - The Go Blog
I am facing a synchronization issue when using goroutines. My program outputs unpredictable results. I checked the docu and for unbuffered channels there is no way to check if all msgs have been processed. I simplified the issue to this little demo code that still demonstrates the problem. Clearly this is not an issue with Golang but with my code. Obviously I am not using the right concurrency pattern.
Question is how to resolve this. If possible I would neither want to close the channel nor stop the hive goroutine. It think it would be great if I could assume that once all bee goroutines are finished that hive is done for now, too (that is what I tried by using wg.Wait()).
package main
import(
"fmt"
"sync"
"time"
)
func main() {
count := int64(0)
c := make(chan int64)
var wg sync.WaitGroup
// bees
for i:=0; i<5000;i++{
wg.Add(1)
go func(in chan int64) {
defer wg.Done()
time.Sleep(100)
in <- 2
}(c)
}
// hive
go func() {
for out := range c {
count += out
}
}()
wg.Wait()
// bang! but why?
fmt.Println(count)
}
// every now and again the program prints out before it is finished
// $ go run pattern1.go
// 10000
// $ go run pattern1.go
// 9998
// $ go run pattern1.go
// 9998
// $ go run pattern1.go
// 10000
// $ go run pattern1.go
// 10000
// $ go run pattern1.go
// 9998
You're never waiting for the "hive" loop to finish, so sometimes you print the count value before it's complete.
It's easiest to use the WaitGroup to signal when to close the channel, and block main on the for range loop:
go func() {
wg.Wait()
close(c)
}()
for out := range c {
count += out
}
http://play.golang.org/p/jK24dtG2je
I wish to know what is the proper way of waiting for a go routine to finish before exiting the program. Reading some other answers it seems that a bool chan will do the trick, as in Playground link
func do_stuff(done chan bool) {
fmt.Println("Doing stuff")
done <- true
}
func main() {
fmt.Println("Main")
done := make(chan bool)
go do_stuff(done)
<-done
//<-done
}
I have two questions here:
why the <- done works at all?
what happens if I uncomment the last line? I have a deadlock error. Is this because the channel is empty and there is no other function sending values to it?
Listening to channel <- done, is a blocking operation, so your program won't continue until true or false is sent i.e. done <- true.
Your question can have a few different answers depending on the circumstance.
For instance, suppose you wanted to parallelize a series of function calls that take a long time.
I would use the sync package for this
package main
import (
"fmt"
"sync"
"time"
)
func main() {
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
longOp()
wg.Done()
}()
}
// will wait until wg.Done is called 10 times
// since we made wg.Add(1) call 10 times
wg.Wait()
}
func longOp() {
time.Sleep(time.Second * 2)
fmt.Println("long op done")
}
Why the <- done works at all?
It works because the runtime detects that you're writing something to the channel somewhere else.
what happens if I uncomment the last line?
The runtime is smart enough to know that there's nothing else being written and it deadlocks.
Bonus, if you're extremely limited on memory, you can use done := make(chan struct{}) and done <- struct{}{}, struct{} is guaranteed to use 0 memory.