func GoCountColumns(in chan []string, r chan Result, quit chan int) {
for {
select {
case data := <-in:
r <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
func main() {
fmt.Println("Welcome to the csv Calculator")
file_path := os.Args[1]
fd, _ := os.Open(file_path)
reader := csv.NewReader(bufio.NewReader(fd))
var totalColumnsCount int64 = 0
var totallettersCount int64 = 0
linesCount := 0
numWorkers := 10000
rc := make(chan Result, numWorkers)
in := make(chan []string, numWorkers)
quit := make(chan int)
t1 := time.Now()
for i := 0; i < numWorkers; i++ {
go GoCountColumns(in, rc, quit)
}
//start worksers
go func() {
for {
record, err := reader.Read()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
if linesCount%1000000 == 0 {
fmt.Println("Adding to the channel")
}
in <- record
//data := countColumns(record)
linesCount++
//totalColumnsCount = totalColumnsCount + data.ColumnCount
//totallettersCount = totallettersCount + data.LettersCount
}
close(in)
}()
for i := 0; i < numWorkers; i++ {
quit <- 1 // quit goroutines from main
}
close(rc)
for i := 0; i < linesCount; i++ {
data := <-rc
totalColumnsCount = totalColumnsCount + data.ColumnCount
totallettersCount = totallettersCount + data.LettersCount
}
fmt.Printf("I counted %d lines\n", linesCount)
fmt.Printf("I counted %d columns\n", totalColumnsCount)
fmt.Printf("I counted %d letters\n", totallettersCount)
elapsed := time.Now().Sub(t1)
fmt.Printf("It took %f seconds\n", elapsed.Seconds())
}
My Hello World is a program that reads a csv file and passes it to a channel. Then the goroutines should consume from this channel.
My Problem is I have no idea how to detect from the main thread that all data was processed and I can exit my program.
on top of other answers.
Take (great) care that closing a channel should happen on the write call site, not the read call site. In GoCountColumns the r channel being written, the responsibility to close the channel are onto GoCountColumns function. Technical reasons are, it is the only actor knowing for sure that the channel will not being written anymore and thus is safe for close.
func GoCountColumns(in chan []string, r chan Result, quit chan int) {
defer close(r) // this line.
for {
select {
case data := <-in:
r <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
The function parameters naming convention, if i might say, is to have the destination as first parameter, the source as second, and others parameters along. The GoCountColumns is preferably written:
func GoCountColumns(dst chan Result, src chan []string, quit chan int) {
defer close(dst)
for {
select {
case data := <-src:
dst <- countColumns(data) // some calculation function
case <-quit:
return // stop goroutine
}
}
}
You are calling quit right after the process started. Its illogical. This quit command is a force exit sequence, it should be called once an exit signal is detected, to force exit the current processing in best state possible, possibly all broken. In other words, you should be relying on the signal.Notify package to capture exit events, and notify your workers to quit. see https://golang.org/pkg/os/signal/#example_Notify
To write better parallel code, list at first the routines you need to manage the program lifetime, identify those you need to block onto to ensure the program has finished before exiting.
In your code, exists read, map. To ensure complete processing, the program main function must ensure that it captures a signal when map exits before exiting itself. Notice that the read function does not matter.
Then, you will also need the code required to capture an exit event from user input.
Overall, it appears we need to block onto two events to manage lifetime. Schematically,
func main(){
go read()
go map(mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
}
}
This simple code is good to process or die. Indeed, when the user event is caught, the program exits immediately, without giving a chance to others routines to do something required upon stop.
To improve those behaviors, you need first a way to signal the program wants to leave to other routines, second, a way to wait for those routines to finish their stop sequence before leaving.
To signal exit event, or cancellation, you can make use of a context.Context, pass it around to the workers, make them listen to it.
Again, schematically,
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
}
}
(more onto read and map later)
To wait for completion, many things are possible, for as long as they are thread safe. Usually, a sync.WaitGroup is being used. Or, in cases like yours where there is only one routine to wait for, we can re use the current mapDone channel.
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
<-mapDone
}
}
That is simple and straight forward. But it is not totally correct. The last mapDone chan might block forever and make the program unstoppable. So you might implement a second signal handler, or a timeout.
Schematically, the timeout solution is
func main(){
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
go map(ctx,mapDone)
go signal()
select {
case <-mapDone:
case <-sig:
cancel()
select {
case <-mapDone:
case <-time.After(time.Second):
}
}
}
You might also accumulate a signal handling and a timeout in the last select.
Finally, there are few things to tell about read and map context listening.
Starting with map, the implementation requires to read for context.Done channel regularly to detect cancellation.
It is the easy part, it requires to only update the select statement.
func GoCountColumns(ctx context.Context, dst chan Result, src chan []string) {
defer close(dst)
for {
select {
case <-ctx.Done():
<-time.After(time.Minute) // do something more useful.
return // quit. Notice the defer will be called.
case data := <-src:
dst <- countColumns(data) // some calculation function
}
}
}
Now the read part is bit more tricky as it is an IO it does not provide a selectable programming interface and listening to the context channel cancellation might seem contradictory. It is. As IOs are blocking, impossible to listen the context. And while reading from the context channel, impossible to read the IO. In your case, the solution requires to understand that your read loop is not relevant to your program lifetime (recall we only listen onto mapDone?), and that we can just ignore the context.
In other cases, if for example you wanted to restart at last byte read (so at every read, we increment an n, counting bytes, and we want to save that value upon stop). Then, a new routine is required to be started, and thus, multiple routines are to wait for completion. In such cases a sync.WaitGroup will be more appropriate.
Schematically,
func main(){
var wg sync.WaitGroup
processDone:=make(chan struct{})
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
wg.Add(1)
go saveN(ctx,&wg)
wg.Add(1)
go map(ctx,&wg)
go signal()
go func(){
wg.Wait()
close(processDone)
}()
select {
case <-processDone:
case <-sig:
cancel()
select {
case <-processDone:
case <-time.After(time.Second):
}
}
}
In this last code, the waitgroup is being passed around. Routines are responsible to call for wg.Done(), when all routines are done, the processDone channel is closed, to signal the select.
func GoCountColumns(ctx context.Context, dst chan Result, src chan []string, wg *sync.WaitGroup) {
defer wg.Done()
defer close(dst)
for {
select {
case <-ctx.Done():
<-time.After(time.Minute) // do something more useful.
return // quit. Notice the defer will be called.
case data := <-src:
dst <- countColumns(data) // some calculation function
}
}
}
It is undecided which patterns is preferred, but you might also see waitgroup being managed at call sites only.
func main(){
var wg sync.WaitGroup
processDone:=make(chan struct{})
ctx,cancel := context.WithCancel(context.WithBackground())
go read(ctx)
wg.Add(1)
go func(){
defer wg.Done()
saveN(ctx)
}()
wg.Add(1)
go func(){
defer wg.Done()
map(ctx)
}()
go signal()
go func(){
wg.Wait()
close(processDone)
}()
select {
case <-processDone:
case <-sig:
cancel()
select {
case <-processDone:
case <-time.After(time.Second):
}
}
}
Beyond all of that and OP questions, you must always evaluate upfront the pertinence of parallel processing for a given task. There is no unique recipe, practice and measure your code performances. see pprof.
There is way too much going on in this code. You should restructure your code into short functions that serve specific purposes to make it possible for someone to help you out easily (and help yourself as well).
You should read the following Go article, which goes into concurrency patterns:
https://blog.golang.org/pipelines
There are multiple ways to make one go-routine wait on some other work to finish. The most common ways are with wait groups (example I have provided) or channels.
func processSomething(...) {
...
}
func main() {
workers := &sync.WaitGroup{}
for i := 0; i < numWorkers; i++ {
workers.Add(1) // you want to call this from the calling go-routine and before spawning the worker go-routine
go func() {
defer workers.Done() // you want to call this from the worker go-routine when the work is done (NOTE the defer, which ensures it is called no matter what)
processSomething(....) // your async processing
}()
}
// this will block until all workers have finished their work
workers.Wait()
}
You can use a channel to block main until completion of a goroutine.
package main
import (
"log"
"time"
)
func main() {
c := make(chan struct{})
go func() {
time.Sleep(3 * time.Second)
log.Println("bye")
close(c)
}()
// This blocks until the channel is closed by the routine
<-c
}
No need to write anything into the channel. Reading is blocked until data is read or, which we use here, the channel is closed.
I'm trying to build some short of semaphore in Go. Although when when the channel receives the signal it just sleeps forever.
I've tried changing the way to sleep and the duration to sleep, but it stills just stopping forever.
Here a representation of what I've tried:
func main() {
backOffChan := make(chan struct{})
go func() {
time.Sleep(2)
backOffChan <- struct{}{}
}()
for {
select {
case <-backOffChan:
d := time.Duration(5 * time.Second)
log.Println("reconnecting in %s", d)
select {
case <-time.After(d):
log.Println("reconnected after %s", d)
return
}
default:
}
}
}
I expect that it just returns after printing the log message and returning.
Thanks!
This code has a number of problems, mainly a tight loop using for/select that may not allow the other goroutine to ever get to send on the channel. Since the default case is empty and the select has only one case, the whole select is unnecessary. The following code works correctly:
backOffChan := make(chan struct{})
go func() {
time.Sleep(1 * time.Millisecond)
backOffChan <- struct{}{}
}()
for range backOffChan {
d := time.Duration(10 * time.Millisecond)
log.Printf("reconnecting in %s", d)
select {
case <-time.After(d):
log.Printf("reconnected after %s", d)
return
}
}
This will wait until the backOffChan gets a message without burning a tight loop.
(Note that this code also addresses issues using log.Println with formatting directives - these were corrected to log.Printf).
See it in action here: https://play.golang.org/p/ksAzOq5ekrm
I have been reading "Building microservices with go" and the book introduces apache/go-resiliency/deadline package for handling timeouts.
deadline.go
// Package deadline implements the deadline (also known as "timeout") resiliency pattern for Go.
package deadline
import (
"errors"
"time"
)
// ErrTimedOut is the error returned from Run when the deadline expires.
var ErrTimedOut = errors.New("timed out waiting for function to finish")
// Deadline implements the deadline/timeout resiliency pattern.
type Deadline struct {
timeout time.Duration
}
// New constructs a new Deadline with the given timeout.
func New(timeout time.Duration) *Deadline {
return &Deadline{
timeout: timeout,
}
}
// Run runs the given function, passing it a stopper channel. If the deadline passes before
// the function finishes executing, Run returns ErrTimeOut to the caller and closes the stopper
// channel so that the work function can attempt to exit gracefully. It does not (and cannot)
// simply kill the running function, so if it doesn't respect the stopper channel then it may
// keep running after the deadline passes. If the function finishes before the deadline, then
// the return value of the function is returned from Run.
func (d *Deadline) Run(work func(<-chan struct{}) error) error {
result := make(chan error)
stopper := make(chan struct{})
go func() {
result <- work(stopper)
}()
select {
case ret := <-result:
return ret
case <-time.After(d.timeout):
close(stopper)
return ErrTimedOut
}
}
deadline_test.go
package deadline
import (
"errors"
"testing"
"time"
)
func takesFiveMillis(stopper <-chan struct{}) error {
time.Sleep(5 * time.Millisecond)
return nil
}
func takesTwentyMillis(stopper <-chan struct{}) error {
time.Sleep(20 * time.Millisecond)
return nil
}
func returnsError(stopper <-chan struct{}) error {
return errors.New("foo")
}
func TestDeadline(t *testing.T) {
dl := New(10 * time.Millisecond)
if err := dl.Run(takesFiveMillis); err != nil {
t.Error(err)
}
if err := dl.Run(takesTwentyMillis); err != ErrTimedOut {
t.Error(err)
}
if err := dl.Run(returnsError); err.Error() != "foo" {
t.Error(err)
}
done := make(chan struct{})
err := dl.Run(func(stopper <-chan struct{}) error {
<-stopper
close(done)
return nil
})
if err != ErrTimedOut {
t.Error(err)
}
<-done
}
func ExampleDeadline() {
dl := New(1 * time.Second)
err := dl.Run(func(stopper <-chan struct{}) error {
// do something possibly slow
// check stopper function and give up if timed out
return nil
})
switch err {
case ErrTimedOut:
// execution took too long, oops
default:
// some other error
}
}
1st question
// in deadline_test.go
if err := dl.Run(takesTwentyMillis); err != ErrTimedOut {
t.Error(err)
}
I have problem understanding the execution flow of above code. As far as I understand, because the takesTwentyMillis function sleeps longer than the set timeout duration of 10 milliseconds,
// in deadline.go
case <-time.After(d.timeout):
close(stopper)
return ErrTimedOut
time.After emits current time, and this case is selected. Then the stopper channel is closed and ErrTimeout is returned.
What I do not understand is, what closing the stopper channel does to the anonymous goroutine that might still be running
I think, when the stopper channel is closed, the below goroutine might still be running.
go func() {
result <- work(stopper)
}()
(Please correct me if I'm wrong here) I think after close(stopper), this goroutine will call takesTwentyMillis(=work function) with stopper channel as its parameter. And the function will proceed and sleep for 20 milliseconds and return nil to pass to result channel. And the main() ends here, right?
I do not see what is the point of closing the stopper channel here. The takesTwentyMillis function does not seem to use the channel within the function body anyway :(.
2nd question
// in deadline_test.go within TestDeadline()
done := make(chan struct{})
err := dl.Run(func(stopper <-chan struct{}) error {
<-stopper
close(done)
return nil
})
if err != ErrTimedOut {
t.Error(err)
}
<-done
This is the part I do not understand completely. I think when dl.Run is run, stopper channel is initialized. But because there is no value in the stopper channel, the function call will be blocked at <-stopper...but because I do not understand this code, I do not see why this code exists in the first place (i.e. what this code is trying to test, and how it is executed, etc).
3rd(additional) question regarding the 2nd question
So I understand that when Run function in the 2nd question triggers the stopper channel to close, the worker function gets the signal. And the worker closes the done channel and returns nil.
I used delve(=go debugger) to see this, and the gdb takes me to the goroutine in deadline.go after the line return nil.
err := dl.Run(func(stopper <-chan struct{}) error {
<-stopper
close(done)
--> return nil
})
After typing n for stepping over to the next line, delve takes me here
go func() {
--> result <- work(stopper)
}()
And the process kind of finishes here because when I type n again the command line prompts PASS and process exits. Why does the process finishes here? The work(stopper) seems to return nil, which should then be passed to result channel right? But this line does not seem to execute for some reason.
I know the main goroutine, which is the Run function, has already returned ErrTimedOut. So I guess it has something to do with this?
1st question
The use of the stopper channel is to signal the function e.g. takesTwentyMillis that it's deadline is reached and the caller no longer cares about its result. Usually this means that the worker function like takesTwentyMillis should check if the stopper channel is already closed so that it may cancel it's work. Still, checking for the stopper channel is the worker function's choice. It may or may not check the channel.
func takesTwentyMillis(stopper <-chan struct{}) error {
for i := 0; i < 20; i++ {
select {
case <-stopper:
// caller doesn't care anymore might as well stop working
return nil
case <-time.After(time.Second): // simulating work
}
}
// work is done
return nil
}
2nd question
This part of Deadline.Run() will close the stopper channel.
case <-time.After(d.timeout):
close(stopper)
Reading on a closed channel (<-stopper) will return a zero value for that channel immediately. I think it's just testing for a worker function that ultimately times-out.
I would like to pool for a token in on a timely base. The Token itself got also information about when it expires.
This should run forever until the user enters ctrl+c.
I tried the same with
span := timeLeft(*expDate)
timer := time.NewTimer(span).C
ticker := time.NewTicker(time.Second * 5).C
which also does not work (the application hangs after count down). So I decided to try it with <- time.After(...)
This is my code that does not work. You will see the count down but it never breaks on expiration.
This is is a small extract with the polling logic for simplicity sake in a main.go:
func refreshToken() (time.Time, error) {
//This should simulate a http request and returns the new target date for the next refresh
time.Sleep(2 * time.Second)
return time.Now().Add(10 * time.Second), nil
}
func timeLeft(d time.Time) time.Duration {
exactLeft := d.Sub(time.Now())
floorSeconds := math.Floor(exactLeft.Seconds())
return time.Duration(floorSeconds) * time.Second
}
func poller(expDate *time.Time) {
exp := timeLeft(*expDate)
done := make(chan bool)
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
for {
select {
// print time left on the screen
case <-time.After(3 * time.Second):
go func() {
fmt.Printf("\rNext Token refresh will be in: %v", timeLeft(*expDate))
}()
// mark as done when date is due
case <-time.After(exp):
fmt.Println("Refresh token now!")
done <- true
// exit app
case <-c:
os.Exit(0)
break
// exit function when done
case <-done:
break
}
}
}
func main() {
var expiration time.Time
expiration = time.Now().Add(10 * time.Second)
// loop and refresh token as long as the app does not exit
for {
poller(&expiration)
ex, err := refreshToken()
expiration = ex
if err != nil {
panic(err)
}
fmt.Println("next round poller")
}
}
I am also not sure if I need the done channel at all?
What is required to listen to two timers and call itself until someone hits ctrl+c?
Found a solution. While #ain was right wit the buffered done channel, it is not really required in the code now. It worked without it.
The trick did have the timeroutside of the for loop and the ticker within it. Reason is the time.Afteris a functhat return a new channel on every iteration. This seams perfectly fine for the ticker, but not for the timer.
With the following changes it worked =) ...
func poller(expDate *time.Time) {
exp := timeLeft(*expDate)
timer := time.After(exp)
fmt.Printf("Next Token refresh will be in: %v\n", exp)
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
for {
select {
// print time left on the screen
case <-time.After(3 * time.Second):
go func() {
fmt.Printf("\r ")
fmt.Printf("\r%v", timeLeft(*expDate))
}()
// mark as done when date is due
case <-timer:
fmt.Println("Refresh token now!")
return
// exit app
case <-c:
os.Exit(0)
break
}
}
}
I want to stop goroutine execution on timeout. But it seems like it is not working for me. I am using iris framework.
type Response struct {
data interface{}
status bool
}
func (s *CicService) Find() (interface{}, bool) {
ch := make(chan Response, 1)
go func() {
time.Sleep(10 * time.Second)
fmt.Println("test")
fmt.Println("test1")
ch <- Response{data: "data", status: true}
}()
select {
case <-ch:
fmt.Println("Read from ch")
res := <-ch
return res.data, res.status
case <-time.After(50 * time.Millisecond):
return "Timed out", false
}
}
Output:
Timed out
test
test1
Expected Output:
Timed out
Can somebody point out what is missing here? It does timeout but still runs goroutine to print test and test1. I just want to stop the execution of goroutine as soon as there is timeout.
There's no good way to "interrupt" the execution of a goroutine in the middle of it's execution.
Go uses a fork-join model of concurrency, this means that you "fork" creating a new goroutine and then have no control over how that goroutine is scheduled until you get to a "join point". A join point is some kind of synchronisation between more than one goroutine. e.g. sending a value on a channel.
Taking your specific example, this line:
ch <- Response{data: "data", status: true}
... will be able to send the value, no matter what because it's a buffered channel. But the timeout's you've created:
case <-time.After(50 * time.Millisecond):
return "Timed out", false
These timeouts are on the "receiver" or "reader" of the channel, and not on the "sender". As mentioned at the top of this answer, there's no way to interrupt the execution of a goroutine without using some synchronisation techniques.
Because the timeouts are on the goroutine "reading" from the channel, there's nothing to stop the execution of the goroutine that send on the channel.
Best way to control your goroutine processing is context (std go library).
You can cancel something inside goroutine and stop execution without possible goroutine leak.
Here simple example with cancel by timeout for your case.
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
ch := make(chan Response, 1)
go func() {
time.Sleep(1 * time.Second)
select {
default:
ch <- Response{data: "data", status: true}
case <-ctx.Done():
fmt.Println("Canceled by timeout")
return
}
}()
select {
case <-ch:
fmt.Println("Read from ch")
case <-time.After(500 * time.Millisecond):
fmt.Println("Timed out")
}
You have a gouroutine leaks, you must handle some done action to return the goroutine before timeout like this:
func (s *CicService) Find() (interface{}, bool) {
ch := make(chan Response, 1)
done := make(chan struct{})
go func() {
select {
case <-time.After(10 * time.Second):
case <-done:
return
}
fmt.Println("test")
fmt.Println("test1")
ch <- Response{data: "data", status: true}
}()
select {
case res := <-ch:
return res.data, res.status
case <-time.After(50 * time.Millisecond):
done <- struct{}{}
return "Timed out", false
}
}