I'm posting here for the first time because I couldn't find a clean solution on the internet.
My goal is simple, I need to create a background operation (goroutine or process or whatever...) that I can kill properly (not leave in the background).
I've tried many things like using chan or context.
But I never could find the right way to avoid leaks.
Here is an example:
package main
import (
"log"
"strconv"
"runtime"
"time"
"math/rand"
)
func main() {
log.Println("goroutines: " + strconv.Itoa(runtime.NumGoroutine()))
func1()
leak := ""
if runtime.NumGoroutine() > 1 {
leak = " there is one LEAK !!"
}
log.Println("goroutines: " + strconv.Itoa(runtime.NumGoroutine()) + leak)
}
func func1() {
done := make(chan struct{})
quit := make(chan struct{})
go func() {
log.Println("goroutines: " + strconv.Itoa(runtime.NumGoroutine()))
select {
case <-quit:
log.Println("USEFUL ???")
return
default:
func2()
done<-struct{}{}
}
}()
select {
case <-time.After(4 * time.Second):
quit<-struct{}{}
log.Println("TIMEOUT")
case <-done:
log.Println("NO TIMEOUT")
}
}
func func2() {
log.Println("JOB START")
rand.Seed(time.Now().UnixNano())
val := rand.Intn(10)
log.Println("JOB DURATION: " + strconv.Itoa(val))
time.Sleep(time.Duration(val) * time.Second) // fake a long process with an unknown duration
log.Println("JOB DONE")
}
In this example, if the job is done before the 4 seconds timeout, everything is fine and the final number of goroutines will be 1, but otherwise it will be 2 like every example that I could find.
But it's just an example, maybe it's not possible with goroutines maybe it's even not possible in Go..
Your problem is here:
quit := make(chan struct{})
go func() {
for {
select {
case <-quit:
log.Println("USEFUL ???")
return
default:
func2()
quit<-struct{}{}
return
}
}
}()
This goroutine is signalling itself on an unbuffered channel. That means when it gets to the quit<-struct{}{}, that send will block forever, because it's waiting on itself to receive. It's not entirely clear how this is intended to work, though; there are a few odd things happening here:
A goroutine is signalling itself via channel, which seems wrong - it shouldn't need to communicate to itself
The channel and loop seem unnecessary; quit<-struct{}{} could be replaced with log.Println("USEFUL ???") and the whole for/select/channel business could be deleted
The function returns in each case of the select, so putting it in a loop is pointless anyway - there is no scenario where this code can ever execute a second iteration of the loop
Related
I'm trying to get a better understanding of channel in GO.
Want 5 routines to be running at all times. At specific times during the first routine, I want to try to start another routine. Assuming 5 routines are already running, I want to queue up the next routine and run it as soon as one of the other routines has been completed.
My logic was call pass a message to spawner, check to see if there are 5 processes already running, if so, keep waiting until there isn't, and start up. From what I can tell, is is p.complete <- struct{}{} isn't working as expecting and removing a process. It works fine outside of the go routine.
package main
import (
"fmt"
"os"
"os/signal"
"syscall"
"time"
)
type ProcessManager struct {
spawner chan struct{}
complete chan struct{}
process int
}
func NewProcessManager() *ProcessManager {
return &ProcessManager{
spawner: make(chan struct{}),
complete: make(chan struct{}),
process: 0,
}
}
func (p *ProcessManager) Run(limit int) {
for {
select {
case <-p.spawner:
for {
if p.process <= limit {
fmt.Println("breaking for new process")
break
}
time.Sleep(time.Second * 10)
}
p.process++
go func() {
fmt.Println("+ Starting goroutine")
p.spawner <- struct{}{}
time.Sleep(time.Second * 2)
fmt.Println("- Stopping goroutine")
p.complete <- struct{}{}
}()
case <-p.complete:
fmt.Println("complete")
p.process--
}
}
}
func main() {
interruptChannel := make(chan os.Signal, 1)
signal.Notify(interruptChannel, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT)
pm := NewProcessManager()
go pm.Run(5)
pm.spawner <- struct{}{}
<-interruptChannel
}
Managed to solve this by switching process to process: make(chan int, 4)
Then I just used this to block instead of the for loop p.process <- 1
and then use this to mark a routine as completed <-p.process
The length of the channel (4) will determine the max number of processed allowed to run at any given time. Updated test I provided below:
func NewProcessManager() *ProcessManager {
return &ProcessManager{
spawner: make(chan struct{}),
complete: make(chan struct{}),
process: make(chan int, 4),
}
}
func (p *ProcessManager) Run() {
for {
select {
case <-p.spawner:
p.process <- 1
go func() {
fmt.Println("+ Starting goroutine")
// do stuff before starting next routine
p.spawner <- struct{}{}
// do stuff rest of stuff
fmt.Println("- Stopping goroutine")
<-p.process
}()
}
}
The whole purpose of the ProcessManager in my eyes is to serialize access to the metadata keeping track of running processes. As such, it runs sequential.
case <-p.spawner:
for {
if p.process <= limit {
fmt.Println("breaking for new process")
break
}
time.Sleep(time.Second * 10)
}
In this section of code, the goroutine is sleeping. While sleeping, the complete case in the select statement cannot run.
Recall the following property of unbuffered channels:
By default, sends and receives block until the other side is ready.
Because the channels are unbuffered, this line of code must block until that complete case is triggered:
p.complete <- struct{}{}
I noticed a weird behavior that is not entirely obvious when using select statement inside a loop, for example if I have:
package main
import (
"fmt"
"time"
)
func main() {
done := make(chan bool)
go func() {
fmt.Println("here we go")
for {
select {
case <-done:
fmt.Println("Bye")
return
default:
fmt.Println("this should continue to print")
}
fmt.Println("loop continues")
}
}()
time.Sleep(2 * time.Second)
done <- true
}
I would assume that the default case should only print for no longer than 2 seconds, however this is not the case, at least in my CPU it is lasting for about 10 seconds before seeing the Bye message.
However if I add time.Sleep(1 * time.Second) right after "loop continues" It is more or less in sync with the 2 second sleep before done <- true.
Given this my assumption is that print calls are getting stacked and the extended completion time is due to the call stack taking longer to get through.
Is is this a correct description of what's happening? Or am I missing something more obvious.
Edit: To put my hypothesis to test I did this and it now works as expected, it takes 2 seconds to finish the program:
go func() {
fmt.Println("here we go")
for {
select {
case <-done:
fmt.Println("Bye")
return
default:
}
}
}()
Trying to get 2 Goroutines to play nicely within my project. Even though this is all sorta working, I am 100% sure it is being done very poorly... maybe even wrong...ly (is that even a word?).
Anyway, the basic concept of this project is to run some repeated work over a pre-selected amount of time while allowing the ability to abort the run before the time is up.
Here's the mess I have so far (I've only included the important parts for everyone's sanity... and to hide my horrible coding):
Two Goroutines:
BenchTimer() is a simple run-time countdown timer that does some repeating work for a set amount of time.
AbortTest() is a keyboard listener used to catch an 'ESC' (or whatever else I want) keypress from the keyboard to act as a "User Abort".
Each Goroutine, upon a successful run (i.e. BenchTimer() completes the countdown OR AbortTest() catches an abort keypress), sends a message down a common channel testAction. I use the one channel since this is an OR kinda thing (i.e. You can't get a completed countdown and an abort at the same time.). If BenchTimer() completes, then it sends "Complete" down the channel. If AbortTest() "completes" it sends "Abort" down the channel. [So far this all seems to be working...]
The next problem I ran into with this setup is how to kill the Goroutine that wasn't the "winner"... (i.e. If BenchTimer() completes normally, then I need to somehow kill AbortTest()... and vice-versa.) After a bunch of searching, I found that it isn't possible to kill a Goroutine externally, but it can be done internally... so I came up with using a second channel for each Goroutine to act as a sort of "kill signal" line: killAbortTest and killBenchTimer.
To tie this all together, I evaluate the result of the testAction channel. Because this channel will tell me which Goroutine "won", I can use this knowledge to send the correct (i.e. opposite) "kill signal" to have the "loser" Goroutine self-terminate.
Note: ... just means other code exists, but was removed due to not being needed for this post.
func main() {
...
testAction := make(chan string) // Action Result (Timer "Complete" or User "Abort")
killAbortTest := make(chan bool) // Kill AbortTest() Goroutine when BenchTimer() completes.
killBenchTimer := make(chan bool) // Kill BenchTimer() Goroutine when AbortTest() completes.
go BenchTimer(testAction, killBenchTimer) // Run BenchTimer() as Goroutine
go AbortTest(testAction, killAbortTest) // Run AbortTest() as Goroutine
// Program should wait here until it receives something on testAction channel.
actionVal := <-testAction
// Evaluate the testAction to kill the "loser" Goroutine
switch actionVal {
case "Abort":
killBenchTimer <- true // Abort received, signal BenchTimer() Goroutine to Quit
fmt.Println()
fmt.Println("Test Aborted")
case "Complete":
killAbortTest <- true // Countdown finished, signal AbortTest() Goroutine to Quit
fmt.Println()
fmt.Println("Test Completed")
}
...
}
// AbortTest - Listen for User Abort
func AbortTest(c chan<- string, k <-chan bool) {
if err := keyboard.Open(); err != nil {
panic(err)
}
defer func() {
_ = keyboard.Close()
}()
for {
select {
case <-k:
return
default:
_, key, err := keyboard.GetKey() // Poll for keypress
if err != nil {
panic(err)
}
if key == keyboard.KeyEsc { // ESC key was pressed
c <- "Abort"
return
}
}
}
}
// BenchTimer - Countdown Timer for BenchTest
func BenchTimer(c chan<- string, k <-chan bool) {
seconds := 0
switch testTime {
case "2-minute (fast)":
seconds = 120
case "5-minute (short)":
seconds = 300
case "10-minute (long)":
seconds = 600
case "20-minute (slow)":
seconds = 1200
}
ticker := time.Tick(time.Second)
for i := seconds; i >= 0; i-- {
select {
case <-k: // Kill Signal Received
return
default:
<-ticker
...
}
}
c <- "Complete"
}
There it is. My mess. There are many like it, but this one is my own. Like I said, it sorta works now, but I'm looking to make it better.
Am I just overthinking this whole process and making it way more complex than it needs to be?
Any help would be great.
This is the simplest example I can think of. I left out the keyboard part, but its basically the idea behind your code :
As mentioned by mh-cbon, this is safe :
package main
import (
"context"
"fmt"
"math/rand"
"sync"
"time"
)
var wg sync.WaitGroup
func main() {
rand.Seed(time.Now().UnixNano())
ctx, cancel := context.WithCancel(context.Background())
wg.Add(2)
go DoSomeTask(ctx, cancel)
go CancelTask(ctx, cancel)
wg.Wait()
}
func DoSomeTask(ctx context.Context, cancel func()) {
defer wg.Done()
defer cancel() // force cancellation
for i := 1; i < 10; i++ {
select {
case <-ctx.Done():
fmt.Println("context cancelled")
return
case <-time.After(time.Second):
}
fmt.Println("Done something!", i)
}
}
func CancelTask(ctx context.Context, cancel func()) {
defer wg.Done()
defer cancel() // force cancellation
duration := time.Second * time.Duration((rand.Intn(20-1) + 1))
fmt.Println("Will cancel in ", duration, " seconds!")
select {
case <-ctx.Done():
fmt.Println("context cancelled")
case <-time.After(duration):
}
}
Here's the code:
import "fmt"
func main() {
messages := make(chan string, 1)
go func(c chan string) {
c <- "Hi"
}(messages)
select {
case msg := <-messages:
fmt.Println("received message", msg)
default:
fmt.Println("no message received")
}
}
It outputs no message received.
Or this code:
import (
"fmt"
"time"
)
func f(from string) {
for i := 0; i < 3; i++ {
fmt.Println(from, ":", i)
}
}
func main() {
go f("goroutine")
go func(msg string) {
fmt.Println(msg)
}("going")
time.Sleep(time.Second)
fmt.Println("done")
}
unexpectedly prints
going
goroutine : 0
goroutine : 1
goroutine : 2
Despite of the fact that goroutine with going called later than the counter. Why?
There are no execution ordering guarantees between multiple goroutines. Only when two goroutines exchange data using a channel, or synchronize using another synchronization mechanism ordering guarantees can be established. In your case, you happened to observe an execution where one goroutine happens to run before another. When you run it multiple times, you may observe different orderings.
Say I have a function
type Foo struct {}
func (a *Foo) Bar() {
// some expensive work - does some calls to redis
}
which gets executed within a goroutine at some point in my app. Lots of these may be executing at any given point. Prior to application termination, I would like to ensure all remaining goroutines have finished their work.
Can I do something like this:
type Foo struct {
wg sync.WaitGroup
}
func (a *Foo) Close() {
a.wg.Wait()
}
func (a *Foo) Bar() {
a.wg.Add(1)
defer a.wg.Done()
// some expensive work - does some calls to redis
}
Assuming here that Bar gets executed within a goroutine and many of these may be running at a given time and that Bar should not be called once Close is called and Close is called upon a sigterm or sigint.
Does this make sense?
Usually I would see the Bar function look like this:
func (a *Foo) Bar() {
a.wg.Add(1)
go func() {
defer a.wg.Done()
// some expensive work - does some calls to redis
}()
}
Yes, WaitGroup is the right answer. You can use WaitGroup.Add at anytime that the counter is greater than zero, as per doc.
Note that calls with a positive delta that occur when the counter is zero must happen before a Wait. Calls with a negative delta, or calls with a positive delta that start when the counter is greater than zero, may happen at any time. Typically this means the calls to Add should execute before the statement creating the goroutine or other event to be waited for. If a WaitGroup is reused to wait for several independent sets of events, new Add calls must happen after all previous Wait calls have returned. See the WaitGroup example.
But one trick is that, you should always keep the counter greater than zero, before Close is called. That usually means you should call wg.Add in NewFoo (or something like that) and wg.Done in Close. And to prevent multiple calls to Done ruining the wait group, you should wrap Close into sync.Once. You may also want to prevent new Bar() from being called.
WaitGroup is one way, however, the Go team introduced the errgroup for your use case exactly. The most inconvenient part of leaf bebop's answer, is the disregard for error handling. Error handling is the reason errgroup exists. And idiomatic go code should never swallow errors.
However, keeping the signatures of your Foo struct, (except a cosmetic workerNumber)—and no error handling—my proposal looks like this:
package main
import (
"fmt"
"math/rand"
"time"
"golang.org/x/sync/errgroup"
)
type Foo struct {
errg errgroup.Group
}
func NewFoo() *Foo {
foo := &Foo{
errg: errgroup.Group{},
}
return foo
}
func (a *Foo) Bar(workerNumber int) {
a.errg.Go(func() error {
select {
// simulates the long running clals
case <-time.After(time.Second * time.Duration(rand.Intn(10))):
fmt.Println(fmt.Sprintf("worker %d completed its work", workerNumber))
return nil
}
})
}
func (a *Foo) Close() {
a.errg.Wait()
}
func main() {
foo := NewFoo()
for i := 0; i < 10; i++ {
foo.Bar(i)
}
<-time.After(time.Second * 5)
fmt.Println("Waiting for workers to complete...")
foo.Close()
fmt.Println("Done.")
}
The benefit here, is that if you introduce error handling in your code (you should), you only need to slightly modify this code: In short, errg.Wait() would return the first redis error, and Close() could propagate this up through the stack (to main, in this case).
Utilizing the context.Context package as well, you would also be able to immediately cancel any running redis call, if one fails. There are examples of this in the errgroup documentation.
I think waiting indefinitely for all the go routines to finish is not the right way.
If one of the go routines get blocked or say it hangs due to some reason and never terminates successfully, what should happen kill the process or wait for go routines to finish ?
Instead you should wait with some timeout and kill the app irrespective of whether all the routines have finished or not.
Edit: Original ans
Thanks #leaf bebop for pointing it out. I misunderstood the question.
Context package can be used to signal all the go routines to handle kill signal.
appCtx, cancel := context.WithCancel(context.Background())
Here appCtx will have to be passed to all the go routines.
On exit signal call cancel().
functions running as go routines can handle how to handle cancel context.
Using context cancellation in Go
A pattern i use a lot is: https://play.golang.org/p/ibMz36TS62z
package main
import (
"fmt"
"sync"
"time"
)
type response struct {
message string
}
func task(i int, done chan response) {
time.Sleep(1 * time.Second)
done <- response{fmt.Sprintf("%d done", i)}
}
func main() {
responses := GetResponses(10)
fmt.Println("all done", len(responses))
}
func GetResponses(n int) []response {
donequeue := make(chan response)
wg := sync.WaitGroup{}
for i := 0; i < n; i++ {
wg.Add(1)
go func(value int) {
defer wg.Done()
task(value, donequeue)
}(i)
}
go func() {
wg.Wait()
close(donequeue)
}()
responses := []response{}
for result := range donequeue {
responses = append(responses, result)
}
return responses
}
this makes it easy to throttle as well: https://play.golang.org/p/a4MKwJKj634
package main
import (
"fmt"
"sync"
"time"
)
type response struct {
message string
}
func task(i int, done chan response) {
time.Sleep(1 * time.Second)
done <- response{fmt.Sprintf("%d done", i)}
}
func main() {
responses := GetResponses(10, 2)
fmt.Println("all done", len(responses))
}
func GetResponses(n, concurrent int) []response {
throttle := make(chan int, concurrent)
for i := 0; i < concurrent; i++ {
throttle <- i
}
donequeue := make(chan response)
wg := sync.WaitGroup{}
for i := 0; i < n; i++ {
wg.Add(1)
<-throttle
go func(value int) {
defer wg.Done()
throttle <- 1
task(value, donequeue)
}(i)
}
go func() {
wg.Wait()
close(donequeue)
}()
responses := []response{}
for result := range donequeue {
responses = append(responses, result)
}
return responses
}