How to close a goroutine which runs the flow based logics? - go

I have some goroutine logics like this:
go func() {
do_things_0()
do_things_1()
do_things_2()
do_things_3()
...
...
} ()
When the service receives a request, it will create such goroutine. And the goroutine maybe memory consuming and needs to run more than 30 minutes.
Sometimes, the service may notice the lack of memory, and needs to terminate some goroutines.
My questions are:
How can I terminate the goroutine in the above example?
Is there any way to know the used memory of each goroutine?
Update
I read other SO answers that goroutine can't be killed outside
I suppose that send a signal to the channel handled by the goroutine to make the goroutine quit is only suitable for the for loop based logics.
I am looking for some best practice to close the goroutine for the flow based logics.

You should be able to adapt this easily to a for loop if you range over a list of the functions that are your steps:
package main
import (
"fmt"
"time"
)
func a() { fmt.Printf("a") }
func b() { fmt.Printf("b") }
func c() { fmt.Printf("c") }
func d() { fmt.Printf("d") }
func e() { fmt.Printf("e") }
func f(quit <-chan struct{}) {
for i := 0; i < 10000; i++ {
for _, fn := range []func(){a, b, c, d, e} {
select {
case _, _ = <-quit:
fmt.Println("quit f")
return
default:
fn()
time.Sleep(1 * time.Millisecond)
}
}
}
}
func main() {
quit := make(chan struct{})
fmt.Println("go f")
go f(quit)
fmt.Println("sleep")
time.Sleep(100 * time.Millisecond)
fmt.Println("\nquit")
close(quit)
time.Sleep(10 * time.Millisecond)
fmt.Println("exit")
}
Try it on the playground.
The outer loop is just there to repeat the steps for long enough that we can witness the quit command happening.

Related

Why is my custom "Timeout waiting on channel" not working and how to make it work?

I'm trying to make my own custom "Channel Timeout". More precisely, the time.After function inside it. In other words, I'm trying to implement this:
select {
case v := <-c:
fmt.Println("Value v: ", v)
case <-time.After(1 * time.Second):
fmt.Println("Timeout")
}
But unfortunately I ran into a problem.
My implementation is:
func waitFun(wait int) chan int {
time.Sleep(time.Duration(wait) * time.Second)
c := make(chan int)
c <- wait
return c
}
func main() {
c := make(chan int)
go func() {
time.Sleep(3 * time.Second)
c <- 10
}()
select {
case v := <-c:
fmt.Println("Value v: ", v)
case <-waitFun(1):
fmt.Println("Timeout")
}
time.Sleep(4 * time.Second)
}
For some reason this doesn't work. The error is: all goroutines are asleep - deadlock!. I understand that at some point all goroutines (main and the one introduced with an anonymous function) will go to sleep, but is this the reason for the error or something else? I mean, it's not "infinite sleep" or "infinite wait for something", so it's not a dead lock, right? Additionally, using time.After also sleeps goroutines, right? What do I need to change to make my program work correctly?
select statement will evaluate all cases when it is run, so this code will actually wait for the waitFun to return before it starts listening to any channels. You have to change the waitFun to return the channel immediately:
func waitFun(wait int) chan int {
c := make(chan int)
go func() {
time.Sleep(time.Duration(wait) * time.Second)
c <- wait
}()
return c
}

GO routine never exits based on stop condition - unable to find the reason

In this example, we have a worker. The idea here is simulate clean shutdown of all go routines based on a condition.
In this case, go routines get spun - based on workers count. Each go routine reads the channel, does some work and sends output to the outputChannel.
The main go routine reads this output and prints it. To simulate a stop condition, the doneChannel is closed. Expected outcome is that select inside each go routine will pick this up and execute return which in turn will call the defer println. The actual output is that it never gets called and main exits.
Not sure what's the reason behind this.
package main
import (
"log"
"time"
)
const jobs = 100
const workers = 1
var timeout = time.After(5 * time.Second)
func main() {
doneChannel := make(chan interface{})
outputChannel := make(chan int)
numberStream := generator()
for i := 1; i <= workers; i++ {
go worker(doneChannel, numberStream, outputChannel)
}
// listen for output
loop:
for {
select {
case i := <-outputChannel:
log.Println(i)
case <-timeout:
// before you timeout cleanup go routines
break loop
}
}
close(doneChannel)
time.Sleep(5 * time.Second)
log.Println("main exited")
}
func generator() <-chan int {
defer log.Println("generator completed !")
c := make(chan int)
go func() {
for i := 1; i <= jobs; i++ {
c <- i
}
defer close(c)
}()
return c
}
func worker(done <-chan interface{}, c <-chan int, output chan<- int) {
// this will be a go routine
// Do some work and send results to output Channel.
// Incase if the done channel is called kill the go routine.
defer log.Println("go routines exited")
for {
select {
case <-done:
log.Println("here")
return
case i := <-c:
time.Sleep(1 * time.Second) // worker delay
output <- i * 100
}
}
}
When your main loop finishes during the timeout, you continue your program and
Close done channel
Print message
Exit
There is no reason to wait for any goroutine to process the signal of this channel.
If you add a small sleep you will see some messages
In real scenarios we use a waitgroup to be sure all goroutine finish properly

Do the 'done' channel and default case lead to goroutine leak

There are two similar cases I would like to compare with you - the only difference is the way of handling values generation
the first case: values generation in one of case of select
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
generateValues := func(done <-chan interface{}) <-chan int {
values := make(chan int)
go func() {
defer fmt.Println("All values generated")
defer close(values)
for {
select {
case <-done:
fmt.Println("DONE")
return
case values <- rand.Int():
fmt.Println("Generated")
}
}
}()
return values
}
done := make(chan interface{})
values := generateValues(done)
for i := 0; i < 3; i++ {
fmt.Printf("Received value: %v\n", <-values)
}
fmt.Println("Closing the channel")
close(done)
time.Sleep(5 * time.Second)
}
Go playground: https://go.dev/play/p/edlOSqdZ9ys
the second case: value generation in default case
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
generateValues := func(done <-chan interface{}) <-chan int {
values := make(chan int)
go func() {
defer fmt.Println("All values generated")
defer close(values)
for {
select {
case <-done:
fmt.Println("DONE")
return
default:
values <- rand.Int()
fmt.Println("Generated")
}
}
}()
return values
}
done := make(chan interface{})
values := generateValues(done)
for i := 0; i < 3; i++ {
fmt.Printf("Received value: %v\n", <-values)
}
fmt.Println("Closing the channel")
close(done)
time.Sleep(5 * time.Second)
}
Go playground: https://go.dev/play/p/edlOSqdZ9ys
As you can see the second case seems leading to the situation that the 'Done' is not printed and calls related to 'defer' are not invoked. I believe we have here goroutine leaks, but cannot clearly explain it. I expected the same behaviour like in the first case.
Could someone please help in understanding the difference between them?
In the second case, the generating goroutine is not likely to receive the done message. Since the default case is always enabled, after the main goroutine receives the last value, the generating goroutine makes another round, falls into the default case, and blocks waiting to write to the values channel. While it is waiting there, the main goroutine closes the done channel and terminates.
This doesn't mean there doesn't exist an execution path where the generating goroutine doesn't receive the done channel. For this to happen, immediately after sending the last value, the generating goroutine must be preempted by the main goroutine that runs until it closes the done channel. Then if the generating goroutine is scheduled, it can receive the done signal. However, this sequence of events is highly unlikely.

How to coordinate shutdown with many goroutines

Say I have a function
type Foo struct {}
func (a *Foo) Bar() {
// some expensive work - does some calls to redis
}
which gets executed within a goroutine at some point in my app. Lots of these may be executing at any given point. Prior to application termination, I would like to ensure all remaining goroutines have finished their work.
Can I do something like this:
type Foo struct {
wg sync.WaitGroup
}
func (a *Foo) Close() {
a.wg.Wait()
}
func (a *Foo) Bar() {
a.wg.Add(1)
defer a.wg.Done()
// some expensive work - does some calls to redis
}
Assuming here that Bar gets executed within a goroutine and many of these may be running at a given time and that Bar should not be called once Close is called and Close is called upon a sigterm or sigint.
Does this make sense?
Usually I would see the Bar function look like this:
func (a *Foo) Bar() {
a.wg.Add(1)
go func() {
defer a.wg.Done()
// some expensive work - does some calls to redis
}()
}
Yes, WaitGroup is the right answer. You can use WaitGroup.Add at anytime that the counter is greater than zero, as per doc.
Note that calls with a positive delta that occur when the counter is zero must happen before a Wait. Calls with a negative delta, or calls with a positive delta that start when the counter is greater than zero, may happen at any time. Typically this means the calls to Add should execute before the statement creating the goroutine or other event to be waited for. If a WaitGroup is reused to wait for several independent sets of events, new Add calls must happen after all previous Wait calls have returned. See the WaitGroup example.
But one trick is that, you should always keep the counter greater than zero, before Close is called. That usually means you should call wg.Add in NewFoo (or something like that) and wg.Done in Close. And to prevent multiple calls to Done ruining the wait group, you should wrap Close into sync.Once. You may also want to prevent new Bar() from being called.
WaitGroup is one way, however, the Go team introduced the errgroup for your use case exactly. The most inconvenient part of leaf bebop's answer, is the disregard for error handling. Error handling is the reason errgroup exists. And idiomatic go code should never swallow errors.
However, keeping the signatures of your Foo struct, (except a cosmetic workerNumber)—and no error handling—my proposal looks like this:
package main
import (
"fmt"
"math/rand"
"time"
"golang.org/x/sync/errgroup"
)
type Foo struct {
errg errgroup.Group
}
func NewFoo() *Foo {
foo := &Foo{
errg: errgroup.Group{},
}
return foo
}
func (a *Foo) Bar(workerNumber int) {
a.errg.Go(func() error {
select {
// simulates the long running clals
case <-time.After(time.Second * time.Duration(rand.Intn(10))):
fmt.Println(fmt.Sprintf("worker %d completed its work", workerNumber))
return nil
}
})
}
func (a *Foo) Close() {
a.errg.Wait()
}
func main() {
foo := NewFoo()
for i := 0; i < 10; i++ {
foo.Bar(i)
}
<-time.After(time.Second * 5)
fmt.Println("Waiting for workers to complete...")
foo.Close()
fmt.Println("Done.")
}
The benefit here, is that if you introduce error handling in your code (you should), you only need to slightly modify this code: In short, errg.Wait() would return the first redis error, and Close() could propagate this up through the stack (to main, in this case).
Utilizing the context.Context package as well, you would also be able to immediately cancel any running redis call, if one fails. There are examples of this in the errgroup documentation.
I think waiting indefinitely for all the go routines to finish is not the right way.
If one of the go routines get blocked or say it hangs due to some reason and never terminates successfully, what should happen kill the process or wait for go routines to finish ?
Instead you should wait with some timeout and kill the app irrespective of whether all the routines have finished or not.
Edit: Original ans
Thanks #leaf bebop for pointing it out. I misunderstood the question.
Context package can be used to signal all the go routines to handle kill signal.
appCtx, cancel := context.WithCancel(context.Background())
Here appCtx will have to be passed to all the go routines.
On exit signal call cancel().
functions running as go routines can handle how to handle cancel context.
Using context cancellation in Go
A pattern i use a lot is: https://play.golang.org/p/ibMz36TS62z
package main
import (
"fmt"
"sync"
"time"
)
type response struct {
message string
}
func task(i int, done chan response) {
time.Sleep(1 * time.Second)
done <- response{fmt.Sprintf("%d done", i)}
}
func main() {
responses := GetResponses(10)
fmt.Println("all done", len(responses))
}
func GetResponses(n int) []response {
donequeue := make(chan response)
wg := sync.WaitGroup{}
for i := 0; i < n; i++ {
wg.Add(1)
go func(value int) {
defer wg.Done()
task(value, donequeue)
}(i)
}
go func() {
wg.Wait()
close(donequeue)
}()
responses := []response{}
for result := range donequeue {
responses = append(responses, result)
}
return responses
}
this makes it easy to throttle as well: https://play.golang.org/p/a4MKwJKj634
package main
import (
"fmt"
"sync"
"time"
)
type response struct {
message string
}
func task(i int, done chan response) {
time.Sleep(1 * time.Second)
done <- response{fmt.Sprintf("%d done", i)}
}
func main() {
responses := GetResponses(10, 2)
fmt.Println("all done", len(responses))
}
func GetResponses(n, concurrent int) []response {
throttle := make(chan int, concurrent)
for i := 0; i < concurrent; i++ {
throttle <- i
}
donequeue := make(chan response)
wg := sync.WaitGroup{}
for i := 0; i < n; i++ {
wg.Add(1)
<-throttle
go func(value int) {
defer wg.Done()
throttle <- 1
task(value, donequeue)
}(i)
}
go func() {
wg.Wait()
close(donequeue)
}()
responses := []response{}
for result := range donequeue {
responses = append(responses, result)
}
return responses
}

Golang, proper way to restart a routine that panicked

I have the following sample code. I want to maintain 4 goroutines running at all times. They have the possibility of panicking. In the case of the panic, I have a recover where I restart the goroutine.
The way I implemented works but I am not sure whether its the correct and proper way to do this. Any thoughts
package main
import (
"fmt"
"time"
)
var gVar string
var pCount int
func pinger(c chan int) {
for i := 0; ; i++ {
fmt.Println("adding ", i)
c <- i
}
}
func printer(id int, c chan int) {
defer func() {
if err := recover(); err != nil {
fmt.Println("HERE", id)
fmt.Println(err)
pCount++
if pCount == 5 {
panic("TOO MANY PANICS")
} else {
go printer(id, c)
}
}
}()
for {
msg := <-c
fmt.Println(id, "- ping", msg, gVar)
if msg%5 == 0 {
panic("PANIC")
}
time.Sleep(time.Second * 1)
}
}
func main() {
var c chan int = make(chan int, 2)
gVar = "Preflight"
pCount = 0
go pinger(c)
go printer(1, c)
go printer(2, c)
go printer(3, c)
go printer(4, c)
var input string
fmt.Scanln(&input)
}
You can extract the recover logic in a function such as:
func recoverer(maxPanics, id int, f func()) {
defer func() {
if err := recover(); err != nil {
fmt.Println("HERE", id)
fmt.Println(err)
if maxPanics == 0 {
panic("TOO MANY PANICS")
} else {
go recoverer(maxPanics-1, id, f)
}
}
}()
f()
}
And then use it like:
go recoverer(5, 1, func() { printer(1, c) })
Like Zan Lynx's answer, I'd like to share another way to do it (although it's pretty much similar to OP's way.) I used an additional buffered channel ch. When a goroutine panics, the recovery function inside the goroutine send its identity i to ch. In for loop at the bottom of main(), it detects which goroutine's in panic and whether to restart by receiving values from ch.
Run in Go Playground
package main
import (
"fmt"
"time"
)
func main() {
var pCount int
ch := make(chan int, 5)
f := func(i int) {
defer func() {
if err := recover(); err != nil {
ch <- i
}
}()
fmt.Printf("goroutine f(%v) started\n", i)
time.Sleep(1000 * time.Millisecond)
panic("goroutine in panic")
}
go f(1)
go f(2)
go f(3)
go f(4)
for {
i := <-ch
pCount++
if pCount >= 5 {
fmt.Println("Too many panics")
break
}
fmt.Printf("Detected goroutine f(%v) panic, will restart\n", i)
f(i)
}
}
Oh, I am not saying that the following is more correct than your way. It is just another way to do it.
Create another function, call it printerRecover or something like it, and do your defer / recover in there. Then in printer just loop on calling printerRecover. Add in function return values to check if you need the goroutine to exit for some reason.
The way you implemented is correct. Just for me the approach to maintain exactly 4 routines running at all times looks not much go_way, either handling routine's ID, either spawning in defer which may leads unpredictable stack due to closure. I don't think you can efficiently balance resource this way. Why don't you like to simple spawn worker when it needed
func main() {
...
go func(tasks chan int){ //multiplexer
for {
task = <-tasks //when needed
go printer(task) //just spawns handler
}
}(ch)
...
}
and let runtime do its job? This way things are done in stdlib listeners/servers and them known to be efficient enough. goroutines are very lightweight to spawn and runtime is quite smart to balance load. Sure you must to recover either way. It is my very personal opinion.

Resources