Behavior of server.GracefulStop() in golang - go

I have a gRPC server, and I have implemented graceful shutdown of my gRPC server something like this
fun main() {
//Some code
term := make(chan os.Signal)
go func() {
if err := grpcServer.Serve(lis); err != nil {
term <- syscall.SIGINT
}
}()
signal.Notify(term, syscall.SIGTERM, syscall.SIGINT)
<-term
server.GracefulStop()
closeDbConnections()
}
This works fine.
If instead I write the grpcServer.Serve() logic in main goroutine and instead put the shutdown handler logic into another goroutine, statements after server.GracefulStop() usually do not execute. Some DbConnections are closed, if closeDbConnections() is executed at all.
server.GracefulStop() is a blocking call. Definitely grpcServer.Serve() finishes before server.GracefulStop() completes. So, how long does main goroutine take to stop after this call returns?
The problematic code
func main() {
term := make(chan os.Signal)
go func() {
signal.Notify(term, syscall.SIGTERM, syscall.SIGINT)
<-term
server.GracefulStop()
closeDbConnections()
}()
if err := grpcServer.Serve(lis); err != nil {
term <- syscall.SIGINT
}
}
This case does not work as expected. After server.GracefulStop() is done, closeDbConnections() may or may not run (usually does not run to completion). I was testing the later case by sending SIGINT by hitting Ctrl-C from my terminal.
Can someone please explain this behavior?

I'm not sure about your question (please clarify it), but I would suggest you to refactor your main in this way:
func main() {
// ...
errChan := make(chan error)
stopChan := make(chan os.Signal)
// bind OS events to the signal channel
signal.Notify(stopChan, syscall.SIGTERM, syscall.SIGINT)
// run blocking call in a separate goroutine, report errors via channel
go func() {
if err := grpcServer.Serve(lis); err != nil {
errChan <- err
}
}()
// terminate your environment gracefully before leaving main function
defer func() {
server.GracefulStop()
closeDbConnections()
}()
// block until either OS signal, or server fatal error
select {
case err := <-errChan:
log.Printf("Fatal error: %v\n", err)
case <-stopChan:
}
I don't think it's a good idea to mix system events and server errors, like you do in your example: in case if Serve fails, you just ignore the error and emit system event, which actually didn't happen. Try another approach when there are two transports (channels) for two different kind of event that cause process termination.

Related

How to include goroutine into a context?

I'm working on a Go project that require calling an initiation function (initFunction) in a separated goroutine (to ensure this function does not interfere with the rest of the project). initFunction must not take more than 30 seconds, so I thought I would use context.WithTimeout. Lastly, initFunction must be able to notify errors to the caller, so I thought of making an error channel and calling initFunction from an anonymous function, to recieve and report the error.
func RunInitGoRoutine(initFunction func(config string)error) error {
initErr := make(chan error)
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Seconds)
go func() {
<-ctx.Done() // Line 7
err := initFunction(config)
initErr <-err
}()
select {
case res := <-initErr:
return res.err
case <-ctx.Done():
err := errors.New("Deadline")
return err
}
}
I'm quite new to Go, so I'm asking for feedbacks about the above code.
I have some doubt about Line 7. I used this to ensure the anonymous function is "included" under ctx and is therefore killed and freed and everything once timeout expires, but I'm not sure I have done the right thing.
Second thing is, I know I should be calling cancel( ) somewhere, but I can't put my finger around where.
Lastly, really any feedback is welcome, being it about efficency, style, correctness or anything.
In Go the pratice is to communicate via channels. So best thing is probably to share a channel on your context so others can consume from the channel.
As you are stating you are new to Go, I wrote a whole bunch of articles on Go (Beginner level) https://marcofranssen.nl/categories/golang.
Read from old to new to get familiar with the language.
Regarding the channel specifics you should have a look at this article.
https://marcofranssen.nl/concurrency-in-go
A pratical example of a webserver listening for ctrl+c and then gracefully shutting down the server using channels is described in this blog post.
https://marcofranssen.nl/improved-graceful-shutdown-webserver
In essence we run the server in a background routine
go func() {
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
srv.l.Fatal("Could not listen on", zap.String("addr", srv.Addr), zap.Error(err))
}
}()
and then we have some code that is blocking the main routine by listening on a channel for the shutdown signal.
quit := make(chan os.Signal, 1)
signal.Notify(quit, os.Interrupt)
sig := <-quit
srv.l.Info("Server is shutting down", zap.String("reason", sig.String()))
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
srv.SetKeepAlivesEnabled(false)
if err := srv.Shutdown(ctx); err != nil {
srv.l.Fatal("Could not gracefully shutdown the server", zap.Error(err))
}
srv.l.Info("Server stopped")
This is very similar to your usecase. So running your init in a background routine and then consume the channel waiting for the result of this init routine.
package main
import (
"fmt"
"time"
)
type InitResult struct {
Message string
}
func main() {
initResult := make(chan InitResult, 0)
go func(c chan<- InitResult) {
time.Sleep(5 * time.Second)
// here we are publishing the result on the channel
c <- InitResult{Message: "Initialization succeeded"}
}(initResult)
fmt.Println("Started initializing")
// here we have a blocking operation consuming the channel
res := <-initResult
fmt.Printf("Init result: %s", res.Message)
}
https://play.golang.org/p/_YGIrdNVZx6
You could also add an error field on the struct so you could do you usual way of error checking.

How chan bool is making goroutine waiting?

I'm building an app to run a command every time the code changes. I'm using a fsnotify for this feature. But, I can't understand how it is waiting a main goroutine.
I found that using sync.WaitGroup is more idiomatic, but I'm curious how done chan bool makes a goroutine is waiting in fsnotify example code.
I've tried to remove done in the example code of fsnotify, but it's not waiting a goroutine, just is exited.
watcher, err := fsnotify.NewWatcher()
if err != nil {
log.Fatal(err)
}
defer watcher.Close()
done := make(chan bool)
go func() {
for {
select {
case event, ok := <-watcher.Events:
if !ok {
return
}
log.Println("event:", event)
if event.Op&fsnotify.Write == fsnotify.Write {
log.Println("modified file:", event.Name)
}
case err, ok := <-watcher.Errors:
if !ok {
return
}
log.Println("error:", err)
}
}
}()
err = watcher.Add("/tmp/foo")
if err != nil {
log.Fatal(err)
}
<-done
I'm not entirely sure what you're asking, but there's a subtle bug in the code you've provided.
A done channel is a common way to block until an action completes. It is used like this:
done := make(chan X) // Where X is any type
go func() {
// Some logic, possibly in a loop
close(done)
}()
// Other logic
<-done // Wait for `done` to be closed
The type of the channel is unimportant, as no data is (necissarily) sent over the channel, so bool works, but struct{} is more idiomatic, as it indicates that no data can be sent.
Your example almost does this, except that it never calls close(done). This is a bug. It means that the code will always wait forever at <-done, thus negating the entire purpose of a done channel. Your example code will never exit.
This means the code, as you have provided, could be also written as:
go func() {
// Do stuff
}()
// Do other stuff
<Any code that blocks forever>
Because there are countless ways to block forever--none of them ever useful in practice--the channel in your example is not needed.
As per my study, I found an answer from one guy in reddit.com. This is kind of trick though, using <-done makes the main goroutine waiting an any value from chan done, eventually this app keeps running for fsnotify to watch and send a event to the main goroutine.

Do goroutines with receiving channel as parameter stop, when the channel is closed?

I have been reading "Building microservices with go" and the book introduces apache/go-resiliency/deadline package for handling timeouts.
deadline.go
// Package deadline implements the deadline (also known as "timeout") resiliency pattern for Go.
package deadline
import (
"errors"
"time"
)
// ErrTimedOut is the error returned from Run when the deadline expires.
var ErrTimedOut = errors.New("timed out waiting for function to finish")
// Deadline implements the deadline/timeout resiliency pattern.
type Deadline struct {
timeout time.Duration
}
// New constructs a new Deadline with the given timeout.
func New(timeout time.Duration) *Deadline {
return &Deadline{
timeout: timeout,
}
}
// Run runs the given function, passing it a stopper channel. If the deadline passes before
// the function finishes executing, Run returns ErrTimeOut to the caller and closes the stopper
// channel so that the work function can attempt to exit gracefully. It does not (and cannot)
// simply kill the running function, so if it doesn't respect the stopper channel then it may
// keep running after the deadline passes. If the function finishes before the deadline, then
// the return value of the function is returned from Run.
func (d *Deadline) Run(work func(<-chan struct{}) error) error {
result := make(chan error)
stopper := make(chan struct{})
go func() {
result <- work(stopper)
}()
select {
case ret := <-result:
return ret
case <-time.After(d.timeout):
close(stopper)
return ErrTimedOut
}
}
deadline_test.go
package deadline
import (
"errors"
"testing"
"time"
)
func takesFiveMillis(stopper <-chan struct{}) error {
time.Sleep(5 * time.Millisecond)
return nil
}
func takesTwentyMillis(stopper <-chan struct{}) error {
time.Sleep(20 * time.Millisecond)
return nil
}
func returnsError(stopper <-chan struct{}) error {
return errors.New("foo")
}
func TestDeadline(t *testing.T) {
dl := New(10 * time.Millisecond)
if err := dl.Run(takesFiveMillis); err != nil {
t.Error(err)
}
if err := dl.Run(takesTwentyMillis); err != ErrTimedOut {
t.Error(err)
}
if err := dl.Run(returnsError); err.Error() != "foo" {
t.Error(err)
}
done := make(chan struct{})
err := dl.Run(func(stopper <-chan struct{}) error {
<-stopper
close(done)
return nil
})
if err != ErrTimedOut {
t.Error(err)
}
<-done
}
func ExampleDeadline() {
dl := New(1 * time.Second)
err := dl.Run(func(stopper <-chan struct{}) error {
// do something possibly slow
// check stopper function and give up if timed out
return nil
})
switch err {
case ErrTimedOut:
// execution took too long, oops
default:
// some other error
}
}
1st question
// in deadline_test.go
if err := dl.Run(takesTwentyMillis); err != ErrTimedOut {
t.Error(err)
}
I have problem understanding the execution flow of above code. As far as I understand, because the takesTwentyMillis function sleeps longer than the set timeout duration of 10 milliseconds,
// in deadline.go
case <-time.After(d.timeout):
close(stopper)
return ErrTimedOut
time.After emits current time, and this case is selected. Then the stopper channel is closed and ErrTimeout is returned.
What I do not understand is, what closing the stopper channel does to the anonymous goroutine that might still be running
I think, when the stopper channel is closed, the below goroutine might still be running.
go func() {
result <- work(stopper)
}()
(Please correct me if I'm wrong here) I think after close(stopper), this goroutine will call takesTwentyMillis(=work function) with stopper channel as its parameter. And the function will proceed and sleep for 20 milliseconds and return nil to pass to result channel. And the main() ends here, right?
I do not see what is the point of closing the stopper channel here. The takesTwentyMillis function does not seem to use the channel within the function body anyway :(.
2nd question
// in deadline_test.go within TestDeadline()
done := make(chan struct{})
err := dl.Run(func(stopper <-chan struct{}) error {
<-stopper
close(done)
return nil
})
if err != ErrTimedOut {
t.Error(err)
}
<-done
This is the part I do not understand completely. I think when dl.Run is run, stopper channel is initialized. But because there is no value in the stopper channel, the function call will be blocked at <-stopper...but because I do not understand this code, I do not see why this code exists in the first place (i.e. what this code is trying to test, and how it is executed, etc).
3rd(additional) question regarding the 2nd question
So I understand that when Run function in the 2nd question triggers the stopper channel to close, the worker function gets the signal. And the worker closes the done channel and returns nil.
I used delve(=go debugger) to see this, and the gdb takes me to the goroutine in deadline.go after the line return nil.
err := dl.Run(func(stopper <-chan struct{}) error {
<-stopper
close(done)
--> return nil
})
After typing n for stepping over to the next line, delve takes me here
go func() {
--> result <- work(stopper)
}()
And the process kind of finishes here because when I type n again the command line prompts PASS and process exits. Why does the process finishes here? The work(stopper) seems to return nil, which should then be passed to result channel right? But this line does not seem to execute for some reason.
I know the main goroutine, which is the Run function, has already returned ErrTimedOut. So I guess it has something to do with this?
1st question
The use of the stopper channel is to signal the function e.g. takesTwentyMillis that it's deadline is reached and the caller no longer cares about its result. Usually this means that the worker function like takesTwentyMillis should check if the stopper channel is already closed so that it may cancel it's work. Still, checking for the stopper channel is the worker function's choice. It may or may not check the channel.
func takesTwentyMillis(stopper <-chan struct{}) error {
for i := 0; i < 20; i++ {
select {
case <-stopper:
// caller doesn't care anymore might as well stop working
return nil
case <-time.After(time.Second): // simulating work
}
}
// work is done
return nil
}
2nd question
This part of Deadline.Run() will close the stopper channel.
case <-time.After(d.timeout):
close(stopper)
Reading on a closed channel (<-stopper) will return a zero value for that channel immediately. I think it's just testing for a worker function that ultimately times-out.

A question about main routine and a child routine listening on the same channel simultaneously

func main() {
c := make(chan os.Signal, 1)
signal.Notify(c)
ticker := time.NewTicker(time.Second)
stop := make(chan bool)
go func() {
defer func() { stop <- true }()
for {
select {
case <-ticker.C:
fmt.Println("Tick")
case <-stop:
fmt.Println("Goroutine closing")
return
}
}
}()
<-c
ticker.Stop()
stop <- true
<-stop
fmt.Println("Application stopped")
}
No matter how many times I run the code above, I got the same result. That is, "Goroutine closing" is always printed before "Application stopped" after I press Ctrl+C.
I think, theoretically, there is a chance that "Goroutine closing" won't be printed at all. Am I right? Unfortunately, I never get this theoretical result.
BTW: I know reading and writing a channel in one routine should be avoided. Ignore that temporarily.
In your case, Goroutine closing will always be executed and it will always be printed before Application stopped, because your stop channel is not buffered. This means that the sending will block until the result is received.
In your code, the stop <- true in your main will block until the goroutine has received the value, causing the channel to be empty again. Then the <-stop in your main will block until another value is sent to the channel, which happens when your goroutine returns after printing Goroutine closing.
If you would initialize your channel in a buffered fashion
stop := make(chan bool, 1)
then Goroutine closing might not be executed. To see this, you can add a time.Sleep right after printing Tick, as this makes this case more likely (it will occur everytime you press Ctrl+C during the sleep).
Using a sync.WaitGroup to wait for goroutines to finish is a good alternative, especially if you have to wait for more than one goroutine. You can also use a context.Context to stop goroutines. Reworking your code to use these two methods could look something like this:
func main() {
c := make(chan os.Signal, 1)
signal.Notify(c)
ticker := time.NewTicker(time.Second)
ctx, cancel := context.WithCancel(context.Background())
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer func() { wg.Done() }()
for {
select {
case <-ctx.Done():
fmt.Println("Goroutine closing")
return
case <-ticker.C:
fmt.Println("Tick")
time.Sleep(time.Second)
}
}
}()
<-c
ticker.Stop()
cancel()
wg.Wait()
fmt.Println("Application stopped")
}

Leaking goroutine when a non-blocking readline hangs

Assuming you have a structure like this:
ch := make(chan string)
errCh := make(chan error)
go func() {
line, _, err := bufio.NewReader(r).ReadLine()
if err != nil {
errCh <- err
} else {
ch <- string(line)
}
}()
select {
case err := <-errCh:
return "", err
case line := <-ch:
return line, nil
case <-time.After(5 * time.Second):
return "", TimeoutError
}
In the case of the 5 second timeout, the goroutine hangs until ReadLine returns, which may never happen. My project is a long-running server, so I don't want a buildup of stuck goroutines.
ReadLine will not return until either the process exits or the method reads a line. There's no deadline or timeout mechanism for pipes.
The goroutine will block if the call to ReadLine returns after the timeout. This can be fixed by using buffered channels:
ch := make(chan string, 1)
errCh := make(chan error, 1)
The application should call Wait to cleanup resources associated with the command. The goroutine is a good place to call it:
go func() {
line, _, err := bufio.NewReader(r).ReadLine()
if err != nil {
errCh <- err
} else {
ch <- string(line)
}
cmd.Wait() // <-- add this line
}()
This will cause the goroutine to block, the very thing you are trying to avoid. The alternative is that the application leaks resources for each command.

Resources