package main
import (
"context"
"fmt"
"sync"
"time"
)
func myfunc(ctx context.Context) {
for {
select {
case <-ctx.Done():
fmt.Printf("Ctx is kicking in with error:%+v\n", ctx.Err())
return
default:
time.Sleep(15 * time.Second)
fmt.Printf("I was not canceled\n")
return
}
}
}
func main() {
ctx, cancel := context.WithTimeout(
context.Background(),
time.Duration(3*time.Second))
defer cancel()
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
myfunc(ctx)
}()
wg.Wait()
fmt.Printf("In main, ctx err is %+v\n", ctx.Err())
}
I have the above snippet that does print the output like this
I was not canceled
In main, ctx err is context deadline exceeded
Process finished with exit code 0
I understand that context times-out after 3 seconds and hence it does give me the expected error when I call ctx.Err() in the end. I also get the fact that in my myfunc once select matches on the case for default, it won't match on the done. What I do not understand is that how do I make my go func myfunc get aborted in 3 seconds using the context logic. Basically, it won't terminate in 3 seconds so I am trying to understand how can golang's ctx help me with this?
If you want to use the timeout and cancellation feature from the context, then in your case the ctx.Done() need to be handled synchronously.
Explanation from https://golang.org/pkg/context/#Context
Done returns a channel that's closed when work is done on behalf of this context should be canceled. Done may return nil if this context can never be canceled. Successive calls to Done return the same value.
So basically the <-ctx.Done() will be called on two conditions:
when context timeout exceeds
when context canceled by force
And when that happens, the ctx.Err() will never be nil.
We can perform some checking on the error object to see whether the context is canceled by force or exceeding the timeout.
Context package provides two error objects, context.DeadlineExceeded and context.Timeout, this two will help us to identify why <-ctx.Done() is called.
Example #1 scenario: context cancelled by force (via cancel())
In the test, we'll try to make the context to be canceled before the timeout exceeds, so the <-ctx.Done() will be executed.
ctx, cancel := context.WithTimeout(
context.Background(),
time.Duration(3*time.Second))
go func(ctx context.Context) {
// simulate a process that takes 2 second to complete
time.Sleep(2 * time.Second)
// cancel context by force, assuming the whole process is complete
cancel()
}(ctx)
select {
case <-ctx.Done():
switch ctx.Err() {
case context.DeadlineExceeded:
fmt.Println("context timeout exceeded")
case context.Canceled:
fmt.Println("context cancelled by force. whole process is complete")
}
}
Output:
$ go run test.go
context cancelled by force
Example #2 scenario: context timeout exceeded
In this scenario, we make the process takes longer than context timeout, so ideally the <-ctx.Done() will also be executed.
ctx, cancel := context.WithTimeout(
context.Background(),
time.Duration(3*time.Second))
go func(ctx context.Context) {
// simulate a process that takes 4 second to complete
time.Sleep(4 * time.Second)
// cancel context by force, assuming the whole process is complete
cancel()
}(ctx)
select {
case <-ctx.Done():
switch ctx.Err() {
case context.DeadlineExceeded:
fmt.Println("context timeout exceeded")
case context.Canceled:
fmt.Println("context cancelled by force. whole process is complete")
}
}
Output:
$ go run test.go
context timeout exceeded
Example #3 scenario: context canceled by force due to error occurred
There might be a situation where we need to stop the goroutine in the middle of the process because error occurred. And sometimes, we might need to retrieve that error object on the main routine.
To achieve that, we need an additional channel to transport the error object from goroutine into main routine.
In the below example, I've prepared a channel called chErr. Whenever error happens in the middle of (goroutine) process, then we will send that error object through the channel and then stop process immediately from.
ctx, cancel := context.WithTimeout(
context.Background(),
time.Duration(3*time.Second))
chErr := make(chan error)
go func(ctx context.Context) {
// ... some process ...
if err != nil {
// cancel context by force, an error occurred
chErr <- err
return
}
// ... some other process ...
// cancel context by force, assuming the whole process is complete
cancel()
}(ctx)
select {
case <-ctx.Done():
switch ctx.Err() {
case context.DeadlineExceeded:
fmt.Println("context timeout exceeded")
case context.Canceled:
fmt.Println("context cancelled by force. whole process is complete")
}
case err := <-chErr:
fmt.Println("process fail causing by some error:", err.Error())
}
Additional info #1: calling cancel() right after context initialized
As per context documentation regarding the cancel() function:
Canceling this context releases resources associated with it, so code should call cancel as soon as the operations running in this Context complete.
It's good to always call cancel() function right after the context declaration. doesn't matter whether it's also called within the goroutine. This is due to ensure context is always cancelled when the whole process within the block are fully complete.
ctx, cancel := context.WithTimeout(
context.Background(),
time.Duration(3*time.Second))
defer cancel()
// ...
Additional info #2: defer cancel() call within goroutine
You can use defer on the cancel() statement within the goroutine (if you want).
// ...
go func(ctx context.Context) {
defer cancel()
// ...
}(ctx)
// ...
In your for ... select, you have 2 cases: case <-ctx.Done(): and default:. When your code reaches the select, it enters the default case because the context is not yet cancelled, where it sleeps for 15 seconds and then returns, breaking your loop. (in other words, it isn't blocking/waiting for your context to cancel)
If you want your code to do what you are describing, you need your select to have cases for the context being cancelled and your imposed timeout.
select {
case <-ctx.Done(): // context was cancelled
fmt.Printf("Ctx is kicking in with error:%+v\n", ctx.Err())
return
case <-time.After(15 * time.Second): // 15 seconds have elapsed
fmt.Printf("I was not canceled\n")
return
}
Now, your code will block when it hits select, rather than entering the default case and breaking your loop.
Related
Helo All,
New to golang and was debugging timeout issues in a production environment. Before making a call to the server we add a timeout of 50ms to context and fire a server call. If the response is not received within 50 ms we expect the application to move on and not wait for the response.
But while debugging, I capture the duration between we fire a server call and the response received (or error out), to my surprise the value at the time is much higher than 50 ms.
Client syntax -
ctx, cancel := context.WithTimeout(ctx, e.opts.Timeout)
defer cancel()
fireServerCall(ctx)
..
..
def fireServerCall(ctx context:Context){
startTime:=time.Now()
//call to the server
res, err:=callToServer(ctx)
if err!=nil{
//capture failure latency
return ....
}
//capture success latency
return ....
}
Has anyone ever faced any similar issue? Is this expected behaviour? How did you handle such cases?
Am I doing something incorrectly? Suggestions are welcome :)
Edit:
I am passing context in my original code but forgot to mention it here, just added it. That mean, I am passing the same context on which my client is waiting for server to respond within 50 ms.
You should pass created context to fireServerCall and callToServer functions
callToServer should consider passed context and monitor ctx.Done() channel to stop its execution accordingly
Answering to comment by #Bishnu:
Don't think this is needed. Did a test and even without passing ctx to callToServer() it works. The behaviour is not as expected under high load. Can you kindly share some document/test what you have pointed here?
Context timeout just can't work without context passing and checking its Done() channel. Context is not some kind of magic — simplifying it is just a struct with done channel which is closed by calling cancel function or when timeout occurs. Monitoring this channel — is responsibility of the innermost function that accepts it.
Example:
package main
import (
"context"
"fmt"
"time"
)
func callToServer(ctx context.Context) {
now := time.Now()
select {
case <-ctx.Done(): // context cancelled via cancel or deadline
case <-time.After(1 * time.Second): // emulate external call
}
fmt.Printf("callToServer: %v\n", time.Since(now))
}
func callToServerContextAgnostic(ctx context.Context) {
now := time.Now()
select {
case <-time.After(2 * time.Second): // emulate external call
}
fmt.Printf("callToServerContextAgnostic: %v\n", time.Since(now))
}
func main() {
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
callToServer(ctx)
ctx2, cancel2 := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel2()
callToServerContextAgnostic(ctx2)
}
Results:
callToServer: 100ms
callToServerContextAgnostic: 2s
You can launch it on Go Playground: https://go.dev/play/p/tIxjHxUzYfh
Note that many of the clients (from standard or third party libraries) monitors Done channel by themselves.
For example standard HTTP client:
c := &http.Client{} // client for all requests
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(time.Millisecond*100))
defer cancel()
req, err := http.NewRequestWithContext(ctx, http.MethodGet, "http://google.com", nil)
if err != nil {
log.Fatal(err)
}
resp, err := c.Do(req) // will monitor `Done` channel for you
Some docs and articles:
https://pkg.go.dev/context
https://www.digitalocean.com/community/tutorials/how-to-use-contexts-in-go
I'm having difficulties getting behind the concept of context cancel functions and at which point calling the cancel func causes a deadlock.
I have a main method that declares a context and I am passing its cancel function to two goroutines
ctx := context.Background()
ctx, cancel := context.WithCancel(ctx)
go runService(ctx, wg, cancel, apiChan)
go api.Run(cancel, wg, apiChan, aviorDb)
I use this context in a service function (infinite loop that stops once the context is cancelled).
I am controlling this by calling the cancel function from another goroutine.
runService is a long running operation and looks similar to this:
func runService(ctx context.Context, wg *sync.WaitGroup, cancel context.CancelFunc, apiChan chan string) {
MainLoop:
for {
// this is the long running operation
worker.ProcessJob(dataStore, client, job, resumeChan)
select {
case <-ctx.Done():
_ = glg.Info("service stop signal received")
break MainLoop
default:
}
select {
case <-resumeChan:
continue
default:
}
waitCtx, cancel := context.WithTimeout(context.Background(), time.Duration(sleepTime)*time.Minute)
globalstate.WaitCtxCancel = cancel
<-waitCtx.Done()
}
_ = dataStore.SignOutClient(client)
apiChan <- "stop"
wg.Done()
cancel()
}
api has a global variable for the context cancel function:
var appCancel context.CancelFunc
It is set in the beginning by the api.Run method like so:
func Run(cancel context.CancelFunc, wg *sync.WaitGroup, stopChan chan string, db *db.DataStore) {
...
appCancel = cancel
...
}
api has a stop function which calls the cancel function:
func requestStop(w http.ResponseWriter, r *http.Request) {
_ = glg.Info("endpoint hit: shut down service")
if globalstate.WaitCtxCancel != nil {
globalstate.WaitCtxCancel()
}
state := globalstate.Instance()
state.ShutdownPending = true
appCancel()
encoder := json.NewEncoder(w)
encoder.SetIndent("", " ")
_ = encoder.Encode("stop signal received")
}
When the requestStop function is called and thus the context is cancelled, the long running operation (worker.ProcessJob) immediately halts and the entire program deadlocks. Before its next line of code is executed, the code jumps to gopark with reason waitReasonSemAcquire. (scratch that, was just the debugger)
The context cancel function is only called in these two locations.
So it seems like the runService goroutine prevents the api.run goroutine to get a lock for some reason.
My understanding up to now was that the cancel function can be passed around to different goroutines and there are no synchronization issues attached when calling it.
For example, the WaitCtxCancel function never causes a deadlock when I call it.
I could
replace the context with a 1-buffered channel and send a message to break out of the loop
use my global state struct and a boolean
to determine whether should run.
However, I want to understand what's happening here and why.
Also, is there any solution or approach I could use using contexts?
It seemed like the "correct" thing to use for use cases like mine.
UPDATE:
I have recently found out that changing
appCancel()
to
go appCancel()
seems to fix the issue, which confuses me even more.
Reusing parent context with context.WithTimeout with a new timeout
Hi there, I'm new to go. I was wondering if it's possible to reuse a parent context to create multiple context.withTimeout(). The rationale would be where I have to call multiple network requests in sequence and would like to set a timeout for each request at the same time using the parent's context.
Rationale
When the parent's context is cancelled, all the requests made would be cancelled too.
Problem
In the code below, it shows an example whereby LongProcess is the network request. However, the context is closed before the second LongProcess call can be made with a context deadline exceeded.
The documentation withDeadline states The returned context's Done channel is closed when the deadline expires, when the returned cancel function is called, or when the parent context's Done channel isclosed, whichever happens first.
So if that's the case, is there a way where I can reset the timer for withTimeout? Or do I have to create a new context context.Background() for every request? That would mean the parent context will not be passed. :(
// LongProcess refers to a long network request
func LongProcess(ctx context.Context, duration time.Duration, msg string) error {
c1 := make(chan string, 1)
go func() {
// Simulate processing
time.Sleep(duration)
c1 <- msg
}()
select {
case m := <-c1:
fmt.Println(m)
return nil
case <-ctx.Done():
return ctx.Err()
}
}
func main() {
ctx := context.Background()
t := 2 * time.Second
ctx, cancel := context.WithTimeout(ctx, t)
defer cancel()
// Simulate a 2 second process time
err := LongProcess(ctx, 2*time.Second, "first process")
fmt.Println(err)
// Reusing the context.
s, cancel := context.WithTimeout(ctx, t)
defer cancel()
// Simulate a 1 second process time
err = LongProcess(s, 1*time.Second, "second process")
fmt.Println(err) // context deadline exceeded
}
It looks like the first call to context.WithTimeout shadow the parent context ctx. The later process re-use this already canceled context hence the error. You have to re-use the parent one. Here is the example updated:
func main() {
// Avoid to shadow child contexts
parent := context.Background()
t := 2 * time.Second
// Use the parent context.
ctx1, cancel := context.WithTimeout(parent, t)
defer cancel()
err := LongProcess(ctx1, 2*time.Second, "first process")
fmt.Println(err)
// Use the parent context not the canceled one.
ctx2, cancel := context.WithTimeout(parent, t)
defer cancel()
err = LongProcess(ctx2, 1*time.Second, "second process")
fmt.Println(err)
}
I would like to cancel on demand a running command, for this, I am trying, exec.CommandContext, currently trying this:
https://play.golang.org/p/0JTD9HKvyad
package main
import (
"context"
"log"
"os/exec"
"time"
)
func Run(quit chan struct{}) {
ctx, cancel := context.WithCancel(context.Background())
cmd := exec.CommandContext(ctx, "sleep", "300")
err := cmd.Start()
if err != nil {
log.Fatal(err)
}
go func() {
log.Println("waiting cmd to exit")
err := cmd.Wait()
if err != nil {
log.Println(err)
}
}()
go func() {
select {
case <-quit:
log.Println("calling ctx cancel")
cancel()
}
}()
}
func main() {
ch := make(chan struct{})
Run(ch)
select {
case <-time.After(3 * time.Second):
log.Println("closing via ctx")
ch <- struct{}{}
}
}
The problem that I am facing is that the cancel() is called but the process is not being killed, my guess is that the main thread exit first and don't wait for the cancel() to properly terminate the command, mainly because If I use a time.Sleep(time.Second) at the end of the main function it exits/kills the running command.
Any idea about how could I wait to ensure that the command has been killed before exiting not using a sleep? could the cancel() be used in a channel after successfully has killed the command?
In a try to use a single goroutine I tried with this: https://play.golang.org/p/r7IuEtSM-gL but the cmd.Wait() seems to be blocking all the time the select and was not available to call the cancel()
In Go, the program will stop if the end of the main method (in the main package) is reached. This behavior is described in the Go language specification under a section on program execution (emphasis my own):
Program execution begins by initializing the main package and then invoking the function main. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete.
Defects
I will consider each of your examples and their associated control flow defects. You will find links to the Go playground below, but the code in these examples will not execute in the restrictive playground sandbox as the sleep executable cannot be found. Copy and paste to your own environment for testing.
Multiple goroutine example
case <-time.After(3 * time.Second):
log.Println("closing via ctx")
ch <- struct{}{}
After the timer fires and you signal to the goroutine it is time to kill the child and stop work, there is nothing to cause the main method to block and wait for this to complete, so it returns. In accordance with the language spec, the program exits.
The scheduler may fire after the channel transmit, so there may be a race may between main exiting and the other goroutines waking up to receive from ch. However, it is unsafe to assume any particular interleaving of behavior – and, for practical purposes, unlikely that any useful work will happen before main quits. The sleep child process will be orphaned; on Unix systems, the operating system will normally re-parent the process onto the init process.
Single goroutine example
Here, you have the opposite problem: main does not return, so the child process is not killed. This situation is only resolved when the child process exits (after 5 minutes). This occurs because:
The call to cmd.Wait in the Run method is a blocking call (docs). The select statement is blocked waiting for cmd.Wait to return an error value, so cannot receive from the quit channel.
The quit channel (declared as ch in main) is an unbuffered channel. Send operations on unbuffered channels will block until a receiver is ready to receive the data. From the language spec on channels (again, emphasis my own):
The capacity, in number of elements, sets the size of the buffer in the channel. If the capacity is zero or absent, the channel is unbuffered and communication succeeds only when both a sender and receiver are ready.
As Run is blocked in cmd.Wait, there is no ready receiver to receive the value transmitted on the channel by the ch <- struct{}{} statement in the main method. main blocks waiting to transmit this data, which prevents the process returning.
We can demonstrate both issues with minor code tweaks.
cmd.Wait is blocking
To expose the blocking nature of cmd.Wait, declare the following function and use it in place of the Wait call. This function is a wrapper with the same behavior as cmd.Wait, but additional side-effects to print what is happening to STDOUT. (Playground link):
func waitOn(cmd *exec.Cmd) error {
fmt.Printf("Waiting on command %p\n", cmd)
err := cmd.Wait()
fmt.Printf("Returning from waitOn %p\n", cmd)
return err
}
// Change the select statement call to cmd.Wait to use the wrapper
case e <- waitOn(cmd):
Upon running this modified program, you will observe the output Waiting on command <pointer> to the console. After the timers fire, you will observe the output calling ctx cancel, but no corresponding Returning from waitOn <pointer> text. This will only occur when the child process returns, which you can observe quickly by reducing the sleep duration to a smaller number of seconds (I chose 5 seconds).
Send on the quit channel, ch, blocks
main cannot return because the signal channel used to propagate the quit request is unbuffered and there is no corresponding listener. By changing the line:
ch := make(chan struct{})
to
ch := make(chan struct{}, 1)
the send on the channel in main will proceed (to the channel's buffer) and main will quit – the same behavior as the multiple goroutine example. However, this implementation is still broken: the value will not be read from the channel's buffer to actually start stopping the child process before main returns, so the child process will still be orphaned.
Fixed version
I have produced a fixed version for you, code below. There are also some stylistic improvements to convert your example to more idiomatic go:
Indirection via a channel to signal when it is time to stop is unnecessary. Instead, we can avoid declaring a channel by hoisting declaration of the context and cancellation function to the main method. The context can be cancelled directly at the appropriate time.
I have retained the separate Run function to demonstrate passing the context in this way, but in many cases, its logic could be embedded into the main method, with a goroutine spawned to perform the cmd.Wait blocking call.
The select statement in the main method is unnecessary as it only has one case statement.
sync.WaitGroup is introduced to explicitly solve the problem of main exiting before the child process (waited on in a separate goroutine) has been killed. The wait group implements a counter; the call to Wait blocks until all goroutines have finished working and called Done.
package main
import (
"context"
"log"
"os/exec"
"sync"
"time"
)
func Run(ctx context.Context) {
cmd := exec.CommandContext(ctx, "sleep", "300")
err := cmd.Start()
if err != nil {
// Run could also return this error and push the program
// termination decision to the `main` method.
log.Fatal(err)
}
err = cmd.Wait()
if err != nil {
log.Println("waiting on cmd:", err)
}
}
func main() {
var wg sync.WaitGroup
ctx, cancel := context.WithCancel(context.Background())
// Increment the WaitGroup synchronously in the main method, to avoid
// racing with the goroutine starting.
wg.Add(1)
go func() {
Run(ctx)
// Signal the goroutine has completed
wg.Done()
}()
<-time.After(3 * time.Second)
log.Println("closing via ctx")
cancel()
// Wait for the child goroutine to finish, which will only occur when
// the child process has stopped and the call to cmd.Wait has returned.
// This prevents main() exiting prematurely.
wg.Wait()
}
(Playground link)
I'm implementing a feature where I need to read files from a directory, parse and export them to a REST service at a regular interval. As part of the same I would like to gracefully handle the program termination (SIGKILL, SIGQUIT etc).
Towards the same I would like to know how to implement Context based cancellation of process.
For executing the flow in regular interval I'm using gocron.
cmd/scheduler.go
func scheduleTask(){
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
s := gocron.NewScheduler()
s.Every(10).Minutes().Do(processTask, ctx)
s.RunAll() // run immediate
<-s.Start() // schedule
for {
select {
case <-(ctx).Done():
fmt.Print("context done")
s.Remove(processTask)
s.Clear()
cancel()
default:
}
}
}
func processTask(ctx *context.Context){
task.Export(ctx)
}
task/export.go
func Export(ctx *context.Context){
pendingFiles, err := filepath.Glob("/tmp/pending/" + "*_task.json")
//error handling
//as there can be 100s of files, I would like to break the loop when context.Done() to return asap & clean up the resources here as well
for _, fileName := range pendingFiles {
exportItem(fileName string)
}
}
func exportItem(fileName string){
data, err := ReadFile(fileName) //not shown here for brevity
//err handling
err = postHTTPData(string(data)) //not shown for brevity
//err handling
}
For the process management, I think the other component is the actual handling of signals, and managing the context from those signals.
I'm not sure of the specifics of go-cron (they have an example showing some of these concepts on their github) but in general I think that the steps involved are:
Registration of os signals handler
Waiting to receive a signal
Canceling top level context in response to a signal
Example:
sigCh := make(chan os.Signal, 1)
defer close(sigCh)
signal.Notify(sigCh, syscall.SIGTERM, syscall.SIGQUIT, syscall.SIGINT)
<-sigCh
cancel()
I'm not sure how this will look in the context of go-cron, but the context that the signal handling code cancels should be a parent of the context that the task and job is given.
Worked this out myself just now. I've always felt the blog post on contexts was A LOT of material to try and understand so a simpler demonstration would be nice.
There are many scenarios you may encounter. Each one is different and will require adaptation. Here's one example:
Say you have a channel that could run for an indeterminate amount of time.
indeterminateChannel := make(chan string)
for s := range indeterminateChannel{
fmt.Println(s)
}
Your producer might look something like:
for {
indeterminateChannel <- "Terry"
}
We don't control the producer, so we need someway to cut out of your print loop if the producer exceeds your time limit.
indeterminateChannel := make(chan string)
// Close the channel when our code exits so OUR for loop no longer occupies
// resources and the goroutine exits.
// The producer will have a problem, but we don't care about their problems.
// In this instance.
defer close(indeterminateChannel)
// we wait for this context to time out below the goroutine.
ctx, cancel := context.WithTimeout(context.TODO(), time.Minute*1)
defer cancel()
go func() {
for s := range indeterminateChannel{
fmt.Println(s)
}
}()
<- ctx.Done // wait for the context to terminate based on a timeout.
You can also check ctx.Err to see if the context exited due to a timeout or because it was canceled.
You might also want to learn about how to properly check if the context failed due to a deadline: How to check if an error is "deadline exceeded" error?
Or if the context was canceled: How to check if a request was cancelled