I would like to cancel on demand a running command, for this, I am trying, exec.CommandContext, currently trying this:
https://play.golang.org/p/0JTD9HKvyad
package main
import (
"context"
"log"
"os/exec"
"time"
)
func Run(quit chan struct{}) {
ctx, cancel := context.WithCancel(context.Background())
cmd := exec.CommandContext(ctx, "sleep", "300")
err := cmd.Start()
if err != nil {
log.Fatal(err)
}
go func() {
log.Println("waiting cmd to exit")
err := cmd.Wait()
if err != nil {
log.Println(err)
}
}()
go func() {
select {
case <-quit:
log.Println("calling ctx cancel")
cancel()
}
}()
}
func main() {
ch := make(chan struct{})
Run(ch)
select {
case <-time.After(3 * time.Second):
log.Println("closing via ctx")
ch <- struct{}{}
}
}
The problem that I am facing is that the cancel() is called but the process is not being killed, my guess is that the main thread exit first and don't wait for the cancel() to properly terminate the command, mainly because If I use a time.Sleep(time.Second) at the end of the main function it exits/kills the running command.
Any idea about how could I wait to ensure that the command has been killed before exiting not using a sleep? could the cancel() be used in a channel after successfully has killed the command?
In a try to use a single goroutine I tried with this: https://play.golang.org/p/r7IuEtSM-gL but the cmd.Wait() seems to be blocking all the time the select and was not available to call the cancel()
In Go, the program will stop if the end of the main method (in the main package) is reached. This behavior is described in the Go language specification under a section on program execution (emphasis my own):
Program execution begins by initializing the main package and then invoking the function main. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete.
Defects
I will consider each of your examples and their associated control flow defects. You will find links to the Go playground below, but the code in these examples will not execute in the restrictive playground sandbox as the sleep executable cannot be found. Copy and paste to your own environment for testing.
Multiple goroutine example
case <-time.After(3 * time.Second):
log.Println("closing via ctx")
ch <- struct{}{}
After the timer fires and you signal to the goroutine it is time to kill the child and stop work, there is nothing to cause the main method to block and wait for this to complete, so it returns. In accordance with the language spec, the program exits.
The scheduler may fire after the channel transmit, so there may be a race may between main exiting and the other goroutines waking up to receive from ch. However, it is unsafe to assume any particular interleaving of behavior – and, for practical purposes, unlikely that any useful work will happen before main quits. The sleep child process will be orphaned; on Unix systems, the operating system will normally re-parent the process onto the init process.
Single goroutine example
Here, you have the opposite problem: main does not return, so the child process is not killed. This situation is only resolved when the child process exits (after 5 minutes). This occurs because:
The call to cmd.Wait in the Run method is a blocking call (docs). The select statement is blocked waiting for cmd.Wait to return an error value, so cannot receive from the quit channel.
The quit channel (declared as ch in main) is an unbuffered channel. Send operations on unbuffered channels will block until a receiver is ready to receive the data. From the language spec on channels (again, emphasis my own):
The capacity, in number of elements, sets the size of the buffer in the channel. If the capacity is zero or absent, the channel is unbuffered and communication succeeds only when both a sender and receiver are ready.
As Run is blocked in cmd.Wait, there is no ready receiver to receive the value transmitted on the channel by the ch <- struct{}{} statement in the main method. main blocks waiting to transmit this data, which prevents the process returning.
We can demonstrate both issues with minor code tweaks.
cmd.Wait is blocking
To expose the blocking nature of cmd.Wait, declare the following function and use it in place of the Wait call. This function is a wrapper with the same behavior as cmd.Wait, but additional side-effects to print what is happening to STDOUT. (Playground link):
func waitOn(cmd *exec.Cmd) error {
fmt.Printf("Waiting on command %p\n", cmd)
err := cmd.Wait()
fmt.Printf("Returning from waitOn %p\n", cmd)
return err
}
// Change the select statement call to cmd.Wait to use the wrapper
case e <- waitOn(cmd):
Upon running this modified program, you will observe the output Waiting on command <pointer> to the console. After the timers fire, you will observe the output calling ctx cancel, but no corresponding Returning from waitOn <pointer> text. This will only occur when the child process returns, which you can observe quickly by reducing the sleep duration to a smaller number of seconds (I chose 5 seconds).
Send on the quit channel, ch, blocks
main cannot return because the signal channel used to propagate the quit request is unbuffered and there is no corresponding listener. By changing the line:
ch := make(chan struct{})
to
ch := make(chan struct{}, 1)
the send on the channel in main will proceed (to the channel's buffer) and main will quit – the same behavior as the multiple goroutine example. However, this implementation is still broken: the value will not be read from the channel's buffer to actually start stopping the child process before main returns, so the child process will still be orphaned.
Fixed version
I have produced a fixed version for you, code below. There are also some stylistic improvements to convert your example to more idiomatic go:
Indirection via a channel to signal when it is time to stop is unnecessary. Instead, we can avoid declaring a channel by hoisting declaration of the context and cancellation function to the main method. The context can be cancelled directly at the appropriate time.
I have retained the separate Run function to demonstrate passing the context in this way, but in many cases, its logic could be embedded into the main method, with a goroutine spawned to perform the cmd.Wait blocking call.
The select statement in the main method is unnecessary as it only has one case statement.
sync.WaitGroup is introduced to explicitly solve the problem of main exiting before the child process (waited on in a separate goroutine) has been killed. The wait group implements a counter; the call to Wait blocks until all goroutines have finished working and called Done.
package main
import (
"context"
"log"
"os/exec"
"sync"
"time"
)
func Run(ctx context.Context) {
cmd := exec.CommandContext(ctx, "sleep", "300")
err := cmd.Start()
if err != nil {
// Run could also return this error and push the program
// termination decision to the `main` method.
log.Fatal(err)
}
err = cmd.Wait()
if err != nil {
log.Println("waiting on cmd:", err)
}
}
func main() {
var wg sync.WaitGroup
ctx, cancel := context.WithCancel(context.Background())
// Increment the WaitGroup synchronously in the main method, to avoid
// racing with the goroutine starting.
wg.Add(1)
go func() {
Run(ctx)
// Signal the goroutine has completed
wg.Done()
}()
<-time.After(3 * time.Second)
log.Println("closing via ctx")
cancel()
// Wait for the child goroutine to finish, which will only occur when
// the child process has stopped and the call to cmd.Wait has returned.
// This prevents main() exiting prematurely.
wg.Wait()
}
(Playground link)
Related
I'm writing a function that needs to run and finish as fast as possible.
It needs to make 3 REST calls and should any of these return a bad result, it needs to exit.
Each of the REST calls are being made in their own go routines and return the results to the main thread via a buffered channel.
Because I'm using buffered channels I know that the sending threads will send the results of the REST request via the buffered channel and exit - no possibility of a goroutine leak.
My question is; lets say I get the response from the first REST requests and it's a bad result (by which I mean the function as a whole needs exit), is it OK for me close the other two channels and exit without reading the contents of the other 2 buffered channels?
I have a feeling this isn't recommended and if that's they case why so?
You shouldn't close a channel which is going to be written by another goroutine. The usual pattern is to have the writer to close the channel it's writing in when it's done. If you want to cancel the call of a goroutine you should use a context.Context instead. Here is a sample synchronisation code between to goroutine using a Context to cancel another one.
package main
import (
"context"
"fmt"
"time"
)
func f(ctx context.Context, ch chan<- struct{}) {
select {
case <-time.After(time.Hour):
fmt.Println("sending data on the channel")
ch <- struct{}{}
case <-ctx.Done():
fmt.Println("closing channel")
close(ch)
}
}
func main() {
ch := make(chan struct{})
ctx, cancel := context.WithCancel(context.Background())
go f(ctx, ch)
cancel()
<-ch
}
I read it on (https://www.geeksforgeeks.org/channel-in-golang/) that:
"In the channel, the send and receive operation block until another side is not ready by default.
It allows goroutine to synchronize with each other without explicit locks or condition variables."
To test above statement, I have written a sample program mentioned below:
Program:
package main
import (
"fmt"
"sync"
"time"
)
func myFunc(ch chan int) {
fmt.Println("Inside goroutine:: myFunc()")
fmt.Println(10 + <-ch) //<-- According to rule, control will be blocked here until 'ch' sends some data so that it will be received in our myFunc() go routine.
}
func main() {
fmt.Println("Start Main method")
// Creating a channel
ch := make(chan int)
go myFunc(ch) //<-- This go routine started in a new thread
time.Sleep(2 * time.Second) //<--- introduced a Sleep of 2 seconds to ensure that myFunc() go routine executes before main thread
ch <- 10
fmt.Println("End Main method")
}
I was expecting below output:
Start Main method
Inside goroutine:: myFunc()
20
End Main method
But, Actual output received is:
Start Main method
Inside goroutine:: myFunc()
End Main method
Why the value sent through channel is not printed?
I think, it is because main thread finished its execution first and hence, all other goroutine also terminated.
If that is the case, then, why does the rule said - It allows goroutine to synchronize with each other without explicit locks or condition variables.
Because, to get the expected output, I have to use sync.WaitGroup to tell the main thread to wait for the other goroutine to finish. Isn't it violating the above rule as I am using locks in form of waitgroup?
PS: I am learning golang. So please forgive if I get the concept totally wrong.
The main goroutine exists before the myFunc goroutine is able to print the output. Here is an implementation which ensures that myFunc goroutine finishes before the main goroutine exits.
package main
import (
"fmt"
"sync"
"time"
)
func myFunc(ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Inside goroutine:: myFunc()")
fmt.Println(10 + <-ch) //<-- According to rule, control will be blocked here until 'ch' sends some data so that it will be received in our myFunc() go routine.
}
func main() {
fmt.Println("Start Main method")
// Creating a channel
ch := make(chan int)
wg := sync.WaitGroup{}
wg.Add(1)
go myFunc(ch, &wg) //<-- This go routine started in a new thread
time.Sleep(2 * time.Second) //<--- introduced a Sleep of 2 seconds to ensure that myFunc() go routine executes before main thread
ch <- 10
wg.Wait()
fmt.Println("End Main method")
}
The channels are used here for synchronization and it works as described in documentation. It does not mean that the code starting from this point in the code will be executed at the same speed. It only means that main goroutine will not continue if myFunc goroutine is not reading from channel. And myFunc will wait for main goroutine to push data to channel. After this happen both goroutines will continue it execution independently.
Try this, used your code as basis
package main
import (
"fmt"
"time"
)
func myFunc(ch chan int, done chan struct{}) {
defer close(done) // channel will be closed in the function exit phase
fmt.Println("Inside goroutine:: myFunc()")
fmt.Println(10 + <-ch) //<-- According to rule, control will be blocked here until 'ch' sends some data so that it will be received in our myFunc() go routine.
}
func main() {
fmt.Println("Start Main method")
// Creating a channel
ch := make(chan int)
done := make(chan struct{}) // signal channel
go myFunc(ch, done) //<-- This go routine started in a new thread
time.Sleep(2 * time.Second) //<--- introduced a Sleep of 2 seconds to ensure that myFunc() go routine executes before main thread
ch <- 10
<-done // waiting for function complete
fmt.Println("End Main method")
}
Or use Jaroslaw's suggestion.
Because go is so fast... https://play.golang.org/p/LNyDAA3mGYY
After you send to channel scheduler isn't fast enoght... and program exists. I have introduced an additional context switcher for scheduler to show effect.
Yes, you are right
I think, it is because main thread finished its execution first and hence, all other goroutine also terminated.
If you check the above program execution. The sleep is before main thread writes to the channel. Now even though which goroutine() will have CPU time is completely arbitary, but in the above case if the main sleeps before the explicit sleep logic. myFunc will be blocked as there is no data in ch
Here I made a slight change to the above code to make main sleep after writing data into Channel. It gives the expected output, Without using waitgroup or quit channels.
package main
import (
"fmt"
"time"
)
func myFunc(ch chan int) {
fmt.Println("Inside goroutine:: myFunc()")
fmt.Println(10 + <-ch) //<-- According to rule, control will be blocked here until 'ch' sends some data so that it will be received in our myFunc() go routine.
}
func main() {
fmt.Println("Start Main method")
// Creating a channel
ch := make(chan int)
go myFunc(ch) //<-- This go routine started in a new thread
ch <- 10
time.Sleep(2 * time.Second) //<--- introduced a Sleep of 2 seconds to ensure that myFunc() go routine executes before main thread
fmt.Println("End Main method")
}
It is currently a race condition between the myFunc being able to print and your main function exiting.
If we look at the spec for program execution at
https://golang.org/ref/spec#Program_execution
Program execution begins by initializing the main package and then invoking the function main. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete.
It is still your job to make sure that all spawned goroutines will complete before your main goroutine exists.
In your case, you could use a waitgroup as you mentioned or you could use a done channel.
https://play.golang.org/p/RVr0HXuUMgn
Given your code you could also close the channel you use to send the integer over since you are passing it to the function as bidirectional but it's not strictly idiomatic.
https://play.golang.org/p/wGvexC5ZgIi
TL;DR: A typical case of all goroutines are asleep, deadlock! but can't figure it out
I'm parsing the Wiktionary XML dump to build a DB of words. I defer the parsing of each article's text to a goroutine hoping that it will speed up the process.
It's 7GB and is processed in under 2 minutes in my machine when doing it serially, but if I can take advantage of all cores, why not.
I'm new to threading in general, I'm getting a all goroutines are asleep, deadlock! error.
What's wrong here?
This may not be performant at all, as it uses an unbuffered channel, so all goroutines effectively end up executing serially, but my idea is to learn and understand threading and to benchmark how long it takes with different alternatives:
unbuffered channel
different sized buffered channel
only calling as many goroutines at a time as there are runtime.NumCPU()
The summary of my code in pseudocode:
while tag := xml.getNextTag() {
wg.Add(1)
go parseTagText(chan, wg, tag.text)
// consume a channel message if available
select {
case msg := <-chan:
// do something with msg
default:
}
}
// reading tags finished, wait for running goroutines, consume what's left on the channel
for msg := range chan {
// do something with msg
}
// Sometimes this point is never reached, I get a deadlock
wg.Wait()
----
func parseTagText(chan, wg, tag.text) {
defer wg.Done()
// parse tag.text
chan <- whatever // just inform that the text has been parsed
}
Complete code:
https://play.golang.org/p/0t2EqptJBXE
In your complete example on the Go Playground, you:
Create a channel (line 39, results := make(chan langs)) and a wait-group (line 40, var wait sync.WaitGroup). So far so good.
Loop: in the loop, sometimes spin off a task:
if ...various conditions... {
wait.Add(1)
go parseTerm(results, &wait, text)
}
In the loop, sometimes do a non-blocking read from the channel (as shown in your question). No problem here either. But...
At the end of the loop, use:
for res := range results {
...
}
without ever calling close(results) in exactly one place, after all writers finish. This loop uses a blocking read from the channel. As long as some writer goroutine is still running, the blocking read can block without having the whole system stop, but when the last writer finishes writing and exits, there are no remaining writer goroutines. Any other remaining goroutines might rescue you, but there are none.
Since you use the var wait correctly (adding 1 in the right place, and calling Done() in the right place in the writer), the solution is to add one more goroutine, which will be the one to rescue you:
go func() {
wait.Wait()
close(results)
}()
You should spin off this rescuer goroutine just before entering the for res := range results loop. (If you spin it off any earlier, it might see the wait variable count down to zero too soon, just before it gets counted up again by spinning off another parseTerm.)
This anonymous function will block in the wait variable's Wait() function until the last writer goroutine has called the final wait.Done(), which will unblock this goroutine. Then this goroutine will call close(results), which will arrange for the for loop in your main goroutine to finish, unblocking that goroutine. When this goroutine (the rescuer) returns and thus terminates, there are no more rescuers, but we no longer need any.
(This main code then calls wait.Wait() unnecessarily: Since the for didn't terminate until the wait.Wait() in the new goroutine already unblocked, we know that this next wait.Wait() will return immediately. So we can drop this second call, although leaving it in is harmless.)
The problem is that nothing is closing the results channel, yet the range loop only exits when it closes. I've simplified your code to illustrate this and propsed a solution - basically consume the data in a goroutine:
// This is our producer
func foo(i int, ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
ch <- i
fmt.Println(i, "done")
}
// This is our consumer - it uses a different WG to signal it's done
func consumeData(ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
for x := range ch {
fmt.Println(x)
}
fmt.Println("ALL DONE")
}
func main() {
ch := make(chan int)
wg := sync.WaitGroup{}
// create the producers
for i := 0; i < 10; i++ {
wg.Add(1)
go foo(i, ch, &wg)
}
// create the consumer on a different goroutine, and sync using another WG
consumeWg := sync.WaitGroup{}
consumeWg.Add(1)
go consumeData(ch,&consumeWg)
wg.Wait() // <<<< means that the producers are done
close(ch) // << Signal the consumer to exit
consumeWg.Wait() // << Wait for the consumer to exit
}
(I don't believe my issue is a duplicate of this QA: go routine blocking the others one, because I'm running Go 1.9 which has the preemptive scheduler whereas that question was asked for Go 1.2).
My Go program calls into a C library wrapped by another Go-lang library that makes a blocking call that can last over 60 seconds. I want to add a timeout so it returns in 3 seconds:
Old code with long block:
// InvokeSomething is part of a Go wrapper library that calls the C library read_something function. I cannot change this code.
func InvokeSomething() ([]Something, error) {
ret := clib.read_something(&input) // this can block for 60 seconds
if ret.Code > 1 {
return nil, CreateError(ret)
}
return ret.Something, nil
}
// This is my code I can change:
func MyCode() {
something, err := InvokeSomething()
// etc
}
My code with a go-routine, channel, and timeout, based on this Go example: https://gobyexample.com/timeouts
type somethingResult struct {
Something []Something
Err error
}
func MyCodeWithTimeout() {
ch = make(chan somethingResult, 1);
go func() {
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
However when I run MyCodeWithTimeout it still takes 60 seconds before it executes the case <-time.After(time.Second * 3) block.
I know that attempting to read from an unbuffered channel with nothing in it will block, but I created the channel with a buffered size of 1 so as far as I can tell I'm doing it correctly. I'm surprised the Go scheduler isn't preempting my goroutine, or does that depend on execution being in go-lang code and not an external native library?
Update:
I read that the Go-scheduler, at least in 2015, is actually "semi-preemptive" and it doesn't preempt OS threads that are in "external code": https://github.com/golang/go/issues/11462
you can think of the Go scheduler as being partially preemptive. It's by no means fully cooperative, since user code generally has no control over scheduling points, but it's also not able to preempt at arbitrary points
I heard that runtime.LockOSThread() might help, so I changed the function to this:
func MyCodeWithTimeout() {
ch = make(chan somethingResult, 1);
defer close(ch)
go func() {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
...however it didn't help at all and it still blocks for 60 seconds.
Your proposed solution to do thread locking in the goroutine started in MyCodeWithTimeout() does not give guarantee MyCodeWithTimeout() will return after 3 seconds, and the reason for this is that first: no guarantee that the started goroutine will get scheduled and reach the point to lock the thread to the goroutine, and second: because even if the external command or syscall gets called and returns within 3 seconds, there is no guarantee that the other goroutine running MyCodeWithTimeout() will get scheduled to receive the result.
Instead do the thread locking in MyCodeWithTimeout(), not in the goroutine it starts:
func MyCodeWithTimeout() {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
ch = make(chan somethingResult, 1);
defer close(ch)
go func() {
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
Now if MyCodeWithTimeout() execution starts, it will lock the goroutine to the OS thread, and you can be sure that this goroutine will notice the value sent on the timers value.
NOTE: This is better if you want it to return within 3 seconds, but this sill will not give guarantee, as the timer that fires (sends a value on its channel) runs in its own goroutine, and this thread locking has no effect on the scheduling of that goroutine.
If you want guarantee, you can't rely on other goroutines giving the "exit" signal, you can only rely on this happening in your goroutine running the MyCodeWithTimeout() function (because since you did thread locking, you can be sure it gets scheduled).
An "ugly" solution which spins up CPU usage for a given CPU core would be:
for end := time.Now().Add(time.Second * 3); time.Now().Before(end); {
// Do non-blocking check:
select {
case result := <-ch:
// Process result
default: // Must have default to be non-blocking
}
}
Note that the "urge" of using time.Sleep() in this loop would take away the guarantee, as time.Sleep() may use goroutines in its implementation and certainly does not guarantee to return exactly after the given duration.
Also note that if you have 8 CPU cores and runtime.GOMAXPROCS(0) returns 8 for you, and your goroutines are still "starving", this may be a temporary solution, but you still have more serious problems using Go's concurrency primitives in your app (or a lack of using them), and you should investigate and "fix" those. Locking a thread to a goroutine may even make it worse for the rest of the goroutines.
Is there any API to let the main goroutine sleep forever?
In other words, I want my project always run except when I stop it.
"Sleeping"
You can use numerous constructs that block forever without "eating" up your CPU.
For example a select without any case (and no default):
select{}
Or receiving from a channel where nobody sends anything:
<-make(chan int)
Or receiving from a nil channel also blocks forever:
<-(chan int)(nil)
Or sending on a nil channel also blocks forever:
(chan int)(nil) <- 0
Or locking an already locked sync.Mutex:
mu := sync.Mutex{}
mu.Lock()
mu.Lock()
Quitting
If you do want to provide a way to quit, a simple channel can do it. Provide a quit channel, and receive from it. When you want to quit, close the quit channel as "a receive operation on a closed channel can always proceed immediately, yielding the element type's zero value after any previously sent values have been received".
var quit = make(chan struct{})
func main() {
// Startup code...
// Then blocking (waiting for quit signal):
<-quit
}
// And in another goroutine if you want to quit:
close(quit)
Note that issuing a close(quit) may terminate your app at any time. Quoting from Spec: Program execution:
Program execution begins by initializing the main package and then invoking the function main. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete.
When close(quit) is executed, the last statement of our main() function can proceed which means the main goroutine can return, so the program exits.
Sleeping without blocking
The above constructs block the goroutine, so if you don't have other goroutines running, that will cause a deadlock.
If you don't want to block the main goroutine but you just don't want it to end, you may use a time.Sleep() with a sufficiently large duration. The max duration value is
const maxDuration time.Duration = 1<<63 - 1
which is approximately 292 years.
time.Sleep(time.Duration(1<<63 - 1))
If you fear your app will run longer than 292 years, put the above sleep in an endless loop:
for {
time.Sleep(time.Duration(1<<63 - 1))
}
It depends on use cases to choose what kind of sleep you want.
#icza provides a good and simple solution for literally sleeping forever, but I want to give you some more sweets if you want your system could shutdown gracefully.
You could do something like this:
func mainloop() {
exitSignal := make(chan os.Signal)
signal.Notify(exitSignal, syscall.SIGINT, syscall.SIGTERM)
<-exitSignal
systemTeardown()
}
And in your main:
func main() {
systemStart()
mainloop()
}
In this way, you could not only ask your main to sleep forever, but you could do some graceful shutdown stuff after your code receives INT or TERM signal from OS, like ctrl+C or kill.
Another solution to block a goroutine. This solution prevents Go-Runtime to complain about the deadlock:
import "time"
func main() {
for {
time.Sleep(1138800 * time.Hour)
}
}