I created an example where I run function concurrently inside which I panic and recover:
package main
import "fmt"
func main() {
// "main" recovery
defer func() {
if r := recover(); r != nil {
fmt.Println("main goroutine paniced:", r)
}
}()
// running function concurrently inside which I panic
chanStr := make(chan string)
go func() {
// this "internal" goroutin recovery
defer func() {
if r := recover(); r != nil {
fmt.Println("internal goroutine paniced:", r)
}
chanStr <- "hello world"
}()
// panicking and wanting recovery not only in "internal" recovery but in "main" recovery as well
panic("NOT main goroutine")
}()
// waiting for chan with "internal" goroutine panicking and recovery
str := <-chanStr
fmt.Println(str)
// panic("main")
}
It gives output:
internal goroutine panicked: NOT main goroutine
hello world
Is it possible to change my code to make pass recovery from "internal" to "main"? In other words I want it to write down to console:
internal goroutine paniced: NOT main goroutine
main goroutine paniced: main
hello world
I tried to implement this by removing "internal" recovery func at all, but "main" recovery do not recover panic inside "internal" goroutine in this case.
Playground
Update
I tried to follow #Momer's advice and send an error through the channel and handle it in the main goroutine, instead of trying to bubble the panic up:
package main
import (
"errors"
"fmt"
)
func main() {
// "main" recovery
defer func() {
if r := recover(); r != nil {
fmt.Println("main goroutine paniced:", r)
}
}()
// running func concarantly inside which I panic
chanStr := make(chan string)
chanErr := make(chan error)
var err error
go func() {
// this "internal" goroutin recovery
defer func() {
if r := recover(); r != nil {
fmt.Println("internal goroutine paniced:", r)
switch t := r.(type) {
case string:
fmt.Println("err is string")
err = errors.New(t)
case error:
fmt.Println("err is error")
err = t
default:
fmt.Println("err is unknown")
err = errors.New("Unknown error")
}
chanErr <- err
chanStr <- ""
}
}()
// panicing and wanting recovery not only in "internal" recovery but in "main" recovery as well
panic("NOT main goroutine")
chanStr <- "hello world"
chanErr <- nil
}()
// waiting for chan with "internal" goroutin panicing and recovery
str := <-chanStr
err = <-chanErr
fmt.Println(str)
fmt.Println(err)
// panic("main")
}
It gives error
all goroutines are asleep - deadlock
Full output:
go run /goPath/parentRecoverty2.go
internal goroutine paniced: NOT main goroutine
err is string
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.main()
/goPath/parentRecoverty2.go:48 +0x1d4
goroutine 5 [chan send]:
main.funcĀ·002()
/goPath/parentRecoverty2.go:37 +0x407
main.funcĀ·003()
/goPath/parentRecoverty2.go:42 +0x130
created by main.main
/goPath/parentRecoverty2.go:46 +0x190
exit status 2
Update playground
I took panic/recover in golang as try/catch/final blocks in java or c++.
For more detail, you can visit Handling panics (from Golang spec).
so you can pass a panic to method's caller.
a simple code is below, hope it helps
Note: in function Foo(), I use recover() to catch things going wrong, and then re-panic in order to catch it later in outer caller.
package main
import (
"fmt"
"time"
)
func Foo() {
defer func() {
if x := recover(); x != nil {
fmt.Printf("Runtime panic: %v \n", x)
panic("Ah oh ... Panic in defer")
}
}()
panic("Panic in Foo() !")
}
func Game() {
defer func(){
fmt.Println("Clean up in Game()")
}()
defer func() {
if x := recover(); x != nil {
fmt.Println("Catch recover panic !!! In Game()")
}
}()
Foo()
}
func main() {
defer func() {
fmt.Println("Program Quit ... ")
}()
fmt.Println("-----------Split-------------")
go Game()
time.Sleep(1 * time.Millisecond)
fmt.Println("-----------Split-------------")
}
In your updated question, one thread is blocked by reading from chanStr while the other thread is blocked by writing into chanErr.
Switching the order of writes should resolve the deadlock.
defer func() {
if r := recover(); r != nil {
fmt.Println("internal goroutine paniced:", r)
switch t := r.(type) {
case string:
fmt.Println("err is string")
err = errors.New(t)
case error:
fmt.Println("err is error")
err = t
default:
fmt.Println("err is unknown")
err = errors.New("Unknown error")
}
chanStr <- ""
chanErr <- err
}
}
Related
I have two tasks that are running in go routines. I am using errgroup. I am not sure how to use the errgroup.WithContext correctly.
In the following code, task1 is returning the error and I would like to terminate task2 (long running) when that happens. Please note that in this example time.sleep is added just to simulate my problem. In reality task1 and task2 are doing real work and does not have any sleep call.
package main
import (
"context"
"fmt"
"golang.org/x/sync/errgroup"
"time"
)
func task1(ctx context.Context) error {
time.Sleep(5 * time.Second)
fmt.Println("first finished, pretend error happened")
return ctx.Err()
}
func task2(ctx context.Context) error {
select {
case <-ctx.Done():
fmt.Println("task 1 is finished with error")
return ctx.Err()
default:
fmt.Println("second started")
time.Sleep(50 * time.Second)
fmt.Println("second finished")
}
return nil
}
func test() (err error) {
ctx := context.Background()
g, gctx := errgroup.WithContext(ctx)
g.Go(func() error {
return task1(gctx)
})
g.Go(func() error {
return task2(gctx)
})
err = g.Wait()
if err != nil {
fmt.Println("wait done")
}
return err
}
func main() {
fmt.Println("main")
err := test()
if err != nil {
fmt.Println("main err")
fmt.Println(err.Error())
}
}
It's up to your tasks to handle context cancellation properly and not time.Sleep inside a select.
As stated in errgroup documentation:
WithContext returns a new Group and an associated Context derived from ctx.
The derived Context is canceled the first time a function passed to Go returns a non-nil error or the first time Wait returns, whichever occurs first.
You are using error group right, but your context handling needs a refactor.
Here is a refacor of your task 2:
func task2(ctx context.Context) error {
errCh := make(chan bool)
go func() {
time.Sleep(50 * time.Second)
errCh <- true
}()
select {
case <-ctx.Done():
return fmt.Errorf("context done: %w", ctx.Err())
case <-errCh:
return errors.New("task 2 failed")
}
}
With such select, you wait for the first channel to emit. In this case, it is the context expiration, unless you modify time sleep to be lower. Example playground.
I have a goroutine that is created when an inbound request comes in, and this is tracing with Opentelemetry:
qsSpan.AddEvent("does this print? yes it does")
go func(natsContext context.Context, nr *NewRequest) {
defer func() {
if panicked := recover(); panicked != nil {
err = fmt.Errorf("this doesn't print")
qsSpan.RecordError(err) // IF PANIC, HOW DO I USE qsSPAN HERE??
qsSpan.SetStatus(codes.Error, err.Error())
}
}()
natsContext = callDatabase(natsContext, *nr) // HERE IS WHERE PANIC ORIGINATES
defer wg.Done()
defer func() { responseChannel <- *msg }()
defer natsContext.Done()
defer qsSpan.End()
}(natsContext, nr)
I'd like to record an error on qsSpan when it panics, but I can't because the context dies
I'm forcing a panic on calldatabase with:
func callDatabase(ctx context.Context, request NewRequest) context.Context {
var callDatabaseSpan trace.Span
ctx, callDatabaseSpan = otel.Tracer("").Start(ctx, "callDatabase")
defer callDatabaseSpan.End()
minTime := 2.0
maxTime := 3.0
time.Sleep(time.Second * (time.Duration(minTime + rand.Float64()*(maxTime-minTime)))) // simulate db doing something
callDatabaseSpan.AddEvent(fmt.Sprintf("refreshCount: %d %s %s %s\n", request.RefreshCount, request.DbQueryToRun, request.Requestid, time.Now().String()[:24]))
panicVar := []string{"a", "b"}
_ = panicVar[2] // FORCE PANIC HERE
return ctx
}
I believe the issue is to do with my context being killed when calldatabase causes a panic, so the trace is lost. But I want to use the span created when the request comes in, and record the panic as an error there, instead of having to initialise a new span for the panic
EDIT:
I don't want to have to do this (which works), I want to use the qSpan:
qsSpan.AddEvent("does this print? yes it does")
go func(natsContext context.Context, nr *NewRequest) {
defer func() {
if panicked := recover(); panicked != nil {
ctxPanic := context.Background()
var pSpan trace.Span
ctxPanic, pSpan = otel.Tracer("").Start(ctxPanic, "panic") // INITIALISING NEW SPAN FOR PANIC WORKS, BUT I WANT TO USE qsSPAN TO TRACE PANIC ORIGIN
err := fmt.Errorf("calling calldatabase caused panic")
pSpan.RecordError(err)
pSpan.SetStatus(codes.Error, err.Error())
pSpan.End()
ctxPanic.Done()
}
}()
natsContext = callDatabase(natsContext, *nr)
defer wg.Done()
defer func() { responseChannel <- *msg }()
defer natsContext.Done()
defer qsSpan.End()
}(natsContext, nr)
Thanks to #SDJ for giving me the idea on how to fix this
I needed to create a span inside the routine that handles the database call (calldatabase), and then use that to record the error/panic. So create a middle span between the inbound request routine, and inside the routine that processes the request
go func(natsContext context.Context, nr *NewRequest) {
var hsSpan trace.Span
natsContext, hsSpan = otel.Tracer("").Start(natsContext, "handlerequest") // CREATE THE SPAN HERE
defer func() {
if panicked := recover(); panicked != nil {
err := fmt.Errorf("calling calldatabase caused panic") // RECORD PANIC USING SPAN CREATED IN THIS ROUTINE
hsSpan.RecordError(err)
hsSpan.SetStatus(codes.Error, err.Error())
response = err.Error()
msg.Respond([]byte(response))
} else {
func() { responseChannel <- *msg }() // IF NO PANIC SIGNAL THAT callDatabase DONE WHEN ROUTINE FINISHES
}
}()
natsContext = callDatabase(natsContext, *nr) // CONTEXT PASSED AND RETURNED TO CONTINUE TRACING callDatabase FUNCTION
defer wg.Done()
defer natsContext.Done()
defer hsSpan.End()
}(natsContext, nr)
So now I have a trace from the request, from processing the request, and linked to those if there's a panic so I can identify the origin of the request that triggered a panic
I try to send an error in the channel on recovery
Why this error is not sent to the channel?
package main
import (
"fmt"
"sync"
"errors"
)
func main() {
var wg sync.WaitGroup
wg.Add(1)
batchErrChan := make(chan error)
go func(errchan chan error) {
defer func() {
if r := recover(); r != nil {
errchan <- errors.New("recover err")
}
close(batchErrChan)
wg.Done()
}()
panic("ddd")
}(batchErrChan)
go func() {
for _ = range batchErrChan {
fmt.Println("err in range")
}
}()
wg.Wait()
}
https://play.golang.org/p/0ytunuYDWZU
I expect "err in range" to be printed, but it is not. Why?
Your program ends before the goroutine gets a chance to print the message. Try waiting to it:
...
done:=make(chan struct{})
go func() {
for _ = range batchErrChan {
fmt.Println("err in range")
}
close(done)
}()
wg.Wait()
<-done
}
I understand that to handle panic recover is used. But the following block fails to recover when panic arises in go routine
func main() {
done := make(chan int64)
defer fmt.Println("Graceful End of program")
defer func() {
r := recover()
if _, ok := r.(error); ok {
fmt.Println("Recovered")
}
}()
go handle(done)
for {
select{
case <- done:
return
}
}
}
func handle(done chan int64) {
var a *int64
a = nil
fmt.Println(*a)
done <- *a
}
However following block is able to execute as expected
func main() {
done := make(chan int64)
defer fmt.Println("Graceful End of program")
defer func() {
r := recover()
if _, ok := r.(error); ok {
fmt.Println("Recovered")
}
}()
handle(done)
for {
select{
case <- done:
return
}
}
}
func handle(done chan int64) {
var a *int64
a = nil
fmt.Println(*a)
done <- *a
}
How to recover from panics that arise in go routines. Here is the link for playground : https://play.golang.org/p/lkvKUxMHjhi
Recover only works when called from the same goroutine as the panic is called in. From the Go blog:
The process continues up the stack until all functions in the current
goroutine have returned, at which point the program crashes
You would have to have a deferred recover within the goroutine.
https://blog.golang.org/defer-panic-and-recover
The docs / spec also includes the same :
While executing a function F, an explicit call to panic or a run-time
panic terminates the execution of F. Any functions deferred by F are
then executed as usual. Next, any deferred functions run by F's caller
are run, and so on up to any deferred by the top-level function in the
executing goroutine. At that point, the program is terminated and the
error condition is reported, including the value of the argument to
panic. This termination sequence is called panicking
https://golang.org/ref/spec#Handling_panics
I use the following way to handle this case, and it works as expected.
package main
import (
"fmt"
"time"
)
func main() {
defer fmt.Println("Graceful End of program")
defer func() {
if r := recover(); r != nil {
fmt.Println("Recovered:", r)
}
}()
done := make(chan int64)
var panicVar interface{}
go handle(done, &panicVar)
WAIT:
for {
select {
case <-done:
break WAIT // break WAIT: goroutine exit normally
default:
if panicVar != nil {
break WAIT // break WAIT: goroutine exit panicked
}
// wait for goroutine exit
}
time.Sleep(1 * time.Microsecond)
}
if panicVar != nil {
panic(panicVar) // panic again
}
}
func handle(done chan int64, panicVar *interface{}) {
defer func() {
if r := recover(); r != nil {
// pass panic variable outside
*panicVar = r
}
}()
var a *int64
a = nil
fmt.Println(*a)
done <- *a
}
Playground link: https://play.golang.org/p/t0wXwB02pa3
I got a problem using sync.WaitGroup and select together. If you take a look at following http request pool you will notice that if an error occurs it will never be reported as wg.Done() will block and there is no read from the channel anymore.
package pool
import (
"fmt"
"log"
"net/http"
"sync"
)
var (
MaxPoolQueue = 100
MaxPoolWorker = 10
)
type Pool struct {
wg *sync.WaitGroup
queue chan *http.Request
errors chan error
}
func NewPool() *Pool {
return &Pool{
wg: &sync.WaitGroup{},
queue: make(chan *http.Request, MaxPoolQueue),
errors: make(chan error),
}
}
func (p *Pool) Add(r *http.Request) {
p.wg.Add(1)
p.queue <- r
}
func (p *Pool) Run() error {
for i := 0; i < MaxPoolWorker; i++ {
go p.doWork()
}
select {
case err := <-p.errors:
return err
default:
p.wg.Wait()
}
return nil
}
func (p *Pool) doWork() {
for r := range p.queue {
fmt.Printf("Request to %s\n", r.Host)
p.wg.Done()
_, err := http.DefaultClient.Do(r)
if err != nil {
log.Fatal(err)
p.errors <- err
} else {
fmt.Printf("no error\n")
}
}
}
Source can be found here
How can I still use WaitGroup but also get errors from go routines?
Just got the answer my self as I wrote the question and as I think it is an interesting case I would like to share it with you.
The trick to use sync.WaitGroup and chan together is that we wrap:
select {
case err := <-p.errors:
return err
default:
p.wg.Done()
}
Together in a for loop:
for {
select {
case err := <-p.errors:
return err
default:
p.wg.Done()
}
}
In this case select will always check for errors and wait if nothing happens :)
It looks a bit like the fail-fast mechanism enabled by the Tomb library (Tomb V2 GoDoc):
The tomb package handles clean goroutine tracking and termination.
If any of the tracked goroutines returns a non-nil error, or the Kill or Killf method is called by any goroutine in the system (tracked or not), the tomb Err is set, Alive is set to false, and the Dying channel is closed to flag that all tracked goroutines are supposed to willingly terminate as soon as possible.
Once all tracked goroutines terminate, the Dead channel is closed, and Wait unblocks and returns the first non-nil error presented to the tomb via a result or an explicit Kill or Killf method call, or nil if there were no errors.
You can see an example in this playground:
(extract)
// start runs all the given functions concurrently
// until either they all complete or one returns an
// error, in which case it returns that error.
//
// The functions are passed a channel which will be closed
// when the function should stop.
func start(funcs []func(stop <-chan struct{}) error) error {
var tomb tomb.Tomb
var wg sync.WaitGroup
allDone := make(chan struct{})
// Start all the functions.
for _, f := range funcs {
f := f
wg.Add(1)
go func() {
defer wg.Done()
if err := f(tomb.Dying()); err != nil {
tomb.Kill(err)
}
}()
}
// Start a goroutine to wait for them all to finish.
go func() {
wg.Wait()
close(allDone)
}()
// Wait for them all to finish, or one to fail
select {
case <-allDone:
case <-tomb.Dying():
}
tomb.Done()
return tomb.Err()
}
A simpler implementation would be like below. (Check in play.golang: https://play.golang.org/p/TYxxsDRt5Wu)
package main
import "fmt"
import "sync"
import "time"
type Error struct {
message string
}
func (e Error) Error() string {
return e.message
}
func main() {
var wg sync.WaitGroup
waitGroupLength := 8
errChannel := make(chan error, 1)
// Setup waitgroup to match the number of go routines we'll launch off
wg.Add(waitGroupLength)
finished := make(chan bool, 1) // this along with wg.Wait() are why the error handling works and doesn't deadlock
for i := 0; i < waitGroupLength; i++ {
go func(i int) {
fmt.Printf("Go routine %d executed\n", i+1)
time.Sleep(time.Duration(waitGroupLength - i))
time.Sleep(0) // only here so the time import is needed
if i%4 == 1 {
errChannel <- Error{fmt.Sprintf("Errored on routine %d", i+1)}
}
// Mark the wait group as Done so it does not hang
wg.Done()
}(i)
}
go func() {
wg.Wait()
close(finished)
}()
L:
for {
select {
case <-finished:
break L // this will break from loop
case err := <-errChannel:
if err != nil {
fmt.Println("error ", err)
// handle your error
}
}
}
fmt.Println("Executed all go routines")
}