Why doesn't panic show all running goroutines? - go

Page 253 of The Go Programming Language states:
... if instead of returning from main in the event of cancellation, we execute a call to panic, then the runtime will dump the stack of every goroutine in the program.
This code deliberately leaks a goroutine by waiting on a channel that never has anything to receive:
package main
import (
"fmt"
"time"
)
func main() {
never := make(chan struct{})
go func() {
defer fmt.Println("End of child")
<-never
}()
time.Sleep(10 * time.Second)
panic("End of main")
}
However, the runtime only lists the main goroutine when panic is called:
panic: End of main
goroutine 1 [running]:
main.main()
/home/simon/panic/main.go:15 +0x7f
exit status 2
If I press Ctrl-\ to send SIGQUIT during the ten seconds before main panics, I do see the child goroutine listed in the output:
goroutine 1 [sleep]:
time.Sleep(0x2540be400)
/usr/lib/go-1.17/src/runtime/time.go:193 +0x12e
main.main()
/home/simon/panic/main.go:14 +0x6c
goroutine 18 [chan receive]:
main.main.func1()
/home/simon/panic/main.go:12 +0x76
created by main.main
/home/simon/panic/main.go:10 +0x5d
I thought maybe the channel was getting closed as panic runs (which still wouldn't guarantee the deferred fmt.Println had time to execute), but I get the same behaviour if the child goroutine does a time.Sleep instead of waiting on a channel.
I know there are ways to dump goroutine stacktraces myself, but my question is why doesn't panic behave as described in the book? The language spec only says that a panic will terminate the program, so is the book simply describing implementation-dependent behaviour?

Thanks to kostix for pointing me to the GOTRACEBACK runtime environment variable. Setting this to all instead of leaving it at the default of single restores the behaviour described in TGPL. Note that this variable is significant to the runtime, but you can't manipulate it with go env.
The default to only list the panicking goroutine is a change in go 1.6 - my edition of the book is copyrighted 2016 and gives go 1.5 as the prequisite for its example code, so it must predate the change. It's interesting reading the change discussion that there was concern about hiding useful information (as the recipient of many an incomplete error report, I can sympathise with this), but nobody called out the issue of scaling to large production systems that kostix mentioned.

Related

When my simple Go program run ,Why the result is deadlock?

This is my entire Go code! What confused me is that case balances <- balance: did't occurs.I dont know why?
package main
import (
"fmt"
)
func main() {
done := make(chan int)
var balance int
balances := make(chan int)
balance = 1
go func() {
fmt.Println(<-balances)
done <- 1
}()
select {
case balances <- balance:
fmt.Println("done case")
default:
fmt.Println("default case")
}
<-done
}
default case
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.main()
/tmp/sandbox575832950/prog.go:29 +0x13d
goroutine 18 [chan receive]:
main.main.func1()
/tmp/sandbox575832950/prog.go:17 +0x38
created by main.main
/tmp/sandbox575832950/prog.go:16 +0x97
The main goroutine executes the select before the anonymous goroutine function executes the receive from balances. The main goroutine executes the default clause in the select because there is no ready receiver on balances. The main goroutine continues on to receive on done.
The goroutine blocks on receive from balances because there is no sender. Main continued past the send by taking the default clause.
The main goroutine blocks on receive from done because there is no sender. The goroutine is blocked on receive from balances.
Fix by replacing the select statement with balances <- balance. The default clause causes the problem. When the the default class is removed, all that remains in the select is send to balances.
Because of concurrency, there's no guarantee that the goroutine will execute before the select. We can see this by adding a print to the goroutine.
go func() {
fmt.Println("Here")
fmt.Println(<-balances)
done <- 1
}()
$ go run test.go
default case
Here
fatal error: all goroutines are asleep - deadlock!
...
If the select runs first, balances <- balance would block; balances has no buffer and nothing is trying to read from it. case balances <- balance would block so select skips it and executes its default.
Then the goroutine runs and blocks reading balances. Meanwhile the main code blocks reading done. Deadlock.
You can solve this by either removing the default case from the select and allowing it to block until balances is ready to be written to.
select {
case balances <- balance:
fmt.Println("done case")
}
Or you can add a buffer to balances so it can be written to before it is read from. Then case balances <- balance does not block.
balances := make(chan int, 1)
What confused me is that case balances <- balance: did't occurs
To be specific: it's because of select with a default case.
Whenever you create a new goroutine with go ...(), there is no guarantee about whether the invoking goroutine, or the invoked goroutine, will run next.
In practice it's likely that the next statements in the invoking goroutine will execute next (there being no particularly good reason to stop it). Of course, we should write programs that function correctly all the time, not just some, most, or even almost all the time! Concurrent programming with go ...() is all about synchronizing the goroutines so that the intended behavior must occur. Channels can do that, if used properly.
I think the balances channel can receive data
It's an unbuffered channel, so it can receive data if someone is reading from it. Otherwise, that write to the channel will block. Which brings us back to select.
Since you provided a default case, it's quite likely that the goroutine that invoked go ...() will continue to execute, and select that can't immediately choose a different case, will choose default. So it would be very unlikely for the invoked goroutine to be ready to read from balances before the main goroutine had already proceeded to try to write to it, failed, and gone on to the default case.
You can solve this by either removing the default case from the select and allowing it to block until balances is ready to be written to.
You sure can, as #Schwern points out. But it's important that you understand you don't necessarily need to use select to use channels. Instead of a select with just one case, you could instead just write
balances <- balance
fmt.Println("done")
select is not required in this case, default is working against you, and there's just one case otherwise, so there's no need for select. You want the main function to block on that channel.
you can add a buffer to balances so it can be written to before it is read from.
Sure. But again, important to understand that the fact that a channel might block both sender and receiver until they're both ready to communicate , is a valid, useful, and common use of channels. Unbuffered channels are not the cause of your problem - providing a default case for your select, and thus a path for unintended behavior, is the cause.

How an os.Signal channel is handled internally in Go?

When having the code:
package main
import (
"os"
"os/signal"
)
func main() {
sig := make(chan os.Signal, 1)
signal.Notify(sig)
<-sig
}
Runs without problem, of course, blocking until you send a signal that interrupts the program.
But:
package main
func main() {
sig := make(chan int, 1)
<-sig
}
throws this error:
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.main()
/home/user/project/src/main.go:5 +0x4d
exit status 2
While I understand why reading from the int channel causes a deadlock, I have only a suspicion
that the os.Signal doesn't because its channel can suffer writes from "the outside" as, well,
it handles signals and they come from outside the program.
Is my suspicion somewhat correct? If so, how the runtime handles this differently from other channel types?
Thank you!
You have a deadlock because try to receive message from channel but no other goroutine running that is no sender exists. In the same time call to signal.Notify starts watchSignalLoop() goroutine in background and you can verify implementation details here https://golang.org/src/os/signal/signal.go.
Channels don't care about element type unless your element type is larger than 64kB (strictly speaking, there are other nuances, please check the implementation).
Don't guess about how runtime works, make researches about it. For example, you can check what happens when you call make(chan int). You can do go tool compile -S main.go | grep main.go:line of make chan and check which function is called from runtime package. Then just jump to this file and invest your time to understand the implementation. You will see that implementation of channels is thin and straightforward comparing to other things
Hope it helps!

Why a go-routine block on channel is considered as deadlock?

As per the definition here, deadlock is related to resource contention.
In an operating system, a deadlock occurs when a process or thread enters a waiting state because a requested system resource is held by another waiting process, which in turn is waiting for another resource held by another waiting process. If a process is unable to change its state indefinitely because the resources requested by it are being used by another waiting process, then the system is said to be in a deadlock.
In the below code:
package main
import "fmt"
func main() {
c := make(chan string)
c <- "John"
fmt.Println("main() stopped")
}
main() go-routine blocks until any other go-routine(no such) reads the same data from that channel.
but the output shows:
$ bin/cs61a
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan send]:
main.main()
/home/user/../myhub/cs61a/Main.go:8 +0x54
$
edit:
For the point: "the main goroutine–which is blocked, hence all goroutines are blocked, hence it's a deadlock." in the below code, non-main goroutine is also blocked on channel, aren't all goroutines supposed to get blocked?
package main
import (
"fmt"
"time"
)
func makeRandom(randoms chan int) {
var ch chan int
fmt.Printf("print 1\n")
<-ch
fmt.Printf("print 2\n")
}
func main() {
randoms := make(chan int)
go makeRandom(randoms)
}
Edit 2:
For your point in the answer: "not all your goroutines are blocked so it's not a deadlock". In the below code, only main() goroutine is blocked, but not worker():
package main
import (
"fmt"
)
func worker() {
fmt.Printf("some work\n")
}
func main() {
ch := make(chan int)
go worker()
<-ch
}
and the output says deadlock:
$ bin/cs61a
some work
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
main.main()
/home/user/code/src/github.com/myhub/cs61a/Main.go:18 +0x6f
$
Ideally main() should not exit, because channel resource is used by any one go-routine.
Why a go-routine block on channel considered as deadlock?
In Go a deadlock is when all existing goroutines are blocked.
Your example has a single goroutine–the main goroutine–which is blocked, hence all goroutines are blocked, hence it's a deadlock.
Note: since all goroutines are blocked, new goroutines will not (cannot) be launched (because they can only be launched from running goroutines). And if all goroutines are blocked and cannot do anything, there is no point in waiting forever for nothing. So the runtime exits.
Edit:
Your edited code where you use a sleep in main is a duplicate of this: Go channel deadlock is not happening. Basically a sleep is not a blocking forever operation (the sleep duration is finite), so a goroutine sleeping is not considered in deadlock detection.
Edit #2:
Since then you removed the sleep() but it doesn't change anything. You have 2 goroutines: the main and the one executing makeRandom(). makeRandom() is blocked and main() isn't. So not all your goroutines are blocked so it's not a deadlock.
Edit #3:
In your last example when the runtime detects the deadlock, then there is only a single goroutine still running: the main(). It's true that you launch a goroutine executing worker(), but that only prints a text and terminates. "Past" goroutines do not count, terminated goroutines also can't do anything to change the blocked state of existing goroutines. Only existing goroutines count.
Check out this article to understand exactly why a go-routine block on channel considered as deadlock:
http://dmitryvorobev.blogspot.com/2016/08/golang-channels-implementation.html
In your example above, the main goroutine gets added to the waiting queue(sendq) and cannot be released until Go runs some goroutine that receives a value from the channel.

Why code in loop not executed when I have two go-routines

I'm facing a problem in golang
var a = 0
func main() {
go func() {
for {
a = a + 1
}
}()
time.Sleep(time.Second)
fmt.Printf("result=%d\n", a)
}
expected: result=(a big int number)
result: result=0
You have a race condition,
run your program with -race flag
go run -race main.go
==================
WARNING: DATA RACE
Read at 0x0000005e9600 by main goroutine:
main.main()
/home/jack/Project/GoProject/src/gitlab.com/hooshyar/GoNetworkLab/StackOVerflow/race/main.go:17 +0x6c
Previous write at 0x0000005e9600 by goroutine 6:
main.main.func1()
/home/jack/Project/GoProject/src/gitlab.com/hooshyar/GoNetworkLab/StackOVerflow/race/main.go:13 +0x56
Goroutine 6 (running) created at:
main.main()
/home/jack/Project/GoProject/src/gitlab.com/hooshyar/GoNetworkLab/StackOVerflow/race/main.go:11 +0x46
==================
result=119657339
Found 1 data race(s)
exit status 66
what is solution?
There is some solution, A solution is using a mutex:
var a = 0
func main() {
var mu sync.Mutex
go func() {
for {
mu.Lock()
a = a + 1
mu.Unlock()
}
}()
time.Sleep(3*time.Second)
mu.Lock()
fmt.Printf("result=%d\n", a)
mu.Unlock()
}
before any read and write lock the mutex and then unlock it, now you don not have any race and resault will bi big int at the end.
For more information read this topic.
Data races in Go(Golang) and how to fix them
and this
Golang concurrency - data races
As other writers have mentioned, you have a data race, but if you are comparing this behavior to, say, a program written in C using pthreads, you are missing some important data. Your problem is not just about timing, it's about the very language definition. Because concurrency primitives are baked into the language itself, the Go language memory model (https://golang.org/ref/mem) describes exactly when and how changes in one goroutine -- think of goroutines as "super-lightweight user-space threads" and you won't be too far off -- are guaranteed to be visible to code running in another goroutine.
Without any synchronizing actions, like channel sends/receives or sync.Mutex locks/unlocks, the Go memory model says that any changes you make to 'a' inside that goroutine don't ever have to be visible to the main goroutine. And, since the compiler knows that, it is free to optimize away pretty much everything in your for loop. Or not.
It's a similar situation to when you have, say, a local int variable in C set to 1, and maybe you have a while loop reading that variable in a loop waiting for it to be set to 0 by an ISR, but then your compiler gets too clever and decides to optimize away the test for zero because it thinks your variable can't ever change within the loop and you really just wanted an infinite loop, and so you have to declare the variable as volatile to fix the 'bug'.
If you are going to be working in Go, (my current favorite language, FWIW,) take time to read and thoroughly grok the Go memory model linked above, and it will really pay off in the future.
Your program is running into race condition. go can detect such scenarios.
Try running your program using go run -race main.go assuming your file name is main.go. It will show how race occured ,
attempted write inside the goroutine ,
simultaneous read by the main goroutine.
It will also print a random int number as you expected.

Ensuring goroutine cleanup, bestpractice

I have a fundamental understanding problem about how to make sure that spawned goroutines are "closed" properly in the context of long-running processes. I watched talks regarding that topic and read about best practices. In order to understand my question please refer to the video "Advanced Go Concurrency Patterns" here
For the following, if you run code on your machine please export the environment variable GOTRACEBACK=all so you are able to see routine states after panic.
I put the code for the original example here: naive (it does not execute on go playground, I guess bacause a time statement is used. Please copy the code and execute it locally)
The result of the panic of the naive implementation after execution is
panic: show me the stacks
goroutine 1 [running]:
panic(0x48a680, 0xc4201d8480)
/usr/lib/go/src/runtime/panic.go:500 +0x1a1
main.main()
/home/flx/workspace/go/go-rps/playground/ball-naive.go:18 +0x16b
goroutine 5 [chan receive]:
main.player(0x4a4ec4, 0x2, 0xc42006a060)
/home/flx/workspace/go/go-rps/playground/ball-naive.go:23 +0x61
created by main.main
/home/flx/workspace/go/go-rps/playground/ball-naive.go:13 +0x76
goroutine 6 [chan receive]:
main.player(0x4a4ec6, 0x2, 0xc42006a060)
/home/flx/workspace/go/go-rps/playground/ball-naive.go:23 +0x61
created by main.main
/home/flx/workspace/go/go-rps/playground/ball-naive.go:14 +0xad
exit status 2
That demonstrates the underlying problem of leaving dangling goroutines on the system, which is especially bad for long running processes.
So for my personal understanding I tried two slightly more sophisticated variants to be found here:
for-select with default
generator pattern with quit channel
(again, not executable on the playground, cause "process takes too long")
The first solution is not fitting for various reasons, even leading to non-determinism in executed steps, depending on goroutine execution speed.
Now I thought -- and here finally comes the question! -- that the second solution with the quit channel would be appropriate to eliminate all executional traces from the system before exiting. Anyhow, "sometimes" the program exits too fast and the panic reports an additional goroutine runnable still residing on the system. The panic output:
panic: show me the stacks
goroutine 1 [running]:
panic(0x48d8e0, 0xc4201e27c0)
/usr/lib/go/src/runtime/panic.go:500 +0x1a1
main.main()
/home/flx/workspace/go/go-rps/playground/ball-perfect.go:20 +0x1a9
goroutine 20 [runnable]:
main.player.func1(0xc420070060, 0x4a8986, 0x2, 0xc420070120)
/home/flx/workspace/go/go-rps/playground/ball-perfect.go:27 +0x211
created by main.player
/home/flx/workspace/go/go-rps/playground/ball-perfect.go:36 +0x7f
exit status 2
My question is: that should not happen, right? I do use a quit channel to cleanup state before stepping forward to panicking.
I did a final try of implementing safe cleanup behavior here:
artificial wait time for runnables to close
Anyhow, that solution does not feel right and may as well not be applicable to large amounts of runnables?
What would be the recommended and most idiomatic pattern to ensure correct cleanup?
Thanks for your time
Your are fooled by the output: Your "generator pattern with quit channel" works perfectly fine, the two goroutines actually are terminated properly.
You see them in the trace because you panic too early. Remember: You have to goroutines running concurrently with main. main "stops" these goroutines by signaling on the quit channel. After these two sends on line 18 and 19 the two receives on line 32 have happened. And nothing more! You still have three goroutines running: Main is between lines 19 and 20 and the player goroutines are between lines 32 and 33. If now the panic in main happens before the return in player then the player goroutines are still there and are show in the panic stacktrace. These goroutines would have ended several milliseconds later if only the scheduler would have had time to execute the return on line 33 (which it hadn't as you killed it by panicking).
This is an instance of the "main ends to early to see concurrent goroutines do work" problem asked once a month here. You do see the concorrent goroutines doing work, but not all work. You might try sleeping 2 milliseconds before the panic and your player goroutines will have time to execute the return and everything is fine.

Resources