Understanding Go channel deadlocks - go

package main
import (
"fmt"
"time"
)
func main() {
p := producer()
for c := range p {
fmt.Println(c)
}
}
func producer() <-chan string {
ch := make(chan string)
go func() {
for i := 0; i < 5; i++ {
ch <- fmt.Sprint("hello", i)
time.Sleep(1 * time.Second)
}
// commented the below to show the issue
// close(ch)
}()
return ch
}
Running the above code will print 5 messages and then give a "all go routines are a sleep - deadlock error". I understand that if I close the channel the error is gone.
The thing I would like to understand is how does go runtime know that the code will be waiting infinitely on the channel and that there is nothing else that will be sending data into the channel.
Now if I add an additional go routine to the main() function.. it does not throw any error and keeps waiting on the channel.
go func() {
for {
time.Sleep(2 * time.Millisecond)
}
}()
So does this mean.. the go runtime is just looking for presence of a running go routine that could potentially send data into the channel and hence not throwing the deadlock error ?

If you want some more insight into how Go implements the deadlock detection, have a look at the place in the code that throws the "all goroutines are asleep - deadlock!": https://github.com/golang/go/blob/master/src/runtime/proc.go#L3751
It looks like the Go runtime keeps some fairly simple accounting on how many goroutines there are, how many are idle, and how many are sleeping for locks (not sure which one sleep on channel I/O will increment). At any given time (serialized with the rest of the runtime), it just does some arithmetic and checks if all - idle - locked > 0... if so, then the program could still make progress... if it's 0, then you're definitely deadlocked.
It's possible you could introduce a livelock by preventing a goroutine from sleeping via an infinite loop (like what you did in your experiment, and apparently sleep for timers isn't treated the same by the runtime). The runtime wouldn't be able to detect a deadlock in that case, and run forever.
Furthermore, I'm not sure when exactly the runtime checks for deadlocks- further inspection of who calls that checkdead() may yield some insight there, if you're interested.
DISCLAIMER- I'm not a Go core developer, I just play one on TV :-)

The runtime panics with the "all go routines are a sleep - deadlock error" error when all goroutines are blocked on channel and mutex operations.
The sleeping goroutine does not block on one of these operations. There is no deadlock and therefore no panic.

Related

Sending data from one goroutine to multiple other goroutines

In a project the program receives data via websocket. This data needs to be processed by n algorithms. The amount of algorithms can change dynamically.
My attempt is to create some pub/sub pattern where subscriptions can be started and canceled on the fly. Turns out that this is a bit more challenging than expected.
Here's what I came up with (which is based on https://eli.thegreenplace.net/2020/pubsub-using-channels-in-go/):
package pubsub
import (
"context"
"sync"
"time"
)
type Pubsub struct {
sync.RWMutex
subs []*Subsciption
closed bool
}
func New() *Pubsub {
ps := &Pubsub{}
ps.subs = []*Subsciption{}
return ps
}
func (ps *Pubsub) Publish(msg interface{}) {
ps.RLock()
defer ps.RUnlock()
if ps.closed {
return
}
for _, sub := range ps.subs {
// ISSUE1: These goroutines apparently do not exit properly...
go func(ch chan interface{}) {
ch <- msg
}(sub.Data)
}
}
func (ps *Pubsub) Subscribe() (context.Context, *Subsciption, error) {
ps.Lock()
defer ps.Unlock()
// prep channel
ctx, cancel := context.WithCancel(context.Background())
sub := &Subsciption{
Data: make(chan interface{}, 1),
cancel: cancel,
ps: ps,
}
// prep subsciption
ps.subs = append(ps.subs, sub)
return ctx, sub, nil
}
func (ps *Pubsub) unsubscribe(s *Subsciption) bool {
ps.Lock()
defer ps.Unlock()
found := false
index := 0
for i, sub := range ps.subs {
if sub == s {
index = i
found = true
}
}
if found {
s.cancel()
ps.subs[index] = ps.subs[len(ps.subs)-1]
ps.subs = ps.subs[:len(ps.subs)-1]
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(s.Data)
}
return found
}
func (ps *Pubsub) Close() {
ps.Lock()
defer ps.Unlock()
if !ps.closed {
ps.closed = true
for _, sub := range ps.subs {
sub.cancel()
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(sub.Data)
}
}
}
type Subsciption struct {
Data chan interface{}
cancel func()
ps *Pubsub
}
func (s *Subsciption) Unsubscribe() {
s.ps.unsubscribe(s)
}
As mentioned in the comments there are (at least) two issues with this:
ISSUE1:
After a while of running in implementation of this I get a few errors of this kind:
goroutine 120624 [runnable]:
bm/internal/pubsub.(*Pubsub).Publish.func1(0x8586c0, 0xc00b44e880, 0xc008617740)
/home/X/Projects/bm/internal/pubsub/pubsub.go:30
created by bookmaker/internal/pubsub.(*Pubsub).Publish
/home/X/Projects/bm/internal/pubsub/pubsub.go:30 +0xbb
Without really understanding this it appears to me that the goroutines created in Publish() do accumulate/leak. Is this correct and what am I doing wrong here?
ISSUE2:
When I end a subscription via Unsubscribe() it occurs that Publish() tried to write to a closed channel and panics. To mitigate this I have created a goroutine to close the channel with a delay. This feel really off-best-practice but I was not able to find a proper solution to this. What would be a deterministic way to do this?
Heres a little playground for you to test with: https://play.golang.org/p/K-L8vLjt7_9
Before diving into your solution and its issues, let me recommend again another Broker approach presented in this answer: How to broadcast message using channel
Now on to your solution.
Whenever you launch a goroutine, always think of how it will end and make sure it does if the goroutine is not ought to run for the lifetime of your app.
// ISSUE1: These goroutines apparently do not exit properly...
go func(ch chan interface{}) {
ch <- msg
}(sub.Data)
This goroutine tries to send a value on ch. This may be a blocking operation: it will block if ch's buffer is full and there is no ready receiver on ch. This is out of the control of the launched goroutine, and also out of the control of the pubsub package. This may be fine in some cases, but this already places a burden on the users of the package. Try to avoid these. Try to create APIs that are easy to use and hard to misuse.
Also, launching a goroutine just to send a value on a channel is a waste of resources (goroutines are cheap and light, but you shouldn't spam them whenever you can).
You do it because you don't want to get blocked. To avoid blocking, you may use a buffered channel with a "reasonable" high buffer. Yes, this doesn't solve the blocking issue, in only helps with "slow" clients receiving from the channel.
To "truly" avoid blocking without launching a goroutine, you may use non-blocking send:
select {
case ch <- msg:
default:
// ch's buffer is full, we cannot deliver now
}
If send on ch can proceed, it will happen. If not, the default branch is chosen immediately. You have to decide what to do then. Is it acceptable to "lose" a message? Is it acceptable to wait for some time until "giving up"? Or is it acceptable to launch a goroutine to do this (but then you'll be back at what we're trying to fix here)? Or is it acceptable to get blocked until the client can receive from the channel...
Choosing a reasonable high buffer, if you encounter a situation when it still gets full, it may be acceptable to block until the client can advance and receive from the message. If it can't, then your whole app might be in an unacceptable state, and it might be acceptable to "hang" or "crash".
// ISSUE2: close the channel async with a delay to ensure
// nothing will be written to the channel anymore
// via a pending goroutine from Publish()
go func(ch chan interface{}) {
time.Sleep(500 * time.Millisecond)
close(ch)
}(s.Data)
Closing a channel is a signal to the receiver(s) that no more values will be sent on the channel. So always it should be the sender's job (and responsibility) to close the channel. Launching a goroutine to close the channel, you "hand" that job and responsibility to another "entity" (a goroutine) that will not be synchronized to the sender. This may easily end up in a panic (sending on a closed channel is a runtime panic, for other axioms see How does a non initialized channel behave?). Don't do that.
Yes, this was necessary because you launched goroutines to send. If you don't do that, then you may close "in-place", without launching a goroutine, because then the sender and closer will be the same entity: the Pubsub itself, whose sending and closing operations are protected by a mutex. So solving the first issue solves the second naturally.
In general if there are multiple senders for a channel, then closing the channel must be coordinated. There must be a single entity (often not any of the senders) that waits for all senders to finish, practically using a sync.WaitGroup, and then that single entity can close the channel, safely. See Closing channel of unknown length.

Deadlock in goroutine with GopherJS

Why is there a deadlock in the following code? I am trying to return something from the goroutine to outside
package main
import (
"fmt"
"syscall/js"
"time"
)
func test(this js.Value, i []js.Value) interface{} {
done := make(chan string, 1)
go func() {
doRequest := func(this js.Value, i []js.Value) interface{} {
time.Sleep(time.Second)
return 0
}
js.Global().Set("doRequest", js.FuncOf(doRequest))
args := []js.Value{js.ValueOf("url")}
var x js.Value
doRequest(x, args)
done <- "true"
}()
aa := <-done
fmt.Println(aa)
return 0
}
func main() {
c := make(chan bool)
js.Global().Set("test", js.FuncOf(test))
<-c
}
When I run this on a browser and call test(), the following error will be shown
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
.....
Pretty much what it says in the error message. All goroutines are asleep. main doesn't start anything and just does a channel receive, so it's blocked, and no other goroutines are running, so there's no possibility that main could ever wake up again, so the runtime panics.
If I recall correctly, unlike regular Go, GopherJS doesn't shut everything down and exit if main exits (in part because: what exactly would that even mean? The closest analog to a Go program would be to close the webpage! Which would kind of suck. So GopherJS doesn't do that.). So what you're doing to keep main alive is not necessary, strictly speaking, in GopherJS.
That said, if you say (for example) time.Sleep(time.Hour) at the end instead, then while all goroutines are still asleep (strictly speaking), main will eventually wake up, which the runtime knows, so it doesn't panic in that case.
As to your actual test function, once you do try it, you get a related error message: Uncaught Error: runtime error: cannot block in JavaScript callback, fix by wrapping code in goroutine. test does a blocking call on a channel, and GopherJS won't allow that in a function called directly from Javascript, so it panics. (When I run it in the playground, I also get Uncaught TypeError: r is not a function, but that's just fallout from the earlier error.)
I think what you're trying to do is wait for doRequest to finish, print the value, and return, but that won't work. You'll need to use a native Javascript promise, or some other asynchronous mechanism, for that.
func main() {
c := make(chan bool)
js.Global().Set("test", js.FuncOf(test))
<-c
}
You have made a channel c, and then wait to receive a value from it. Notice how c is a variable local to the main function. The reference to c is never passed anywhere else in the program, so there can never be a value sent on the c channel, and thus your main goroutine will wait forever to receive.

Golang intermittent behaviour on timed out Goroutine

I am trying to implement concurrency for repetitive task. I want to implement an http request on a different Goroutine (pictured by longRunningTask function). I provide a timer for a mechanism to stop the Goroutine and sends a timeout signal to the main Goroutine if the heavy load task proceed the predefined timeout. The problem that I currently have is that I am getting intermittent behaviour.
The code has been simplified to look like below.
package main
import (
"fmt"
"time"
)
func main() {
var iteration int = 5
timeOutChan := make(chan struct{})
resultChan := make(chan string)
for i := 0; i < iteration; i++ {
go longRunningTaks(timeOutChan, resultChan)
}
for i := 0; i < iteration; i++ {
select {
case data := <-resultChan:
fmt.Println(data)
case <-timeOutChan:
fmt.Println("timed out")
}
}
}
func longRunningTaks(tc chan struct{}, rc chan string) {
timer := time.NewTimer(time.Nanosecond * 1)
defer timer.Stop()
// Heavy load task
time.Sleep(time.Second * 1)
select {
case <-timer.C:
tc <- struct{}{}
case rc <- "success":
return
}
}
I believe every tries should be printing out
timeout
timeout
timeout
timeout
timeout
Instead I got an intermittent
success
timeout
timeout
timeout
timeout
The doc mentions:
NewTimer creates a new Timer that will send the current time on its
channel after at least duration d.
"at least means" timer will take specified time for sure, however this also implicitly means can take more time than specified. Timer starts its own go routine and write to channel on expiry.
Because of scheduler or garbage collection or processes of writing to other channel can get delayed. Besides simulated work load is very short considering above possibilities.
Update:
As Peter mentioned in comment writing "success" to rc channel is action which is equally likely to complete because that can be read from the other end by main routine. The select has to choose between 1) writing "success" to rc channel & 2) expired timer. And both are possible.
The likelihood of No1 is more in the beginning because the main routine is yet to read it from other end. Once that happens. Other remaining routines will have to compete for the channel (to write "success") (since it is blocking with buffer size 0) so for rest of the times the likelihood of expired timer getting selected is more as cannot say how fast the main routine will read from the resultChan channel (other end of rc).

Go project's main goroutine sleep forever?

Is there any API to let the main goroutine sleep forever?
In other words, I want my project always run except when I stop it.
"Sleeping"
You can use numerous constructs that block forever without "eating" up your CPU.
For example a select without any case (and no default):
select{}
Or receiving from a channel where nobody sends anything:
<-make(chan int)
Or receiving from a nil channel also blocks forever:
<-(chan int)(nil)
Or sending on a nil channel also blocks forever:
(chan int)(nil) <- 0
Or locking an already locked sync.Mutex:
mu := sync.Mutex{}
mu.Lock()
mu.Lock()
Quitting
If you do want to provide a way to quit, a simple channel can do it. Provide a quit channel, and receive from it. When you want to quit, close the quit channel as "a receive operation on a closed channel can always proceed immediately, yielding the element type's zero value after any previously sent values have been received".
var quit = make(chan struct{})
func main() {
// Startup code...
// Then blocking (waiting for quit signal):
<-quit
}
// And in another goroutine if you want to quit:
close(quit)
Note that issuing a close(quit) may terminate your app at any time. Quoting from Spec: Program execution:
Program execution begins by initializing the main package and then invoking the function main. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete.
When close(quit) is executed, the last statement of our main() function can proceed which means the main goroutine can return, so the program exits.
Sleeping without blocking
The above constructs block the goroutine, so if you don't have other goroutines running, that will cause a deadlock.
If you don't want to block the main goroutine but you just don't want it to end, you may use a time.Sleep() with a sufficiently large duration. The max duration value is
const maxDuration time.Duration = 1<<63 - 1
which is approximately 292 years.
time.Sleep(time.Duration(1<<63 - 1))
If you fear your app will run longer than 292 years, put the above sleep in an endless loop:
for {
time.Sleep(time.Duration(1<<63 - 1))
}
It depends on use cases to choose what kind of sleep you want.
#icza provides a good and simple solution for literally sleeping forever, but I want to give you some more sweets if you want your system could shutdown gracefully.
You could do something like this:
func mainloop() {
exitSignal := make(chan os.Signal)
signal.Notify(exitSignal, syscall.SIGINT, syscall.SIGTERM)
<-exitSignal
systemTeardown()
}
And in your main:
func main() {
systemStart()
mainloop()
}
In this way, you could not only ask your main to sleep forever, but you could do some graceful shutdown stuff after your code receives INT or TERM signal from OS, like ctrl+C or kill.
Another solution to block a goroutine. This solution prevents Go-Runtime to complain about the deadlock:
import "time"
func main() {
for {
time.Sleep(1138800 * time.Hour)
}
}

Need help to understand this weird bahaviour of go routines

I've following code using go routines:
package main
import (
"fmt"
"time"
)
func thread_1(i int) {
time.Sleep(time.Second * 1)
fmt.Println("thread_1: i: ",i)
}
func thread_2(i int) {
time.Sleep(time.Second * 1)
fmt.Println("thread_2: i: ",i)
}
func main() {
for i := 0; i < 100; i++ {
go thread_1(i)
go thread_2(i)
}
var input string
fmt.Scanln(&input)
}
I was expecting each go routine would wait for a second and then print its value of i.
However, both the go routines waited for 1 second each and printed all the values of i at once.
Another question in the same context is:
I've a server-client application where server spawns a receiver go routine to receive data from client one for each client connection. Then the receiver go routine spawns a worker go routine called processor to process the data. There could be multiple receiver go routines and hence multiple processor go routines.
In such a case some receiver and processor go routines may hog indefinitely.
Please help me understand this behaviour.
You span 100 goroutines running thread_1 and 100 goroutines running thread_2 in one big batch. Each of these 200 goroutines sleeps for one second and then prints and ends. So yes, the behavior is to be expected: 200 goroutines sleeping each 1 second in parallel and then 200 goroutines printing in parallel.
(And I do not understand your second question)

Resources