In a Go case statement featuring channels, where does the blocking happen? - go

I am a Go rookie.
I'm looking at this construct:
for {
select {
case <-resyncCh:
case <-stopCh:
return
case <-cancelCh:
return
}
if r.ShouldResync == nil || r.ShouldResync() {
// do stuff
}
resyncCh = r.resyncChan()
}
I understand that the for loop runs forever.
I understand that break is implicit in Go.
I understand that channel operations in a select statement are blocking if there's no default clause (which there isn't here).
Suppose resyncCh does not have a message on it.
Are all the cases evaluated (blocked on) in parallel? Or is there another path through this I'm not seeing?
I read this as:
Block on the resyncCh, the stopCh and the cancelCh chans in parallel waiting for messages
If a message is received on resyncCh, we effectively fall through to the r.ShouldResync stuff, but the other blocks on the other chans remain.
If a message is received at any point on either the stopCh or the cancelCh chan, return, effectively "disconnecting" from all chans here.
Is that correct?

In direct answer to your questions:
Block on the resyncCh, the stopCh and the cancelCh chans in parallel waiting for messages. YES.
If a message is received on resyncCh, we effectively fall through to the r.ShouldResync stuff, but the other blocks on the other chans remain. No, they don't remain, you are past the select However, since this loops, you will block again. You could also use the fallthrough keyword to make them block after passing the initial one.
If a message is received at any point on either the stopCh or the cancelCh chan, return, effectively "disconnecting" from all chans here. Correct - they would return from this function.
Also, bear in mind what you can do with a default --> https://gobyexample.com/non-blocking-channel-operations
for {
select {
case <-resyncCh:
case <-stopCh:
return
case <-cancelCh:
return
default:
fmt.Printf("will keep printing\n")
}
if r.ShouldResync == nil || r.ShouldResync() {
// do stuff
}
resyncCh = r.resyncChan()
}
update: Another useful pattern, I'm using right now, which takes advantage of this:
select {
case m := <-c:
handle(m)
case <-time.After(5 * time.Minute):
fmt.Println("timed out")
}
Here you can wait, blocking, on a channel, but eventually timeout, just using the golang time package. Very succinct and easy to read. Compare that to poll() with timespec values.
https://golang.org/pkg/time/#After

select takes first not blocked action and goes to next operation.
Block on the resyncCh, the stopCh and the cancelCh chans in parallel waiting for messages
Yes, waiting for first of them.
If a message is received on resyncCh, we effectively fall through to the r.ShouldResync stuff, but the other blocks on the other chans remain.
Unlike in some other languages fallthrough is explicit in go - you should state it.
If a message is received at any point on either the stopCh or the cancelCh chan, return, effectively "disconnecting" from all chans here.
Exit from the function where the code located. Yes, we do not wait for new messages more.

Related

Select array of buffered channels respecting order [duplicate]

I have the following piece of code:
func sendRegularHeartbeats(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
case <-time.After(1 * time.Second):
sendHeartbeat()
}
}
}
This function is executed in a dedicated go-routine and sends a heartbeat-message every second. The whole process should stop immediately when the context is canceled.
Now consider the following scenario:
ctx, cancel := context.WithCancel(context.Background())
cancel()
go sendRegularHeartbeats(ctx)
This starts the heartbeat-routine with a closed context. In such a case, I don't want any heartbeats to be transmitted. So the first case block in the select should be entered immediately.
However, it seems that the order in which case blocks are evaluated is not guaranteed, and that the code sometimes sends a heartbeat message, even though the context is already canceled.
What is the correct way to implement such a behaviour?
I could add a "isContextclosed"-check in the second case, but that looks more like an ugly workaround for the problem.
The accepted answer has a wrong suggestion:
func sendRegularHeartbeats(ctx context.Context) {
ticker := time.NewTicker(time.Second)
defer ticker.Stop()
for {
//first select
select {
case <-ctx.Done():
return
default:
}
//second select
select {
case <-ctx.Done():
return
case <-ticker.C:
sendHeartbeat()
}
}
}
This doesn't help, because of the following scenario:
both channels are empty
first select runs
both channels get a message concurrently
you are in the same probability game as if you haven't done anything in the first select
An alternative but still imperfect way is to guard against concurrent Done() events (the "wrong select") after consuming the ticker event i.e.
func sendRegularHeartbeats(ctx context.Context) {
ticker := time.NewTicker(time.Second)
defer ticker.Stop()
for {
//select as usual
select {
case <-ctx.Done():
return
case <-ticker.C:
//give priority to a possible concurrent Done() event non-blocking way
select {
case <-ctx.Done():
return
default:
}
sendHeartbeat()
}
}
}
Caveat: the problem with this one is that it allows for "close enough" events to be confused - e.g. even though a ticker event arrived earlier, the Done event came soon enough to preempt the heartbeat. There is no perfect solution as of now.
Note beforehand:
Your example will work as you intend it to, as if the context is already cancelled when sendRegularHeartbeats() is called, the case <-ctx.Done() communication will be the only one ready to proceed and therefore chosen. The other case <-time.After(1 * time.Second) will only be ready to proceed after 1 second, so it will not be chosen at first. But to explicitly handle priorities when multiple cases might be ready, read on.
Unlike the case branches of a switch statement (where the evaluation order is the order they are listed), there is no priority or any order guaranteed in the case branches of a select statement.
Quoting from Spec: Select statements:
If one or more of the communications can proceed, a single one that can proceed is chosen via a uniform pseudo-random selection. Otherwise, if there is a default case, that case is chosen. If there is no default case, the "select" statement blocks until at least one of the communications can proceed.
If more communications can proceed, one is chosen randomly. Period.
If you want to maintain priority, you have to do that yourself (manually). You may do it using multiple select statements (subsequent, not nested), listing ones with higher priority in an earlier select, also be sure to add a default branch to avoid blocking if those are not ready to proceed. Your example requires 2 select statements, first one checking <-ctx.Done() as that is the one you want higher priority for.
I also recommend using a single time.Ticker instead of calling time.After() in each iteration (time.After() also uses a time.Ticker under the hood, but it doesn't reuse it just "throws it away" and creates a new one on the next call).
Here's an example implementation:
func sendRegularHeartbeats(ctx context.Context) {
ticker := time.NewTicker(time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
default:
}
select {
case <-ctx.Done():
return
case <-ticker.C:
sendHeartbeat()
}
}
}
This will send no heartbeat if the context is already cancelled when sendRegularHeartbeats() is called, as you can check / verify it on the Go Playground.
If you delay the cancel() call for 2.5 seconds, then exactly 2 heartbeats will be sent:
ctx, cancel := context.WithCancel(context.Background())
go sendRegularHeartbeats(ctx)
time.Sleep(time.Millisecond * 2500)
cancel()
time.Sleep(time.Second * 2)
Try this one on the Go Playground.
If it is absolutely critical to maintain that priority of operations, you can:
Consume from each channel in a separate goroutine
Have each of those goroutines write a message to a shared third channel indicating its type
Have a third goroutine consume from that channel, reading the messages it receives to determine if it is a tick and should sendHeartbeat or if it is a cancel and it should exit
This way, messages received on the other channels will (probably - you can't guarantee order of execution of concurrent routines) come in on the third channel in the order they're triggered, allowing you to handle them appropriately.
However, it's worth noting that this is probably not necessary. A select does not guarantee which case will execute if multiple cases succeed simultaneously. That is probably a rare event; the cancel and ticker would both have to fire before either was handled by the select. The vast majority of the time, only one or the other will fire at any given loop iteration, so it will behave exactly as expected. If you can tolerate rare occurrences of firing one additional heartbeat after a cancellation, you're better off keeping the code you have, as it is more efficient and more readable.

Timer example using timer.Reset() not working as described

I've been working with examples trying to get my first "go routine" running and while I got it running, it won't work as prescribed by the go documentation with the timer.Reset() function.
In my case I believe that the way I am doing it is just fine because I don't actually care what's in the chan buffer, if anything. All as this is meant to do is trigger case <-tmr.C: if anything happened on case _, ok := <-watcher.Events: and then all goes quiet for at least one second. The reason for this is that case _, ok := <-watcher.Events: can get from one to dozens of events microseconds apart and I only care once they are all done and things have settled down again.
However I'm concerned that doing it the way that the documentation says you "must do" doesn't work. If I knew go better I would say the documentation is flawed because it assumes there is something in the buffer when there may not be but I don't know go well enough to have confidence in making that determination so I'm hoping some experts out there can enlighten me.
Below is the code. I haven't put this up on playground because I would have to do some cleaning up (remove calls to other parts of the program) and I'm not sure how I would make it react to filesystem changes for showing it working.
I've clearly marked in the code which alternative works and which doesn't.
func (pm *PluginManager) LoadAndWatchPlugins() error {
// DOING OTHER STUFF HERE
fmt.Println(`m1`)
done := make(chan interface{})
terminated := make(chan interface{})
go pm.watchDir(done, terminated, nil)
fmt.Println(`m2.pre-10`)
time.Sleep(10 * time.Second)
fmt.Println(`m3-post-10`)
go pm.cancelWatchDir(done)
fmt.Println(`m4`)
<-terminated
fmt.Println(`m5`)
os.Exit(0) // Temporary for testing
return Err
}
func (pm *PluginManager) cancelWatchDir(done chan interface{}) {
fmt.Println(`t1`)
time.Sleep(5 * time.Second)
fmt.Println()
fmt.Println(`t2`)
close(done)
}
func (pm *PluginManager) watchDir(done <-chan interface{}, terminated chan interface{}, strings <-chan string) {
watcher, err := fsnotify.NewWatcher()
if err != nil {
Logger("watchDir::"+err.Error(), `plugins`, Error)
}
//err = watcher.Add(pm.pluginDir)
err = watcher.Add(`/srv/plugins/`)
if err != nil {
Logger("watchDir::"+err.Error(), `plugins`, Error)
}
var tmr = time.NewTimer(time.Second)
tmr.Stop()
defer close(terminated)
defer watcher.Close()
defer tmr.Stop()
for {
select {
case <-tmr.C:
fmt.Println(`UPDATE FIRED`)
tmr.Stop()
case _, ok := <-watcher.Events:
if !ok {
return
}
fmt.Println(`Ticker: STOP`)
/*
* START OF ALTERNATIVES
*
* THIS IS BY EXAMPLE AND STATED THAT IT "MUST BE" AT:
* https://golang.org/pkg/time/#Timer.Reset
*
* BUT DOESN'T WORK
*/
if !tmr.Stop() {
fmt.Println(`Ticker: CHAN DRAIN`)
<-tmr.C // STOPS HERE AND GOES NO FURTHER
}
/*
* BUT IF I JUST DO THIS IT WORKS
*/
tmr.Stop()
/*
* END OF ALTERNATIVES
*/
fmt.Println(`Ticker: RESET`)
tmr.Reset(time.Second)
case <-done:
fmt.Println(`DONE TRIGGERED`)
return
}
}
}
Besides what icza said (q.v.), note that the documentation says:
For example, assuming the program has not received from t.C already:
if !t.Stop() {
<-t.C
}
This cannot be done concurrent to other receives from the Timer's channel.
One could argue that this is not a great example since it assumes that the timer was running at the time you called t.Stop. But it does go on to mention that this is a bad idea if there's already some existing goroutine that is or may be reading from t.C.
(The Reset documentation repeats all of this, and kind of in the wrong order because Reset sorts before Stop.)
Essentially, the whole area is a bit fraught. There's no good general answer, because there are at least three possible situations during the return from t.Stop back to your call:
No one is listening to the channel, and no timer-tick is in the channel now. This is often the case if the timer was already stopped before the call to t.Stop. If the timer was already stopped, t.Stop always returns false.
No one is listening to the channel, and a timer-tick is in the channel now. This is always the case when the timer was running but t.Stop was unable to stop it from firing. In this case, t.Stop returns false. It's also the case when the timer was running but fired before you even called t.Stop, and had therefore stopped on its own, so that t.Stop was not able to stop it and returned false.
Someone else is listening to the channel.
In the last situation, you should do nothing. In the first situation, you should do nothing. In the second situation, you probably want to receive from the channel so as to clear it out. That's what their example is for.
One could argue that:
if !t.Stop() {
select {
case <-t.C:
default:
}
}
is a better example. It does one non-blocking attempt that will consume the timer-tick if present, and does nothing if there is no timer-tick. This works whether or not the timer was not actually running when you called t.Stop. Indeed, it even works if t.Stop returns true, though in that case, t.Stop stopped the timer, so the timer never managed to put a timer-tick into the channel. (Thus, if there is a datum in the channel, it must necessarily be left over from a previous failure to clear the channel. If there are no such bugs, the attempt to receive was in turn unnecessary.)
But, if someone else—some other goroutine—is or may be reading the channel, you should not do any of this at all. There is no way to know who (you or them) will get any timer tick that might be in the channel despite the call to Stop.
Meanwhile, if you're not going to use the timer any further, it's relatively harmless just to leave a timer-tick, if there is one, in the channel. It will be garbage-collected when the channel itself is garbage-collected. Of course, whether this is sensible depends on what you are doing with the timer, but in these cases it suffices to just call t.Stop and ignore its return value.
You create a timer and you stop it immediately:
var tmr = time.NewTimer(time.Second)
tmr.Stop()
This doesn't make any sense, I assume this is just an "accident" from your part.
But going further, inside your loop:
case _, ok := <-watcher.Events:
When this happens, you claim this doesn't work:
if !tmr.Stop() {
fmt.Println(`Ticker: CHAN DRAIN`)
<-tmr.C // STOPS HERE AND GOES NO FURTHER
}
Timer.Stop() documents that it returns true if this call stops the timer, and false if the timer has already been stopped (or expired). But your timer was already stopped, right after its creation, so tmr.Stop() returns false properly, so you go inside the if and try to receive from tmr.C, but since the timer was "long" stopped, nothing will be sent on its channel, so this is a blocking (forever) operation.
If you're the one stopping the timer explicitly with timer.Stop(), the recommended "pattern" to drain its channel doesn't make any sense and doesn't work for the 2nd Timer.Stop() call.

Go Timer Deadlock on Stop

I am trying to reuse timers by stopping and resetting them. I am following the pattern provided by the documentation. Here is a simple example which can be run in go playground that demonstrates the issue I am experiencing.
Is there a correct way to stop and reset a timer that doesn't involve deadlock or race conditions? I am aware that using a select with default involves a race condition on channel message delivery timing and cannot be depended on.
package main
import (
"fmt"
"time"
"sync"
)
func main() {
fmt.Println("Hello, playground")
timer := time.NewTimer(1 * time.Second)
wg := &sync.WaitGroup{}
wg.Add(1)
go func(_wg *sync.WaitGroup) {
<- timer.C
fmt.Println("Timer done")
_wg.Done()
}(wg)
wg.Wait()
fmt.Println("Checking timer")
if !timer.Stop() {
<- timer.C
}
fmt.Println("Done")
}
According to the timer.Stop docs, there is a caveat for draining the channel:
assuming the program has not received from t.C already ...
This cannot be done concurrent to other receives from the Timer's
channel.
Since the channel has already been drained - and will never fire again, the second <-timer.C will block forever.
The question asks, in the first place, why the timer hangs. That's a good question, because even in the absence of bugs in the user program, there is... at least some weird ambiguity in how this thing, called time.Timer, works in Go. The spec says, specifically this:
Stop prevents the Timer from firing. It returns true if the call stops
the timer, false if the timer has already expired or been stopped.
Stop does not close the channel, to prevent a read from the channel
succeeding incorrectly.
To ensure the channel is empty after a call to Stop, check the return
value and drain the channel. For example, assuming the program has not
received from t.C already:
if !t.Stop() {
<-t.C
}
This cannot be done concurrent to other receives from the Timer's
channel or other calls to the Timer's Stop method.
There are very short and precise words, but it may be not that easy to understand them (at least for me). I tried to use the Timer repeatedly in a piece of code, and reset it each time before the next use. Each time I do so, I may want to Stop() it - just for sure. The spec above implies how you should do that, and provides an example - and it may not work! It depends, it depends where you try to apply the Stop idiom. In case you do it after you already in a select-case on this very timer, then it will hang the program.
Specifically, I do not have any concurrent receivers, only a single goroutine. So let's make a simple test program, and try to experiment with it (https://play.golang.org/p/d7BlNReE9Jz):
package main
import (
"fmt"
"time"
)
func main() {
i := 0
d2s := time.Second * 1
i++; fmt.Println(i)
t := time.NewTimer(d2s)
<-t.C
i++; fmt.Println(i)
t.Reset(d2s)
<-t.C
i++; fmt.Println(i)
// if !t.Stop() { <-t.C }
// if !t.Stop() { select { case <-t.C: default: } }
t.Reset(d2s)
<-t.C
i++; fmt.Println(i)
}
This code WORKS. It prints 1,2,3,4, delayed by 1 sec, and that's what it is expected to print. So far so good.
Now, try to un-comment the first commented line. Now the thing: according to spec, it is 100% right (is it?), and must work, but it does not, and hangs. Why? Because, according to spec, it must hang! I already read the channel, and the timer is stopped, so the if fires, and the channel drain op hangs.
Is this a bug? No. Is the spec wrong? No, it's correct. But, it's contrary to what a typical timer user would want. (Maybe a subject for proposal to Go?). All we need, is something like:
t.SafeStopDrain()
Which would do this right, and never hang. But, sadly, it is non-existent.
Here's the life-hack, the workaround, to make this work, is the second commented line. Un-comment it, and that will both work, and do what you wanted - make sure the timer is stopped, channel drained, and the whole thing is fresh anew for re-use.

Is there a resource leak here?

func First(query string, replicas ...Search) Result {
c := make(chan Result)
searchReplica := func(i int) {
c <- replicas[i](query)
}
for i := range replicas {
go searchReplica(i)
}
return <-c
}
This function is from the slides of Rob Pike on go concurrency patterns in 2012. I think there is a resource leak in this function. As the function return after the first send & receive pair happens on channel c, the other go routines try to send on channel c. So there is a resource leak here. Anyone knows golang well can confirm this? And how can I detect this leak using what kind of golang tooling?
Yes, you are right (for reference, here's the link to the slide). In the above code only one launched goroutine will terminate, the rest will hang on attempting to send on channel c.
Detailing:
c is an unbuffered channel
there is only a single receive operation, in the return statement
A new goroutine is launched for each element of replicas
each launched goroutine sends a value on channel c
since there is only 1 receive from it, one goroutine will be able to send a value on it, the rest will block forever
Note that depending on the number of elements of replicas (which is len(replicas)):
if it's 0: First() would block forever (no one sends anything on c)
if it's 1: would work as expected
if it's > 1: then it leaks resources
The following modified version will not leak goroutines, by using a non-blocking send (with the help of select with default branch):
searchReplica := func(i int) {
select {
case c <- replicas[i](query):
default:
}
}
The first goroutine ready with the result will send it on channel c which will be received by the goroutine running First(), in the return statement. All other goroutines when they have the result will attempt to send on the channel, and "seeing" that it's not ready (send would block because nobody is ready to receive from it), the default branch will be chosen, and thus the goroutine will end normally.
Another way to fix it would be to use a buffered channel:
c := make(chan Result, len(replicas))
And this way the send operations would not block. And of course only one (the first sent) value will be received from the channel and returned.
Note that the solution with any of the above fixes would still block if len(replicas) is 0. To avoid that, First() should check this explicitly, e.g.:
func First(query string, replicas ...Search) Result {
if len(replicas) == 0 {
return Result{}
}
// ...rest of the code...
}
Some tools / resources to detect leaks:
https://github.com/fortytw2/leaktest
https://github.com/zimmski/go-leak
https://medium.com/golangspec/goroutine-leak-400063aef468
https://blog.minio.io/debugging-go-routine-leaks-a1220142d32c

golang channel can't consume or publish

In my code below,just part of the whole code.I init a channel, the channel can't consume or publish.I don't konw what make this happen.
//init at the beginning of program
var stopSvr chan bool
stopSvr=make(chan bool)
var stopSvrDone chan bool
stopSvrDone=make(chan bool)
//somewhere use,in a goroutine
select{
case <-stopSvr:
stopSvrDone<-true
fmt.Println("son svr exit")
default:
//do its job
}
//somewhere use,in a goroutine
stopSvr <- true //block here
<-stopSvrDone
fmt.Println("svr exit")
//here to do other things,but it's blocked at "stopSvr<-true",
//what condition could make this happen?
conclusion:
channel's block and unblock,I didn't know clearly.
select{} expr keyword 'default',I didn't know clearly.
that's why my program didn't run.
thanks #jimt ,I finish the problem.
I am unsure what you are trying to achieve. But your example code is guaranteed to block on the select statement.
The default case for a select is used to provide a fallback when either a specific read or write on a channel does not succeed. This means that in your code, the default case is always executed. No value is ever written into the channel before the select begins, thus the case statement is never run.
The code in the default case will never succeed and block indefinitely, because there is no space in the channel to store the value and nobody else is reading from it in any other goroutines.
A simple solution to your immediate problem would be:
stopSvr=make(chan bool, 1) // 1 slot buffer to store a value
However, without understanding what you want to achieve, I can't guarantee that this will solve all your problems.

Resources