In the following scenario a network entity always waits for a TimeOutTime seconds before doing a particular task X. Assume this time as TimerT. During this wait of TimeOutTime seconds if the entity receives a set of external messages, it should reset the same TimerT to TimeOutTime again. If no external messages are received the expected behaviour is as follows:
Timer Expired
Do task X
Reset the Timer again to TimeOutTime
(by reset I mean, stop the timer and start over again)
To simulate the scenario I wrote the following code in Go.
package main
import (
"log"
"math/rand"
"sync"
"time"
)
const TimeOutTime = 3
const MeanArrivalTime = 4
func main() {
rand.Seed(time.Now().UTC().UnixNano())
var wg sync.WaitGroup
t := time.NewTimer(time.Second * time.Duration(TimeOutTime))
wg.Add(1)
// go routine for doing timeout event
go func() {
defer wg.Done()
for {
t1 := time.Now()
<-t.C
t2 := time.Now()
// Do.. task X .. on timeout...
log.Println("Timeout after ", t2.Sub(t1))
t.Reset(time.Second * time.Duration(TimeOutTime))
}
}()
// go routine to simulate incoming messages ...
// second go routine
go func() {
for {
// simulates a incoming message at any time
time.Sleep(time.Second * time.Duration(rand.Intn(MeanArrivalTime)))
// once any message is received reset the timer to TimeOutTime seconds again
t.Reset(time.Second * time.Duration(TimeOutTime))
}
}()
wg.Wait()
}
After running this program using -race flag it shows DATA_RACE:
==================
WARNING: DATA RACE
Write at 0x00c0000c2068 by goroutine 8:
time.(*Timer).Reset()
/usr/local/go/src/time/sleep.go:125 +0x98
main.main.func1()
/home/deka/Academic/go/src/main/test.go:29 +0x18f
Previous write at 0x00c0000c2068 by goroutine 9:
time.(*Timer).Reset()
/usr/local/go/src/time/sleep.go:125 +0x98
main.main.func2()
/home/deka/Academic/go/src/main/test.go:42 +0x80
Goroutine 8 (running) created at:
main.main()
/home/deka/Academic/go/src/main/test.go:20 +0x1d3
Goroutine 9 (running) created at:
main.main()
/home/deka/Academic/go/src/main/test.go:35 +0x1f5
==================
Then I used a Mutex to wrap the Reset() call inside the Mutex.
package main
import (
"log"
"math/rand"
"sync"
"time"
)
const TimeOutTime = 3
const MeanArrivalTime = 4
func main() {
rand.Seed(time.Now().UTC().UnixNano())
var wg sync.WaitGroup
t := time.NewTimer(time.Second * time.Duration(TimeOutTime))
wg.Add(1)
var mu sync.Mutex
// go routine for doing timeout event
go func() {
defer wg.Done()
for {
t1 := time.Now()
<-t.C
t2 := time.Now()
// Do.. task X .. on timeout...
log.Println("Timeout after ", t2.Sub(t1))
mu.Lock()
t.Reset(time.Second * time.Duration(TimeOutTime))
mu.Unlock()
}
}()
// go routine to simulate incoming messages ...
// second go routine
go func() {
for {
// simulates a incoming message at any time
time.Sleep(time.Second * time.Duration(rand.Intn(MeanArrivalTime)))
// once any message is received reset the timer to TimeOutTime seconds again
mu.Lock()
t.Reset(time.Second * time.Duration(TimeOutTime))
mu.Unlock()
}
}()
wg.Wait()
}
After this code seems to work fine based on the following observation.
If I replace the line
time.Sleep(time.Second * time.Duration(rand.Intn(MeanArrivalTime)))
in the second go routine with a constant time of sleep of 4 seconds and the TimeOutTime is constant at 3 seconds.
Output of the program is:
2020/02/29 20:10:11 Timeout after 3.000160828s
2020/02/29 20:10:15 Timeout after 4.000444017s
2020/02/29 20:10:19 Timeout after 4.000454657s
2020/02/29 20:10:23 Timeout after 4.000304877s
In the above execution, 2nd go routine is resetting the active timer after the timer has spent initial one second. Because of which, the timer is getting expired after 4 seconds from the second print onward.
Now when I checked the documentation of Reset() I found the following:
// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Reset changes the timer to expire after duration d.
// It returns true if the timer had been active, false if the timer had
// expired or been stopped.
//
// Reset should be invoked only on stopped or expired timers with drained channels.
// If a program has already received a value from t.C, the timer is known
// to have expired and the channel drained, so t.Reset can be used directly.
// If a program has not yet received a value from t.C, however,
// the timer must be stopped and—if Stop reports that the timer expired
// before being stopped—the channel explicitly drained:
//
// if !t.Stop() {
// <-t.C
// }
// t.Reset(d)
//
// This should not be done concurrent to other receives from the Timer's
// channel.
//
// Note that it is not possible to use Reset's return value correctly, as there
// is a race condition between draining the channel and the new timer expiring.
// Reset should always be invoked on stopped or expired channels, as described above.
// The return value exists to preserve compatibility with existing programs.
I found this diagram: (link : https://blogtitle.github.io/go-advanced-concurrency-patterns-part-2-timers/)
With the digram in mind, it seems that I need to use,
if !t.Stop() {
<-t.C
}
t.Reset(d)
in the 2nd go routine. In this case I also need to do proper locking in both the go routine to avoid infinite wait on channel.
I don't understand the scenario under which the t.Stop() + draining of the channel (<-t.C) should be performed. In which case it is required ? In my example I don't use channel read values. Can I call Reset() without calling Stop() ?
I simplified the code using time.After function:
package main
import (
"log"
"math/rand"
"time"
)
const TimeOutTime = 3
const MeanArrivalTime = 4
func main() {
const interval = time.Second * TimeOutTime
// channel for incoming messages
var incomeCh = make(chan struct{})
go func() {
for {
// On each iteration new timer is created
select {
case <-time.After(interval):
time.Sleep(time.Second)
log.Println("Do task")
case <-incomeCh:
log.Println("Handle income message and move to the next iteration")
}
}
}()
go func() {
for {
time.Sleep(time.Duration(rand.Intn(MeanArrivalTime)) * time.Second)
// generate incoming message
incomeCh <- struct{}{}
}
}()
// prevent main to stop for a while
<-time.After(10 * time.Second)
}
Note that:
After waits for the duration to elapse and then sends the current time
on the returned channel.
It is equivalent to NewTimer(d).C.
The underlying Timer is not recovered by the garbage collector
until the timer fires. If efficiency is a concern, use NewTimer
instead and call Timer.Stop if the timer is no longer needed.
Assume you have:
t.Stop()
t.Reset()
If the timer is stopped and drained before calling Stop, this works fine. The problem manifests itself if Stop stops the timer and timer ticks at the same time. Then you may end up with a stopped timer with a goroutine waiting to write to the t.C channel. So Stop returns false if there is still a goroutine waiting to write to t.C, and you have to read from it. Otherwise, you'll have that goroutine waiting there indefinitely.
So, as you already observed, you have to do:
if !t.Stop() {
<-t.C
}
t.Reset(d)
However, even with that, I think your solution is flawed because of the use of asynchronous resets. Instead, try using a new timer for each simulated event.
You might consider a different overall design.
Suppose for instance that we write a routine or interface called Deadliner—it could become its own package if you like, or just be an interface, and we'll see a pretty strong resemblance to something Go already has—whose job / contract is described this way:
The user of a Deadliner creates a Deadline whenever they like.
The Deadliner waits until the deadline occurs, then flags the deadline as having occurred.
A Deadliner can be canceled by any Go routine any time. This flags the deadliner as canceled, so that anyone waiting on it will stop waiting, and can tell that the reason they stopped waiting was "canceled" (not "expired"). It helps clean up resources for gc as well, in case you create a lot of Deadliners and then discard them before their timeout fires.
Now in your top level, before you start waiting for a message, you simply set up a deadline. This isn't a timer (even if it may use one internally), it's just a Deadliner instance. Then you wait for one of two events:
d, cancel = newDeadline(when)
for {
select {
case <-d.Done():
// Deadline expired.
// ... handle it ...
d, cancel = newDeadline(when) // if/as appropriate
case m := <-msgC:
// got message - cancel existing deadline and get new one
cancel()
d, cancel = newDeadline(when)
// ... handle the message
}
}
Now we just note that Go already has this: it's in package context. d is a context; newDeadline is context.WithDeadline or context.WithTimeout (depending on whether you want to compute the deadline time yourself, or have the timeout code add a duration to "now").
There is no need to fiddle with timers and time-tick channels and no need to spin off your own separate goroutines.
If the deadline doesn't reset on a single message, but rather on a particular combination of messages, you just write that in your case <-msgChan section. If messages aren't currently received via channels, make that happen by putting messages into a channel, so that you can use this very simple wait-for-deadline-or-message pattern.
Related
I am implementing a very simple concurrent program in Go. There are 2 channels todo and done that are used for signaling which task is done. There are 5 routines that are executed and each one require its own time to complete. I would like to see every 100ms the status of what is happening.
However I tried but the polling branch case <-time.After(100 * time.Millisecond): seems that is never been called. It is called sometimes (not in a consisted way) if I reduce the time to something less than 100ms.
My understanding is that go func executes the method in a separate Go scheduler thread. I do not understand therefore why the case of the polling is never hit. I tried to move the specific case branch before/after the other but nothing changed.
Any suggestions?
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
func concurrent(id int, done chan int, todo chan int) {
for {
// doing a task
t := randInt(50, 100)
time.Sleep(time.Duration(t) * time.Millisecond)
done <- id
// redo again this task
t = randInt(50, 100)
time.Sleep(time.Duration(t) * time.Millisecond)
todo <- id
}
}
func randInt(min int, max int) int {
return (min + rand.Intn(max-min))
}
func seedRandom() {
rand.Seed(time.Now().UTC().UnixNano())
}
func main() {
seedRandom()
todo := make(chan int, 5)
done := make(chan int, 5)
for i := 0; i < 5; i++ {
todo <- i
}
timeout := make(chan bool)
go func() {
time.Sleep(1 * time.Second)
timeout <- true
}()
var mu sync.Mutex
var output []int
loop:
for {
select {
case <-time.After(100 * time.Millisecond):
//this branch is never hit?
fmt.Printf("\nPolling status: %v\n", output)
case <-timeout:
fmt.Printf("\nDing ding, time is up!\n")
break loop
case id := <-done:
mu.Lock()
output = append(output, id)
fmt.Printf(".")
mu.Unlock()
case id := <-todo:
go concurrent(id, done, todo)
}
}
}
Update After following the answers I created this version in Go Playgound: https://play.golang.org/p/f08t984BdPt. That works as expected
you are creating 5 goroutines (func concurrent) and in your select case using the todo channel and this channel is being used in concurrent function so you end up creating a lot of goroutines
func concurrent(id int, done chan int, todo chan int) {
for {
// doing a task
t := randInt(50, 100)
time.Sleep(time.Duration(t) * time.Millisecond)
done <- id
// redo again this task
t = randInt(50, 100)
time.Sleep(time.Duration(t) * time.Millisecond)
by doing this call you are re-crating the go-routime
todo <- id
}
}
when I ran your code I got "runtime.NumGoroutine()"
"number of goRoutines still running 347"
as you are implementing the time.After(100 * time.Millisecond) inside the for loop it gets reset every time some other case gets hit and in your case
case id := <-todo: && id := <-done: will always get hit within 100 Milliseconds that's why you didn't get the expected output (from how your code is now i would say that the number of go-routines would increase exponentially and each of em would be waiting to send value to done and few on todo channel so your loop wont get enough time(100 ms) to wait on time.After)
loop:
for {
select {
case <-time.After(100 * time.Millisecond): ->this will always get reset ( we can use time.Ticker as it will create a single object that will signal for each and every 100ms https://golang.org/pkg/time/#NewTicker
//this branch is never hit?
fmt.Printf("\nPolling status: %v\n", output)
case <-timeout:
fmt.Printf("\nDing ding, time is up!\n")
break loop
case id := <-done: -> **this will get called**
//the mutex call is actually not very usefull as this only get called once per loop and is prefectly thread safe in this code
mu.Lock()
output = append(output, id)
fmt.Printf(".")
mu.Unlock()
case id := <-todo: -> **this will get called**
go concurrent(id, done, todo)
}
}
}
https://play.golang.org/p/SmlSIUIF5jn -> I have made some modifications to make your code work as expected..
try referring this to get a better understanding of golang channels and goroutine
https://tour.golang.org/concurrency/1
time.After(100*time.Millisecond) creates a brand new channel, with a brand new timer, which starts at the moment that function is called.
So, in your loop :
for {
select {
// this statement resets the 100ms timer each time you execute the loop :
case <-time.After(100*time.Millisecond):
...
Your branch never gets hit because, with 5 goroutines sending signals within less than 100ms on one of the other cases, this time.After(100ms) never reaches completion.
You need to choose a way to keep the same timer between iterations.
Here is one way to adapt your time.After(...) call :
// store the timer in a variable *outside* the loop :
statusTimer := time.After(100*time.Millisecond)
for {
select {
case <-statusTimer:
fmt.Printf("\nPolling status: %v\n", output)
// reset the timer :
statusTimer = time.After(100*time.Millisecond)
case <-timeout:
...
Another way is, as #blackgreen suggests, to use a time.Ticker :
statusTicker := time.NewTicker(100*time.Millisecond)
for {
select {
case <-statusTicker.C:
fmt.Printf("\nPolling status: %v\n", output)
case <-timeout:
...
side notes
a. if the output slice is not shared with other goroutines, you don't need a mutex around its access :
for {
select {
case <-statusTicker.C:
fmt.Printf("\nPolling status: %v\n", output)
...
case i <-done:
// no race condition here : all happens within the same goroutine,
// the 'select' statement makes sure that 'case's are executed
// one at a time
output = append(output, id)
fmt.Printf(".")
b. For your timeout channel :
Another generic way to "signal" that some event occurred with a channel is to close the channel instead of sending a value on it :
// if you don't actually care about the value you send over this channel :
// you can make it unbuffered, and use the empty 'struct{}' type
timeout := make(chan struct{})
go func(){
// wait for some condition ...
<-time.After(1*time.Second)
close(timeout)
}()
select {
case <-statusTimer:
...
case <-timeout: // this branch will also be taken once timeout is closed
fmt.Printf("\nDing ding, time is up!\n")
break loop
case ...
The bug you will avoid is the following : suppose you want to use that timeout channel in two goroutines
if you send a value over the timeout channel, only one goroutine will get signaled - it will "eat up" the value from the channel, and the other goroutine will only have a blocking channel,
if you close the channel, both goroutines will correctly "receive" the signal
In absence of a default case, when multiple cases are ready, it executes one of them at random. It's not deterministic.
To make sure the case runs, you should run it in a separate goroutine. (In that case, you must synchronize accesses to the output variable).
Moreover you say "I would like to see every 100ms", but time.After sends on the channel only once.
To execute the case periodically, use <-time.NewTicker(100 * time.Millis).C instead.
var mu sync.Mutex
var output []int
go func() {
ticker := time.NewTicker(100 * time.Millisecond)
defer ticker.Stop()
for {
select {
case <-ticker.C:
// TODO: must synchronize access
fmt.Printf("\nPolling status: %v\n", output)
case <-timeout:
return
}
}
}()
loop:
for {
select {
// other cases
}
}
I am trying to implement concurrency for repetitive task. I want to implement an http request on a different Goroutine (pictured by longRunningTask function). I provide a timer for a mechanism to stop the Goroutine and sends a timeout signal to the main Goroutine if the heavy load task proceed the predefined timeout. The problem that I currently have is that I am getting intermittent behaviour.
The code has been simplified to look like below.
package main
import (
"fmt"
"time"
)
func main() {
var iteration int = 5
timeOutChan := make(chan struct{})
resultChan := make(chan string)
for i := 0; i < iteration; i++ {
go longRunningTaks(timeOutChan, resultChan)
}
for i := 0; i < iteration; i++ {
select {
case data := <-resultChan:
fmt.Println(data)
case <-timeOutChan:
fmt.Println("timed out")
}
}
}
func longRunningTaks(tc chan struct{}, rc chan string) {
timer := time.NewTimer(time.Nanosecond * 1)
defer timer.Stop()
// Heavy load task
time.Sleep(time.Second * 1)
select {
case <-timer.C:
tc <- struct{}{}
case rc <- "success":
return
}
}
I believe every tries should be printing out
timeout
timeout
timeout
timeout
timeout
Instead I got an intermittent
success
timeout
timeout
timeout
timeout
The doc mentions:
NewTimer creates a new Timer that will send the current time on its
channel after at least duration d.
"at least means" timer will take specified time for sure, however this also implicitly means can take more time than specified. Timer starts its own go routine and write to channel on expiry.
Because of scheduler or garbage collection or processes of writing to other channel can get delayed. Besides simulated work load is very short considering above possibilities.
Update:
As Peter mentioned in comment writing "success" to rc channel is action which is equally likely to complete because that can be read from the other end by main routine. The select has to choose between 1) writing "success" to rc channel & 2) expired timer. And both are possible.
The likelihood of No1 is more in the beginning because the main routine is yet to read it from other end. Once that happens. Other remaining routines will have to compete for the channel (to write "success") (since it is blocking with buffer size 0) so for rest of the times the likelihood of expired timer getting selected is more as cannot say how fast the main routine will read from the resultChan channel (other end of rc).
(I don't believe my issue is a duplicate of this QA: go routine blocking the others one, because I'm running Go 1.9 which has the preemptive scheduler whereas that question was asked for Go 1.2).
My Go program calls into a C library wrapped by another Go-lang library that makes a blocking call that can last over 60 seconds. I want to add a timeout so it returns in 3 seconds:
Old code with long block:
// InvokeSomething is part of a Go wrapper library that calls the C library read_something function. I cannot change this code.
func InvokeSomething() ([]Something, error) {
ret := clib.read_something(&input) // this can block for 60 seconds
if ret.Code > 1 {
return nil, CreateError(ret)
}
return ret.Something, nil
}
// This is my code I can change:
func MyCode() {
something, err := InvokeSomething()
// etc
}
My code with a go-routine, channel, and timeout, based on this Go example: https://gobyexample.com/timeouts
type somethingResult struct {
Something []Something
Err error
}
func MyCodeWithTimeout() {
ch = make(chan somethingResult, 1);
go func() {
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
However when I run MyCodeWithTimeout it still takes 60 seconds before it executes the case <-time.After(time.Second * 3) block.
I know that attempting to read from an unbuffered channel with nothing in it will block, but I created the channel with a buffered size of 1 so as far as I can tell I'm doing it correctly. I'm surprised the Go scheduler isn't preempting my goroutine, or does that depend on execution being in go-lang code and not an external native library?
Update:
I read that the Go-scheduler, at least in 2015, is actually "semi-preemptive" and it doesn't preempt OS threads that are in "external code": https://github.com/golang/go/issues/11462
you can think of the Go scheduler as being partially preemptive. It's by no means fully cooperative, since user code generally has no control over scheduling points, but it's also not able to preempt at arbitrary points
I heard that runtime.LockOSThread() might help, so I changed the function to this:
func MyCodeWithTimeout() {
ch = make(chan somethingResult, 1);
defer close(ch)
go func() {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
...however it didn't help at all and it still blocks for 60 seconds.
Your proposed solution to do thread locking in the goroutine started in MyCodeWithTimeout() does not give guarantee MyCodeWithTimeout() will return after 3 seconds, and the reason for this is that first: no guarantee that the started goroutine will get scheduled and reach the point to lock the thread to the goroutine, and second: because even if the external command or syscall gets called and returns within 3 seconds, there is no guarantee that the other goroutine running MyCodeWithTimeout() will get scheduled to receive the result.
Instead do the thread locking in MyCodeWithTimeout(), not in the goroutine it starts:
func MyCodeWithTimeout() {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
ch = make(chan somethingResult, 1);
defer close(ch)
go func() {
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
Now if MyCodeWithTimeout() execution starts, it will lock the goroutine to the OS thread, and you can be sure that this goroutine will notice the value sent on the timers value.
NOTE: This is better if you want it to return within 3 seconds, but this sill will not give guarantee, as the timer that fires (sends a value on its channel) runs in its own goroutine, and this thread locking has no effect on the scheduling of that goroutine.
If you want guarantee, you can't rely on other goroutines giving the "exit" signal, you can only rely on this happening in your goroutine running the MyCodeWithTimeout() function (because since you did thread locking, you can be sure it gets scheduled).
An "ugly" solution which spins up CPU usage for a given CPU core would be:
for end := time.Now().Add(time.Second * 3); time.Now().Before(end); {
// Do non-blocking check:
select {
case result := <-ch:
// Process result
default: // Must have default to be non-blocking
}
}
Note that the "urge" of using time.Sleep() in this loop would take away the guarantee, as time.Sleep() may use goroutines in its implementation and certainly does not guarantee to return exactly after the given duration.
Also note that if you have 8 CPU cores and runtime.GOMAXPROCS(0) returns 8 for you, and your goroutines are still "starving", this may be a temporary solution, but you still have more serious problems using Go's concurrency primitives in your app (or a lack of using them), and you should investigate and "fix" those. Locking a thread to a goroutine may even make it worse for the rest of the goroutines.
Is there any API to let the main goroutine sleep forever?
In other words, I want my project always run except when I stop it.
"Sleeping"
You can use numerous constructs that block forever without "eating" up your CPU.
For example a select without any case (and no default):
select{}
Or receiving from a channel where nobody sends anything:
<-make(chan int)
Or receiving from a nil channel also blocks forever:
<-(chan int)(nil)
Or sending on a nil channel also blocks forever:
(chan int)(nil) <- 0
Or locking an already locked sync.Mutex:
mu := sync.Mutex{}
mu.Lock()
mu.Lock()
Quitting
If you do want to provide a way to quit, a simple channel can do it. Provide a quit channel, and receive from it. When you want to quit, close the quit channel as "a receive operation on a closed channel can always proceed immediately, yielding the element type's zero value after any previously sent values have been received".
var quit = make(chan struct{})
func main() {
// Startup code...
// Then blocking (waiting for quit signal):
<-quit
}
// And in another goroutine if you want to quit:
close(quit)
Note that issuing a close(quit) may terminate your app at any time. Quoting from Spec: Program execution:
Program execution begins by initializing the main package and then invoking the function main. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete.
When close(quit) is executed, the last statement of our main() function can proceed which means the main goroutine can return, so the program exits.
Sleeping without blocking
The above constructs block the goroutine, so if you don't have other goroutines running, that will cause a deadlock.
If you don't want to block the main goroutine but you just don't want it to end, you may use a time.Sleep() with a sufficiently large duration. The max duration value is
const maxDuration time.Duration = 1<<63 - 1
which is approximately 292 years.
time.Sleep(time.Duration(1<<63 - 1))
If you fear your app will run longer than 292 years, put the above sleep in an endless loop:
for {
time.Sleep(time.Duration(1<<63 - 1))
}
It depends on use cases to choose what kind of sleep you want.
#icza provides a good and simple solution for literally sleeping forever, but I want to give you some more sweets if you want your system could shutdown gracefully.
You could do something like this:
func mainloop() {
exitSignal := make(chan os.Signal)
signal.Notify(exitSignal, syscall.SIGINT, syscall.SIGTERM)
<-exitSignal
systemTeardown()
}
And in your main:
func main() {
systemStart()
mainloop()
}
In this way, you could not only ask your main to sleep forever, but you could do some graceful shutdown stuff after your code receives INT or TERM signal from OS, like ctrl+C or kill.
Another solution to block a goroutine. This solution prevents Go-Runtime to complain about the deadlock:
import "time"
func main() {
for {
time.Sleep(1138800 * time.Hour)
}
}
In a simple scheduler for timers I'm writing, I'm making use of a monitor goroutine to sync start/stop and timer done events.
The monitor goroutine, when stripped down to the essential, looks like this:
actions := make(chan func(), 1024)
// monitor goroutine
go func() {
for a := range actions {
a()
}
}()
actions <- func() {
actions <- func() {
// causes deadlock when buffer size is reached
}
}
This works great, until an action is sent that sends another action.
It's possible for a scheduled action to schedule another action, which causes a deadlock when the buffer size is reached.
Is there any clean way to solve this issue without resorting to shared state (which I've tried in my particular problem, but is quite ugly)?
The problem arises from the fact that when your monitor goroutine "takes out" (receives) a value (a function) and executes it (this also happens on the monitor goroutine), during its execution it sends a value on the monitored channel.
This in itself would not cause deadlock as when the function is executed (a()), it is already taken out of the buffered channel, so there is at least one free space in it, so sending a new value on the channel inside a() could proceed without blocking.
Problem may arise if there are other goroutines which may also send values on the monitored channel, which is your case.
One way to avoid the deadlock is that if the function being executed tries to "put back" (send) a function not in the same goroutine (which is the monitor goroutine), but in a new goroutine, so the monitor goroutine is not blocked:
actions <- func() {
// Send new func value in a new goroutine:
// this will never block the monitor goroutine
go func() {
actions <- func() {
// No deadlock.
}
}()
}
By doing this, the monitor goroutine will not be blocked even if buffer of actions is full, because sending a value on it will happen in a new goroutine (which may be blocked until there is free space in the buffer).
If you want to avoid always spawning a new goroutine when sending a value on actions, you may use a select to first try to send it without spawning a new goroutine. Only when buffer of actions is full and cannot be sent without blocking, shall you spawn a goroutine for sending to avoid deadlock, which - depending on your actual case may happen very rarely, and in this case it's inevitable anyway to avoid deadlock:
actions <- func() {
newfv := func() { /* do something */ }
// First try to send with select:
select {
case actions <- newfv:
// Success!
default:
// Buffer is full, must do it in new goroutine:
go func() {
actions <- newfv
}()
}
}
If you have to do this in many places, it's recommended to create a helper function for it:
func safeSend(fv func()) {
// First try to send with select:
select {
case actions <- fv:
// Success!
default:
// Buffer is full, must do it in new goroutine:
go func() {
actions <- fv
}()
}
}
And using it:
actions <- func() {
safeSend(func() {
// something to do
})
}