Golang intermittent behaviour on timed out Goroutine - go

I am trying to implement concurrency for repetitive task. I want to implement an http request on a different Goroutine (pictured by longRunningTask function). I provide a timer for a mechanism to stop the Goroutine and sends a timeout signal to the main Goroutine if the heavy load task proceed the predefined timeout. The problem that I currently have is that I am getting intermittent behaviour.
The code has been simplified to look like below.
package main
import (
"fmt"
"time"
)
func main() {
var iteration int = 5
timeOutChan := make(chan struct{})
resultChan := make(chan string)
for i := 0; i < iteration; i++ {
go longRunningTaks(timeOutChan, resultChan)
}
for i := 0; i < iteration; i++ {
select {
case data := <-resultChan:
fmt.Println(data)
case <-timeOutChan:
fmt.Println("timed out")
}
}
}
func longRunningTaks(tc chan struct{}, rc chan string) {
timer := time.NewTimer(time.Nanosecond * 1)
defer timer.Stop()
// Heavy load task
time.Sleep(time.Second * 1)
select {
case <-timer.C:
tc <- struct{}{}
case rc <- "success":
return
}
}
I believe every tries should be printing out
timeout
timeout
timeout
timeout
timeout
Instead I got an intermittent
success
timeout
timeout
timeout
timeout

The doc mentions:
NewTimer creates a new Timer that will send the current time on its
channel after at least duration d.
"at least means" timer will take specified time for sure, however this also implicitly means can take more time than specified. Timer starts its own go routine and write to channel on expiry.
Because of scheduler or garbage collection or processes of writing to other channel can get delayed. Besides simulated work load is very short considering above possibilities.
Update:
As Peter mentioned in comment writing "success" to rc channel is action which is equally likely to complete because that can be read from the other end by main routine. The select has to choose between 1) writing "success" to rc channel & 2) expired timer. And both are possible.
The likelihood of No1 is more in the beginning because the main routine is yet to read it from other end. Once that happens. Other remaining routines will have to compete for the channel (to write "success") (since it is blocking with buffer size 0) so for rest of the times the likelihood of expired timer getting selected is more as cannot say how fast the main routine will read from the resultChan channel (other end of rc).

Related

timer reset in separate go routine

In the following scenario a network entity always waits for a TimeOutTime seconds before doing a particular task X. Assume this time as TimerT. During this wait of TimeOutTime seconds if the entity receives a set of external messages, it should reset the same TimerT to TimeOutTime again. If no external messages are received the expected behaviour is as follows:
Timer Expired
Do task X
Reset the Timer again to TimeOutTime
(by reset I mean, stop the timer and start over again)
To simulate the scenario I wrote the following code in Go.
package main
import (
"log"
"math/rand"
"sync"
"time"
)
const TimeOutTime = 3
const MeanArrivalTime = 4
func main() {
rand.Seed(time.Now().UTC().UnixNano())
var wg sync.WaitGroup
t := time.NewTimer(time.Second * time.Duration(TimeOutTime))
wg.Add(1)
// go routine for doing timeout event
go func() {
defer wg.Done()
for {
t1 := time.Now()
<-t.C
t2 := time.Now()
// Do.. task X .. on timeout...
log.Println("Timeout after ", t2.Sub(t1))
t.Reset(time.Second * time.Duration(TimeOutTime))
}
}()
// go routine to simulate incoming messages ...
// second go routine
go func() {
for {
// simulates a incoming message at any time
time.Sleep(time.Second * time.Duration(rand.Intn(MeanArrivalTime)))
// once any message is received reset the timer to TimeOutTime seconds again
t.Reset(time.Second * time.Duration(TimeOutTime))
}
}()
wg.Wait()
}
After running this program using -race flag it shows DATA_RACE:
==================
WARNING: DATA RACE
Write at 0x00c0000c2068 by goroutine 8:
time.(*Timer).Reset()
/usr/local/go/src/time/sleep.go:125 +0x98
main.main.func1()
/home/deka/Academic/go/src/main/test.go:29 +0x18f
Previous write at 0x00c0000c2068 by goroutine 9:
time.(*Timer).Reset()
/usr/local/go/src/time/sleep.go:125 +0x98
main.main.func2()
/home/deka/Academic/go/src/main/test.go:42 +0x80
Goroutine 8 (running) created at:
main.main()
/home/deka/Academic/go/src/main/test.go:20 +0x1d3
Goroutine 9 (running) created at:
main.main()
/home/deka/Academic/go/src/main/test.go:35 +0x1f5
==================
Then I used a Mutex to wrap the Reset() call inside the Mutex.
package main
import (
"log"
"math/rand"
"sync"
"time"
)
const TimeOutTime = 3
const MeanArrivalTime = 4
func main() {
rand.Seed(time.Now().UTC().UnixNano())
var wg sync.WaitGroup
t := time.NewTimer(time.Second * time.Duration(TimeOutTime))
wg.Add(1)
var mu sync.Mutex
// go routine for doing timeout event
go func() {
defer wg.Done()
for {
t1 := time.Now()
<-t.C
t2 := time.Now()
// Do.. task X .. on timeout...
log.Println("Timeout after ", t2.Sub(t1))
mu.Lock()
t.Reset(time.Second * time.Duration(TimeOutTime))
mu.Unlock()
}
}()
// go routine to simulate incoming messages ...
// second go routine
go func() {
for {
// simulates a incoming message at any time
time.Sleep(time.Second * time.Duration(rand.Intn(MeanArrivalTime)))
// once any message is received reset the timer to TimeOutTime seconds again
mu.Lock()
t.Reset(time.Second * time.Duration(TimeOutTime))
mu.Unlock()
}
}()
wg.Wait()
}
After this code seems to work fine based on the following observation.
If I replace the line
time.Sleep(time.Second * time.Duration(rand.Intn(MeanArrivalTime)))
in the second go routine with a constant time of sleep of 4 seconds and the TimeOutTime is constant at 3 seconds.
Output of the program is:
2020/02/29 20:10:11 Timeout after 3.000160828s
2020/02/29 20:10:15 Timeout after 4.000444017s
2020/02/29 20:10:19 Timeout after 4.000454657s
2020/02/29 20:10:23 Timeout after 4.000304877s
In the above execution, 2nd go routine is resetting the active timer after the timer has spent initial one second. Because of which, the timer is getting expired after 4 seconds from the second print onward.
Now when I checked the documentation of Reset() I found the following:
// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Reset changes the timer to expire after duration d.
// It returns true if the timer had been active, false if the timer had
// expired or been stopped.
//
// Reset should be invoked only on stopped or expired timers with drained channels.
// If a program has already received a value from t.C, the timer is known
// to have expired and the channel drained, so t.Reset can be used directly.
// If a program has not yet received a value from t.C, however,
// the timer must be stopped and—if Stop reports that the timer expired
// before being stopped—the channel explicitly drained:
//
// if !t.Stop() {
// <-t.C
// }
// t.Reset(d)
//
// This should not be done concurrent to other receives from the Timer's
// channel.
//
// Note that it is not possible to use Reset's return value correctly, as there
// is a race condition between draining the channel and the new timer expiring.
// Reset should always be invoked on stopped or expired channels, as described above.
// The return value exists to preserve compatibility with existing programs.
I found this diagram: (link : https://blogtitle.github.io/go-advanced-concurrency-patterns-part-2-timers/)
With the digram in mind, it seems that I need to use,
if !t.Stop() {
<-t.C
}
t.Reset(d)
in the 2nd go routine. In this case I also need to do proper locking in both the go routine to avoid infinite wait on channel.
I don't understand the scenario under which the t.Stop() + draining of the channel (<-t.C) should be performed. In which case it is required ? In my example I don't use channel read values. Can I call Reset() without calling Stop() ?
I simplified the code using time.After function:
package main
import (
"log"
"math/rand"
"time"
)
const TimeOutTime = 3
const MeanArrivalTime = 4
func main() {
const interval = time.Second * TimeOutTime
// channel for incoming messages
var incomeCh = make(chan struct{})
go func() {
for {
// On each iteration new timer is created
select {
case <-time.After(interval):
time.Sleep(time.Second)
log.Println("Do task")
case <-incomeCh:
log.Println("Handle income message and move to the next iteration")
}
}
}()
go func() {
for {
time.Sleep(time.Duration(rand.Intn(MeanArrivalTime)) * time.Second)
// generate incoming message
incomeCh <- struct{}{}
}
}()
// prevent main to stop for a while
<-time.After(10 * time.Second)
}
Note that:
After waits for the duration to elapse and then sends the current time
on the returned channel.
It is equivalent to NewTimer(d).C.
The underlying Timer is not recovered by the garbage collector
until the timer fires. If efficiency is a concern, use NewTimer
instead and call Timer.Stop if the timer is no longer needed.
Assume you have:
t.Stop()
t.Reset()
If the timer is stopped and drained before calling Stop, this works fine. The problem manifests itself if Stop stops the timer and timer ticks at the same time. Then you may end up with a stopped timer with a goroutine waiting to write to the t.C channel. So Stop returns false if there is still a goroutine waiting to write to t.C, and you have to read from it. Otherwise, you'll have that goroutine waiting there indefinitely.
So, as you already observed, you have to do:
if !t.Stop() {
<-t.C
}
t.Reset(d)
However, even with that, I think your solution is flawed because of the use of asynchronous resets. Instead, try using a new timer for each simulated event.
You might consider a different overall design.
Suppose for instance that we write a routine or interface called Deadliner—it could become its own package if you like, or just be an interface, and we'll see a pretty strong resemblance to something Go already has—whose job / contract is described this way:
The user of a Deadliner creates a Deadline whenever they like.
The Deadliner waits until the deadline occurs, then flags the deadline as having occurred.
A Deadliner can be canceled by any Go routine any time. This flags the deadliner as canceled, so that anyone waiting on it will stop waiting, and can tell that the reason they stopped waiting was "canceled" (not "expired"). It helps clean up resources for gc as well, in case you create a lot of Deadliners and then discard them before their timeout fires.
Now in your top level, before you start waiting for a message, you simply set up a deadline. This isn't a timer (even if it may use one internally), it's just a Deadliner instance. Then you wait for one of two events:
d, cancel = newDeadline(when)
for {
select {
case <-d.Done():
// Deadline expired.
// ... handle it ...
d, cancel = newDeadline(when) // if/as appropriate
case m := <-msgC:
// got message - cancel existing deadline and get new one
cancel()
d, cancel = newDeadline(when)
// ... handle the message
}
}
Now we just note that Go already has this: it's in package context. d is a context; newDeadline is context.WithDeadline or context.WithTimeout (depending on whether you want to compute the deadline time yourself, or have the timeout code add a duration to "now").
There is no need to fiddle with timers and time-tick channels and no need to spin off your own separate goroutines.
If the deadline doesn't reset on a single message, but rather on a particular combination of messages, you just write that in your case <-msgChan section. If messages aren't currently received via channels, make that happen by putting messages into a channel, so that you can use this very simple wait-for-deadline-or-message pattern.

How to run for x seconds in a http handler

I want to run my function InsertRecords for 30 seconds and test how many records I can insert in a given time.
How can I stop processing InsertRecords after x seconds and then return a result from my handler?
func benchmarkHandler(w http.ResponseWriter, r *http.Request) {
counter := InsertRecords()
w.WriteHeader(200)
io.WriteString(w, fmt.Sprintf("counter is %d", counter))
}
func InsertRecords() int {
counter := 0
// db code goes here
return counter
}
Cancellations and timeouts are often done with a context.Context.
While this simple example could be done with a channel alone, using the context here makes it more flexible, and can take into account the client disconnecting as well.
func benchmarkHandler(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), 30*time.Second)
defer cancel()
counter := InsertRecords(ctx)
w.WriteHeader(200)
io.WriteString(w, fmt.Sprintf("counter is %d", counter))
}
func InsertRecords(ctx context.Context) int {
counter := 0
done := ctx.Done()
for {
select {
case <-done:
return counter
default:
}
// db code goes here
counter++
}
return counter
}
This will run for at least 30 seconds, returning the number of complete database iterations. If you want to be sure that the handler always returns immediately after 30s, even if the DB call is blocked, then you need to push the DB code into another goroutine and let it return later. The shortest example would be to use a similar pattern as above, but synchronize access to the counter variable, since it could be written by the DB loop while returning.
func InsertRecords(ctx context.Context) int {
counter := int64(0)
done := ctx.Done()
go func() {
for {
select {
case <-done:
return
default:
}
// db code goes here
atomic.AddInt64(&counter, 1)
}
}()
<-done
return int(atomic.LoadInt64(&counter))
}
See #JoshuaKolden's answer for an example with a producer and a timeout, which could also be combined with the existing request context.
As JimB pointed out cancelation for limiting the time taken by an http requests can be handled with context.WithTimeout, however since you asked for the purposes of benchmarking you may want to use a more direct method.
The purpose of context.Context is to allow for numerous cancelation events to occur and have the same net effect of gracefully stopping all downstream tasks. In JimB's example it's possible that some other process will cancel the context before the 30 seconds have elapsed, and this is desirable from the resource utilization point of view. For example, if the connection is terminated prematurely there is no point in doing any more work on building a response.
If benchmarking is your goal you'd want to minimized the effect of superfluous events on the code being benchmarked. Here is an example of how to do that:
func InsertRecords() int {
stop := make(chan struct{})
defer close(stop)
countChan := make(chan int)
go func() {
defer close(countChan)
for {
// db code goes here
select {
case countChan <- 1:
case <-stop:
return
}
}
}()
var counter int
timeoutCh := time.After(30 * time.Second)
for {
select {
case n := <-countChan:
counter += n
case <-timeoutCh:
return counter
}
}
}
Essentially what we are doing is creating an infinite loop over discrete db operations, and counting iterations through the loop, we stop when time.After is triggered.
A problem in JimB's example is that despite checking ctx.Done() in the loop the loop can still block if the "db code" blocks. This is because ctx.Done() is only evaluated inline with the "db code" block.
To avoid this problem we separate the timing function and the benchmarking loop so that nothing can prevent us from receiving the timeout event when it occurs. Once the time out even occurs we immediately return the result of the counter. The "db code" may still be in mid execution but InsertRecords will exit and return its results anyway.
If the "db code" is in mid-execution when InsertRecords exits, the goroutine will be left running, so to clean this up we defer close(stop) so that on function exit we'll be sure to signal the goroutine to exit on the next iteration. When the goroutine exits, it cleans up the channel it was using to send the count.
As a general pattern the above is an example of how you can get precise timing in Go without regard to the actual execution time of the code being timed.
sidenote: A somewhat more advanced observation is that my example does not attempt to synchronize the start times between the timer and the goroutine. It seemed a bit pedantic to address that issue here. However, you can easily synchronize the two threads by creating a channel that blocks the main thread until the goroutine closes it just before starting the loop.

How do I timeout a blocking external library call?

(I don't believe my issue is a duplicate of this QA: go routine blocking the others one, because I'm running Go 1.9 which has the preemptive scheduler whereas that question was asked for Go 1.2).
My Go program calls into a C library wrapped by another Go-lang library that makes a blocking call that can last over 60 seconds. I want to add a timeout so it returns in 3 seconds:
Old code with long block:
// InvokeSomething is part of a Go wrapper library that calls the C library read_something function. I cannot change this code.
func InvokeSomething() ([]Something, error) {
ret := clib.read_something(&input) // this can block for 60 seconds
if ret.Code > 1 {
return nil, CreateError(ret)
}
return ret.Something, nil
}
// This is my code I can change:
func MyCode() {
something, err := InvokeSomething()
// etc
}
My code with a go-routine, channel, and timeout, based on this Go example: https://gobyexample.com/timeouts
type somethingResult struct {
Something []Something
Err error
}
func MyCodeWithTimeout() {
ch = make(chan somethingResult, 1);
go func() {
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
However when I run MyCodeWithTimeout it still takes 60 seconds before it executes the case <-time.After(time.Second * 3) block.
I know that attempting to read from an unbuffered channel with nothing in it will block, but I created the channel with a buffered size of 1 so as far as I can tell I'm doing it correctly. I'm surprised the Go scheduler isn't preempting my goroutine, or does that depend on execution being in go-lang code and not an external native library?
Update:
I read that the Go-scheduler, at least in 2015, is actually "semi-preemptive" and it doesn't preempt OS threads that are in "external code": https://github.com/golang/go/issues/11462
you can think of the Go scheduler as being partially preemptive. It's by no means fully cooperative, since user code generally has no control over scheduling points, but it's also not able to preempt at arbitrary points
I heard that runtime.LockOSThread() might help, so I changed the function to this:
func MyCodeWithTimeout() {
ch = make(chan somethingResult, 1);
defer close(ch)
go func() {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
...however it didn't help at all and it still blocks for 60 seconds.
Your proposed solution to do thread locking in the goroutine started in MyCodeWithTimeout() does not give guarantee MyCodeWithTimeout() will return after 3 seconds, and the reason for this is that first: no guarantee that the started goroutine will get scheduled and reach the point to lock the thread to the goroutine, and second: because even if the external command or syscall gets called and returns within 3 seconds, there is no guarantee that the other goroutine running MyCodeWithTimeout() will get scheduled to receive the result.
Instead do the thread locking in MyCodeWithTimeout(), not in the goroutine it starts:
func MyCodeWithTimeout() {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
ch = make(chan somethingResult, 1);
defer close(ch)
go func() {
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
Now if MyCodeWithTimeout() execution starts, it will lock the goroutine to the OS thread, and you can be sure that this goroutine will notice the value sent on the timers value.
NOTE: This is better if you want it to return within 3 seconds, but this sill will not give guarantee, as the timer that fires (sends a value on its channel) runs in its own goroutine, and this thread locking has no effect on the scheduling of that goroutine.
If you want guarantee, you can't rely on other goroutines giving the "exit" signal, you can only rely on this happening in your goroutine running the MyCodeWithTimeout() function (because since you did thread locking, you can be sure it gets scheduled).
An "ugly" solution which spins up CPU usage for a given CPU core would be:
for end := time.Now().Add(time.Second * 3); time.Now().Before(end); {
// Do non-blocking check:
select {
case result := <-ch:
// Process result
default: // Must have default to be non-blocking
}
}
Note that the "urge" of using time.Sleep() in this loop would take away the guarantee, as time.Sleep() may use goroutines in its implementation and certainly does not guarantee to return exactly after the given duration.
Also note that if you have 8 CPU cores and runtime.GOMAXPROCS(0) returns 8 for you, and your goroutines are still "starving", this may be a temporary solution, but you still have more serious problems using Go's concurrency primitives in your app (or a lack of using them), and you should investigate and "fix" those. Locking a thread to a goroutine may even make it worse for the rest of the goroutines.

Understanding Go channel deadlocks

package main
import (
"fmt"
"time"
)
func main() {
p := producer()
for c := range p {
fmt.Println(c)
}
}
func producer() <-chan string {
ch := make(chan string)
go func() {
for i := 0; i < 5; i++ {
ch <- fmt.Sprint("hello", i)
time.Sleep(1 * time.Second)
}
// commented the below to show the issue
// close(ch)
}()
return ch
}
Running the above code will print 5 messages and then give a "all go routines are a sleep - deadlock error". I understand that if I close the channel the error is gone.
The thing I would like to understand is how does go runtime know that the code will be waiting infinitely on the channel and that there is nothing else that will be sending data into the channel.
Now if I add an additional go routine to the main() function.. it does not throw any error and keeps waiting on the channel.
go func() {
for {
time.Sleep(2 * time.Millisecond)
}
}()
So does this mean.. the go runtime is just looking for presence of a running go routine that could potentially send data into the channel and hence not throwing the deadlock error ?
If you want some more insight into how Go implements the deadlock detection, have a look at the place in the code that throws the "all goroutines are asleep - deadlock!": https://github.com/golang/go/blob/master/src/runtime/proc.go#L3751
It looks like the Go runtime keeps some fairly simple accounting on how many goroutines there are, how many are idle, and how many are sleeping for locks (not sure which one sleep on channel I/O will increment). At any given time (serialized with the rest of the runtime), it just does some arithmetic and checks if all - idle - locked > 0... if so, then the program could still make progress... if it's 0, then you're definitely deadlocked.
It's possible you could introduce a livelock by preventing a goroutine from sleeping via an infinite loop (like what you did in your experiment, and apparently sleep for timers isn't treated the same by the runtime). The runtime wouldn't be able to detect a deadlock in that case, and run forever.
Furthermore, I'm not sure when exactly the runtime checks for deadlocks- further inspection of who calls that checkdead() may yield some insight there, if you're interested.
DISCLAIMER- I'm not a Go core developer, I just play one on TV :-)
The runtime panics with the "all go routines are a sleep - deadlock error" error when all goroutines are blocked on channel and mutex operations.
The sleeping goroutine does not block on one of these operations. There is no deadlock and therefore no panic.

Gogoutine schedule algorithm

package main
import ()
func main() {
msgQueue := make(chan int, 1000000)
netAddr := "127.0.0.1"
token := make(chan int, 10)
for i := 0; i < 10; i++ {
token <- i
}
go RecvReq(netAddr, msgQueue)
for {
select {
case req := <-msgQueue:
go HandleReq(req, token)
}
}
}
func RecvReq(addr string,msgQueue chan int){
msgQueue<-//get from network
}
func HandleReq(msg int, token chan int) {
//step 1
t := <-token
//step 2
//codo here...(don't call runtime.park)
//step 3
//code here...(may call runtime.park)
//step 4
token <- t
}
System: 1cpu 2core
Go version:go1.3 linux/amd64
Problem description:
msgQueue revc request all the time by RecvReq,then the main goroutine create new goroutine all the time,but the waiting goroutine wait all the time.The first 10 goroutines stop at step 3,new goroutines followed stop at step 1.
Q1:How to make the waiting goroutine to run when new goroutine is being created all the time.
Q2:How to balance RevcReq and HandleReq? Revc msg rate is 10 times faster than Handle msg.
Alas this is not very clear from your question. But there are several issues here.
You create a buffered channel of size n then insert n items into it. Don't do this - or to be clearer, don't do this until you know it's needed. Buffered channels usually fall into the 'premature optimisation' category. Start with unbuffered channels so you can work out how the goroutines co-operate. When it's working (free of deadlocks), measure the performance, add buffering, try again.
Your select has only one guard. So it behaves just like the select wasn't there and the case body was the only code there.
You are trying to spawn off new goroutines for every message. Is this really what you wanted? You may find you can use a static mesh of goroutines, perhaps 10 in your case, and the result may be a program in which the intent is clearer. It would also give a small saving because the runtime would not have to spawn and clean up goroutines dynamically (however, you should be concerned with correct behaviour first, before worrying about any inefficiencies).
Your RecvReq is missing from the playground example, which is not executable.

Resources