How do I timeout a blocking external library call? - go

(I don't believe my issue is a duplicate of this QA: go routine blocking the others one, because I'm running Go 1.9 which has the preemptive scheduler whereas that question was asked for Go 1.2).
My Go program calls into a C library wrapped by another Go-lang library that makes a blocking call that can last over 60 seconds. I want to add a timeout so it returns in 3 seconds:
Old code with long block:
// InvokeSomething is part of a Go wrapper library that calls the C library read_something function. I cannot change this code.
func InvokeSomething() ([]Something, error) {
ret := clib.read_something(&input) // this can block for 60 seconds
if ret.Code > 1 {
return nil, CreateError(ret)
}
return ret.Something, nil
}
// This is my code I can change:
func MyCode() {
something, err := InvokeSomething()
// etc
}
My code with a go-routine, channel, and timeout, based on this Go example: https://gobyexample.com/timeouts
type somethingResult struct {
Something []Something
Err error
}
func MyCodeWithTimeout() {
ch = make(chan somethingResult, 1);
go func() {
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
However when I run MyCodeWithTimeout it still takes 60 seconds before it executes the case <-time.After(time.Second * 3) block.
I know that attempting to read from an unbuffered channel with nothing in it will block, but I created the channel with a buffered size of 1 so as far as I can tell I'm doing it correctly. I'm surprised the Go scheduler isn't preempting my goroutine, or does that depend on execution being in go-lang code and not an external native library?
Update:
I read that the Go-scheduler, at least in 2015, is actually "semi-preemptive" and it doesn't preempt OS threads that are in "external code": https://github.com/golang/go/issues/11462
you can think of the Go scheduler as being partially preemptive. It's by no means fully cooperative, since user code generally has no control over scheduling points, but it's also not able to preempt at arbitrary points
I heard that runtime.LockOSThread() might help, so I changed the function to this:
func MyCodeWithTimeout() {
ch = make(chan somethingResult, 1);
defer close(ch)
go func() {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
...however it didn't help at all and it still blocks for 60 seconds.

Your proposed solution to do thread locking in the goroutine started in MyCodeWithTimeout() does not give guarantee MyCodeWithTimeout() will return after 3 seconds, and the reason for this is that first: no guarantee that the started goroutine will get scheduled and reach the point to lock the thread to the goroutine, and second: because even if the external command or syscall gets called and returns within 3 seconds, there is no guarantee that the other goroutine running MyCodeWithTimeout() will get scheduled to receive the result.
Instead do the thread locking in MyCodeWithTimeout(), not in the goroutine it starts:
func MyCodeWithTimeout() {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
ch = make(chan somethingResult, 1);
defer close(ch)
go func() {
something, err := InvokeSomething() // blocks here for 60 seconds
ret := somethingResult{ something, err }
ch <- ret
}()
select {
case result := <-ch:
// etc
case <-time.After(time.Second *3):
// report timeout
}
}
Now if MyCodeWithTimeout() execution starts, it will lock the goroutine to the OS thread, and you can be sure that this goroutine will notice the value sent on the timers value.
NOTE: This is better if you want it to return within 3 seconds, but this sill will not give guarantee, as the timer that fires (sends a value on its channel) runs in its own goroutine, and this thread locking has no effect on the scheduling of that goroutine.
If you want guarantee, you can't rely on other goroutines giving the "exit" signal, you can only rely on this happening in your goroutine running the MyCodeWithTimeout() function (because since you did thread locking, you can be sure it gets scheduled).
An "ugly" solution which spins up CPU usage for a given CPU core would be:
for end := time.Now().Add(time.Second * 3); time.Now().Before(end); {
// Do non-blocking check:
select {
case result := <-ch:
// Process result
default: // Must have default to be non-blocking
}
}
Note that the "urge" of using time.Sleep() in this loop would take away the guarantee, as time.Sleep() may use goroutines in its implementation and certainly does not guarantee to return exactly after the given duration.
Also note that if you have 8 CPU cores and runtime.GOMAXPROCS(0) returns 8 for you, and your goroutines are still "starving", this may be a temporary solution, but you still have more serious problems using Go's concurrency primitives in your app (or a lack of using them), and you should investigate and "fix" those. Locking a thread to a goroutine may even make it worse for the rest of the goroutines.

Related

Code executes successfully even though there are issues with WaitGroup implementation

I have this snippet of code which concurrently runs a function using an input and output channel and associated WaitGroups, but I was clued in to the fact that I've done some things wrong. Here's the code:
func main() {
concurrency := 50
var tasksWG sync.WaitGroup
tasks := make(chan string)
output := make(chan string)
for i := 0; i < concurrency; i++ {
tasksWG.Add(1)
// evidentally because I'm processing tasks in a groutine then I'm not blocking and I end up closing the tasks channel almost immediately and stopping tasks from executing
go func() {
for t := range tasks {
output <- process(t)
continue
}
tasksWG.Done()
}()
}
var outputWG sync.WaitGroup
outputWG.Add(1)
go func() {
for o := range output {
fmt.Println(o)
}
outputWG.Done()
}()
go func() {
// because of what was mentioned in the previous comment, the tasks wait group finishes almost immediately which then closes the output channel almost immediately as well which ends ranging over output early
tasksWG.Wait()
close(output)
}()
f, err := os.Open(os.Args[1])
if err != nil {
log.Panic(err)
}
s := bufio.NewScanner(f)
for s.Scan() {
tasks <- s.Text()
}
close(tasks)
// and finally the output wait group finishes almost immediately as well because tasks gets closed right away due to my improper use of goroutines
outputWG.Wait()
}
func process(t string) string {
time.Sleep(3 * time.Second)
return t
}
I've indicated in the comments where I've implementing things wrong. Now these comments make sense to me. The funny thing is that this code does indeed seem to run asynchronously and dramatically speeds up execution. I want to understand what I've done wrong but it's hard to wrap my head around it when the code seems to execute in an asynchronous way. I'd love to understand this better.
Your main goroutine is doing a couple of things sequentially and others concurrently, so I think your order of execution is off
f, err := os.Open(os.Args[1])
if err != nil {
log.Panic(err)
}
s := bufio.NewScanner(f)
for s.Scan() {
tasks <- s.Text()
}
Shouldn't you move this up top? So then you have values sent to tasks
THEN have your loop which ranges over tasks 50 times in the concurrency named for loop (you want to have something in tasks before calling code that ranges over it)
go func() {
// because of what was mentioned in the previous comment, the tasks wait group finishes almost immediately which then closes the output channel almost immediately as well which ends ranging over output early
tasksWG.Wait()
close(output)
}()
The logic here is confusing me, you're spawning a goroutine to wait on the waitgroup, so here the wait is nonblocking on the main goroutine - is that what you want to do? It won't wait for tasksWG to be decremented to zero inside main, it'll do that inside the goroutine that you've created. I don't believe you want to do that?
It might be easier to debug if you could give more details on the expected output?

A case of `all goroutines are asleep - deadlock!` I can't figure out why

TL;DR: A typical case of all goroutines are asleep, deadlock! but can't figure it out
I'm parsing the Wiktionary XML dump to build a DB of words. I defer the parsing of each article's text to a goroutine hoping that it will speed up the process.
It's 7GB and is processed in under 2 minutes in my machine when doing it serially, but if I can take advantage of all cores, why not.
I'm new to threading in general, I'm getting a all goroutines are asleep, deadlock! error.
What's wrong here?
This may not be performant at all, as it uses an unbuffered channel, so all goroutines effectively end up executing serially, but my idea is to learn and understand threading and to benchmark how long it takes with different alternatives:
unbuffered channel
different sized buffered channel
only calling as many goroutines at a time as there are runtime.NumCPU()
The summary of my code in pseudocode:
while tag := xml.getNextTag() {
wg.Add(1)
go parseTagText(chan, wg, tag.text)
// consume a channel message if available
select {
case msg := <-chan:
// do something with msg
default:
}
}
// reading tags finished, wait for running goroutines, consume what's left on the channel
for msg := range chan {
// do something with msg
}
// Sometimes this point is never reached, I get a deadlock
wg.Wait()
----
func parseTagText(chan, wg, tag.text) {
defer wg.Done()
// parse tag.text
chan <- whatever // just inform that the text has been parsed
}
Complete code:
https://play.golang.org/p/0t2EqptJBXE
In your complete example on the Go Playground, you:
Create a channel (line 39, results := make(chan langs)) and a wait-group (line 40, var wait sync.WaitGroup). So far so good.
Loop: in the loop, sometimes spin off a task:
if ...various conditions... {
wait.Add(1)
go parseTerm(results, &wait, text)
}
In the loop, sometimes do a non-blocking read from the channel (as shown in your question). No problem here either. But...
At the end of the loop, use:
for res := range results {
...
}
without ever calling close(results) in exactly one place, after all writers finish. This loop uses a blocking read from the channel. As long as some writer goroutine is still running, the blocking read can block without having the whole system stop, but when the last writer finishes writing and exits, there are no remaining writer goroutines. Any other remaining goroutines might rescue you, but there are none.
Since you use the var wait correctly (adding 1 in the right place, and calling Done() in the right place in the writer), the solution is to add one more goroutine, which will be the one to rescue you:
go func() {
wait.Wait()
close(results)
}()
You should spin off this rescuer goroutine just before entering the for res := range results loop. (If you spin it off any earlier, it might see the wait variable count down to zero too soon, just before it gets counted up again by spinning off another parseTerm.)
This anonymous function will block in the wait variable's Wait() function until the last writer goroutine has called the final wait.Done(), which will unblock this goroutine. Then this goroutine will call close(results), which will arrange for the for loop in your main goroutine to finish, unblocking that goroutine. When this goroutine (the rescuer) returns and thus terminates, there are no more rescuers, but we no longer need any.
(This main code then calls wait.Wait() unnecessarily: Since the for didn't terminate until the wait.Wait() in the new goroutine already unblocked, we know that this next wait.Wait() will return immediately. So we can drop this second call, although leaving it in is harmless.)
The problem is that nothing is closing the results channel, yet the range loop only exits when it closes. I've simplified your code to illustrate this and propsed a solution - basically consume the data in a goroutine:
// This is our producer
func foo(i int, ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
ch <- i
fmt.Println(i, "done")
}
// This is our consumer - it uses a different WG to signal it's done
func consumeData(ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
for x := range ch {
fmt.Println(x)
}
fmt.Println("ALL DONE")
}
func main() {
ch := make(chan int)
wg := sync.WaitGroup{}
// create the producers
for i := 0; i < 10; i++ {
wg.Add(1)
go foo(i, ch, &wg)
}
// create the consumer on a different goroutine, and sync using another WG
consumeWg := sync.WaitGroup{}
consumeWg.Add(1)
go consumeData(ch,&consumeWg)
wg.Wait() // <<<< means that the producers are done
close(ch) // << Signal the consumer to exit
consumeWg.Wait() // << Wait for the consumer to exit
}

Golang intermittent behaviour on timed out Goroutine

I am trying to implement concurrency for repetitive task. I want to implement an http request on a different Goroutine (pictured by longRunningTask function). I provide a timer for a mechanism to stop the Goroutine and sends a timeout signal to the main Goroutine if the heavy load task proceed the predefined timeout. The problem that I currently have is that I am getting intermittent behaviour.
The code has been simplified to look like below.
package main
import (
"fmt"
"time"
)
func main() {
var iteration int = 5
timeOutChan := make(chan struct{})
resultChan := make(chan string)
for i := 0; i < iteration; i++ {
go longRunningTaks(timeOutChan, resultChan)
}
for i := 0; i < iteration; i++ {
select {
case data := <-resultChan:
fmt.Println(data)
case <-timeOutChan:
fmt.Println("timed out")
}
}
}
func longRunningTaks(tc chan struct{}, rc chan string) {
timer := time.NewTimer(time.Nanosecond * 1)
defer timer.Stop()
// Heavy load task
time.Sleep(time.Second * 1)
select {
case <-timer.C:
tc <- struct{}{}
case rc <- "success":
return
}
}
I believe every tries should be printing out
timeout
timeout
timeout
timeout
timeout
Instead I got an intermittent
success
timeout
timeout
timeout
timeout
The doc mentions:
NewTimer creates a new Timer that will send the current time on its
channel after at least duration d.
"at least means" timer will take specified time for sure, however this also implicitly means can take more time than specified. Timer starts its own go routine and write to channel on expiry.
Because of scheduler or garbage collection or processes of writing to other channel can get delayed. Besides simulated work load is very short considering above possibilities.
Update:
As Peter mentioned in comment writing "success" to rc channel is action which is equally likely to complete because that can be read from the other end by main routine. The select has to choose between 1) writing "success" to rc channel & 2) expired timer. And both are possible.
The likelihood of No1 is more in the beginning because the main routine is yet to read it from other end. Once that happens. Other remaining routines will have to compete for the channel (to write "success") (since it is blocking with buffer size 0) so for rest of the times the likelihood of expired timer getting selected is more as cannot say how fast the main routine will read from the resultChan channel (other end of rc).

How to run for x seconds in a http handler

I want to run my function InsertRecords for 30 seconds and test how many records I can insert in a given time.
How can I stop processing InsertRecords after x seconds and then return a result from my handler?
func benchmarkHandler(w http.ResponseWriter, r *http.Request) {
counter := InsertRecords()
w.WriteHeader(200)
io.WriteString(w, fmt.Sprintf("counter is %d", counter))
}
func InsertRecords() int {
counter := 0
// db code goes here
return counter
}
Cancellations and timeouts are often done with a context.Context.
While this simple example could be done with a channel alone, using the context here makes it more flexible, and can take into account the client disconnecting as well.
func benchmarkHandler(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), 30*time.Second)
defer cancel()
counter := InsertRecords(ctx)
w.WriteHeader(200)
io.WriteString(w, fmt.Sprintf("counter is %d", counter))
}
func InsertRecords(ctx context.Context) int {
counter := 0
done := ctx.Done()
for {
select {
case <-done:
return counter
default:
}
// db code goes here
counter++
}
return counter
}
This will run for at least 30 seconds, returning the number of complete database iterations. If you want to be sure that the handler always returns immediately after 30s, even if the DB call is blocked, then you need to push the DB code into another goroutine and let it return later. The shortest example would be to use a similar pattern as above, but synchronize access to the counter variable, since it could be written by the DB loop while returning.
func InsertRecords(ctx context.Context) int {
counter := int64(0)
done := ctx.Done()
go func() {
for {
select {
case <-done:
return
default:
}
// db code goes here
atomic.AddInt64(&counter, 1)
}
}()
<-done
return int(atomic.LoadInt64(&counter))
}
See #JoshuaKolden's answer for an example with a producer and a timeout, which could also be combined with the existing request context.
As JimB pointed out cancelation for limiting the time taken by an http requests can be handled with context.WithTimeout, however since you asked for the purposes of benchmarking you may want to use a more direct method.
The purpose of context.Context is to allow for numerous cancelation events to occur and have the same net effect of gracefully stopping all downstream tasks. In JimB's example it's possible that some other process will cancel the context before the 30 seconds have elapsed, and this is desirable from the resource utilization point of view. For example, if the connection is terminated prematurely there is no point in doing any more work on building a response.
If benchmarking is your goal you'd want to minimized the effect of superfluous events on the code being benchmarked. Here is an example of how to do that:
func InsertRecords() int {
stop := make(chan struct{})
defer close(stop)
countChan := make(chan int)
go func() {
defer close(countChan)
for {
// db code goes here
select {
case countChan <- 1:
case <-stop:
return
}
}
}()
var counter int
timeoutCh := time.After(30 * time.Second)
for {
select {
case n := <-countChan:
counter += n
case <-timeoutCh:
return counter
}
}
}
Essentially what we are doing is creating an infinite loop over discrete db operations, and counting iterations through the loop, we stop when time.After is triggered.
A problem in JimB's example is that despite checking ctx.Done() in the loop the loop can still block if the "db code" blocks. This is because ctx.Done() is only evaluated inline with the "db code" block.
To avoid this problem we separate the timing function and the benchmarking loop so that nothing can prevent us from receiving the timeout event when it occurs. Once the time out even occurs we immediately return the result of the counter. The "db code" may still be in mid execution but InsertRecords will exit and return its results anyway.
If the "db code" is in mid-execution when InsertRecords exits, the goroutine will be left running, so to clean this up we defer close(stop) so that on function exit we'll be sure to signal the goroutine to exit on the next iteration. When the goroutine exits, it cleans up the channel it was using to send the count.
As a general pattern the above is an example of how you can get precise timing in Go without regard to the actual execution time of the code being timed.
sidenote: A somewhat more advanced observation is that my example does not attempt to synchronize the start times between the timer and the goroutine. It seemed a bit pedantic to address that issue here. However, you can easily synchronize the two threads by creating a channel that blocks the main thread until the goroutine closes it just before starting the loop.

Golang transaction exit handling

I have a system with about 100 req/sec. And sometimes it doesn't respond until I restart my go program. I found out that this is because I open transaction and don't close it in some places. That's why all connections were occupied by opened transactions and I couldn't open another one After this I added this code
defer func() {
if r := recover(); r != nil {
tx.Rollback()
return
}
if err == nil {
err = tx.Commit()
} else {
tx.Rollback()
}
}()
This made my program work for a month without interruption. But just now it happened again. Probably because of this issue. Is there a better way to close transaction? Or maybe a way to close a transaction if it was open for 1 minute?
If you want to roll back the transaction after 1 minute, you can use the standard Go timeout pattern (below).
However, there may be better ways to deal with your problems. You don’t give a lot of information about it, but you are probably using standard HTTP requests with contexts. In that case, it may help you to know that a transaction started within a context is automatically rolled back when the context is cancelled. One way to make sure that a context is cancelled if something gets stuck is to give it a deadline.
Excerpted from Channels - Timeouts. The original authors were Chris Lucas, Kwarrtz and Rodolfo Carvalho. Attribution details can be found on the contributor page. The source is licenced under CC BY-SA 3.0 and may be found in the Documentation archive. Reference topic ID: 1263 and example ID: 6050.
Timeouts
Channels are often used to implement timeouts.
func main() {
// Create a buffered channel to prevent a goroutine leak. The buffer
// ensures that the goroutine below can eventually terminate, even if
// the timeout is met. Without the buffer, the send on the channel
// blocks forever, waiting for a read that will never happen, and the
// goroutine is leaked.
ch := make(chan struct{}, 1)
go func() {
time.Sleep(10 * time.Second)
ch <- struct{}{}
}()
select {
case <-ch:
// Work completed before timeout.
case <-time.After(1 * time.Second):
// Work was not completed after 1 second.
}
}

Resources