Concurrent limited consumers using goroutins and channels - go

I've tried to reproduce "approach that manages resources well is to start a fixed number of handle goroutines all reading from the request channel." from Effective Go and found
fatal error: all goroutines are asleep - deadlock!
The idea is simple: have 1 queue and 1 results channels and several limited number of "workers".
My code is in Go Playground
queue := make(chan *Request)
result := make(chan int)
quit := make(chan bool)
go Serve(queue, quit)
for i := 0; i < 10; i++ {
req := Request{i, result}
queue <- &req
}
close(queue)
for i := 0; i < 10; i++ {
fmt.Printf("Finished %d\n", <-result)
}
fmt.Printf("All finished\n")
quit <- true
Function Serve:
func handle(queue chan *Request) {
for r := range queue {
//process(r)
fmt.Printf("Processing %d\n", r.i)
r.r <- r.i
}
}
func Serve(clientRequests chan *Request, quit chan bool) {
MaxOutstanding := 2
// Start handlers
for i := 0; i < MaxOutstanding; i++ {
go handle(clientRequests)
}
<-quit // Wait to be told to exit.
}
What is wrong? Or may be there is simplier solution for implementing limited number of workers that process requests?

The result channel isn't buffered, so each send blocks the goroutine until something receives the result. So after the first two items are processed, the two handler routines are waiting for their results to be received. But the code to receive work items isn't running yet, so they're stuck. Then main() tries to send the next work item, and the handlers aren't ready to receive it so main() is now stuck too.
One way to fix it is to start a goroutine to receive results in the background before you queue the work items. Here's a version of your code that receives the results in the background, plus makes main() wait until all results have been received and handled: http://play.golang.org/p/_CKn3CxQFc.
In your example, you can know when it's safe to quit just by counting results received. But if a situation comes up where you need to figure out when you're done and counting is not enough, then 1) each worker can signal when it's done, 2) Serve can close the result channel once all workers signal done, 3) your result-handling goroutine can send a quit to main() when all the results have been handled. There are lots of variations on that; you can see code at http://play.golang.org/p/12wZbm0rxa

Related

A case of `all goroutines are asleep - deadlock!` I can't figure out why

TL;DR: A typical case of all goroutines are asleep, deadlock! but can't figure it out
I'm parsing the Wiktionary XML dump to build a DB of words. I defer the parsing of each article's text to a goroutine hoping that it will speed up the process.
It's 7GB and is processed in under 2 minutes in my machine when doing it serially, but if I can take advantage of all cores, why not.
I'm new to threading in general, I'm getting a all goroutines are asleep, deadlock! error.
What's wrong here?
This may not be performant at all, as it uses an unbuffered channel, so all goroutines effectively end up executing serially, but my idea is to learn and understand threading and to benchmark how long it takes with different alternatives:
unbuffered channel
different sized buffered channel
only calling as many goroutines at a time as there are runtime.NumCPU()
The summary of my code in pseudocode:
while tag := xml.getNextTag() {
wg.Add(1)
go parseTagText(chan, wg, tag.text)
// consume a channel message if available
select {
case msg := <-chan:
// do something with msg
default:
}
}
// reading tags finished, wait for running goroutines, consume what's left on the channel
for msg := range chan {
// do something with msg
}
// Sometimes this point is never reached, I get a deadlock
wg.Wait()
----
func parseTagText(chan, wg, tag.text) {
defer wg.Done()
// parse tag.text
chan <- whatever // just inform that the text has been parsed
}
Complete code:
https://play.golang.org/p/0t2EqptJBXE
In your complete example on the Go Playground, you:
Create a channel (line 39, results := make(chan langs)) and a wait-group (line 40, var wait sync.WaitGroup). So far so good.
Loop: in the loop, sometimes spin off a task:
if ...various conditions... {
wait.Add(1)
go parseTerm(results, &wait, text)
}
In the loop, sometimes do a non-blocking read from the channel (as shown in your question). No problem here either. But...
At the end of the loop, use:
for res := range results {
...
}
without ever calling close(results) in exactly one place, after all writers finish. This loop uses a blocking read from the channel. As long as some writer goroutine is still running, the blocking read can block without having the whole system stop, but when the last writer finishes writing and exits, there are no remaining writer goroutines. Any other remaining goroutines might rescue you, but there are none.
Since you use the var wait correctly (adding 1 in the right place, and calling Done() in the right place in the writer), the solution is to add one more goroutine, which will be the one to rescue you:
go func() {
wait.Wait()
close(results)
}()
You should spin off this rescuer goroutine just before entering the for res := range results loop. (If you spin it off any earlier, it might see the wait variable count down to zero too soon, just before it gets counted up again by spinning off another parseTerm.)
This anonymous function will block in the wait variable's Wait() function until the last writer goroutine has called the final wait.Done(), which will unblock this goroutine. Then this goroutine will call close(results), which will arrange for the for loop in your main goroutine to finish, unblocking that goroutine. When this goroutine (the rescuer) returns and thus terminates, there are no more rescuers, but we no longer need any.
(This main code then calls wait.Wait() unnecessarily: Since the for didn't terminate until the wait.Wait() in the new goroutine already unblocked, we know that this next wait.Wait() will return immediately. So we can drop this second call, although leaving it in is harmless.)
The problem is that nothing is closing the results channel, yet the range loop only exits when it closes. I've simplified your code to illustrate this and propsed a solution - basically consume the data in a goroutine:
// This is our producer
func foo(i int, ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
ch <- i
fmt.Println(i, "done")
}
// This is our consumer - it uses a different WG to signal it's done
func consumeData(ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
for x := range ch {
fmt.Println(x)
}
fmt.Println("ALL DONE")
}
func main() {
ch := make(chan int)
wg := sync.WaitGroup{}
// create the producers
for i := 0; i < 10; i++ {
wg.Add(1)
go foo(i, ch, &wg)
}
// create the consumer on a different goroutine, and sync using another WG
consumeWg := sync.WaitGroup{}
consumeWg.Add(1)
go consumeData(ch,&consumeWg)
wg.Wait() // <<<< means that the producers are done
close(ch) // << Signal the consumer to exit
consumeWg.Wait() // << Wait for the consumer to exit
}

Go Channels and delays

I was just experimenting with Go channels on my Ubuntu 64 bit environment and got confused with the output the following program produced.
I got the output:
0
1
2
3
Exit
The output when I uncommented the two commented lines:
0
1
2
3
4
Exit
Please explain the behavior.
TIA.
package main
import (
"fmt"
//"time"
)
func main() {
ch := make(chan int)
done := make(chan bool)
go func() {
for i := 0; i < 5; i++ {
ch <- i
}
//time.Sleep(1 * time.Second)
done <- false
}()
go func() {
for {
select {
case message := <-ch:
fmt.Println(message)
case <-done:
return
}
}
}()
<-done
fmt.Println("Exit")
}
Your main thread is waiting on done, then exiting. Meanwhile your first go function pipes 5 values into ch, then sends to done.
The value in done then gets read from the main thread and happens to occur before the second go function reads the last value from ch. When it does so, it exits the program.
Note that if your second thread did happen to both read from ch and done, then your program would deadlock since the main thread would never receive on done and all running go threads would be blocked waiting to receive on channels.
You're not waiting for both goroutines, and only sending a single value over done to 2 receivers, which will deadlock if the second receiver happens to be main.
Using a WaitGroup simplifies the code, and allows you to easily wait for as many goroutines as needed. https://play.golang.org/p/MWknv_9AFKp
ch := make(chan int)
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
defer close(ch)
for i := 0; i < 5; i++ {
ch <- i
}
}()
wg.Add(1)
go func() {
defer wg.Done()
for message := range ch {
fmt.Println(message)
}
}()
wg.Wait()
fmt.Println("Exit")
You have two go routines running in parallel. One inserts 5 numbers into the channel and then signals the main thread to exit, and another one reads the numbers from the channel.
Note that once the go routine that is responsible for enqueuing the numbers into the channel finishes, it signals the main thread to exit, regardless of whether the go routine that reads the numbers finished or not. So you could end up with a case where the enqueueing go routine finished before the dequeuing finished, and the main thread exited.
By adding the sleep, you make the enqueueing go routine live a bit longer and give a chance to the dequeueing go routine to read and print all the numbers before the enqueueing go routine signals the main thread to exit.
To solve that, you could just run the dequeuing code in the main thread. No need to run it in a go routine in this case.

Go channel didn't work for producer/consumer sample

I've just installed Go on Mac, and here's the code
package main
import (
"fmt"
"time"
)
func Product(ch chan<- int) {
for i := 0; i < 100; i++ {
fmt.Println("Product:", i)
ch <- i
}
}
func Consumer(ch <-chan int) {
for i := 0; i < 100; i++ {
a := <-ch
fmt.Println("Consmuer:", a)
}
}
func main() {
ch := make(chan int, 1)
go Product(ch)
go Consumer(ch)
time.Sleep(500)
}
I "go run producer_consumer.go", there's no output on screen, and then it quits.
Any problem with my program ? How to fix it ?
This is a rather verbose answer, but to put it simply:
Using time.Sleep to wait until hopefully other routines have completed their jobs is bad.
The consumer and producer shouldn't know anything about each other, apart from the type they exchange over the channel. Your code relies on both consumer and producer knowing how many ints will be passed around. Not a realistic scenario
Channels can be iterated over (think of them as a thread-safe, shared slice)
channels should be closed
At the bottom of this rather verbose answer where I attempt to explain some basic concepts and best practices (well, better practices), you'll find your code rewritten to work and display all the values without relying on time.Sleep. I've not tested that code, but should be fine
Right, there's a couple of problems here. Just as a bullet-list:
Your channel is buffered to 1, which is fine, but it's not necessary
Your channel is never closed
You're waiting 500ns, then exit regardless of the routines having completed, or even started processing for that matter.
There's no centralised control on over the routines, once you've started them, you have 0 control. If you hit ctrl+c, you might want to cancel routines when writing code that'll handle important data. Check signal handling, and context for this
Channel buffer
Seeing as you already know how many values you're going to push onto your channel, why not simply create ch := make(chan int, 100)? That way your publisher can continue to push messages onto the channel, regardless of what the consumer does.
You don't need to do this, but adding a sensible buffer to your channel, depending on what you're trying to do, is definitely worth checking out. At the moment, though, both routines are using fmt.Println & co, which is going to be a bottleneck either way. Printing to STDOUT is thread-safe, and buffered. This means that each call to fmt.Print* is going to acquire a lock, to avoid text from both routines to be combined.
Closing the channel
You could simply push all the values onto your channel, and then close it. This is, however, bad form. The rule of thumb WRT channels is that channels are created and closed in the same routine. Meaning: you're creating the channel in the main routine, that's where it should be closed.
You need a mechanism to sync up, or at least keep tabs on whether or not your routines have completed their job. That's done using the sync package, or through a second channel.
// using a done channel
func produce(ch chan<- int) <-chan struct{} {
done := make(chan struct{})
go func() {
for i := 0; i < 100; i++ {
ch <- i
}
// all values have been published
// close done channel
close(done)
}()
return done
}
func main() {
ch := make(chan int, 1)
done := produce(ch)
go consume(ch)
<-done // if producer has done its thing
close(ch) // we can close the channel
}
func consume(ch <-chan int) {
// we can now simply loop over the channel until it's closed
for i := range ch {
fmt.Printf("Consumed %d\n", i)
}
}
OK, but here you'll still need to wait for the consume routine to complete.
You may have already noticed that the done channel technically isn't closed in the same routine that creates it either. Because the routine is defined as a closure, however, this is an acceptable compromise. Now let's see how we could use a waitgroup:
import (
"fmt"
"sync"
)
func product(wg *sync.WaitGroup, ch chan<- int) {
defer wg.Done() // signal we've done our job
for i := 0; i < 100; i++ {
ch <- i
}
}
func main() {
ch := make(chan int, 1)
wg := sync.WaitGroup{}
wg.Add(1) // I'm adding a routine to the channel
go produce(&wg, ch)
wg.Wait() // will return once `produce` has finished
close(ch)
}
OK, so this looks promising, I can have the routines tell me when they've finished their tasks. But if I add both consumer and producer to the waitgroup, I can't simply iterate over the channel. The channel will only ever get closed if both routines invoke wg.Done(), but if the consumer is stuck looping over a channel that'll never get closed, then I've created a deadlock.
Solution:
A hybrid would be the easiest solution at this point: Add the consumer to a waitgroup, and use the done channel in the producer to get:
func produce(ch chan<- int) <-chan struct{} {
done := make(chan struct{})
go func() {
for i := 0; i < 100; i++ {
ch <- i
}
close(done)
}()
return done
}
func consume(wg *sync.WaitGroup, ch <-chan int) {
defer wg.Done()
for i := range ch {
fmt.Printf("Consumer: %d\n", i)
}
}
func main() {
ch := make(chan int, 1)
wg := sync.WaitGroup{}
done := produce(ch)
wg.Add(1)
go consume(&wg, ch)
<- done // produce done
close(ch)
wg.Wait()
// consumer done
fmt.Println("All done, exit")
}
I have changed slightly(expanded time.Sleep) your code. Works fine on my Linux x86_64
func Product(ch chan<- int) {
for i := 0; i < 10; i++ {
fmt.Println("Product:", i)
ch <- i
}
}
func Consumer(ch <-chan int) {
for i := 0; i < 10; i++ {
a := <-ch
fmt.Println("Consmuer:", a)
}
}
func main() {
ch := make(chan int, 1)
go Product(ch)
go Consumer(ch)
time.Sleep(10000)
}
Output
go run s1.go
Product: 0
Product: 1
Product: 2
As JimB hinted at, time.Sleep takes a time.Duration, not an integer. The godoc shows an example of how to call this correctly. In your case, you probably want:
time.Sleep(500 * time.Millisecond)
The reason that your program is exiting quickly (but not giving you an error) is due to the (somewhat surprising) way that time.Duration is implemented.
time.Duration is simply a type alias for int64. Internally, it uses the value to represent the duration in nanoseconds. When you call time.Sleep(500), the compiler will gladly interpret the numeric literal 500 as a time.Duration. Unfortunately, that means 500 ns.
time.Millisecond is a constant equal to the number of nanoseconds in a millisecond (1,000,000). The nice thing is that requiring you to do that multiplication explicitly makes it obvious to that caller what the units are on that argument. Unfortunately, time.Sleep(500) is perfectly valid go code but doesn't do what most beginners would expect.

How does make(chan bool) behave differently from make(chan bool, 1)?

My question arises from trying to read a channel, if I can, or write it, if I can, using a select statement.
I know that channels specified like make(chan bool, 1) are buffered, and part of my question is what is the difference between that, and make(chan bool) -- which this page says is the same thing as make(chan bool, 0) --- what is the point of a channel that can fit 0 values in it?
See playground A:
chanFoo := make(chan bool)
for i := 0; i < 5; i++ {
select {
case <-chanFoo:
fmt.Println("Read")
case chanFoo <- true:
fmt.Println("Write")
default:
fmt.Println("Neither")
}
}
A output:
Neither
Neither
Neither
Neither
Neither
(Removing the default case results in a deadlock!!)
Now see playground B:
chanFoo := make(chan bool, 1) // the only difference is the buffer size of 1
for i := 0; i < 5; i++ {
select {
case <-chanFoo:
fmt.Println("Read")
case chanFoo <- true:
fmt.Println("Write")
default:
fmt.Println("Neither")
}
}
B output:
Write
Read
Write
Read
Write
In my case, B output is what I want. What good are unbuffered channels? All the examples I see on golang.org appear to use them to send one signal/value at a time (which is all I need) -- but as in playground A, the channel never gets read or written. What am I missing here in my understanding of channels?
what is the point of a channel that can fit 0 values in it
First I want to point out that the second parameter here means buffer size, so that is simply a channel without buffers (un-buffered channel).
Actually that's the reason why your problem is generated. Un-buffered channels are only writable when there's someone blocking to read from it, which means you shall have some coroutines to work with -- instead of this single one.
Also see The Go Memory Model:
A receive from an unbuffered channel happens before the send on that channel completes.
An unbuffered channel means that read and writes from and to the channel are blocking.
In a select statement:
the read would work if some other goroutine was currently blocked in writing to the channel
the write would work if some other goroutine was currently blocked in reading to the channel
otherwise the default case is executed, which happens in your case A.
Unbuffered channels (created without capacity) will block the sender until somebody can read from them, so to make it work as you expect, you should use two goroutines to avoid the deadlock in the same thread.
For example, with this code: http://play.golang.org/p/KWJ1gbdSqf
It also includes a mechanism for the main func to detect when both goroutines have finished.
package main
import "fmt"
func send(out, finish chan bool) {
for i := 0; i < 5; i++ {
out <- true
fmt.Println("Write")
}
finish <- true
close(out)
}
func recv(in, finish chan bool) {
for _ = range in {
fmt.Println("Read")
}
finish <- true
}
func main() {
chanFoo := make(chan bool)
chanfinish := make(chan bool)
go send(chanFoo, chanfinish)
go recv(chanFoo, chanfinish)
<-chanfinish
<-chanfinish
}
It won't show the alternate Read and Write, because as soon as send writes to the channel, it is blocked, before it can print the "Write", so the execution moves to recv that receives the channel and prints "Read". It will try to read the channel again but it will be blocked and the execution moves to send. Now send can write the first "Write", send to the channel (without blocking because now there is a receiver ready) and write the second "Write". In any case, this is not deterministic and the scheduler may move the execution at any point to any other running goroutine (at least in the latest 1.2 release).

What is correct way to use goroutine?

I need to apply some tests to each request and fire responce based on result of tests. If one of the test fail, I need to send responce imediatelly, otherwise I wait when all tests are done succesfully. I want to do that tests with concurrency.
Now, I do that like this (simplified):
func handler_request_checker(w http.ResponseWriter, r *http.Request) {
done := make(chan bool)
quit := make(chan bool)
counter := 0
go TestOne(r,done,quit)
go TestTwo(r,done,quit)
..............
go TestTen(r,done,quit)
for {
select {
case <- quit:
fmt.Println("got quit signal")
return
case <- done:
counter++
if counter == 10 {
fmt.Println("All checks passed succesfully")
return
}
}
}
func main() {
http.HandleFunc("/", handler_request_checker)
http.ListenAndServe()
}
Example of one of goroutine:
func TestOne(r *http.Request, done,quit chan bool) {
ip,_,ok := net.SplitHostPort(r.RemoteAddr)
if ok == nil {
for _,item := range BAD_IP_LIST {
if strings.Contains(ip,item) {
quit <- true
return
}
}
done <- true
return
} else {
quit <- true
return
}
}
Problem is that goroutines doesnt' free memory after I go t quit signal. I suppose that happens because there is something in done chanel.
I'm completely new in GO, so maybe I use them wrong way?
For example, when I start load check http_load -parallel 10 -seconds 10, that goroutine easily eats 100+ MB of RAM and dont'g give it back to the system. On next check, it eat 100+ MB more, and so on.
If I do that tests without go (step-by-step), program takes no more than 10-15mb with any load checks.
Your guess is right. You are using synchronous channels, which means that both the sender and the receiver must be available in order to transmit a value.
Once your handler_request_checker function gets a quit signal, it stops retrieving any values of the done and the quit channel. Further TestOne, TestTwo, etc. goroutines will be blocked when they try to send their result. Even worse, they will stay in memory forever because they are still running since they haven't transmitted their result yet.
One way to solve the problem would be to use buffered (asynchronous) channels for done and quit. For example:
done := make(chan bool, 10)
quit := make(chan bool, 10)
If you use a buffer size of 10 for your 10 tests, then all goroutines are able to send their results, even when there is no reader available anymore. All goroutines will exit cleanly and the channels (that might contain some unread results) will get garbage collected once all goroutines have quit.

Resources