What is correct way to use goroutine? - go

I need to apply some tests to each request and fire responce based on result of tests. If one of the test fail, I need to send responce imediatelly, otherwise I wait when all tests are done succesfully. I want to do that tests with concurrency.
Now, I do that like this (simplified):
func handler_request_checker(w http.ResponseWriter, r *http.Request) {
done := make(chan bool)
quit := make(chan bool)
counter := 0
go TestOne(r,done,quit)
go TestTwo(r,done,quit)
..............
go TestTen(r,done,quit)
for {
select {
case <- quit:
fmt.Println("got quit signal")
return
case <- done:
counter++
if counter == 10 {
fmt.Println("All checks passed succesfully")
return
}
}
}
func main() {
http.HandleFunc("/", handler_request_checker)
http.ListenAndServe()
}
Example of one of goroutine:
func TestOne(r *http.Request, done,quit chan bool) {
ip,_,ok := net.SplitHostPort(r.RemoteAddr)
if ok == nil {
for _,item := range BAD_IP_LIST {
if strings.Contains(ip,item) {
quit <- true
return
}
}
done <- true
return
} else {
quit <- true
return
}
}
Problem is that goroutines doesnt' free memory after I go t quit signal. I suppose that happens because there is something in done chanel.
I'm completely new in GO, so maybe I use them wrong way?
For example, when I start load check http_load -parallel 10 -seconds 10, that goroutine easily eats 100+ MB of RAM and dont'g give it back to the system. On next check, it eat 100+ MB more, and so on.
If I do that tests without go (step-by-step), program takes no more than 10-15mb with any load checks.

Your guess is right. You are using synchronous channels, which means that both the sender and the receiver must be available in order to transmit a value.
Once your handler_request_checker function gets a quit signal, it stops retrieving any values of the done and the quit channel. Further TestOne, TestTwo, etc. goroutines will be blocked when they try to send their result. Even worse, they will stay in memory forever because they are still running since they haven't transmitted their result yet.
One way to solve the problem would be to use buffered (asynchronous) channels for done and quit. For example:
done := make(chan bool, 10)
quit := make(chan bool, 10)
If you use a buffer size of 10 for your 10 tests, then all goroutines are able to send their results, even when there is no reader available anymore. All goroutines will exit cleanly and the channels (that might contain some unread results) will get garbage collected once all goroutines have quit.

Related

How does a channel for a signal inside another channel work together?

I have recently started my study of go and am trying to understand how channels work. I'm reading a book there they give such an example:
At times you will be working with channels from disparate parts of your system.
Unlike with pipelines, you can’t make any assertions about how a channel will behave
when code you’re working with is canceled via its done channel. That is to say, you
don’t know if the fact that your goroutine was canceled means the channel you’re
reading from will have been canceled. For this reason we need to wrap our read from the channel with a select statement that also selects from a done channel. This is perfectly fine, but
doing so takes code that’s easily read like this
func orDone(done, c <-chan interface{}) <-chan interface{} {
valStream := make(chan interface{})
go func() {
defer close(valStream)
for {
select {
case <-done:
return
case v, ok := <-c:
if ok == false {
return
}
select {
case valStream <- v:
case <-done:
fmt.Println("inside", count)
}
}
}
}()
return valStream
}
And I don't quite understand how closing a channel through a done channel works.
Let's say I have this code:
func main() {
done1 := make(chan interface{})
done2 := make(chan interface{})
inf := infGenerator(done1)
for v := range orDone(done2, inf) {
if count == 3 {
close(done1)
}
count++
fmt.Println(v)
}
}
If , for example , the first channel is closed , then everything is clear , it will be executed
if ok == false {
return
}
But I don 't quite understand how the program works when we close the second channel instead of 1.
First, the program goes to
case <-done:
fmt.Println("inside", count)
}
And then it gets into the upper one. And I don't understand why it works this way.Shouldn't the nested case <-done: get the closure message and pick it up? Or if we close the channel, then the signal about it is not taken from the channel? And if there is a chance that another case can be executed when closing ?
Assuming infGenerator closes inf after receiving the done signal:
Closing done1 results in closing inf, which causes orDone to return as you said.
If you close done instead, the function orDone executes the nested done case, then the outer done case, simply because the nested done case did not return from the function. It reads the done signal, continues looping, and the loop reads the done signal again and returns.
An unbuffered channel does not hold any data. Thus, if you close an unbuffered channel, any goroutines blocked reading from it will be enabled with false. Any further read attempts will also immediately return false. Because of this, closing a channel is a good way to broadcast completion to all goroutines.

A case of `all goroutines are asleep - deadlock!` I can't figure out why

TL;DR: A typical case of all goroutines are asleep, deadlock! but can't figure it out
I'm parsing the Wiktionary XML dump to build a DB of words. I defer the parsing of each article's text to a goroutine hoping that it will speed up the process.
It's 7GB and is processed in under 2 minutes in my machine when doing it serially, but if I can take advantage of all cores, why not.
I'm new to threading in general, I'm getting a all goroutines are asleep, deadlock! error.
What's wrong here?
This may not be performant at all, as it uses an unbuffered channel, so all goroutines effectively end up executing serially, but my idea is to learn and understand threading and to benchmark how long it takes with different alternatives:
unbuffered channel
different sized buffered channel
only calling as many goroutines at a time as there are runtime.NumCPU()
The summary of my code in pseudocode:
while tag := xml.getNextTag() {
wg.Add(1)
go parseTagText(chan, wg, tag.text)
// consume a channel message if available
select {
case msg := <-chan:
// do something with msg
default:
}
}
// reading tags finished, wait for running goroutines, consume what's left on the channel
for msg := range chan {
// do something with msg
}
// Sometimes this point is never reached, I get a deadlock
wg.Wait()
----
func parseTagText(chan, wg, tag.text) {
defer wg.Done()
// parse tag.text
chan <- whatever // just inform that the text has been parsed
}
Complete code:
https://play.golang.org/p/0t2EqptJBXE
In your complete example on the Go Playground, you:
Create a channel (line 39, results := make(chan langs)) and a wait-group (line 40, var wait sync.WaitGroup). So far so good.
Loop: in the loop, sometimes spin off a task:
if ...various conditions... {
wait.Add(1)
go parseTerm(results, &wait, text)
}
In the loop, sometimes do a non-blocking read from the channel (as shown in your question). No problem here either. But...
At the end of the loop, use:
for res := range results {
...
}
without ever calling close(results) in exactly one place, after all writers finish. This loop uses a blocking read from the channel. As long as some writer goroutine is still running, the blocking read can block without having the whole system stop, but when the last writer finishes writing and exits, there are no remaining writer goroutines. Any other remaining goroutines might rescue you, but there are none.
Since you use the var wait correctly (adding 1 in the right place, and calling Done() in the right place in the writer), the solution is to add one more goroutine, which will be the one to rescue you:
go func() {
wait.Wait()
close(results)
}()
You should spin off this rescuer goroutine just before entering the for res := range results loop. (If you spin it off any earlier, it might see the wait variable count down to zero too soon, just before it gets counted up again by spinning off another parseTerm.)
This anonymous function will block in the wait variable's Wait() function until the last writer goroutine has called the final wait.Done(), which will unblock this goroutine. Then this goroutine will call close(results), which will arrange for the for loop in your main goroutine to finish, unblocking that goroutine. When this goroutine (the rescuer) returns and thus terminates, there are no more rescuers, but we no longer need any.
(This main code then calls wait.Wait() unnecessarily: Since the for didn't terminate until the wait.Wait() in the new goroutine already unblocked, we know that this next wait.Wait() will return immediately. So we can drop this second call, although leaving it in is harmless.)
The problem is that nothing is closing the results channel, yet the range loop only exits when it closes. I've simplified your code to illustrate this and propsed a solution - basically consume the data in a goroutine:
// This is our producer
func foo(i int, ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
ch <- i
fmt.Println(i, "done")
}
// This is our consumer - it uses a different WG to signal it's done
func consumeData(ch chan int, wg *sync.WaitGroup) {
defer wg.Done()
for x := range ch {
fmt.Println(x)
}
fmt.Println("ALL DONE")
}
func main() {
ch := make(chan int)
wg := sync.WaitGroup{}
// create the producers
for i := 0; i < 10; i++ {
wg.Add(1)
go foo(i, ch, &wg)
}
// create the consumer on a different goroutine, and sync using another WG
consumeWg := sync.WaitGroup{}
consumeWg.Add(1)
go consumeData(ch,&consumeWg)
wg.Wait() // <<<< means that the producers are done
close(ch) // << Signal the consumer to exit
consumeWg.Wait() // << Wait for the consumer to exit
}

How to run for x seconds in a http handler

I want to run my function InsertRecords for 30 seconds and test how many records I can insert in a given time.
How can I stop processing InsertRecords after x seconds and then return a result from my handler?
func benchmarkHandler(w http.ResponseWriter, r *http.Request) {
counter := InsertRecords()
w.WriteHeader(200)
io.WriteString(w, fmt.Sprintf("counter is %d", counter))
}
func InsertRecords() int {
counter := 0
// db code goes here
return counter
}
Cancellations and timeouts are often done with a context.Context.
While this simple example could be done with a channel alone, using the context here makes it more flexible, and can take into account the client disconnecting as well.
func benchmarkHandler(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), 30*time.Second)
defer cancel()
counter := InsertRecords(ctx)
w.WriteHeader(200)
io.WriteString(w, fmt.Sprintf("counter is %d", counter))
}
func InsertRecords(ctx context.Context) int {
counter := 0
done := ctx.Done()
for {
select {
case <-done:
return counter
default:
}
// db code goes here
counter++
}
return counter
}
This will run for at least 30 seconds, returning the number of complete database iterations. If you want to be sure that the handler always returns immediately after 30s, even if the DB call is blocked, then you need to push the DB code into another goroutine and let it return later. The shortest example would be to use a similar pattern as above, but synchronize access to the counter variable, since it could be written by the DB loop while returning.
func InsertRecords(ctx context.Context) int {
counter := int64(0)
done := ctx.Done()
go func() {
for {
select {
case <-done:
return
default:
}
// db code goes here
atomic.AddInt64(&counter, 1)
}
}()
<-done
return int(atomic.LoadInt64(&counter))
}
See #JoshuaKolden's answer for an example with a producer and a timeout, which could also be combined with the existing request context.
As JimB pointed out cancelation for limiting the time taken by an http requests can be handled with context.WithTimeout, however since you asked for the purposes of benchmarking you may want to use a more direct method.
The purpose of context.Context is to allow for numerous cancelation events to occur and have the same net effect of gracefully stopping all downstream tasks. In JimB's example it's possible that some other process will cancel the context before the 30 seconds have elapsed, and this is desirable from the resource utilization point of view. For example, if the connection is terminated prematurely there is no point in doing any more work on building a response.
If benchmarking is your goal you'd want to minimized the effect of superfluous events on the code being benchmarked. Here is an example of how to do that:
func InsertRecords() int {
stop := make(chan struct{})
defer close(stop)
countChan := make(chan int)
go func() {
defer close(countChan)
for {
// db code goes here
select {
case countChan <- 1:
case <-stop:
return
}
}
}()
var counter int
timeoutCh := time.After(30 * time.Second)
for {
select {
case n := <-countChan:
counter += n
case <-timeoutCh:
return counter
}
}
}
Essentially what we are doing is creating an infinite loop over discrete db operations, and counting iterations through the loop, we stop when time.After is triggered.
A problem in JimB's example is that despite checking ctx.Done() in the loop the loop can still block if the "db code" blocks. This is because ctx.Done() is only evaluated inline with the "db code" block.
To avoid this problem we separate the timing function and the benchmarking loop so that nothing can prevent us from receiving the timeout event when it occurs. Once the time out even occurs we immediately return the result of the counter. The "db code" may still be in mid execution but InsertRecords will exit and return its results anyway.
If the "db code" is in mid-execution when InsertRecords exits, the goroutine will be left running, so to clean this up we defer close(stop) so that on function exit we'll be sure to signal the goroutine to exit on the next iteration. When the goroutine exits, it cleans up the channel it was using to send the count.
As a general pattern the above is an example of how you can get precise timing in Go without regard to the actual execution time of the code being timed.
sidenote: A somewhat more advanced observation is that my example does not attempt to synchronize the start times between the timer and the goroutine. It seemed a bit pedantic to address that issue here. However, you can easily synchronize the two threads by creating a channel that blocks the main thread until the goroutine closes it just before starting the loop.

Concurrent limited consumers using goroutins and channels

I've tried to reproduce "approach that manages resources well is to start a fixed number of handle goroutines all reading from the request channel." from Effective Go and found
fatal error: all goroutines are asleep - deadlock!
The idea is simple: have 1 queue and 1 results channels and several limited number of "workers".
My code is in Go Playground
queue := make(chan *Request)
result := make(chan int)
quit := make(chan bool)
go Serve(queue, quit)
for i := 0; i < 10; i++ {
req := Request{i, result}
queue <- &req
}
close(queue)
for i := 0; i < 10; i++ {
fmt.Printf("Finished %d\n", <-result)
}
fmt.Printf("All finished\n")
quit <- true
Function Serve:
func handle(queue chan *Request) {
for r := range queue {
//process(r)
fmt.Printf("Processing %d\n", r.i)
r.r <- r.i
}
}
func Serve(clientRequests chan *Request, quit chan bool) {
MaxOutstanding := 2
// Start handlers
for i := 0; i < MaxOutstanding; i++ {
go handle(clientRequests)
}
<-quit // Wait to be told to exit.
}
What is wrong? Or may be there is simplier solution for implementing limited number of workers that process requests?
The result channel isn't buffered, so each send blocks the goroutine until something receives the result. So after the first two items are processed, the two handler routines are waiting for their results to be received. But the code to receive work items isn't running yet, so they're stuck. Then main() tries to send the next work item, and the handlers aren't ready to receive it so main() is now stuck too.
One way to fix it is to start a goroutine to receive results in the background before you queue the work items. Here's a version of your code that receives the results in the background, plus makes main() wait until all results have been received and handled: http://play.golang.org/p/_CKn3CxQFc.
In your example, you can know when it's safe to quit just by counting results received. But if a situation comes up where you need to figure out when you're done and counting is not enough, then 1) each worker can signal when it's done, 2) Serve can close the result channel once all workers signal done, 3) your result-handling goroutine can send a quit to main() when all the results have been handled. There are lots of variations on that; you can see code at http://play.golang.org/p/12wZbm0rxa

How does make(chan bool) behave differently from make(chan bool, 1)?

My question arises from trying to read a channel, if I can, or write it, if I can, using a select statement.
I know that channels specified like make(chan bool, 1) are buffered, and part of my question is what is the difference between that, and make(chan bool) -- which this page says is the same thing as make(chan bool, 0) --- what is the point of a channel that can fit 0 values in it?
See playground A:
chanFoo := make(chan bool)
for i := 0; i < 5; i++ {
select {
case <-chanFoo:
fmt.Println("Read")
case chanFoo <- true:
fmt.Println("Write")
default:
fmt.Println("Neither")
}
}
A output:
Neither
Neither
Neither
Neither
Neither
(Removing the default case results in a deadlock!!)
Now see playground B:
chanFoo := make(chan bool, 1) // the only difference is the buffer size of 1
for i := 0; i < 5; i++ {
select {
case <-chanFoo:
fmt.Println("Read")
case chanFoo <- true:
fmt.Println("Write")
default:
fmt.Println("Neither")
}
}
B output:
Write
Read
Write
Read
Write
In my case, B output is what I want. What good are unbuffered channels? All the examples I see on golang.org appear to use them to send one signal/value at a time (which is all I need) -- but as in playground A, the channel never gets read or written. What am I missing here in my understanding of channels?
what is the point of a channel that can fit 0 values in it
First I want to point out that the second parameter here means buffer size, so that is simply a channel without buffers (un-buffered channel).
Actually that's the reason why your problem is generated. Un-buffered channels are only writable when there's someone blocking to read from it, which means you shall have some coroutines to work with -- instead of this single one.
Also see The Go Memory Model:
A receive from an unbuffered channel happens before the send on that channel completes.
An unbuffered channel means that read and writes from and to the channel are blocking.
In a select statement:
the read would work if some other goroutine was currently blocked in writing to the channel
the write would work if some other goroutine was currently blocked in reading to the channel
otherwise the default case is executed, which happens in your case A.
Unbuffered channels (created without capacity) will block the sender until somebody can read from them, so to make it work as you expect, you should use two goroutines to avoid the deadlock in the same thread.
For example, with this code: http://play.golang.org/p/KWJ1gbdSqf
It also includes a mechanism for the main func to detect when both goroutines have finished.
package main
import "fmt"
func send(out, finish chan bool) {
for i := 0; i < 5; i++ {
out <- true
fmt.Println("Write")
}
finish <- true
close(out)
}
func recv(in, finish chan bool) {
for _ = range in {
fmt.Println("Read")
}
finish <- true
}
func main() {
chanFoo := make(chan bool)
chanfinish := make(chan bool)
go send(chanFoo, chanfinish)
go recv(chanFoo, chanfinish)
<-chanfinish
<-chanfinish
}
It won't show the alternate Read and Write, because as soon as send writes to the channel, it is blocked, before it can print the "Write", so the execution moves to recv that receives the channel and prints "Read". It will try to read the channel again but it will be blocked and the execution moves to send. Now send can write the first "Write", send to the channel (without blocking because now there is a receiver ready) and write the second "Write". In any case, this is not deterministic and the scheduler may move the execution at any point to any other running goroutine (at least in the latest 1.2 release).

Resources