Golang: limit concurrency levels of a blocking operation - go

I have the following scenario:
I am receiving a message on a channel telling me to upload a file. The upload is made by the blocking function uploadToServer. The zipGen channel may receive several messages per second, and I want to upload maximum 5 files simultaneously (not more, but possibly less - depending on how many messages are sent on zipGen by a third worker that is out of the scope of this question).
The listenToZips function runs inside a go routine (go listenToZips() on the file's init function):
func listenToZips() {
for {
select {
case zip := <-zipGen:
uploadToServer(zip) // this is blocking
}
}
}
If I launch go uploadToServer(zip) instead of just uploadToServer(zip) - I get too much concurrency (so for example my program will try to upload 10 files at the same time, but I want a maximum of 5).
On the other hand, without go uploadToServer(zip) (just using uploadToServer(zip) like in the above function), I only upload one file at a time (since the uploadToServer(zip) is blocking).
How can I achieve this level of control to allow me a max upload of 5 files simultaneously?
Thanks!

The simplest option - prespawn N goroutines that take input from the channel, and upload it, in a loop. In each goroutine's context the operation will be blocking, but N goroutines do this. Only one goroutine will receive each message, of course.
func listenToZips(concurrent int) {
for i:=0; i < concurrent; i++ {
// spawn a listener goroutine
go func() {
for {
select {
case zip := <-zipGen:
uploadToServer(zip) // this is blocking
}
}
}()
}
}
Of course you can then add stop condition, probably using a different channel, but the basic idea is just the same.

try this:
https://github.com/korovkin/limiter
limiter := NewConcurrencyLimiter(10)
limiter.Execute(func() {
uploadToServer()
})
limiter.Wait()

Related

Go Concurrency Circular Logic

I'm just getting into concurrency in Go and trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel. If a message comes into my dispatch function via the dispatchchan channel and my other go routines are busy, the message is appended onto the stack slice in the dispatcher and the dispatcher will try to send again later when a worker becomes available, and/or no more messages are received on the dispatchchan. This is because the dispatchchan and the jobchan are unbuffered, and the go routine the workers are running will append other messages to the dispatcher up to a certain point and I don't want the workers blocked waiting on the dispatcher and creating a deadlock. Here's the dispatcher code I've come up with so far:
func dispatch() {
var stack []string
acount := 0
for {
select {
case d := <-dispatchchan:
stack = append(stack, d)
case c := <-mw:
acount = acount + c
case jobchan <-stack[0]:
if len(stack) > 1 {
stack[0] = stack[len(stack)-1]
stack = stack[:len(stack)-1]
} else {
stack = nil
}
default:
if acount == 0 && len(stack) == 0 {
close(jobchan)
close(dispatchchan)
close(mw)
wg.Done()
return
}
}
}
Complete example at https://play.golang.wiki/p/X6kXVNUn5N7
The mw channel is a buffered channel the same length as the number of worker go routines. It acts as a semaphore for the worker pool. If the worker routine is doing [m]eaningful [w]ork it throws int 1 on the mw channel and when it finishes its work and goes back into the for loop listening to the jobchan it throws int -1 on the mw. This way the dispatcher knows if there's any work being done by the worker pool, or if the pool is idle. If the pool is idle and there are no more messages on the stack, then the dispatcher closes the channels and return control to the main func.
This is all good but the issue I have is that the stack itself could be zero length so the case where I attempt to send stack[0] to the jobchan, if the stack is empty, I get an out of bounds error. What I'm trying to figure out is how to ensure that when I hit that case, either stack[0] has a value in it or not. I don't want that case to send an empty string to the jobchan.
Any help is greatly appreciated. If there's a more idomatic concurrency pattern I should consider, I'd love to hear about it. I'm not 100% sold on this solution but this is the farthest I've gotten so far.
This is all good but the issue I have is that the stack itself could be zero length so the case where I attempt to send stack[0] to the jobchan, if the stack is empty, I get an out of bounds error.
I can't reproduce it with your playground link, but it's believable, because at lest one gofunc worker might have been ready to receive on that channel.
My output has been Msgcnt: 0, which is also easily explained, because gofunc might not have been ready to receive on jobschan when dispatch() runs its select. The order of these operations is not defined.
trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel
A channel needs no dispatcher. A channel is the dispatcher.
If a message comes into my dispatch function via the dispatchchan channel and my other go routines are busy, the message is [...] will [...] send again later when a worker becomes available, [...] or no more messages are received on the dispatchchan.
With a few creative edits, it was easy to turn that into something close to the definition of a buffered channel. It can be read from immediately, or it can take up to some "limit" of messages that can't be immediately dispatched. You do define limit, though it's not used elsewhere within your code.
In any function, defining a variable you don't read will result in a compile time error like limit declared but not used. This stricture improves code quality and helps identify typeos. But at package scope, you've gotten away with defining the unused limit as a "global" and thus avoided a useful error - you haven't limited anything.
Don't use globals. Use passed parameters to define scope, because the definition of scope is tantamount to functional concurrency as expressed with the go keyword. Pass the relevant channels defined in local scope to functions defined at package scope so that you can easily track their relationships. And use directional channels to enforce the producer/consumer relationship between your functions. More on this later.
Going back to "limit", it makes sense to limit the quantity of jobs you're queueing because all resources are limited, and accepting more messages than you have any expectation of processing requires more durable storage than process memory provides. If you don't feel obligated to fulfill those requests no matter what, don't accept "too many" of them in the first place.
So then, what function has dispatchchan and dispatch()? To store a limited number of pending requests, if any, before they can be processed, and then to send them to the next available worker? That's exactly what a buffered channel is for.
Circular Logic
Who "knows" when your program is done? main() provides the initial input, but you close all 3 channels in `dispatch():
close(jobchan)
close(dispatchchan)
close(mw)
Your workers write to their own job queue so only when the workers are done writing to it can the incoming job queue be closed. However, individual workers also don't know when to close the jobs queue because other workers are writing to it. Nobody knows when your algorithm is done. There's your circular logic.
The mw channel is a buffered channel the same length as the number of worker go routines. It acts as a semaphore for the worker pool.
There's a race condition here. Consider the case where all n workers have just received the last n jobs. They've each read from jobschan and they're checking the value of ok. disptatcher proceeds to run its select. Nobody is writing to dispatchchan or reading from jobschan right now so the default case is immediately matched. len(stack) is 0 and there's no current job so dispatcher closes all channels including mw. At some point thereafter, a worker tries to write to a closed channel and panics.
So finally I'm ready to provide some code, but I have one more problem: I don't have a clear problem statement to write code around.
I'm just getting into concurrency in Go and trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel.
Channels between goroutines are like the teeth that synchronize gears. But to what end do the gears turn? You're not trying to keep time, nor construct a wind-up toy. Your gears could be made to turn, but what would success look like? Their turning?
Let's try to define a more specific use case for channels: given an arbitrarily long set of durations coming in as strings on standard input*, sleep that many seconds in one of n workers. So that we actually have a result to return, we'll say each worker will return the start and end time the duration was run for.
So that it can run in the playground, I'll simulate standard input with a hard-coded byte buffer.
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"strings"
"sync"
"time"
)
type SleepResult struct {
worker_id int
duration time.Duration
start time.Time
end time.Time
}
func main() {
var num_workers = 2
workchan := make(chan time.Duration)
resultschan := make(chan SleepResult)
var wg sync.WaitGroup
var resultswg sync.WaitGroup
resultswg.Add(1)
go results(&resultswg, resultschan)
for i := 0; i < num_workers; i++ {
wg.Add(1)
go worker(i, &wg, workchan, resultschan)
}
// playground doesn't have stdin
var input = bytes.NewBufferString(
strings.Join([]string{
"3ms",
"1 seconds",
"3600ms",
"300 ms",
"5s",
"0.05min"}, "\n") + "\n")
var scanner = bufio.NewScanner(input)
for scanner.Scan() {
text := scanner.Text()
if dur, err := time.ParseDuration(text); err != nil {
fmt.Fprintln(os.Stderr, "Invalid duration", text)
} else {
workchan <- dur
}
}
close(workchan) // we know when our inputs are done
wg.Wait() // and when our jobs are done
close(resultschan)
resultswg.Wait()
}
func results(wg *sync.WaitGroup, resultschan <-chan SleepResult) {
for res := range resultschan {
fmt.Printf("Worker %d: %s : %s => %s\n",
res.worker_id, res.duration,
res.start.Format(time.RFC3339Nano), res.end.Format(time.RFC3339Nano))
}
wg.Done()
}
func worker(id int, wg *sync.WaitGroup, jobchan <-chan time.Duration, resultschan chan<- SleepResult) {
var res = SleepResult{worker_id: id}
for dur := range jobchan {
res.duration = dur
res.start = time.Now()
time.Sleep(res.duration)
res.end = time.Now()
resultschan <- res
}
wg.Done()
}
Here I use 2 wait groups, one for the workers, one for the results. This makes sure Im done writing all the results before main() ends. I keep my functions simple by having each function do exactly one thing at a time: main reads inputs, parses durations from them, and sends them off to the next worker. The results function collects results and prints them to standard output. The worker does the sleeping, reading from jobchan and writing to resultschan.
workchan can be buffered (or not, as in this case); it doesn't matter because the input will be read at the rate it can be processed. We can buffer as much input as we want, but we can't buffer an infinite amount. I've set channel sizes as big as 1e6 - but a million is a lot less than infinite. For my use case, I don't need to do any buffering at all.
main knows when the input is done and can close the jobschan. main also knows when jobs are done (wg.Wait()) and can close the results channel. Closing these channels is an important signal to the worker and results goroutines - they can distinguish between a channel that is empty and a channel that is guaranteed not to have any new additions.
for job := range jobchan {...} is shorthand for your more verbose:
for {
job, ok := <- jobchan
if !ok {
wg.Done()
return
}
...
}
Note that this code creates 2 workers, but it could create 20 or 2000, or even 1. The program functions regardless of how many workers are in the pool. It can handle any volume of input (though interminable input of course leads to an interminable program). It does not create a cyclic loop of output to input. If your use case requires jobs to create more jobs, that's a more challenging scenario that can typically be avoided with careful planning.
I hope this gives you some good ideas about how you can better use concurrency in your Go applications.
https://play.golang.wiki/p/cZuI9YXypxI

Safe to read from shared data structure in goroutines without channels

I am new to Golang and starting to use goroutines. I am curious if it is safe to read from a data structure that is being written to by another goroutines without using channel or mutex. In the example below, the events array is visible to main and goroutine, but main only reads from it and the goroutine is modifying it. Is this considered safe?
var events = []string{}
func main() {
go func() {
for {
if len(events) > 30 {
events = nil
}
event := "The time is now " + time.Now().String()
events = append(events, event)
time.Sleep(time.Millisecond * 200)
}
}()
for {
for i:=0; i < len(events); i++ {
fmt.Println(events[i])
}
time.Sleep(time.Millisecond * 100)
fmt.Println("--------------------------------------------------------")
}
}
Thanks.
No, this is absolutely not safe. You should use synchronization primitives around access to events, or redesign to use channels.
Your code may work fine today, on a specific architecture/OS, but it may break unexpectedly with small changes and/or on another architecture/OS.
Try to run your code with -race to verify. You'll probably see WARNING: DATA RACE.
See also this answer.

Is there a resource leak here?

func First(query string, replicas ...Search) Result {
c := make(chan Result)
searchReplica := func(i int) {
c <- replicas[i](query)
}
for i := range replicas {
go searchReplica(i)
}
return <-c
}
This function is from the slides of Rob Pike on go concurrency patterns in 2012. I think there is a resource leak in this function. As the function return after the first send & receive pair happens on channel c, the other go routines try to send on channel c. So there is a resource leak here. Anyone knows golang well can confirm this? And how can I detect this leak using what kind of golang tooling?
Yes, you are right (for reference, here's the link to the slide). In the above code only one launched goroutine will terminate, the rest will hang on attempting to send on channel c.
Detailing:
c is an unbuffered channel
there is only a single receive operation, in the return statement
A new goroutine is launched for each element of replicas
each launched goroutine sends a value on channel c
since there is only 1 receive from it, one goroutine will be able to send a value on it, the rest will block forever
Note that depending on the number of elements of replicas (which is len(replicas)):
if it's 0: First() would block forever (no one sends anything on c)
if it's 1: would work as expected
if it's > 1: then it leaks resources
The following modified version will not leak goroutines, by using a non-blocking send (with the help of select with default branch):
searchReplica := func(i int) {
select {
case c <- replicas[i](query):
default:
}
}
The first goroutine ready with the result will send it on channel c which will be received by the goroutine running First(), in the return statement. All other goroutines when they have the result will attempt to send on the channel, and "seeing" that it's not ready (send would block because nobody is ready to receive from it), the default branch will be chosen, and thus the goroutine will end normally.
Another way to fix it would be to use a buffered channel:
c := make(chan Result, len(replicas))
And this way the send operations would not block. And of course only one (the first sent) value will be received from the channel and returned.
Note that the solution with any of the above fixes would still block if len(replicas) is 0. To avoid that, First() should check this explicitly, e.g.:
func First(query string, replicas ...Search) Result {
if len(replicas) == 0 {
return Result{}
}
// ...rest of the code...
}
Some tools / resources to detect leaks:
https://github.com/fortytw2/leaktest
https://github.com/zimmski/go-leak
https://medium.com/golangspec/goroutine-leak-400063aef468
https://blog.minio.io/debugging-go-routine-leaks-a1220142d32c

What happens when returning on for-select loop with running goroutines

I'm trying to figure how can I shorten run times as much as possible when waiting for results on multiple goroutines. The idea is doing a for-select loop on retrieving messages from a channel (result channel) and breaking out of the loop when a result is false. Subsequently, potentially one or more goroutines are left running and I don't quite know what would happen in the background.
Consider this:
results := make(chan bool, int NumRequests)
go DoSomething(results) // DoSomething sends the result on results channel
go DoSomething(results) // DoSomething sends the result on results channel
go DoSomething(results) // DoSomething sends the result on results channel
for {
select {
case r := <- results:
if !r {
return
}
}
}
My question is - What would happen if I return while there are goroutines trying to sent their result to the channel? I've made a buffered results channel as seen above so the goroutines wouldn't deadlock while running. What would happen memory-wise when doing this? Will there be a goroutine leakage? What is the idiomatic way of doing something like this?
This is the exact purpose behind cancellable contexts. Take a look at the example for context.WithCancel, it shows how to do exactly what you describe.

Writing Sleep function based on time.After

EDIT: My question is different from How to write my own Sleep function using just time.After? It has a different variant of the code that's not working for a separate reason and I needed explanation as to why.
I'm trying to solve the homework problem here: https://www.golang-book.com/books/intro/10 (Write your own Sleep function using time.After).
Here's my attempt so far based on the examples discussed in that chapter:
package main
import (
"fmt"
"time"
)
func myOwnSleep(duration int) {
for {
select {
case <-time.After(time.Second * time.Duration(duration)):
fmt.Println("slept!")
default:
fmt.Println("Waiting")
}
}
}
func main() {
go myOwnSleep(3)
var input string
fmt.Scanln(&input)
}
http://play.golang.org/p/fb3i9KY3DD
My thought process is that the infinite for will keep executing the select statement's default until the time.After function's returned channel talks. Problem with the current code being, the latter does not happen, while the default statement is called infinitely.
What am I doing wrong?
In each iteration of your for loop the select statement is executed which involves evaluating the channel operands.
In each iteration time.After() will be called and a new channel will be created!
And if duration is more than 0, this channel is not ready to receive from, so the default case will be executed. This channel will not be tested/checked again, the next iteration creates a new channel which will again not be ready to receive from, so the default case is chosen again - as always.
The solution is really simple though as can be seen in this answer:
func Sleep(sec int) {
<-time.After(time.Second* time.Duration(sec))
}
Fixing your variant:
If you want to make your variant work, you have to create one channel only (using time.After()), store the returned channel value, and always check this channel. And if the channel "kicks in" (a value is received from it), you must return from your function because more values will not be received from it and so your loop will remain endless!
func myOwnSleep(duration int) {
ch := time.After(time.Second * time.Duration(duration))
for {
select {
case <-ch:
fmt.Println("slept!")
return // MUST RETURN, else endless loop!
default:
fmt.Println("Waiting")
}
}
}
Note that though until a value is received from the channel, this function will not "rest" and just execute code relentlessly - loading one CPU core. This might even give you trouble if only 1 CPU core is available (runtime.GOMAXPROCS()), other goroutines (including the one that will (or would) send the value on the channel) might get blocked and never executed. A sleep (e.g. time.Sleep(time.Millisecond)) could release the CPU core from doing endless work (and allow other goroutines to run).

Resources