Go Concurrency Circular Logic - go

I'm just getting into concurrency in Go and trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel. If a message comes into my dispatch function via the dispatchchan channel and my other go routines are busy, the message is appended onto the stack slice in the dispatcher and the dispatcher will try to send again later when a worker becomes available, and/or no more messages are received on the dispatchchan. This is because the dispatchchan and the jobchan are unbuffered, and the go routine the workers are running will append other messages to the dispatcher up to a certain point and I don't want the workers blocked waiting on the dispatcher and creating a deadlock. Here's the dispatcher code I've come up with so far:
func dispatch() {
var stack []string
acount := 0
for {
select {
case d := <-dispatchchan:
stack = append(stack, d)
case c := <-mw:
acount = acount + c
case jobchan <-stack[0]:
if len(stack) > 1 {
stack[0] = stack[len(stack)-1]
stack = stack[:len(stack)-1]
} else {
stack = nil
}
default:
if acount == 0 && len(stack) == 0 {
close(jobchan)
close(dispatchchan)
close(mw)
wg.Done()
return
}
}
}
Complete example at https://play.golang.wiki/p/X6kXVNUn5N7
The mw channel is a buffered channel the same length as the number of worker go routines. It acts as a semaphore for the worker pool. If the worker routine is doing [m]eaningful [w]ork it throws int 1 on the mw channel and when it finishes its work and goes back into the for loop listening to the jobchan it throws int -1 on the mw. This way the dispatcher knows if there's any work being done by the worker pool, or if the pool is idle. If the pool is idle and there are no more messages on the stack, then the dispatcher closes the channels and return control to the main func.
This is all good but the issue I have is that the stack itself could be zero length so the case where I attempt to send stack[0] to the jobchan, if the stack is empty, I get an out of bounds error. What I'm trying to figure out is how to ensure that when I hit that case, either stack[0] has a value in it or not. I don't want that case to send an empty string to the jobchan.
Any help is greatly appreciated. If there's a more idomatic concurrency pattern I should consider, I'd love to hear about it. I'm not 100% sold on this solution but this is the farthest I've gotten so far.

This is all good but the issue I have is that the stack itself could be zero length so the case where I attempt to send stack[0] to the jobchan, if the stack is empty, I get an out of bounds error.
I can't reproduce it with your playground link, but it's believable, because at lest one gofunc worker might have been ready to receive on that channel.
My output has been Msgcnt: 0, which is also easily explained, because gofunc might not have been ready to receive on jobschan when dispatch() runs its select. The order of these operations is not defined.
trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel
A channel needs no dispatcher. A channel is the dispatcher.
If a message comes into my dispatch function via the dispatchchan channel and my other go routines are busy, the message is [...] will [...] send again later when a worker becomes available, [...] or no more messages are received on the dispatchchan.
With a few creative edits, it was easy to turn that into something close to the definition of a buffered channel. It can be read from immediately, or it can take up to some "limit" of messages that can't be immediately dispatched. You do define limit, though it's not used elsewhere within your code.
In any function, defining a variable you don't read will result in a compile time error like limit declared but not used. This stricture improves code quality and helps identify typeos. But at package scope, you've gotten away with defining the unused limit as a "global" and thus avoided a useful error - you haven't limited anything.
Don't use globals. Use passed parameters to define scope, because the definition of scope is tantamount to functional concurrency as expressed with the go keyword. Pass the relevant channels defined in local scope to functions defined at package scope so that you can easily track their relationships. And use directional channels to enforce the producer/consumer relationship between your functions. More on this later.
Going back to "limit", it makes sense to limit the quantity of jobs you're queueing because all resources are limited, and accepting more messages than you have any expectation of processing requires more durable storage than process memory provides. If you don't feel obligated to fulfill those requests no matter what, don't accept "too many" of them in the first place.
So then, what function has dispatchchan and dispatch()? To store a limited number of pending requests, if any, before they can be processed, and then to send them to the next available worker? That's exactly what a buffered channel is for.
Circular Logic
Who "knows" when your program is done? main() provides the initial input, but you close all 3 channels in `dispatch():
close(jobchan)
close(dispatchchan)
close(mw)
Your workers write to their own job queue so only when the workers are done writing to it can the incoming job queue be closed. However, individual workers also don't know when to close the jobs queue because other workers are writing to it. Nobody knows when your algorithm is done. There's your circular logic.
The mw channel is a buffered channel the same length as the number of worker go routines. It acts as a semaphore for the worker pool.
There's a race condition here. Consider the case where all n workers have just received the last n jobs. They've each read from jobschan and they're checking the value of ok. disptatcher proceeds to run its select. Nobody is writing to dispatchchan or reading from jobschan right now so the default case is immediately matched. len(stack) is 0 and there's no current job so dispatcher closes all channels including mw. At some point thereafter, a worker tries to write to a closed channel and panics.
So finally I'm ready to provide some code, but I have one more problem: I don't have a clear problem statement to write code around.
I'm just getting into concurrency in Go and trying to create a dispatch go routine that will send jobs to a worker pool listening on the jobchan channel.
Channels between goroutines are like the teeth that synchronize gears. But to what end do the gears turn? You're not trying to keep time, nor construct a wind-up toy. Your gears could be made to turn, but what would success look like? Their turning?
Let's try to define a more specific use case for channels: given an arbitrarily long set of durations coming in as strings on standard input*, sleep that many seconds in one of n workers. So that we actually have a result to return, we'll say each worker will return the start and end time the duration was run for.
So that it can run in the playground, I'll simulate standard input with a hard-coded byte buffer.
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"strings"
"sync"
"time"
)
type SleepResult struct {
worker_id int
duration time.Duration
start time.Time
end time.Time
}
func main() {
var num_workers = 2
workchan := make(chan time.Duration)
resultschan := make(chan SleepResult)
var wg sync.WaitGroup
var resultswg sync.WaitGroup
resultswg.Add(1)
go results(&resultswg, resultschan)
for i := 0; i < num_workers; i++ {
wg.Add(1)
go worker(i, &wg, workchan, resultschan)
}
// playground doesn't have stdin
var input = bytes.NewBufferString(
strings.Join([]string{
"3ms",
"1 seconds",
"3600ms",
"300 ms",
"5s",
"0.05min"}, "\n") + "\n")
var scanner = bufio.NewScanner(input)
for scanner.Scan() {
text := scanner.Text()
if dur, err := time.ParseDuration(text); err != nil {
fmt.Fprintln(os.Stderr, "Invalid duration", text)
} else {
workchan <- dur
}
}
close(workchan) // we know when our inputs are done
wg.Wait() // and when our jobs are done
close(resultschan)
resultswg.Wait()
}
func results(wg *sync.WaitGroup, resultschan <-chan SleepResult) {
for res := range resultschan {
fmt.Printf("Worker %d: %s : %s => %s\n",
res.worker_id, res.duration,
res.start.Format(time.RFC3339Nano), res.end.Format(time.RFC3339Nano))
}
wg.Done()
}
func worker(id int, wg *sync.WaitGroup, jobchan <-chan time.Duration, resultschan chan<- SleepResult) {
var res = SleepResult{worker_id: id}
for dur := range jobchan {
res.duration = dur
res.start = time.Now()
time.Sleep(res.duration)
res.end = time.Now()
resultschan <- res
}
wg.Done()
}
Here I use 2 wait groups, one for the workers, one for the results. This makes sure Im done writing all the results before main() ends. I keep my functions simple by having each function do exactly one thing at a time: main reads inputs, parses durations from them, and sends them off to the next worker. The results function collects results and prints them to standard output. The worker does the sleeping, reading from jobchan and writing to resultschan.
workchan can be buffered (or not, as in this case); it doesn't matter because the input will be read at the rate it can be processed. We can buffer as much input as we want, but we can't buffer an infinite amount. I've set channel sizes as big as 1e6 - but a million is a lot less than infinite. For my use case, I don't need to do any buffering at all.
main knows when the input is done and can close the jobschan. main also knows when jobs are done (wg.Wait()) and can close the results channel. Closing these channels is an important signal to the worker and results goroutines - they can distinguish between a channel that is empty and a channel that is guaranteed not to have any new additions.
for job := range jobchan {...} is shorthand for your more verbose:
for {
job, ok := <- jobchan
if !ok {
wg.Done()
return
}
...
}
Note that this code creates 2 workers, but it could create 20 or 2000, or even 1. The program functions regardless of how many workers are in the pool. It can handle any volume of input (though interminable input of course leads to an interminable program). It does not create a cyclic loop of output to input. If your use case requires jobs to create more jobs, that's a more challenging scenario that can typically be avoided with careful planning.
I hope this gives you some good ideas about how you can better use concurrency in your Go applications.
https://play.golang.wiki/p/cZuI9YXypxI

Related

How to create channels in loop?

I am learning concurrency in go and how it works.
What I am trying to do ?
Loop through slice of data
Create struct for required/needed data
Create channel for that struct
Call worker func using go rutine and pass that channel to that rutine
Using data from channel do some processing
Set the processed output back into channel
Wait in main thread to get output from all the channels which we kicked off
Code Which I tried
package main
import (
"fmt"
"github.com/pkg/errors"
"time"
)
type subject struct {
Name string
Class string
StartDate time.Time
EndDate time.Time
}
type workerData struct {
Subject string
Class string
Result string
Error error
}
func main () {
// Creating test data
var subjects []subject
st,_ := time.Parse("01/02/2016","01/01/2015")
et,_ := time.Parse("01/02/2016","01/01/2016")
s1 := subject{Name:"Math", Class:"3", StartDate:st,EndDate:et }
s2 := subject{Name:"Geo", Class:"3", StartDate:st,EndDate:et }
s3 := subject{Name:"Bio", Class:"3", StartDate:st,EndDate:et }
s4 := subject{Name:"Phy", Class:"3", StartDate:st,EndDate:et }
s5 := subject{Name:"Art", Class:"3", StartDate:st,EndDate:et }
subjects = append(subjects, s1)
subjects = append(subjects, s2)
subjects = append(subjects, s3)
subjects = append(subjects, s4)
subjects = append(subjects, s5)
c := make(chan workerData) // I am sure this is not how I should be creating channel
for i := 0 ; i< len(subjects) ; i++ {
go worker(c)
}
for _, v := range subjects {
// Setting required data in channel
data := workerData{Subject:v.Name, Class:v.Class}
// set the data and start the routine
c <- data // I think this will update data for all the routines ? SO how should create separate channel for each routine
}
// I want to wait till all the routines set the data in channel and return the data from workers.
for {
select {
case data := <- c :
fmt.Println(data)
}
}
}
func worker (c chan workerData) {
data := <- c
// This can be any processing
time.Sleep(100 * time.Millisecond)
if data.Subject != "Math" {
data.Result = "Pass"
} else {
data.Error = errors.New("Subject not found")
}
fmt.Println(data.Subject)
// returning processed data and error to channel
c <- data
// Rightfully this closes channel and here after I get error send on Closed channel.
close(c)
}
Playgorund Link - https://play.golang.org/p/hs1-B1UR98r
Issue I am Facing
I am not sure how to create different channel for each data item. The way I am currently doing will update the channel data for all routines. I want to know is there way to create diffrent channel for each data item in loop and pass that to the go rutine. And then wait in main rutine to get the result back from rutines from all channels.
Any pointers/ help would be great ? If any confusion feel free to comment.
"// I think this will update data for all the routines ?"
A channel (to simplify) is not a data structure to store data.
It is a structure to send and receive data over different goroutines.
As such, notice that your worker function is doing send and receive on the same channel within each goroutine instances. If you were having only one instance of such worker, this would deadlock (https://golang.org/doc/articles/race_detector.html).
In the version of the code you posted, for a beginner this might seem to work because you have many workers exchanging works to each other. But it is wrong for a correct program.
As a consequence, if a worker can not read and write the same channel, then it must consume a specific writable channel to send its results to some other routines.
// I want to wait till all the routines set the data in channel and
return the data from workers.
This is part of the synchronization mechanisms required to ensure that a pusher waits until all its workers has finished their job before proceeding further. (this blog post talks about it https://medium.com/golangspec/synchronized-goroutines-part-i-4fbcdd64a4ec)
// Rightfully this closes channel and here after I get error send on
Closed channel.
Take care that you have n routines of workers executing in parallel. The first of this worker to reach the end of its function will close the channel, making it unwritable to other workers, and false signaling its end to main.
Normally one use the close statement on the writer side to indicate that there is no more data into the channel. To indicate it has ended. This signal is consumed by readers to quit their read-wait operation of the channel.
As an example, lets review this loop
for {
select {
case data := <- c :
fmt.Println(data)
}
}
it is bad, really bad.
It is an infinite loop with no exit statement
The select is superfluous and does not contain exit statement, remember that a read on a channel is a blocking operation.
It is a bad rewrite of a standard pattern provided by the language, the range loop over a channel
The range loop over a channel is very simply written
for data := range c {
fmt.Println(data)
}
This pattern has one great advantage, it automatically detect a closed channel to exit the loop! letting you loop over only the relevant data to process. It is also much more succint.
Also, your worker is a awkward in that it read and write only one element before quitting.
Spawning go routines is cheap, but not free. You should always evaluate the trade-off between the costs of async processing and its actual workload.
Overall, your code should be closer to what is demonstrated here
https://gobyexample.com/worker-pools

Computing mod inverse

I want to compute the inverse element of a prime in modular arithmetic.
In order to speed things up I start a few goroutines which try to find the element in a certain range. When the first one finds the element, it sends it to the main goroutine and at this point I want to terminate the program. So I call close in the main goroutine, but I don't know if the goroutines will finish their execution (I guess not). So a few questions arise:
1) Is this a bad style, should I have something like a WaitGroup?
2) Is there a more idiomatic way to do this computation?
package main
import "fmt"
const (
Procs = 8
P = 1000099
Base = 1<<31 - 1
)
func compute(start, end uint64, finished chan struct{}, output chan uint64) {
for i := start; i < end; i++ {
select {
case <-finished:
return
default:
break
}
if i*P%Base == 1 {
output <- i
}
}
}
func main() {
finished := make(chan struct{})
output := make(chan uint64)
for i := uint64(0); i < Procs; i++ {
start := i * (Base / Procs)
end := (i + 1) * (Base / Procs)
go compute(start, end, finished, output)
}
fmt.Println(<-output)
close(finished)
}
Is there a more idiomatic way to do this computation?
You don't actually need a loop to compute this.
If you use the GCD function (part of the standard library), you get returned numbers x and y such that:
x*P+y*Base=1
this means that x is the answer you want (because x*P = 1 modulo Base):
package main
import (
"fmt"
"math/big"
)
const (
P = 1000099
Base = 1<<31 - 1
)
func main() {
bigP := big.NewInt(P)
bigBase := big.NewInt(Base)
// Compute inverse of bigP modulo bigBase
bigGcd := big.NewInt(0)
bigX := big.NewInt(0)
bigGcd.GCD(bigX,nil,bigP,bigBase)
// x*bigP+y*bigBase=1
// => x*bigP = 1 modulo bigBase
fmt.Println(bigX)
}
Is this a bad style, should I have something like a WaitGroup?
A wait group solves a different problem.
In general, to be a responsible go citizen here and ensure your code runs and tidies up behind itself, you may need to do a combination of:
Signal to the spawned goroutines to stop their calculations when the result of the computation has been found elsewhere.
Ensure a synchronous process waits for the goroutines to stop before returning. This is not mandatory if they properly respond to the signal in #1, but if you don't wait, there will be no guarantee they have terminated before the parent goroutine continues.
In your example program, which performs this task and then quits, there is strictly no need to do either. As this comment indicates, your program's main method terminates upon a satisfactory answer being found, at which point the program will end, any goroutines will be summarily terminated, and the operating system will tidy up any consumed resources. Waiting for goroutines to stop is unnecessary.
However, if you wrapped this code up into a library or it became part of a long running "inverse prime calculation" service, it would be desirable to tidy up the goroutines you spawned to avoid wasting cycles unnecessarily. Additionally, in general, you may have other scenarios in which goroutines store state, hold handles to external resources, or hold handles to internal objects which you risk leaking if not properly tidied away – it is desirable to properly close these.
Communicating the requirement to stop working
There are several approaches to communicate this. I don't claim this is an exhaustive list! (Please do suggest other general-purpose methods in the comments or by proposing edits to the post.)
Using a special channel
Signal the child goroutines by closing a special "shutdown" channel reserved for the purpose. This exploits the channel axiom:
A receive from a closed channel returns the zero value immediately
On receiving from the shutdown channel, the goroutine should immediately arrange to tidy any local state and return from the function. Your earlier question had example code which implemented this; a version of the pattern is:
func myGoRoutine(shutdownChan <-chan struct{}) {
select {
case <-shutdownChan:
// tidy up behaviour goes here
return
// You may choose to listen on other channels here to implement
// the primary behaviour of the goroutine.
}
}
func main() {
shutdownChan := make(chan struct{})
go myGoRoutine(shutdownChan)
// some time later
close(shutdownChan)
}
In this instance, the shutdown logic is wasted because the main() method will immediately return after the call to close. This will race with the shutdown of the goroutine, but we should assume it will not properly execute its tidy-up behaviour. Point 2 addresses ways to fix this.
Using a context
The context package provides the option to create a context which can be cancelled. On cancellation, a channel exposed by the context's Done() method will be closed, which signals time to return from the goroutine.
This approach is approximately the same as the previous method, with the exception of neater encapsulation and the availability of a context to pass to downstream calls in your goroutine to cancel nested calls where desired. Example:
func myGoRoutine(ctx context.Context) {
select {
case <-ctx.Done():
// tidy up behaviour goes here
return
// Put real behaviour for the goroutine here.
}
}
func main() {
// Get a context (or use an existing one if you are provided with one
// outside a `main` method:
ctx := context.Background()
// Create a derived context with a cancellation method
ctx, cancel := context.WithCancel(ctx)
go myGoRoutine(ctx)
// Later, when ready to quit
cancel()
}
This has the same bug as the other case in that the main method will not wait for the child goroutines to quit before returning.
Waiting (or "join"ing) for child goroutines to stop
The code which closes the shutdown channel or closes the context in the above examples will not wait for child goroutines to stop working before continuing. This may be acceptable in some instances, while in others you may require the guarantee that goroutines have stopped before continuing.
sync.WaitGroup can be used to implement this requirement. The documentation is comprehensive. A wait group is a counter which should be incremented using its Add method on starting a goroutine and decremented using its Done method when a goroutine completes. Code can wait for the counter to return to zero by calling its Wait method, which blocks until the condition is true. All calls to Add must occur before a call to Wait.
Example code:
func main() {
var wg sync.WaitGroup
// Increment the WaitGroup with the number of goroutines we're
// spawning.
wg.Add(1)
// It is common to wrap a goroutine in a function which performs
// the decrement on the WaitGroup once the called function returns
// to avoid passing references of this control logic to the
// downstream consumer.
go func() {
// TODO: implement a method to communicate shutdown.
callMyFunction()
wg.Done()
}()
// Indicate shutdown, e.g. by closing a channel or cancelling a
// context.
// Wait for goroutines to stop
wg.Wait()
}
Is there a more idiomatic way to do this computation?
This algorithm is certainly parallelizable through use of goroutines in the manner you have defined. As the work is CPU-bound, the limitation of goroutines to the number of available CPUs makes sense (in the absence of other work on the machine) to benefit from the available compute resource.
See peterSO's answer for a bug fix.

Distribute the same keyword to multiple goroutines

I have something like this mock (code below) which distributes the same keyword out to multiple goroutines, except the goroutines all take different amount of times doing things with the keyword but can operate independently of each other so they don't need any synchronization. The solution given below to distribute clearly synchronizes the goroutines.
I just want to toss this idea out there to see how other people would deal with this type of distribution, as I assume it is fairly common and someone else has thought about it before.
Here are some other solutions I have thought up and why they seem kinda meh to me:
One goroutine for each keyword
Each time a new keyword comes in spawn a goroutine to handle the distribution
Give the keyword a bitmask or something for each goroutine to update
This way once all of the workers have touched the keyword it can be deleted and we can move on
Give each worker its own stack to work off of
This seems kinda appealing, just give each worker a stack to add each keyword to, but we would eventually run into a problem of a ton of memory being taken up since it is planned to run so long
The problem with all of these is that my code is supposed to run for a long time, unwatched, and that would lead to either a huge build up of keywords or goroutines due to the lazy worker taking longer than the others. It almost seems like it'd be nice to give each worker its own Amazon SQS queue or implement something similar to that myself.
EDIT:
Store the keyword outside the program
I just thought of doing it this way instead, I could perhaps just store the keyword outside the program until they all grab it and then delete it once it has been used up. This sits ok with me actually, I don't have a problem with using up disk space
Anyway here is an example of the approach that waits for all to finish:
package main
import (
"flag"
"fmt"
"math/rand"
"os"
"os/signal"
"strconv"
"time"
)
var (
shutdown chan struct{}
count = flag.Int("count", 5, "number to run")
)
type sleepingWorker struct {
name string
sleep time.Duration
ch chan int
}
func NewQuicky(n string) sleepingWorker {
var rq sleepingWorker
rq.name = n
rq.ch = make(chan int)
rq.sleep = time.Duration(rand.Intn(5)) * time.Second
return rq
}
func (r sleepingWorker) Work() {
for {
fmt.Println(r.name, "is about to sleep, number:", <-r.ch)
time.Sleep(r.sleep)
}
}
func NewLazy() sleepingWorker {
var rq sleepingWorker
rq.name = "Lazy slow worker"
rq.ch = make(chan int)
rq.sleep = 20 * time.Second
return rq
}
func distribute(gen chan int, workers ...sleepingWorker) {
for kw := range gen {
for _, w := range workers {
fmt.Println("sending keyword to:", w.name)
select {
case <-shutdown:
return
case w.ch <- kw:
fmt.Println("keyword sent to:", w.name)
}
}
}
}
func main() {
flag.Parse()
shutdown = make(chan struct{})
go func() {
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
<-c
close(shutdown)
}()
x := make([]sleepingWorker, *count)
for i := 0; i < (*count)-1; i++ {
x[i] = NewQuicky(strconv.Itoa(i))
go x[i].Work()
}
x[(*count)-1] = NewLazy()
go x[(*count)-1].Work()
gen := make(chan int)
go distribute(gen, x...)
go func() {
i := 0
for {
i++
select {
case <-shutdown:
return
case gen <- i:
}
}
}()
<-shutdown
os.Exit(0)
}
Let's assume I understand the problem correctly:
There's not too much you can do about it I'm afraid. You have limited resources (assuming all resources are limited) so if data to your input is written faster then you process it, there will be some synchronisation needed. At the end the whole process will run as quickly as the slowest worker anyway.
If you really need data from the workers available as soon as possible, the best you can do is to add some kind of buffering. But the buffer must be limited in size (even if you run in the cloud it would be limited by your wallet) so assuming never ending torrent of input it will only postpone the choke until some time in the future where you will start seeing "synchronisation" again.
All the ideas you presented in your questions are based on buffering the data. Even if you run a routine for every keyword-worker pair, this will buffer one element in every routine and, unless you implement the limit on total number of routines, you'll run out of memory. And even if you always leave some room for the quickest worker to spawn a new routine, the input queue won't be able to deliver new items as it would be choked on the slowest worker.
Buffering would solve your problem if on average you input is slower than processing time, but you have occasional spikes. If your buffer is big enough you can than accommodate the increase of throughput and maybe your quickest worker won't notice a thing.
Solution?
As go comes with buffered channels, this is the easiest to implement (also suggested by icza in the comment). Just give each worker a buffer. If you know which worker is the slowest, you can give it a bigger buffer. In this scenario you're limited by the memory of your machine.
If you're not happy with the single-machine memory limit then yes, per one of your ideas, you can "simply" store the buffer (queue) for each worker on the hard drive. But this is also limited and just postpones the blocking scenario until later. This is essentially the same as your Amazon SQS proposal (you could keep buffer in the cloud, but you need either limit it reasonably or prepare for the bill.)
The final note, depending on the system you're building, it might be not a good idea to buffer items in such a massive scale allowing to build up the backlog for the slower workers – it's often not desirable to have a worker hours, days, weeks behind the input flow and this is what would happen with an infinite buffer. The real answer then would be: improve your slowest worker to process things faster. (And add some buffering to improve the experience.)

How to block all goroutines except the one running

I have two (but later I'll be three) go routines that are handling incoming messages from a remote server (from a ampq channel). But because they are handling on the same data/state, I want to block all other go routines, except the one running.
I come up with a solution to use chan bool where each go routine blocks and then release it, the code is like:
package main
func a(deliveries <-chan amqp, handleDone chan bool) {
for d := range deliveries {
<-handleDone // Data comes always, wait for other channels
handleDone <- false // Block other channels
// Do stuff with data...
handleDone <- true // I'm done, other channels are free to do anything
}
}
func b(deliveries <-chan amqp, handleDone chan bool) {
for d := range deliveries {
<-handleDone
handleDone <- false
// Do stuff with data...
handleDone <- true
}
}
func main() {
handleDone := make(chan bool, 1)
go a(arg1, handleDone)
go b(arg2, handleDone)
// go c(arg3, handleDone) , later
handleDone <- true // kickstart
}
But for the first time each of the function will get handleDone <- true, which they will be executed. And later if I add another third function, things will get more complicated. How can block all other go routines except the running? Any other better solutions?
You want to look at the sync package.
http://golang.org/pkg/sync/
You would do this with a mutex.
If you have an incoming stream of messages and you have three goroutines listening on that stream and processing and you want to ensure that only one goroutine is running at a time, the solution is quite simple: kill off two of the goroutines.
You're spinning up concurrency and adding complexity and then trying to prevent them from running concurrently. The end result is the same as a single stream reader, but with lots of things that can go wrong.
I'm puzzled why you want this - why can't each message on deliveries be handled independently? and why are there two different functions handling those message? If each is responsible for a particular type of message, it seems like you want one deliveries receiver that dispatches to appropriate logic for the type.
But to answer your question, I don't think it's true that each function will get a true from handleDone on start. One (let's say it's a) is receiving the true sent from main; the other (b then) is getting the false sent from the first. Because you're discarding the value received, you can't tell this. And then both are running, and you're using a buffered channel (you probably want make(chan bool) instead for an unbuffered one), so confusion ensues, particularly when you add that third goroutine.
The handleDone <- false doesn't actually accomplish anything. Just treat any value on handleDone as the baton in a relay race. Once a goroutine receives this value, it can do its thing; when it's done, it should send it to the channel to hand it to the next goroutine.

More idiomatic way of adding channel result to queue on completion

So, right now, I just pass a pointer to a Queue object (implementation doesn't really matter) and call queue.add(result) at the end of goroutines that should add things to the queue.
I need that same sort of functionality—and of course doing a loop checking completion with the comma ok syntax is unacceptable in terms of performance versus the simple queue add function call.
Is there a way to do this better, or not?
There are actually two parts to your question: how does one queue data in Go, and how does one use a channel without blocking.
For the first part, it sounds like what you need to do is instead of using the channel to add things to the queue, use the channel as a queue. For example:
var (
ch = make(chan int) // You can add an int parameter to this make call to create a buffered channel
// Do not buffer these channels!
gFinished = make(chan bool)
processFinished = make(chan bool)
)
func f() {
go g()
for {
// send values over ch here...
}
<-gFinished
close(ch)
}
func g() {
// create more expensive objects...
gFinished <- true
}
func processObjects() {
for val := range ch {
// Process each val here
}
processFinished <- true
}
func main() {
go processObjects()
f()
<-processFinished
}
As for how you can make this more asynchronous, you can (as cthom06 pointed out) pass a second integer to the make call in the second line which will make send operations asynchronous until the channel's buffer is full.
EDIT: However (as cthom06 also pointed out), because you have two goroutines writing to the channel, one of them has to be responsible for closing the channel. Also, my previous revision would exit before processObjects could complete. The way I chose to synchronize the goroutines is by creating a couple more channels that pass around dummy values to ensure that the cleanup gets finished properly. Those channels are specifically unbuffered so that the sends happen in lock-step.

Resources