Should I use sync or blocking channels? - go

I have several go routines and I use unbuffered channels as sync mechanism.
I'm wondering if there is anything wrong in this(e.g. compared with a WaitGroup implementation). A known "drawback" that I'm aware of is that two go routines may stay blocked until the 3rd(last) one completes because the channel is not buffered but I don't know the internals/what this really means.
func main() {
chan1, chan2, chan3 := make(chan bool), make(chan bool), make(chan bool)
go fn(chan1)
go fn(chan2)
go fn(chan3)
res1, res2, res3 := <-chan1, <-chan2, <-chan3
}

This implementation isn't inherently worse or better and I've written code of this style in favor of using a WaitGroup with success. I've also implemented the same functionality with WaitGroup and had roughly the same results. As mentioned in the comments which is better is situational, and likely subjective in many cases (the difference will be in maintainability or readability rather than performance).
Personally, I really like this style for a situation where I'm spinning off one worker per item in a collection. I was already encountering a lot of pains with shutting down cleanly (signaling on an abort or close channel which had to be made available to the worker methods one way or another) so I thought it was very convenient to take that communication loop one step further and enclose all the work in a channel select. In order to have a working abort you kind of have to anyway.

Channels and WaitGroups are both available for you to use as appropriate. For a given problem, your solution could be solved using channels or WaitGroup or even a combination of both.
From what I understand, channels are more appropriate when your goroutines need to communicate with each other (as in they are not independent). WaitGroup is usually used when your goroutines are independent of each other.
Personally, I like WaitGroup because it is more readable and simpler to use. However, just like channels, I do not like that we need to pass the reference to the goroutine because that would mean that the concurrency logic would be mixed with your business logic.
So I came up with this generic function to solve this problem for me:
// Parallelize parallelizes the function calls
func Parallelize(functions ...func()) {
var waitGroup sync.WaitGroup
waitGroup.Add(len(functions))
defer waitGroup.Wait()
for _, function := range functions {
go func(copy func()) {
defer waitGroup.Done()
copy()
}(function)
}
}
Here is an example:
func1 := func() {
for char := 'a'; char < 'a' + 3; char++ {
fmt.Printf("%c ", char)
}
}
func2 := func() {
for number := 1; number < 4; number++ {
fmt.Printf("%d ", number)
}
}
Parallelize(func1, func2) // a 1 b 2 c 3
If you would like to use it, you can find it here https://github.com/shomali11/util

Related

Recursive calls from function started as goroutine & Idiomatic way to continue caller when all worker goroutines finished

I am implementing a (sort of a) combinatorial backtracking algorithm in go utilising goroutines. My problem can be represented as a tree with a certain degree/spread where I want to visit each leaf and calculate a result depending on the path taken. On a given level, I want to spawn goroutines to process the subproblems concurrently, i.e. if I have a tree with degree 3 and I want to start the concurrency after level 2, I'd spawn 3*3=9 goroutines that proceed with processing the subproblems concurrently.
func main() {
cRes := make(chan string, 100)
res := []string{}
numLevels := 5
spread := 3
startConcurrencyAtLevel := 2
nTree("", numLevels, spread, startConcurrencyAtLevel, cRes)
for {
select {
case r := <-cRes:
res = append(res, r)
case <-time.After(10 * time.Second):
fmt.Println("Caculation timed out")
fmt.Println(len(res), math.Pow(float64(spread), float64(numLevels)))
return
}
}
}
func nTree(path string, maxLevels int, spread int, startConcurrencyAtLevel int, cRes chan string) {
if len(path) == maxLevels {
// some longer running task here associated with the found path, also using a lookup table
// real problem actually returns not the path but the result if it satisfies some condition
cRes <- path
return
}
for i := 1; i <= spread; i++ {
nextPath := path + fmt.Sprint(i)
if len(path) == startConcurrencyAtLevel {
go nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes)
} else {
nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes)
}
}
}
The above code works, however I rely on the for select statement timing out. I am looking for a way to continue with main() as soon as all goroutines have finished, i.e. all subproblems have been processed.
I already came up with two possible (unpreferred/unelegant) solutions:
Using a mutex protected result map + a waitgroup instead of a channel-based approach should do the trick, but I'm curious if there is a neat solution with channels.
Using a quit channel (of type int). Every time a goroutine is spawned, the quit channel gets a +1 int, everytime a comptutation finished in a leaf, it gets a -1 int and the caller sums up the values. See the following snippet, this however is not a good solution as it (rather blatantly) runs into timing issues I don't want to deal with. It quits prematurely if for instance the first goroutine finishes before another one has been spawned.
for {
select {
case q := <-cRunningRoutines:
runningRoutines += q
if runningRoutines == 0 {
fmt.Println("Calculation complete")
return res
}
// ...same cases as above
}
Playground: https://go.dev/play/p/9jzeCvl8Clj
Following questions:
Is doing recursive calls from a function started as a goroutine to itself a valid approach?
What would be an idiomatic way of reading the results from cRes until all spawned goroutines finish? I read somewhere that channels should be closed when computation is done, but I just cant wrap my head around how to integrate it in this case.
Happy about any ideas, thanks!
reading the description and the snippet I am not able to understand exactly what you are trying to achieve, but I have some hints and patterns for channels that I use daily and think are helpful.
the context package is very helpful to manage goroutines' state in a safe way. In your example, time.After is used to end the main program, but in non-main functions it could be leaking goroutines: if instead you use context.Context and pass it into the gorotuines (it's usually passed first arg of a function) you will be able to control cancellation of downstream calls. This explains it briefly.
it is common practice to create channels (and return them) in functions that produce messages and send them in the channel. The same function should be responsible of closing the channel, e,g, with defer close(channel) when it's done writing.
This is handy because when the channel is buffered, closing it is possible even when it still has data in it: the Go runtime will actually wait until all messages are polledbefore closing. For unbuffered channels, the function won't be able to send a message over the channel until a reader of the channel is ready to poll it, thus won;t be able to exit.
This is an example (without recursion).
We can close the channel both when it is buffered or unbuffered in this example, because the send will block until the for := range on the channel in the main goroutine reads from it.
This is a variant for the same principle, with the channel passed as argument.
we can use sync.WaitGroup in tandem with channels, to signal completion for individual goroutines, and let know to an "orchestrating" goroutine that the channel can be closed, because all message producers are done sending data into the channel. The same considerations as point 1 apply on the close operation.
This is an example showing the use of waitGroup and external closer of channel.
channels can have a direction! notice that in the example, I added/removed arrows next to the channel (e.g. <-chan string, or chan<- string) when passing them in/outside functions. This tells the compiler that a channel is read or write only respectively in the scope of that function.
This is helping in 2 ways:
the compiler will produce more efficient code, as the channels with direction will have a single lock instead of 2.
the signature of the function describes if it will only use the channel for writing to it (and possibly close()) or reading: remember that reading from a channel with a range automatically stops the iteration when the channel is closed.
you can build channels of channels: make(chan chan string) is a valid (and helpful) construct to build processing pipelines.
A common usage of it is a fan-in goroutine that is collecting multiple outputs of a series of channel-producing goroutines.
This is an example of how to use them.
In essence, to answer your initial questions:
Is doing recursive calls from a function started as a goroutine to itself a valid approach?
If you really need recursion, it's probably better to handle it separately from the concurrent code: create a dedicated function that recursively sends data into a channel, and orchestrate the closing of the channel in the caller.
What would be an idiomatic way of reading the results from cRes until all spawned goroutines finish? I read somewhere that channels should be closed when computation is done, but I just cant wrap my head around how to integrate it in this case.
A good reference is Go Concurrency Patterns: Pipelines and cancellation: this is a rather old post (before the context package existedin the std lib) and I think Parallel digestion is what you're looking for to address the original question.
As mentioned by torek, I spun off an anonymous function closing the channel after the waitgroup finished waiting. Also needed some logic around calling the wg.Done() of the spawned goroutines only after the the recursion of the goroutine spawning level returns.
Generally I think this is a useful idiom (correct me if I'm wrong :))
Playground: https://go.dev/play/p/bQjHENsZL25
func main() {
cRes := make(chan string, 100)
numLevels := 3
spread := 3
startConcurrencyAtLevel := 2
var wg sync.WaitGroup
nTree("", numLevels, spread, startConcurrencyAtLevel, cRes, &wg)
go func() {
// time.Sleep(1 * time.Second) // edit: code should work without this initial sleep
wg.Wait()
close(cRes)
}()
for r := range cRes {
fmt.Println(r)
}
fmt.Println("Done!")
}
func nTree(path string, maxLevels int, spread int, startConcurrencyAtLevel int, cRes chan string, wg *sync.WaitGroup) {
if len(path) == maxLevels {
// some longer running task here associated with the found path
cRes <- path
return
}
for i := 1; i <= spread; i++ {
nextPath := path + fmt.Sprint(i)
if len(path) == startConcurrencyAtLevel {
go nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes, wg)
} else {
nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes, wg)
}
}
}

Which type of UML diagram is suited for depicting goroutines collaborating via channel?

Let's assume there is a simple integer calculator that only supports addition and multiplication operation. It will receive an integer generator and an integer as additive or multiplier as its input parameters and apply the corresponding calculation for each element that comes from the generator.
I think the following rough sequence diagram depicts this logic appropriately.
But when I use goroutines and channels to implement the same logic, the straight method/function calling relationship disappeared because goroutines utilize channels to send and receive data.
generator := func(integers ...int) <-chan int {
intStream := make(chan int)
go func() {
defer close(intStream)
for _, i := range integers {
intStream <- i
}
}()
return intStream
}
multiply := func(intStream <-chan int, multiplier int) <-chan int {
multipliedStream := make(chan int)
go func() {
defer close(multipliedStream)
for i := range intStream {
multipliedStream <- i*multiplier
}
}()
return multipliedStream
}
add := func(intStream <-chan int, additive int) <-chan int {
addedStream := make(chan int)
go func() {
defer close(addedStream)
for i := range intStream {
addedStream <- i+additive
}
}()
return addedStream
}
intStream := generator(1, 2, 3, 4)
pipeline := multiply(add(intStream, 1), 2)
for v := range pipeline {
fmt.Println(v)
}
The goroutine born in the generator acts as a producer to send integers; the goroutines born in the add and multiply are both producers and consumers; they receive integers, handle them, and put them into new channels. Finally, two channels connect these 3 goroutines as a pipeline, but I have no idea to present it to be clear at a glance.
Is there a kind of goroutines-oriented UML diagram?
There is no one-size fits all in this domain. It all depends where you want to set the focus in your design:
if you want to insist on the fact that a goroutine is lightweight thread, and on the channel (buffered or not) that is consumed, you may be interested in activity diagrams. Activity diagrams are also suitable to highlight the flow of values (i.e. object flows) in a functional design.
if you want to show how objects (including functors) interact in a specific scenario, keep the sequence diagram, but complete it to show what happens: you need at least some messages between the consumers and the generator (this corresponds to the exchanges via the channel: arrows are not only function calls; they are messages that can correspond to a function call but also to other forms of communication). If the channel is very important, you may even consider to add a liefline for it: this would address most of your expressed concerns.
Not related: using UML diagrams to visually document low-level code, or do some kind of visual programming is perfectly valid, but tends to create very complex diagrams that are harder to read thant the code. This may not be the best purpose.

Select on blocked call and channel

I am pretty sure I've seen a question on this before but can't find it now.
Essentially I want to select on a blocked call as well as a channel.
I know I can push the blocked call into a goroutine and wait on the result via a channel, however that feels like the wrong solution.
Is there an idiomatic way to write this that I'm missing?
Optimally there would be something like:
select {
case a <- c:
...
case ans := connection.Read():
...
}
If you have a channel and a function of which you want to select, using a goroutine and a channel is the idiomatic solution. Note though that if a value is received from the channel, that will not affect the function and it will continue to run. You may use context.Context to signal its result is no longer needed and it may terminate early.
If you're allowed to refactor though, you can "make" the function send on the same channel, so you only need to receive from a single channel.
Another refactoring idea would be for the function to monitor the same channel and return early, so you may just do a single call without select.
Note that if you need to do this in many places, you may create a helper function to launch it asychronously:
func launch(f func()) <-chan struct{} {
done := make(chan struct{})
go func() {
defer close(done)
f()
}()
return done
}
Example function:
func test() {
time.Sleep(time.Second)
}
And then using it:
select {
case a := <-c:
fmt.Println("received from channel:", a)
case <-launch(test):
fmt.Println("test() finished")
}
Try it on the Go Playground.

Proper way to gain access to a channel length in Go

I have been using Go for a little and still getting better everyday, but not an expert per se. Currently I am tackling concurrency and goroutines as I think that is the final unknown in my Go toolbelt. I think I am getting the hang of it as such, but still definitely a beginner.
The task I am having an issue with seems pretty basic to me, but nothing I have tried works. I would like to figure out a way to calculate the length of a channel.
From what I have gathered, len() only works on buffered channels so that won't help me in this case. What I am doing is reading values from the DB in batches. I have a generator func that goes like
func gen() chan Result {
out := make(chan Result)
go func() {
... query db
for rows.Next() {
out <- row
}
close(out)
}()
return out
}
then I am using it as such
c := gen()
...
// do other stuff
I would either like to return the count with the out channel, or wrap all of it in a struct type and just return that.
like so:
c, len := gen()
or:
a := gen()
fmt.Println(a.c)
fmt.Println(a.len)
I believe I have tried all but using atomic, which I think would actually work but I read around and it apparently isn't the right thing to use atomic for. What other options do I have that either don't leave me with a 0 or blocks infinitely
Thanks!
The len built-in will return the "length" of a channel:
func len(v Type) int
The len built-in function returns the length of v, according to its type:
Array: the number of elements in v.
Pointer to array: the number of elements in *v (even if v is nil).
Slice, or map: the number of elements in v; if v is nil, len(v) is zero.
String: the number of bytes in v.
Channel: the number of elements queued (unread) in the channel buffer;
if v is nil, len(v) is zero.
But I don't think that will help you.
What you really need is a new approach to your problem: counting the items in queue in a channel is not an appropriate way to handle "batches" of tasks.
What do you need this length for?
You are using a not buffered channels. Thank you for thatπŸ‘πŸ‘πŸ‘ŒπŸ™Œ
Unbuffered channel uses no memory. thus never contains nothing !
The only purpose of unbuffered channels are for achieving synchronicity between goroutine by passing an element from one to another. That's it !
go func(){
c:=make(chan struct{})
c<-struct{}{} // Definitely locked
}()
another deadlock
go func(){
c:=make(chan struct{})
<-c // Definitely locked
c<-struct{}{} // Never get there
}()
use another goroutine to read the channel
go func(){
c:=make(chan struct{})
go func(){<-c}()
c<-struct{}{}
}()
In your case you have a generator, which means you have to read the channel until the producer goroutine will close it. It is a good design that ensures that your goroutine are not dangling.
// Read the channel until the injecter goroutine finishes and closes it.
for r := range gen() {
...
}
// goroutine inner of gen() as finished
I am assuming, from your follow on answers, you actually want to know "good" values for the workers pool and the buffer on the channel to keep everything working "optimally".
This is extremely hard, and depends on what the workers are doing, but at a first guess I'd look at a minimal value of buffered channel and a pool of workers at runtime.GOMAXPROCS(0). If you have a lot of resources then you could go as far as "infinite" workers.

Should I consume all of a golang channel's values if I know they are finite?

I'm working through the Concurrency section of A Tour of Go, and I'm curious about proper Go convention for consuming finite channels. In this exercise, I need to read values from two channels and determine if the values are the same and in the same order. If not, I can immediately return false from my method. However, if I do that, will Go clean up my channels for me automatically, or will my goroutines be left hanging forever and consuming resources?
The best way to handle this would be to pass a cancel channel into my goroutines, but since the goroutines read a finite amount of data, it seems fine to just consume all the data. What is the best way to handle this case in real life?
Andrew Gerrand's talk at Gophercon covers this exact question on slide 37.
Create a quit channel and pass it to each walker. By closing quit
when the Same exits, any running walkers are terminated.
func Same(t1, t2 *tree.Tree) bool {
quit := make(chan struct{})
defer close(quit)
w1, w2 := Walk(t1, quit), Walk(t2, quit)
for {
v1, ok1 := <-w1
v2, ok2 := <-w2
if v1 != v2 || ok1 != ok2 {
return false
}
if !ok1 {
return true
}
}
}
Using quit channels,
as discussed in the
Go Concurrency Patterns: Pipelines and cancellation
blog article
and Heath Borders' answer,
is often a good idea.
There is also the
golang.org/x/net/context package
discussed in the
Go Concurrency Patterns: Context
blog article which adds timeouts, deadlines, and other features.
However, to directly address:
will Go clean up my channels for me automatically
It depends on the channel buffering and how the channels are written to.
E.g.
func writeValues(n int, c chan int) {
for i := 0; i < n; i++ {
c <- i
}
log.Println("writeValues", n, "done")
}
func main() {
ch1 := make(chan int, 12)
ch2 := make(chan int, 6)
ch3 := make(chan int)
go writeValues(10, ch1)
go writeValues(11, ch2)
go writeValues(12, ch3)
time.Sleep(time.Second) // XXX
}
Playground
Here the first goroutine will complete and ch1 (and anything buffered in it) would be garbage collected and cleaned up.
However, the later two goroutines will block waiting until they can write all their values. The garbage collector would never touch ch2 and ch3 since the blocked goroutines keep a reference to them.
Note ch2 would get cleaned up if as few as five items where read from the channel.
Usually, you only rely on this when doing something like:
errc := make(chan error, 1)
go func() { errc <- someErrorReturningFunc() }()
If the function being called has no way to cancel it, then this is a common idiom.
You can do this and abort/return early without reading from errc and know that the goroutine and channel will be cleaned up when the function eventually returns.
The buffer size of the errc channel is important here.

Resources