Proper way to gain access to a channel length in Go - go

I have been using Go for a little and still getting better everyday, but not an expert per se. Currently I am tackling concurrency and goroutines as I think that is the final unknown in my Go toolbelt. I think I am getting the hang of it as such, but still definitely a beginner.
The task I am having an issue with seems pretty basic to me, but nothing I have tried works. I would like to figure out a way to calculate the length of a channel.
From what I have gathered, len() only works on buffered channels so that won't help me in this case. What I am doing is reading values from the DB in batches. I have a generator func that goes like
func gen() chan Result {
out := make(chan Result)
go func() {
... query db
for rows.Next() {
out <- row
}
close(out)
}()
return out
}
then I am using it as such
c := gen()
...
// do other stuff
I would either like to return the count with the out channel, or wrap all of it in a struct type and just return that.
like so:
c, len := gen()
or:
a := gen()
fmt.Println(a.c)
fmt.Println(a.len)
I believe I have tried all but using atomic, which I think would actually work but I read around and it apparently isn't the right thing to use atomic for. What other options do I have that either don't leave me with a 0 or blocks infinitely
Thanks!

The len built-in will return the "length" of a channel:
func len(v Type) int
The len built-in function returns the length of v, according to its type:
Array: the number of elements in v.
Pointer to array: the number of elements in *v (even if v is nil).
Slice, or map: the number of elements in v; if v is nil, len(v) is zero.
String: the number of bytes in v.
Channel: the number of elements queued (unread) in the channel buffer;
if v is nil, len(v) is zero.
But I don't think that will help you.
What you really need is a new approach to your problem: counting the items in queue in a channel is not an appropriate way to handle "batches" of tasks.
What do you need this length for?

You are using a not buffered channels. Thank you for thatπŸ‘πŸ‘πŸ‘ŒπŸ™Œ
Unbuffered channel uses no memory. thus never contains nothing !
The only purpose of unbuffered channels are for achieving synchronicity between goroutine by passing an element from one to another. That's it !
go func(){
c:=make(chan struct{})
c<-struct{}{} // Definitely locked
}()
another deadlock
go func(){
c:=make(chan struct{})
<-c // Definitely locked
c<-struct{}{} // Never get there
}()
use another goroutine to read the channel
go func(){
c:=make(chan struct{})
go func(){<-c}()
c<-struct{}{}
}()
In your case you have a generator, which means you have to read the channel until the producer goroutine will close it. It is a good design that ensures that your goroutine are not dangling.
// Read the channel until the injecter goroutine finishes and closes it.
for r := range gen() {
...
}
// goroutine inner of gen() as finished

I am assuming, from your follow on answers, you actually want to know "good" values for the workers pool and the buffer on the channel to keep everything working "optimally".
This is extremely hard, and depends on what the workers are doing, but at a first guess I'd look at a minimal value of buffered channel and a pool of workers at runtime.GOMAXPROCS(0). If you have a lot of resources then you could go as far as "infinite" workers.

Related

Recursive calls from function started as goroutine & Idiomatic way to continue caller when all worker goroutines finished

I am implementing a (sort of a) combinatorial backtracking algorithm in go utilising goroutines. My problem can be represented as a tree with a certain degree/spread where I want to visit each leaf and calculate a result depending on the path taken. On a given level, I want to spawn goroutines to process the subproblems concurrently, i.e. if I have a tree with degree 3 and I want to start the concurrency after level 2, I'd spawn 3*3=9 goroutines that proceed with processing the subproblems concurrently.
func main() {
cRes := make(chan string, 100)
res := []string{}
numLevels := 5
spread := 3
startConcurrencyAtLevel := 2
nTree("", numLevels, spread, startConcurrencyAtLevel, cRes)
for {
select {
case r := <-cRes:
res = append(res, r)
case <-time.After(10 * time.Second):
fmt.Println("Caculation timed out")
fmt.Println(len(res), math.Pow(float64(spread), float64(numLevels)))
return
}
}
}
func nTree(path string, maxLevels int, spread int, startConcurrencyAtLevel int, cRes chan string) {
if len(path) == maxLevels {
// some longer running task here associated with the found path, also using a lookup table
// real problem actually returns not the path but the result if it satisfies some condition
cRes <- path
return
}
for i := 1; i <= spread; i++ {
nextPath := path + fmt.Sprint(i)
if len(path) == startConcurrencyAtLevel {
go nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes)
} else {
nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes)
}
}
}
The above code works, however I rely on the for select statement timing out. I am looking for a way to continue with main() as soon as all goroutines have finished, i.e. all subproblems have been processed.
I already came up with two possible (unpreferred/unelegant) solutions:
Using a mutex protected result map + a waitgroup instead of a channel-based approach should do the trick, but I'm curious if there is a neat solution with channels.
Using a quit channel (of type int). Every time a goroutine is spawned, the quit channel gets a +1 int, everytime a comptutation finished in a leaf, it gets a -1 int and the caller sums up the values. See the following snippet, this however is not a good solution as it (rather blatantly) runs into timing issues I don't want to deal with. It quits prematurely if for instance the first goroutine finishes before another one has been spawned.
for {
select {
case q := <-cRunningRoutines:
runningRoutines += q
if runningRoutines == 0 {
fmt.Println("Calculation complete")
return res
}
// ...same cases as above
}
Playground: https://go.dev/play/p/9jzeCvl8Clj
Following questions:
Is doing recursive calls from a function started as a goroutine to itself a valid approach?
What would be an idiomatic way of reading the results from cRes until all spawned goroutines finish? I read somewhere that channels should be closed when computation is done, but I just cant wrap my head around how to integrate it in this case.
Happy about any ideas, thanks!
reading the description and the snippet I am not able to understand exactly what you are trying to achieve, but I have some hints and patterns for channels that I use daily and think are helpful.
the context package is very helpful to manage goroutines' state in a safe way. In your example, time.After is used to end the main program, but in non-main functions it could be leaking goroutines: if instead you use context.Context and pass it into the gorotuines (it's usually passed first arg of a function) you will be able to control cancellation of downstream calls. This explains it briefly.
it is common practice to create channels (and return them) in functions that produce messages and send them in the channel. The same function should be responsible of closing the channel, e,g, with defer close(channel) when it's done writing.
This is handy because when the channel is buffered, closing it is possible even when it still has data in it: the Go runtime will actually wait until all messages are polledbefore closing. For unbuffered channels, the function won't be able to send a message over the channel until a reader of the channel is ready to poll it, thus won;t be able to exit.
This is an example (without recursion).
We can close the channel both when it is buffered or unbuffered in this example, because the send will block until the for := range on the channel in the main goroutine reads from it.
This is a variant for the same principle, with the channel passed as argument.
we can use sync.WaitGroup in tandem with channels, to signal completion for individual goroutines, and let know to an "orchestrating" goroutine that the channel can be closed, because all message producers are done sending data into the channel. The same considerations as point 1 apply on the close operation.
This is an example showing the use of waitGroup and external closer of channel.
channels can have a direction! notice that in the example, I added/removed arrows next to the channel (e.g. <-chan string, or chan<- string) when passing them in/outside functions. This tells the compiler that a channel is read or write only respectively in the scope of that function.
This is helping in 2 ways:
the compiler will produce more efficient code, as the channels with direction will have a single lock instead of 2.
the signature of the function describes if it will only use the channel for writing to it (and possibly close()) or reading: remember that reading from a channel with a range automatically stops the iteration when the channel is closed.
you can build channels of channels: make(chan chan string) is a valid (and helpful) construct to build processing pipelines.
A common usage of it is a fan-in goroutine that is collecting multiple outputs of a series of channel-producing goroutines.
This is an example of how to use them.
In essence, to answer your initial questions:
Is doing recursive calls from a function started as a goroutine to itself a valid approach?
If you really need recursion, it's probably better to handle it separately from the concurrent code: create a dedicated function that recursively sends data into a channel, and orchestrate the closing of the channel in the caller.
What would be an idiomatic way of reading the results from cRes until all spawned goroutines finish? I read somewhere that channels should be closed when computation is done, but I just cant wrap my head around how to integrate it in this case.
A good reference is Go Concurrency Patterns: Pipelines and cancellation: this is a rather old post (before the context package existedin the std lib) and I think Parallel digestion is what you're looking for to address the original question.
As mentioned by torek, I spun off an anonymous function closing the channel after the waitgroup finished waiting. Also needed some logic around calling the wg.Done() of the spawned goroutines only after the the recursion of the goroutine spawning level returns.
Generally I think this is a useful idiom (correct me if I'm wrong :))
Playground: https://go.dev/play/p/bQjHENsZL25
func main() {
cRes := make(chan string, 100)
numLevels := 3
spread := 3
startConcurrencyAtLevel := 2
var wg sync.WaitGroup
nTree("", numLevels, spread, startConcurrencyAtLevel, cRes, &wg)
go func() {
// time.Sleep(1 * time.Second) // edit: code should work without this initial sleep
wg.Wait()
close(cRes)
}()
for r := range cRes {
fmt.Println(r)
}
fmt.Println("Done!")
}
func nTree(path string, maxLevels int, spread int, startConcurrencyAtLevel int, cRes chan string, wg *sync.WaitGroup) {
if len(path) == maxLevels {
// some longer running task here associated with the found path
cRes <- path
return
}
for i := 1; i <= spread; i++ {
nextPath := path + fmt.Sprint(i)
if len(path) == startConcurrencyAtLevel {
go nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes, wg)
} else {
nTree(nextPath, maxLevels, spread, startConcurrencyAtLevel, cRes, wg)
}
}
}

Event driven pattern in golang

I am using golang to implement a simple event driven worker. It's like this:
go func() {
for {
select {
case data := <-ch:
time.Sleep(1)
someGlobalMap[data.key] = data.value
}
}
}()
And the main function will create several goroutines, and each of them will do thing like this:
ch <- data
fmt.Println(someGlobalMap[data.key])
As you can see that, because my worker need some time to do the work, I will got nil result in my main function.How can I control this workflow properly?
EDIT: I may have misread your question, I see that you mention that main will start many producer goroutines. I thought it was many consumer goroutines, and a single producer. Leaving the answer here in case it can be useful for others looking for that pattern, though the bullet points still apply to your case.
So if I understand correctly your use-case, you can't expect to send on a channel and read the results immediately after. You don't know when the worker will process that send, you need to communicate between the goroutines, and that is done with channels. Assuming just calling a function with a return value doesn't work in your scenario, if you really need to send to a worker, then block until you have the result, you could send a channel as part of the data structure, and block-receive on it after the send, i.e.:
resCh := make(chan Result)
ch <- Data{key, value, resCh}
res := <- resCh
But you should probably try to break down the work as a pipeline of independent steps instead, see the blog post that I linked to in the original answer.
Original answer where I thought it was a single producer - multiple consumers/workers pattern:
This is a common pattern for which Go's goroutines and channels semantics are very well suited. There are a few things you need to keep in mind:
The main function will not automatically wait for goroutines to finish. If there's nothing else to do in the main, then the program exits and you don't have your results.
The global map that you use is not thread-safe. You need to synchronize access via a mutex, but there's a better way - use an output channel for results, which is already synchronized.
You can use a for..range over a channel, and you can safely share a channel between multiple goroutines. As we'll see, that makes this pattern quite elegant to write.
Playground: https://play.golang.org/p/WqyZfwldqp
For more on Go pipelines and concurrency patterns, to introduce error handling, early cancellation, etc.: https://blog.golang.org/pipelines
Commented code for the use-case you mention:
// could be a command-line flag, a config, etc.
const numGoros = 10
// Data is a similar data structure to the one mentioned in the question.
type Data struct {
key string
value int
}
func main() {
var wg sync.WaitGroup
// create the input channel that sends work to the goroutines
inch := make(chan Data)
// create the output channel that sends results back to the main function
outch := make(chan Data)
// the WaitGroup keeps track of pending goroutines, you can add numGoros
// right away if you know how many will be started, otherwise do .Add(1)
// each time before starting a worker goroutine.
wg.Add(numGoros)
for i := 0; i < numGoros; i++ {
// because it uses a closure, it could've used inch and outch automaticaly,
// but if the func gets bigger you may want to extract it to a named function,
// and I wanted to show the directed channel types: within that function, you
// can only receive from inch, and only send (and close) to outch.
//
// It also receives the index i, just for fun so it can set the goroutines'
// index as key in the results, to show that it was processed by different
// goroutines. Also, big gotcha: do not capture a for-loop iteration variable
// in a closure, pass it as argument, otherwise it very likely won't do what
// you expect.
go func(i int, inch <-chan Data, outch chan<- Data) {
// make sure WaitGroup.Done is called on exit, so Wait unblocks
// eventually.
defer wg.Done()
// range over a channel gets the next value to process, safe to share
// concurrently between all goroutines. It exits the for loop once
// the channel is closed and drained, so wg.Done will be called once
// ch is closed.
for data := range inch {
// process the data...
time.Sleep(10 * time.Millisecond)
outch <- Data{strconv.Itoa(i), data.value}
}
}(i, inch, outch)
}
// start the goroutine that prints the results, use a separate WaitGroup to track
// it (could also have used a "done" channel but the for-loop would be more complex, with a select).
var wgResults sync.WaitGroup
wgResults.Add(1)
go func(ch <-chan Data) {
defer wgResults.Done()
// to prove it processed everything, keep a counter and print it on exit
var n int
for data := range ch {
fmt.Println(data.key, data.value)
n++
}
// for fun, try commenting out the wgResults.Wait() call at the end, the output
// will likely miss this line.
fmt.Println(">>> Processed: ", n)
}(outch)
// send work, wherever that comes from...
for i := 0; i < 1000; i++ {
inch <- Data{"main", i}
}
// when there's no more work to send, close the inch, so the goroutines will begin
// draining it and exit once all values have been processed.
close(inch)
// wait for all goroutines to exit
wg.Wait()
// at this point, no more results will be written to outch, close it to signal
// to the results goroutine that it can terminate.
close(outch)
// and wait for the results goroutine to actually exit, otherwise the program would
// possibly terminate without printing the last few values.
wgResults.Wait()
}
In real-life scenarios, where the amount of work is not known ahead of time, the closing of the in-channel could come from e.g. a SIGINT signal. Just make sure no code path can send work after the channel was closed as that would panic.

Spread sequential tests into 4 go routines and terminate all if one fails

Suppose I have a simple loop which does sequential tests like this.
for f := 1; f <= 1000; f++ {
if doTest(f) {
break
}
}
I loop through range of numbers and do a test for each number. If test fails for one number, I break and exit the main thread. Simple enough.
Now, how do correctly feed the test numbers in say four or several go routines. Basically, I want to test the numbers from 1 to 1000 in batches of 4 (or whatever number of go routines is).
Do I create 4 routines reading from one channel and feed the numbers sequentially into this channel? Or do I make 4 routines with an individual channel?
And another question. How do I stop all 4 routines if one of them fails the test? I've been reading some texts on channels but I cannot put the pieces together.
You can create a producer/consumer system: https://play.golang.org/p/rks0gB3aDb
func main() {
ch := make(chan int)
clients := 4
// make it buffered, so all clients can fail without hanging
notifyCh := make(chan struct{}, clients)
go produce(100, ch, notifyCh)
var wg sync.WaitGroup
wg.Add(clients)
for i := 0; i < clients; i++ {
go func() {
consumer(ch, notifyCh)
wg.Done()
}()
}
wg.Wait()
}
func consumer(in chan int, notifyCh chan struct{}) {
fmt.Printf("Start consumer\n")
for i := range in {
<-time.After(100 * time.Millisecond)
if i == 42 {
fmt.Printf("%d fails\n", i)
notifyCh <- struct{}{}
return
} else {
fmt.Printf("%d\n", i)
}
}
fmt.Printf("Consumer stopped working\n")
}
func produce(N int, out chan int, notifyCh chan struct{}) {
for i := 0; i < N; i++ {
select {
case out <- i:
case <-notifyCh:
close(out)
return
}
}
close(out)
}
The producer pushes numbers from 0 to 99 to the channel, the consumer consumes until the channel is closed. In main we create 4 clients and add them to a waitgroup to reliably check if every goroutine returned.
Every consumer can signal on the notifyCh, the producer stops working and no further numbers are generated, therefor all consumers return after their current number.
There's also an option to create 4 go routines, wait for all of them to return, start the next 4 go routines. But this adds quite an overhead on waiting.
Since you mentioned prime numbers, here's a really cool prime seive: https://golang.org/doc/play/sieve.go
Whether you will create one channel common or a channel per routines depend on what you want.
If you want only put some numbers (or more general - requests) inside and you don't care which goroutine serve that, than of course is better to share a channel. In case when you want for example first 250 request to be served by goroutine1, than of course you cannot share a channel.
For channel is a good practice use it as input or output. And the simples thing how sender can sent, that he is finished is close the channel. Good article about that is https://blog.golang.org/pipelines
What is not mentiond in the question - is you need also another channel (or channels) or or any other communication primitive to get results. And here is the channel most interesting than to feeding.
What information should be sent - it should be sent, a bool after every doTest, or just know when everthing was done (it this case neither bool is not necessary just close a channel)?
If you prefer program at first fail. Than I would prefer use buffered shared channel to feed the numbers. Don't forget to close it, when all numbers will be feed.
And another unbuffered chan to let main thread know, that tests are done. It can be channel, there you only put the number, where test failed, or if you also want a positive result - channel of struct containing number and result, or any other informantion returned from doTest.
Very good article about channel is also http://dave.cheney.net/2014/03/19/channel-axioms
Each of your four goroutines can report a failure (by sending error and closing channel). But gotcha is what goroutines should do, when all numbers passed and feeding channel is closed. And about that is also nice article http://nathanleclaire.com/blog/2014/02/15/how-to-wait-for-all-goroutines-to-finish-executing-before-continuing/

Should I use sync or blocking channels?

I have several go routines and I use unbuffered channels as sync mechanism.
I'm wondering if there is anything wrong in this(e.g. compared with a WaitGroup implementation). A known "drawback" that I'm aware of is that two go routines may stay blocked until the 3rd(last) one completes because the channel is not buffered but I don't know the internals/what this really means.
func main() {
chan1, chan2, chan3 := make(chan bool), make(chan bool), make(chan bool)
go fn(chan1)
go fn(chan2)
go fn(chan3)
res1, res2, res3 := <-chan1, <-chan2, <-chan3
}
This implementation isn't inherently worse or better and I've written code of this style in favor of using a WaitGroup with success. I've also implemented the same functionality with WaitGroup and had roughly the same results. As mentioned in the comments which is better is situational, and likely subjective in many cases (the difference will be in maintainability or readability rather than performance).
Personally, I really like this style for a situation where I'm spinning off one worker per item in a collection. I was already encountering a lot of pains with shutting down cleanly (signaling on an abort or close channel which had to be made available to the worker methods one way or another) so I thought it was very convenient to take that communication loop one step further and enclose all the work in a channel select. In order to have a working abort you kind of have to anyway.
Channels and WaitGroups are both available for you to use as appropriate. For a given problem, your solution could be solved using channels or WaitGroup or even a combination of both.
From what I understand, channels are more appropriate when your goroutines need to communicate with each other (as in they are not independent). WaitGroup is usually used when your goroutines are independent of each other.
Personally, I like WaitGroup because it is more readable and simpler to use. However, just like channels, I do not like that we need to pass the reference to the goroutine because that would mean that the concurrency logic would be mixed with your business logic.
So I came up with this generic function to solve this problem for me:
// Parallelize parallelizes the function calls
func Parallelize(functions ...func()) {
var waitGroup sync.WaitGroup
waitGroup.Add(len(functions))
defer waitGroup.Wait()
for _, function := range functions {
go func(copy func()) {
defer waitGroup.Done()
copy()
}(function)
}
}
Here is an example:
func1 := func() {
for char := 'a'; char < 'a' + 3; char++ {
fmt.Printf("%c ", char)
}
}
func2 := func() {
for number := 1; number < 4; number++ {
fmt.Printf("%d ", number)
}
}
Parallelize(func1, func2) // a 1 b 2 c 3
If you would like to use it, you can find it here https://github.com/shomali11/util

What's wrong with the following go code that I receive 'all goroutines are asleep - deadlock!'

I'm trying to implement an Observer Pattern suggested here; Observer pattern in Go language
(the code listed above doesn't compile and is incomplete). Here, is a complete code that compiles but I get deadlock error.
package main
import (
"fmt"
)
type Publisher struct{
listeners []chan int
}
type Subscriber struct{
Channel chan int
Name string
}
func (p *Publisher) Sub(c chan int){
p.listeners = append(p.listeners, c)
}
func (p *Publisher) Pub(m int, quit chan int){
for _, c := range p.listeners{
c <- m
}
quit <- 0
}
func (s *Subscriber) ListenOnChannel(){
data := <-s.Channel
fmt.Printf("Name: %v; Data: %v\n", s.Name, data)
}
func main() {
quit := make(chan int)
p := &Publisher{}
subscribers := []*Subscriber{&Subscriber{Channel: make(chan int), Name: "1"}, &Subscriber{Channel: make(chan int), Name: "2"}, &Subscriber{Channel: make(chan int), Name: "3"}}
for _, v := range subscribers{
p.Sub(v.Channel)
go v.ListenOnChannel()
}
p.Pub(2, quit)
<-quit
}
Also, if I get rid of 'quit' completely, I get no error but it only prints first record.
The problem is that you're sending to quit on the same goroutine that's receiving from quit.
quit has a buffer size of 0, which means that in order to proceed there has to be a sender on one side and a receiver on the other at the same time. You're sending, but no one's on the other end, so you wait forever. In this particular case the Go runtime is able to detect the problem and panic.
The reason only the first value is printed when you remove quit is that your main goroutine is exiting before your remaining two are able to print.
Do not just increase channel buffer sizes to get rid of problems like this. It can help (although in this case it doesn't), but it only covers up the problem and doesn't truly fix the underlying cause. Increasing a channel's buffer size is strictly an optimization. In fact, it's usually better to develop with no buffer because it makes concurrency problems more obvious.
There are two ways to fix the problem:
Keep quit, but send 0 on it in each goroutine inside ListenOnChannel. In main, make sure you receive a value from each goroutine before moving on. (In this case, you'll wait for three values.)
Use a WaitGroup. There's a good example of how it works in the documentation.
In general this looks good, but there is one problem. Remember that channels are either buffered or unbuffered (synchronous or asynchronous). When you send to an unbuffered channel or to a channel with a full buffer the sender will block until the data has been removed from the channel by a receiver.
So with that, I'll ask a question or two of my own:
Is the quit channel synchronous or asynchronous?
What happens in Pub when execution hits quit<-0?
One solution that fixes your problem and allows the code to run is to change the second-to-last code line to be go p.Pub(2, quit). But there is another solution. Can you see what it is?
I don't actually get the same behavior you do if I remove <-quit from the original code. And this should not affect the output because as it is written that line is never executed.

Resources