How do I close a channel multiple goroutines are sending on? - go

I am attempting to do some computation in parallel. The program is designed so that each worker goroutine sends "pieces" of a solved puzzle back to the controller goroutine that waits to receive and assembles everything sent from the worker routines.
What is the idomatic Go for closing the single channel? I cannot call close on the channel in each goroutine because then I could possibly send on a closed channel. Likewise, there is no way to predetermine which goroutine will finish first. Is a sync.WaitGroup necessary here?

Here is an example using the sync.WaitGroup to do what you are looking for,
This example accepts a lenghty list of integers, then sums them all up by handing N parallel workers an equal-sized chunk of the input data. It can be run on go playground:
package main
import (
"fmt"
"sync"
)
const WorkerCount = 10
func main() {
// Some input data to operate on.
// Each worker gets an equal share to work on.
data := make([]int, WorkerCount*10)
for i := range data {
data[i] = i
}
// Sum all the entries.
result := sum(data)
fmt.Printf("Sum: %d\n", result)
}
// sum adds up the numbers in the given list, by having the operation delegated
// to workers operating in parallel on sub-slices of the input data.
func sum(data []int) int {
var sum int
result := make(chan int)
defer close(result)
// Accumulate results from workers.
go func() {
for {
select {
case value := <-result:
sum += value
}
}
}()
// The WaitGroup will track completion of all our workers.
wg := new(sync.WaitGroup)
wg.Add(WorkerCount)
// Divide the work up over the number of workers.
chunkSize := len(data) / WorkerCount
// Spawn workers.
for i := 0; i < WorkerCount; i++ {
go func(i int) {
offset := i * chunkSize
worker(result, data[offset:offset+chunkSize])
wg.Done()
}(i)
}
// Wait for all workers to finish, before returning the result.
wg.Wait()
return sum
}
// worker sums up the numbers in the given list.
func worker(result chan int, data []int) {
var sum int
for _, v := range data {
sum += v
}
result <- sum
}

Yes, This is a perfect use case for sync.WaitGroup.
Your other option is to use 1 channel per goroutine and one multiplexer goroutine that feeds from each channel into a single channel. But that would get unwieldy fast so I'd just go with a sync.WaitGroup.

Related

Gracefully closing channel and not sending on closed channel

I am new to Golang concurrency and have been working to understand this piece of code mentioned below.
I witness few things which I am unable to explain why it happens:
when using i smaller than equal to 100000 for i <= 100000 { in main function, it sometimes prints different values for nResults and countWrites (in last two statements)
fmt.Printf("number of result writes %d\n", nResults) fmt.Printf("Number of job writes %d\n", jobWrites)
when using i more than 1000000 it gives panic: send on closed channel
How can I make sure that the values send to jobs is not on closed channel and later after all values are received in results we can close the channel without deadlock?
package main
import (
"fmt"
"sync"
)
func worker(wg *sync.WaitGroup, id int, jobs <-chan int, results chan<- int, countWrites *int64) {
defer wg.Done()
for j := range jobs {
*countWrites += 1
go func(j int) {
if j%2 == 0 {
results <- j * 2
} else {
results <- j
}
}(j)
}
}
func main() {
wg := &sync.WaitGroup{}
jobs := make(chan int)
results := make(chan int)
var i int = 1
var jobWrites int64 = 0
for i <= 10000000 {
go func(j int) {
if j%2 == 0 {
i += 99
j += 99
}
jobWrites += 1
jobs <- j
}(i)
i += 1
}
var nResults int64 = 0
for w := 1; w < 1000; w++ {
wg.Add(1)
go worker(wg, w, jobs, results, &nResults)
}
close(jobs)
wg.Wait()
var sum int32 = 0
var count int64 = 0
for r := range results {
count += 1
sum += int32(r)
if count == nResults {
close(results)
}
}
fmt.Println(sum)
fmt.Printf("number of result writes %d\n", nResults)
fmt.Printf("Number of job writes %d\n", jobWrites)
}
Quite a few problems in your code.
Sending on closed channel
One general principle of using Go channels is
don't close a channel from the receiver side and don't close a channel if the channel has multiple concurrent senders
(https://go101.org/article/channel-closing.html)
The solution for you is simple: don't have multiple concurrent senders, and then you can close the channel from the sender side.
Instead of starting millions of separate goroutine for each job you add to the channel, run one goroutine that executes the whole loop to add all jobs to the channel. And close the channel after the loop. The workers will consume the channel as fast as they can.
Data races by modifying shared variables in multiple goroutines
You're modifying two shared variables without taking special steps:
nResults, which you pass to the countWrites *int64 in the worker.
i in the loop that writes to the jobs channel: you're adding 99 to it from multiple goroutines, making it unpredictable how many values you actually write to the jobs channel
To solve 1, there are many options, including using sync.Mutex. However since you're just adding to it, the easiest solution is to use atomic.AddInt64(countWrites, 1) instead of *countWrites += 1
To solve 2, don't use one goroutine per write to the channel, but one goroutine for the entire loop (see above)

How to recover the input of multiple go routines called in a loop

I have a loop throwing multiple go routines, they call a function that makes a http get petition and calculate and object.
I want to recover the result of all those routines.
I tried using channels, but hey are empty, even if I force wait for all the routines to be done.
This is the code that starts the routines:
func main() {
pairs := getPairs() //Returns an array of strings
c := make(chan result)
for _, product := range pairs {
go getScore(product.Symbol, 1, c)
}
fmt.Println(len(c))
time.Sleep(5000 * time.Millisecond)
fmt.Println(len(c))
}
And at the end of getScore() I do this, c being the name of the channel in the function and res the result of the function
c <- res
The length of the channel is 0 in both prints.
What's the best way to get the result of the functions?
A channel is a synchronization prototype against a shared memory (in simple point of view). A buffered channel has a length but not a regular channel. Buffered channel is useful in little bit cases but not as a general approaches.
The simplest way to Just add a loop by range of pair or len of pairs:
// start processing
for _, product := range pairs {
go getScore(product.Symbol, 1, c)
}
// getting a result
for i:=0; i<len(pairs); i ++ {
result := <-c
// process a result value
}
Or another way is collecting result in another grouting:
// result and sync variable
var (
wait sync.WaitGroup
result int32
)
// start processing
for _, product := range pairs {
wait.Add(1)
go getScore(product.Symbol, 1, c)
go func() {
defer wait.Done()
// simple accumulate or maybe more complicated actions
atomic.AddInt32(&result, <-c)
}()
}
// wait finishing
wait.Wait()
c := make(chan result)
Creates an unbuffered channel. Therefore send statements, such as
c <- res
cannot proceed until another goroutine is attempting a receive operation.
In other words, execute the number of receive operations in your main goroutine matching the number of sends that will be attempted from other goroutines. Like this:
for _, product := range pairs {
go getScore(product.Symbol, 1, c)
}
for x := 0; x < len(pairs); x++ {
fmt.Println(<-c)
}
See the Go Tour section on channels, and the Effective Go section on channels for more information.

How to prevent deadlocks without using sync.WaitGroup?

concurrent.go:
package main
import (
"fmt"
"sync"
)
// JOBS represents the number of jobs workers do
const JOBS = 2
// WORKERS represents the number of workers
const WORKERS = 5
func work(in <-chan int, out chan<- int, wg *sync.WaitGroup) {
for n := range in {
out <- n * n
}
wg.Done()
}
var wg sync.WaitGroup
func main() {
in := make(chan int, JOBS)
out := make(chan int, JOBS)
for w := 1; w <= WORKERS; w++ {
wg.Add(1)
go work(in, out, &wg)
}
for j := 1; j <= JOBS; j++ {
in <- j
}
close(in)
wg.Wait()
close(out)
for r := range out {
fmt.Println("result:", r)
}
// This is a solution but I want to do it with `range out`
// and also without WaitGroups
// for r := 1; r <= JOBS; r++ {
// fmt.Println("result:", <-out)
// }
}
Example is here on goplay.
Goroutines run concurrently and independently. Spec: Go statements:
A "go" statement starts the execution of a function call as an independent concurrent thread of control, or goroutine, within the same address space.
If you want to use for range to receive values from the out channel, that means the out channel can only be closed once all goroutines are done sending on it.
Since goroutines run concurrently and independently, without synchronization you can't have this.
Using WaitGroup is one mean, one way to do it (to ensure we wait all goroutines to do their job before closing out).
Your commented code is another way of that: the commented code receives exactly as many values from the channel as many the goroutines ought to send on it, which is only possible if all goroutines do send their values. The synchronization are the send statements and receive operations.
Notes:
Usually receiving results from the channel is done asynchronously, in a dedicated goroutine, or using even multiple goroutines. Doing so you are not required to use channels with buffers capable of buffering all the results. You will still need synchronization to wait for all workers to finish their job, you can't avoid this due to the concurrent and independent nature of gorutine scheduling and execution.

What is an elegant way to shut down a chain of goroutines linked by channels?

I'm a Go learner. In order better to understand the care and feeding of channels and goroutines, I'm trying to build a Sieve of Eratosthenes as a set of goroutines connected into a pipeline by channels.
Here's what I have so far:
// esieve implements a Sieve of Eratosthenes
// as a series of channels connected together
// by goroutines
package main
import "fmt"
func sieve(mine int, inch chan int) {
start := true // First-number switch
ouch := make(chan int) // Output channel for this instance
fmt.Printf("%v\n", mine) // Print this instance's prime
for next := <-inch; next > 0; next = <-inch { // Read input channel
fmt.Printf("%v <- %v\n",mine,next) // (Trace)
if (next % mine) > 0 { // Divisible by my prime?
if start { // No; is it the first number through?
go sieve(next, ouch) // First number - create instance for it
start = false // First time done
} else { // Not first time
ouch <- next // Pass it to the next instance
}
}
}
}
func main() {
lim := 30 // Let's do up to 30
fmt.Printf("%v\n", 2) // Treat 2 as a special case
ouch := make(chan int) // Create the first segment of the pipe
go sieve(3, ouch) // Create the instance for '3'
for prime := 3; prime < lim; prime += 2 { // Generate 3, 5, ...
fmt.Printf("Send %v\n", prime) // Trace
ouch <- prime // Send it down the pipe
}
}
And as far as it goes, it works nicely.
However, when I finish the main loop, main exits before all the numbers still in the pipeline of sieve instances have propagated down to the end.
What is the simplest, most elegant, or generally accepted way to make a main routine wait for a set of goroutines (about which it only 'knows' of the first one) to complete?
As for your title question, killing worker goroutines when you don't need them anymore:
You could use the Done idiom. Reads from a closed channel yield the zero value.
Make a new channel done. When reads from this channel succeed, the goroutines know they should quit. Close the channel in main when you have all the values you need.
Check if you can read from a channel done, and exit by returning, or read from next when that's available. This partially replaces the assignment to next in you for loop:
select {
case <-done:
return
case next = <- inch:
}
Ranging over a channel also works, since closing that channel exits the loop.
As for the reverse, your body question, waiting for a set of goroutines to finish:
Use sync.WaitGroup.
var wg sync.WaitGroup
wg.Add(goroutineCount)
And when each goroutine finishes:
wg.Done()
Or use defer:
defer wg.Done()
To wait for all of them to report as Done:
wg.Wait()
In your example, simply call wg.Add(1) when you start a new goroutine, before you call wg.Done() and return. As long as you only reach zero once, wg.Wait() works as expected, so wg.Add(1) before wg.Done.
After #izca unblocked my logjam, and after a few false starts involving deadlocks when everything finished, here's my solution working correctly:
// esieve implements a Sieve of Eratosthenes
// as a series of channels connected together
// by goroutines
package main
import "fmt"
func sieve(mine int, // This instance's own prime
inch chan int, // Input channel from lower primes
done chan int, // Channel for signalling shutdown
count int) { // Number of primes - counter
start := true // First-number switch
ouch := make(chan int) // Output channel, this instance
fmt.Printf("%v ", mine) // Print this instance's prime
for next := <-inch; next > 0; next = <-inch { // Read input channel
if (next % mine) > 0 { // Divisible by my prime?
if start { // No; first time through?
go sieve(next, ouch, done, count+1) // First number,
// create instance for it
start = false // First time done
} else { // Not first time
ouch <- next // Pass to next instance
}
}
}
if start { // Just starting?
close(done) // Yes - we're last in pipe - signal done
print("\n",count," primes\n") // Number of primes/goroutines
} else {
close(ouch) // No - send the signal down the pipe
}
}
func main() {
lim := 100 // Let's do up to 100
done := make(chan int) // Create the done return channel
ouch := make(chan int) // Create the first segment of the pipe
go sieve(2, ouch, done, 1) // Create the first instance for '2'
for prime := 3; prime < lim; prime += 1 { // Generate odd numbers
ouch <- prime // Send numbers down the pipe
}
close(ouch) // Send the done signal down the pipe
<- done // and wait for it to come back
}
I'm tremendously impressed with the elegance and simplicity of Go for this kind of programming, when compared with many other languages. Of course, the warts I claim for myself.
If appropriate here, I'd welcome critical comments.

Goroutines channels and "stopping short"

I'm reading/working through Go Concurrency Patterns: Pipelines and cancellation, but i'm having trouble understanding the Stopping short section. We have the following functions:
func sq(in <-chan int) <-chan int {
out := make(chan int)
go func() {
for n := range in {
out <- n * n
}
close(out)
}()
return out
}
func gen(nums ...int) <-chan int {
out := make(chan int)
go func() {
for _, n := range nums {
out <- n
}
close(out)
}()
return out
}
func merge(cs ...<-chan int) <-chan int {
var wg sync.WaitGroup
out := make(chan int, 1) // enough space for the unread inputs
// Start an output goroutine for each input channel in cs. output
// copies values from c to out until c is closed, then calls wg.Done.
output := func(c <-chan int) {
for n := range c {
out <- n
}
wg.Done()
}
wg.Add(len(cs))
for _, c := range cs {
go output(c)
}
// Start a goroutine to close out once all the output goroutines are
// done. This must start after the wg.Add call.
go func() {
wg.Wait()
close(out)
}()
return out
}
func main() {
in := gen(2, 3)
// Distribute the sq work across two goroutines that both read from in.
c1 := sq(in)
c2 := sq(in)
// Consume the first value from output.
out := merge(c1, c2)
fmt.Println(<-out) // 4 or 9
return
// Apparently if we had not set the merge out buffer size to 1
// then we would have a hanging go routine.
}
Now, if you notice line 2 in merge, it says we make the out chan with buffer size 1, because this is enough space for the unread inputs. However, I'm almost positive that we should allocate a chan with buffer size 2. In accordance with this code sample:
c := make(chan int, 2) // buffer size 2
c <- 1 // succeeds immediately
c <- 2 // succeeds immediately
c <- 3 // blocks until another goroutine does <-c and receives 1
Because this section implies that a chan of buffer size 3 would not block. Can anyone please clarify/assist my understanding?
The program sends two values to the channel out and reads one value from the channel out. One of the values is not received.
If the channel is unbuffered (capacity 0), then one of the sending goroutines will block until the program exits. This is a leak.
If the channel is created with a capacity of 1, then both goroutines can send to the channel and exit. The first value sent to the channel is received by main. The second value remains in the channel.
If the main function does not receive a value from the channel out, then a channel of capacity 2 is required to prevent the goroutines from blocking indefinitely.

Resources