Why Go channels limit the buffer size - go

I am new to Go and I might be missing the point but why are Go channels limited in the maximum buffer size buffered channels can have? For example if I make a channel like so
channel := make(chan int, 100)
I cannot add more than 100 elements to the channel without blocking, is there a reason for this? Further they cannot dynamically be resized, because the channel API does not support that.
This seems sort of limiting in the language's support for universal synchronization with a single mechanism since it lacks convenience compared to an unbounded semaphore. For example a generalized semaphore's value can be increased without bounds.

If one component of a program can't keep up with its input, it needs to put back-pressure on the rest of the system, rather than letting it run ahead and generate gigabytes of data that will never get processed because the system ran out of memory and crashed.
There is really no such thing as an unlimited buffer, because machines have limits on what they can handle. Go requires you to specify a size for buffered channels so that you will think about what size buffer your program actually needs and can handle. If it really needs a billion items, and can handle them, you can create a channel that big. But in most cases a buffer size of 0 or 1 is actually what is needed.

This is because Channels are designed for efficient communication between concurrent goroutines, but the need you have is something different: the fact you are blocking denotes that the recipient is not attending the work queue, and "dynamic" is rarely free.
There are a variety of different patterns and algorithms you can use to solve the problem you have: you could change your channel to accept arrays of ints, you could add additional goroutines to better balance or filter work. Or you could implement your own dynamic channel. Doing so is certainly a useful exercise for seeing why dynamic channels aren't a great way to build concurrency.
package main
import "fmt"
func dynamicChannel(initial int) (chan <- interface{}, <- chan interface{}) {
in := make(chan interface{}, initial)
out := make(chan interface{}, initial)
go func () {
defer close(out)
buffer := make([]interface{}, 0, initial)
loop:
for {
packet, ok := <- in
if !ok {
break loop
}
select {
case out <- packet:
continue
default:
}
buffer = append(buffer, packet)
for len(buffer) > 0 {
select {
case packet, ok := <-in:
if !ok {
break loop
}
buffer = append(buffer, packet)
case out <- buffer[0]:
buffer = buffer[1:]
}
}
}
for len(buffer) > 0 {
out <- buffer[0]
buffer = buffer[1:]
}
} ()
return in, out
}
func main() {
in, out := dynamicChannel(4)
in <- 10
fmt.Println(<-out)
in <- 20
in <- 30
fmt.Println(<-out)
fmt.Println(<-out)
for i := 100; i < 120; i++ {
in <- i
}
fmt.Println("queued 100-120")
fmt.Println(<-out)
close(in)
fmt.Println("in closed")
for i := range out {
fmt.Println(i)
}
}
Generally, if you are blocking, it indicates your concurrency is not well balanced. Consider a different strategy. For example, a simple tool to look for files with a matching .checksum file and then check the hashes:
func crawl(toplevel string, workQ chan <- string) {
defer close(workQ)
for _, path := filepath.Walk(toplevel, func (path string, info os.FileInfo, err error) error {
if err == nil && info.Mode().IsRegular() {
workQ <- path
}
}
// if a file has a .checksum file, compare it with the file's checksum.
func validateWorker(workQ <- chan string) {
for path := range workQ {
// If there's a .checksum file, read it, limit to 256 bytes.
expectedSum, err := os.ReadFile(path + ".checksum")[:256]
if err != nil { // ignore
continue
}
file, err := os.Open(path)
if err != nil {
continue
}
defer close(file)
hash := sha256.New()
if _, err := io.Copy(hash, file); err != nil {
log.Printf("couldn't hash %s: %w", path, err)
continue
}
actualSum := fmt.Sprintf("%x", hash.Sum(nil))
if actualSum != expectedSum {
log.Printf("%s: mismatch: expected %s, got %s", path, expectedSum, actualSum)
}
}
Even without any .checksum files, the crawl function will tend to outpace the worker queue. When .checksum files are encountered, especially if the files are large, the worker could take much, much longer to perform a single checksum.
A better aim here would be to achieve more consistent throughput by reducing the number of things the "validateWorker" does. Right now it is sometimes fast, because it checks for the checksum file. Othertimes it is slow, because it also loads has to read and checksum the files.
type ChecksumQuery struct {
Filepath string
ExpectSum string
}
func crawl(toplevel string, workQ chan <- ChecksumQuery) {
// Have a worker filter out files which don't have .checksums, and allow it
// to get a little ahead of the crawl function.
checkupQ := make(chan string, 4)
go func () {
defer close(workQ)
for path := range checkupQ {
expected, err := os.ReadFile(path + ".checksum")[:256]
if err == nil && len(expected) > 0 {
workQ <- ChecksumQuery{ path, string(expected) }
}
}
}()
go func () {
defer close(checkupQ)
for _, path := filepath.Walk(toplevel, func (path string, info os.FileInfo, err error) error {
if err == nil && info.Mode().IsRegular() {
checkupQ <- path
}
}
}()
}
Run a suitable number of validate workers, assign the workQ a suitable size, but if the crawler or validate functions block, it is because validate is doing useful work.
If your validate workers are all busy, they are all consuming large files from disk and hashing them. Having other workers interrupt this by crawling for more filenames, allocating and passing strings, isn't advantageous.
Other scenarios might be passing large lists to workers, in which pass the slices over channels (its cheap); or dynamic sized groups of things, in which case consider passing channels or captures over channels.

The buffer size is the number of elements that can be sent to the channel without the send blocking. By default, a channel has a buffer size of 0 (you get this with make(chan int)). This means that every single send will block until another goroutine receives from the channel. A channel of buffer size 1 can hold 1 element until sending blocks, so you'd get
c := make(chan int, 1)
c <- 1 // doesn't block
c <- 2 // blocks until another goroutine receives from the channel
I suggest you to look this for more clarification:
https://rogpeppe.wordpress.com/2010/02/10/unlimited-buffering-with-low-overhead/
http://openmymind.net/Introduction-To-Go-Buffered-Channels/

Related

What's the idiomatic way to avoid channel deadlocks in a recursive worker pool chain?

Suppose you have a basic toy system that finds and processes all files in a directory (for some definition of "processes"). A basic diagram of how it operates could look like:
If this were a real-world distributed system, the "arrows" could actually be unbounded queues, and then it just works.
In a self-contained go application, it's tempting to model the "arrows" as channels. However, due to the self-referential nature of "generating more work by needing to list subdirectories", it's easy to see that a naive implementation would deadlock. For example (untested, forgive compile errors):
func ListDirWorker(dirs, files chan string) {
for dir := range dirs {
for _, path := range ListDir(dir) {
if isDir(path) {
dirs <- path
} else {
files <- path
}
}
}
}
}
If we imagine we've configured just a single List worker, all it takes is for a directory to have two subdirectories to basically deadlock this thing.
My brain wants there to be "unbounded channels" in golang, but the creators don't want that. What's the correct idiomatic way to model this stuff? I imagine there's something simpler than implementing a thread-safe queue and using that instead of channels. :)
Had a very similar problem to solve. Needed:
finite number of recursive workers (bounded parallelism)
content.Context for early cancelations (enforce timeout limits etc.)
partial results (some goroutines hit errors while others did not)
crawl completion (worker clean-up etc.) via recursive depth tracking
Below I describe the problem and the gist of the solution I arrived at
Problem: scrape a HR LDAP directory with no pagination support. Server-side limits also precluded bulk queries greater than 100K records. Needed small queries to work around these limitations. So recursively navigated the tree from the top (CEO) - listing employees (nodes) and recursing on managers (branches).
To avoid deadlocks - a single workItem channel was used not only by workers to read (get) work, but also to write (delegate) to other idle workers. This approach allowed for fast worker saturation.
Note: not included here, but worth adding, is to use a common API rate-limiter to avoid multiple workers collectively abusing/exceeding any server-side API rate limits.
To start the crawl, create the workers and return a results channel and an error channel. Some notes:
c.in the workItem channel must be unbuffered for delegation to work (more on this later)
c.rwg tracks collective recursion depth for all worker. When it reaches zero, all recursion is done and the crawl is complete
func (c *Crawler) Crawl(ctx context.Context, root Branch, workers int) (<-chan Result, <-chan error) {
errC := make(chan error, 1)
c.rwg = sync.WaitGroup{} // recursion depth waitgroup (to determine when all workers are done)
c.rwg.Add(1) // add to waitgroups *BEFORE* starting workers
c.in = make(chan workItem) // input channel: shared by all workers (to read from and also to write to when they need to delegate)
c.out = make(chan Result) // output channel: where all workers write their results
go func() {
workerErrC := c.createWorkers(ctx, workers)
c.in <- workItem{
branch: root, // initial place to start crawl
}
for err := range workerErrC {
if err != nil {
// tally for partial results - or abort on first error (see werr)
}
}
// summarize crawl success/failure via a single write to errC
errC <- werr // nil, partial results, aborted early etc.
close(errC)
}
return c.out, errC
}
Create a finite number of individual workers. The returned error channel receives an error for each individual worker:
func (c *Crawler) createWorkers(ctx context.Context, workers int) (<-chan error) {
errC := make(chan error)
var wg sync.WaitGroup
wg.Add(workers)
for i := 0; i < workers; i++ {
i := i
go func() {
defer wg.Done()
var err error
defer func() {
errC <- err
}()
conn := Dial("somewhere:8080") // worker prep goes here (open network connect etc.)
for workItem := range c.in {
err = c.recurse(ctx, i+1, conn, workItem)
if err != nil {
return
}
}
}()
}
go func() {
c.rwg.Wait() // wait for all recursion to finish ...
close(c.in) // ... so safe to close input channel ...
wg.Wait() // ... wait for all workers to complete ...
close(errC) // .. finally signal to caller we're truly done
}()
return errC
}
recurse logic:
for any potentially blocking channel write, always check the ctx for cancelation, so we can abort early
c.in is deliberately unbuffered to ensure delegation works (see final note)
func (c *Crawler) recurse(ctx context.Context, workID int, conn *net.Conn, wi workItem) error {
defer c.rwg.Done() // decrement recursion count
select {
case <-ctx.Done():
return ctx.Err() // canceled/timeout etc.
case c.out <- Result{ /* Item: wi.. */}: // write to results channel (manager or employee)
}
items, err := getItems(conn) // WORKER CODE (e.g. get manager employees etc.)
if err != nil {
return err
}
for _, i := range items {
// leaf case
if i.IsLeaf() {
select {
case <-ctx.Done():
return ctx.Err()
case c.out <- Result{ Item: i.Leaf }:
}
continue
}
// branch case
wi := workItem{
branch: i.Branch,
}
c.rwg.Add(1) // about to recurse (or delegate-recursion)
select {
case c.in <- wi:
// delegated to another worker!
case <-ctx.Done(): // context canceled...
c.rwg.Done() // ... so undo above `c.rwg.Add(1)`
return ctx.Err()
default:
// no-one to delegated to (all busy) - so this worker will keep working
err = c.recurse(ctx, workID, conn, wi)
if err != nil {
return err
}
}
}
return nil
}
Delegation is key:
if a worker successfully writes to the worker channel, then it knows work has been delegated to another worker.
if it cannot, then the worker knows all workers are busy working (i.e. not waiting on work items) - and so it must recurse itself
So one gets both the benefits of recursion, but also leveraging a fixed-sized worker pool.

Handle goroutine termination and error handling via error group?

I am trying to read multiple files in parallel in such a way so that each go routine that is reading a file write its data to that channel, then have a single go-routine that listens to that channel and adds the data to the map. Here is my play.
Below is the example from the play:
package main
import (
"fmt"
"sync"
)
func main() {
var myFiles = []string{"file1", "file2", "file3"}
var myMap = make(map[string][]byte)
dataChan := make(chan fileData, len(myFiles))
wg := sync.WaitGroup{}
defer close(dataChan)
// we create a wait group of N
wg.Add(len(myFiles))
for _, file := range myFiles {
// we create N go-routines, one per file, each one will return a struct containing their filename and bytes from
// the file via the dataChan channel
go getBytesFromFile(file, dataChan, &wg)
}
// we wait until the wait group is decremented to zero by each instance of getBytesFromFile() calling waitGroup.Done()
wg.Wait()
for i := 0; i < len(myFiles); i++ {
// we can now read from the data channel N times.
file := <-dataChan
myMap[file.name] = file.bytes
}
fmt.Printf("%+v\n", myMap)
}
type fileData struct {
name string
bytes []byte
}
// how to handle error from this method if reading file got messed up?
func getBytesFromFile(file string, dataChan chan fileData, waitGroup *sync.WaitGroup) {
bytes := openFileAndGetBytes(file)
dataChan <- fileData{name: file, bytes: bytes}
waitGroup.Done()
}
func openFileAndGetBytes(file string) []byte {
return []byte(fmt.Sprintf("these are some bytes for file %s", file))
}
Problem Statement
How can I use golang.org/x/sync/errgroup to wait on and handle errors from goroutines or if there is any better way like using semaphore? For example if any one of my go routine fails to read data from file then I want to cancels all those remaining in the case of any one routine returning an error (in which case that error is the one bubble back up to the caller). And it should automatically waits for all the supplied go routines to complete successfully for success case.
I also don't want to spawn 100 go-routines if total number of files is 100. I want to control the parallelism if possible if there is any way.
How can I use golang.org/x/sync/errgroup to wait on and handle errors from goroutines or if there is any better way like using semaphore? For example [...] I want to cancels all those remaining in the case of any one routine returning an error (in which case that error is the one bubble back up to the caller). And it should automatically waits for all the supplied go routines to complete successfully for success case.
There are many ways to communicate error states across goroutines. errgroup does a bunch of heavy lifting though, and is appropriate for this case. Otherwise you're going to end up implementing the same thing.
To use errgroup we'll need to handle errors (and for your demo, generate some). In addition, to cancel existing goroutines, we'll use a context from errgroup.NewWithContext.
From the errgroup reference,
Package errgroup provides synchronization, error propagation, and Context cancelation for groups of goroutines working on subtasks of a common task.
Your play doesn't support any error handling. We can't collect and cancel on errors if we don't do any error handling. So I added some code to inject error handling:
func openFileAndGetBytes(file string) (string, error) {
if file == "file2" {
return "", fmt.Errorf("%s cannot be read", file)
}
return fmt.Sprintf("these are some bytes for file %s", file), nil
}
Then that error had to be passed back from getBytesFromFile as well:
func getBytesFromFile(file string, dataChan chan fileData) error {
bytes, err := openFileAndGetBytes(file)
if err == nil {
dataChan <- fileData{name: file, bytes: bytes}
}
return err
}
Now that we've done that, we can turn our attention to how we're going to start up a number of goroutines.
I also don't want to spawn 100 go-routines if total number of files is 100. I want to control the parallelism if possible if there is any way.
Written well, the number of tasks, channel size, and number of workers are typically independent values. The trick is to use channel closure - and in your case, context cancellation - to communicate state between the goroutines. We'll need an additional channel for the distribution of filenames, and an additional goroutine for the collection of the results.
To illustate this point, my code uses 3 workers, and adds a few more files. My channels are unbuffered. This allows us to see some of the files get processed, while others are aborted. If you buffer the channels, the example will still work, but it's more likely for additional work to be processed before the cancellation is handled. Experiment with buffer size along with worker count and number of files to process.
var myFiles = []string{"file1", "file2", "file3", "file4", "file5", "file6"}
fileChan := make(chan string)
dataChan := make(chan fileData)
To start up the workers, instead of starting one for each file, we start the number we desire - here, 3.
for i := 0; i < 3; i++ {
worker_num := i
g.Go(func() error {
for file := range fileChan {
if err := getBytesFromFile(file, dataChan); err != nil {
fmt.Println("worker", worker_num, "failed to process", file, ":", err.Error())
return err
} else if err := ctx.Err(); err != nil {
fmt.Println("worker", worker_num, "context error in worker:", err.Error())
return err
}
}
fmt.Println("worker", worker_num, "processed all work on channel")
return nil
})
}
The workers call your getBytesFromFile function. If it returns an err, we return an err. errgroup will cancel our context automatically in this case. However, the exact order of operations is not deterministic, so more files may or may not get processed before the context is cancelled. I'll show several possibilties below.
by rangeing over fileChan, the worker automatically picks up end of work from the channel closure. If we get an error, we can return it to errgroup immediately. Otherwise, if the context has been cancelled, we can return the cancellation error immediately.
You might think that g.Go would automatically cancel our function. But it cannot. There is no way to cancel a running function in Go other than process termination. errgroup.Group.Go's function argument must cancel itself when appropriate based on the state of its context.
Now we can turn our attention to the thing that puts the files on fileChan. We have 2 options here: we can use a buffered channel of the size of myFiles, like you did. We can fill the entire channel with pending jobs. This is only an option if you know the number of jobs when you create the channel. The other option is to use an additional "distribution" goroutine that can block on writes to fileChan so that our "main" goroutine can continue.
// dispatch files
g.Go(func() error {
defer close(fileChan)
done := ctx.Done()
for _, file := range myFiles {
select {
case fileChan <- file:
continue
case <-done:
break
}
}
return ctx.Err()
})
I'm not sure it's strictly necessary to put this in the same errgroup in this case, because we can't get an error in the distributor goroutine. But this general pattern, drawn from the Pipeline example from errgroup, works regardless of whether the work dispatcher might generate errors.
This functions pretty simple, but the magic is in select along with ctx.Done() channel. Either we write to the work channel, or we fail if our context is done. This allows us to stop distributing work when one worker has failed one file.
We defer close(fileChan) so that, regardless of why we have finished (either we distributed all work, or the context was cancelled), the workers know there will be no more work on the incoming work queue (ie fileChan).
We need one more synchronization mechanism: once all the work is distributed, and all the results are in or work was finished being cancelled, (eg, after our errgroup's Wait() returns), we need to close our results channel, dataChan. This signals the results collector that there are no more results to be collected.
var err error // we'll need this later!
go func() {
err = g.Wait()
close(dataChan)
}()
We can't - and don't need to - put this in the errgroup.Group. The function can't return an error, and it can't wait for itself to close(dataChan). So it goes into a regular old goroutine, sans errgroup.
Finally we can collect the results. With dedicated worker goroutines, a distributor goroutine, and a goroutine waiting on the work and notifying that there will be no more writes to the dataChan, we can collect all the results right in the "primary" goroutine in main.
for data := range dataChan {
myMap[data.name] = data.bytes
}
if err != nil { // this was set in our final goroutine, remember
fmt.Println("errgroup Error:", err.Error())
}
I made a few small changes so that it was easier to see the output. You may already have noticed I changed the file contents from []byte to string. This was purely so that the results were easy to read. Pursuant also to that end, I am using encoding/json to format the results so that it is easy to read them and paste them into SO. This is a common pattern that I often use to indent structured data:
enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", " ")
if err := enc.Encode(myMap); err != nil {
panic(err)
}
Finally we're ready to run. Now we can see a number of different results depending on just what order the goroutines execute. But all of them are valid execution paths.
worker 2 failed to process file2 : file2 cannot be read
worker 0 context error in worker: context canceled
worker 1 context error in worker: context canceled
errgroup Error: file2 cannot be read
{
"file1": "these are some bytes for file file1",
"file3": "these are some bytes for file file3"
}
Program exited.
In this result, the remaining work (file4 and file5) were not added to the channel. Remember, an unbuffered channel stores no data. For those tasks to be written to the channel, a worker would have to be there to read them. Instead, the context was cancelled after file2 failed, and the distribution function followed the <-done path within its select. file1 and file3 were already processed.
Here's a different result (I just ran the playground share a few times to get different results).
worker 1 failed to process file2 : file2 cannot be read
worker 2 processed all work on channel
worker 0 processed all work on channel
errgroup Error: file2 cannot be read
{
"file1": "these are some bytes for file file1",
"file3": "these are some bytes for file file3",
"file4": "these are some bytes for file file4",
"file5": "these are some bytes for file file5",
"file6": "these are some bytes for file file6"
}
In this case, it looks a little like our cancellation failed. but what really happened is that the goroutines just "happened" to queue and finish the rest of the work before errorgroup picked upon worker `'s failure and cancelled the context.
what errorgroup does
When you use errorgroup, you're really getting 2 things out of it:
easily accessing the first error your workers returned;
getting a context that errorgroup will cancel for you when
Keep in mind that errorgroup does not cancel goroutines. This tripped me up a bit at first. Errorgroup cancels the context. It's your responsibility to apply the status of that context to your goroutines (remember, the goroutine must end itself, errorgroup can't end it).
A final aside about contexts with file operations, and failing outstanding work
Most of your file operations, eg io.Copy or os.ReadFile, are actually a loop of subsequent Read operations. But io and os don't support contexts directly. so if you have a worker reading a file, and you don't implement the Read loop yourself, you won't have an opportunity to cancel based on context. That's probably okay in your case - sure, you may have read some more files than you really needed to, but only because you were already reading them when the error occurred. I would personally accept this state of affairs and not implement my own read loop.
The code
https://go.dev/play/p/9qfESp_eB-C
package main
import (
"context"
"encoding/json"
"fmt"
"os"
"golang.org/x/sync/errgroup"
)
func main() {
var myFiles = []string{"file1", "file2", "file3", "file4", "file5", "file6"}
fileChan := make(chan string)
dataChan := make(chan fileData)
g, ctx := errgroup.WithContext(context.Background())
for i := 0; i < 3; i++ {
worker_num := i
g.Go(func() error {
for file := range fileChan {
if err := getBytesFromFile(file, dataChan); err != nil {
fmt.Println("worker", worker_num, "failed to process", file, ":", err.Error())
return err
} else if err := ctx.Err(); err != nil {
fmt.Println("worker", worker_num, "context error in worker:", err.Error())
return err
}
}
fmt.Println("worker", worker_num, "processed all work on channel")
return nil
})
}
// dispatch files
g.Go(func() error {
defer close(fileChan)
done := ctx.Done()
for _, file := range myFiles {
if err := ctx.Err(); err != nil {
return err
}
select {
case fileChan <- file:
continue
case <-done:
break
}
}
return ctx.Err()
})
var err error
go func() {
err = g.Wait()
close(dataChan)
}()
var myMap = make(map[string]string)
for data := range dataChan {
myMap[data.name] = data.bytes
}
if err != nil {
fmt.Println("errgroup Error:", err.Error())
}
enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", " ")
if err := enc.Encode(myMap); err != nil {
panic(err)
}
}
type fileData struct {
name,
bytes string
}
func getBytesFromFile(file string, dataChan chan fileData) error {
bytes, err := openFileAndGetBytes(file)
if err == nil {
dataChan <- fileData{name: file, bytes: bytes}
}
return err
}
func openFileAndGetBytes(file string) (string, error) {
if file == "file2" {
return "", fmt.Errorf("%s cannot be read", file)
}
return fmt.Sprintf("these are some bytes for file %s", file), nil
}

How to parallelize a recursive function

I am trying to parallelize a recursive problem in Go, and I am unsure what the best way to do this is.
I have a recursive function, which works like this:
func recFunc(input string) (result []string) {
for subInput := range getSubInputs(input) {
subOutput := recFunc(subInput)
result = result.append(result, subOutput...)
}
result = result.append(result, getOutput(input)...)
}
func main() {
output := recFunc("some_input")
...
}
So the function calls itself N times (where N is 0 at some level), generates its own output and returns everything in a list.
Now I want to make this function run in parallel. But I am unsure what the cleanest way to do this is. My Idea:
Have a "result" channel, to which all function calls send their result.
Collect the results in the main function.
Have a wait group, which determines when all results are collected.
The Problem: I need to wait for the wait group and collect all results in parallel. I can start a separate go function for this, but how do I ever quit this separate go function?
func recFunc(input string) (result []string, outputChannel chan []string, waitGroup &sync.WaitGroup) {
defer waitGroup.Done()
waitGroup.Add(len(getSubInputs(input))
for subInput := range getSubInputs(input) {
go recFunc(subInput)
}
outputChannel <-getOutput(input)
}
func main() {
outputChannel := make(chan []string)
waitGroup := sync.WaitGroup{}
waitGroup.Add(1)
go recFunc("some_input", outputChannel, &waitGroup)
result := []string{}
go func() {
nextResult := <- outputChannel
result = append(result, nextResult ...)
}
waitGroup.Wait()
}
Maybe there is a better way to do this? Or how can I ensure the anonymous go function, that collects the results, is quited when done?
tl;dr;
recursive algorithms should have bounded limits on expensive resources (network connections, goroutines, stack space etc.)
cancelation should be supported - to ensure expensive operations can be cleaned up quickly if a result is no longer needed
branch traversal should support error reporting; this allows errors to bubble up the stack & partial results to be returned without the entire recursion traversal to fail.
For asychronous results - whether using recursions or not - use of channels is recommended. Also, for long running jobs with many goroutines, provide a method for cancelation (context.Context) to aid with clean-up.
Since recursion can lead to exponential consumption of resources it's important to put limits in place (see bounded parallelism).
Below is a design patten I use a lot for asynchronous tasks:
always support taking a context.Context for cancelation
number of workers needed for the task
return a chan of results & a chan error (will only return one error or nil)
var (
workers = 10
ctx = context.TODO() // use request context here - otherwise context.Background()
input = "abc"
)
resultC, errC := recJob(ctx, workers, input) // returns results & `error` channels
// asynchronous results - so read that channel first in the event of partial results ...
for r := range resultC {
fmt.Println(r)
}
// ... then check for any errors
if err := <-errC; err != nil {
log.Fatal(err)
}
Recursion:
Since recursion quickly scales horizontally, one needs a consistent way to fill the finite list of workers with work but also ensure when workers are freed up, that they quickly pick up work from other (over-worked) workers.
Rather than create a manager layer, employ a cooperative peer system of workers:
each worker shares a single inputs channel
before recursing on inputs (subIinputs) check if any other workers are idle
if so, delegate to that worker
if not, current worker continues recursing that branch
With this algorithm, the finite count of workers quickly become saturated with work. Any workers which finish early with their branch - will quickly be delegated a sub-branch from another worker. Eventually all workers will run out of sub-branches, at which point all workers will be idled (blocked) and the recursion task can finish up.
Some careful coordination is needed to achieve this. Allowing the workers to write to the input channel helps with this peer coordination via delegation. A "recursion depth" WaitGroup is used to track when all branches have been exhausted across all workers.
(To include context support and error chaining - I updated your getSubInputs function to take a ctx and return an optional error):
func recFunc(ctx context.Context, input string, in chan string, out chan<- string, rwg *sync.WaitGroup) error {
defer rwg.Done() // decrement recursion count when a depth of recursion has completed
subInputs, err := getSubInputs(ctx, input)
if err != nil {
return err
}
for subInput := range subInputs {
rwg.Add(1) // about to recurse (or delegate recursion)
select {
case in <- subInput:
// delegated - to another goroutine
case <-ctx.Done():
// context canceled...
// but first we need to undo the earlier `rwg.Add(1)`
// as this work item was never delegated or handled by this worker
rwg.Done()
return ctx.Err()
default:
// noone available to delegate - so this worker will need to recurse this item themselves
err = recFunc(ctx, subInput, in, out, rwg)
if err != nil {
return err
}
}
select {
case <-ctx.Done():
// always check context when doing anything potentially blocking (in this case writing to `out`)
// context canceled
return ctx.Err()
case out <- subInput:
}
}
return nil
}
Connecting the Pieces:
recJob creates:
input & output channels - shared by all workers
"recursion" WaitGroup detects when all workers are idle
"output" channel can then safely be closed
error channel for all workers
kicks-off recursion workload by writing initial input to input channel
func recJob(ctx context.Context, workers int, input string) (resultsC <-chan string, errC <-chan error) {
// RW channels
out := make(chan string)
eC := make(chan error, 1)
// R-only channels returned to caller
resultsC, errC = out, eC
// create workers + waitgroup logic
go func() {
var err error // error that will be returned to call via error channel
defer func() {
close(out)
eC <- err
close(eC)
}()
var wg sync.WaitGroup
wg.Add(1)
in := make(chan string) // input channel: shared by all workers (to read from and also to write to when they need to delegate)
workerErrC := createWorkers(ctx, workers, in, out, &wg)
// get the ball rolling, pass input job to one of the workers
// Note: must be done *after* workers are created - otherwise deadlock
in <- input
errCount := 0
// wait for all worker error codes to return
for err2 := range workerErrC {
if err2 != nil {
log.Println("worker error:", err2)
errCount++
}
}
// all workers have completed
if errCount > 0 {
err = fmt.Errorf("PARTIAL RESULT: %d of %d workers encountered errors", errCount, workers)
return
}
log.Printf("All %d workers have FINISHED\n", workers)
}()
return
}
Finally, create the workers:
func createWorkers(ctx context.Context, workers int, in chan string, out chan<- string, rwg *sync.WaitGroup) (errC <-chan error) {
eC := make(chan error) // RW-version
errC = eC // RO-version (returned to caller)
// track the completeness of the workers - so we know when to wrap up
var wg sync.WaitGroup
wg.Add(workers)
for i := 0; i < workers; i++ {
i := i
go func() {
defer wg.Done()
var err error
// ensure the current worker's return code gets returned
// via the common workers' error-channel
defer func() {
if err != nil {
log.Printf("worker #%3d ERRORED: %s\n", i+1, err)
} else {
log.Printf("worker #%3d FINISHED.\n", i+1)
}
eC <- err
}()
log.Printf("worker #%3d STARTED successfully\n", i+1)
// worker scans for input
for input := range in {
err = recFunc(ctx, input, in, out, rwg)
if err != nil {
log.Printf("worker #%3d recurseManagers ERROR: %s\n", i+1, err)
return
}
}
}()
}
go func() {
rwg.Wait() // wait for all recursion to finish
close(in) // safe to close input channel as all workers are blocked (i.e. no new inputs)
wg.Wait() // now wait for all workers to return
close(eC) // finally, signal to caller we're truly done by closing workers' error-channel
}()
return
}
I can start a separate go function for this, but how do I ever quit this separate go function?
You can range over the output channel in the separate go-routine. The go-routine, in that case, will exit safely, when the channel is closed
go func() {
for nextResult := range outputChannel {
result = append(result, nextResult ...)
}
}
So, now the thing that we need to take care of is that the channel is closed after all the go-routines spawned as part of the recursive function call have successfully existed
For that, you can use a shared waitgroup across all the go-routines and wait on that waitgroup in your main function, as you are already doing. Once the wait is over, close the outputChannel, so that the other go-routine also exits safely
func recFunc(input string, outputChannel chan, wg &sync.WaitGroup) {
defer wg.Done()
for subInput := range getSubInputs(input) {
wg.Add(1)
go recFunc(subInput)
}
outputChannel <-getOutput(input)
}
func main() {
outputChannel := make(chan []string)
waitGroup := sync.WaitGroup{}
waitGroup.Add(1)
go recFunc("some_input", outputChannel, &waitGroup)
result := []string{}
go func() {
for nextResult := range outputChannel {
result = append(result, nextResult ...)
}
}
waitGroup.Wait()
close(outputChannel)
}
PS: If you want to have bounded parallelism to limit the exponential growth, check this out

Why is there a fatal error: all goroutines are asleep - deadlock! in this code?

This is in reference to following code in The Go Programming Language - Chapter 8 p.238 copied below from this link
// makeThumbnails6 makes thumbnails for each file received from the channel.
// It returns the number of bytes occupied by the files it creates.
func makeThumbnails6(filenames <-chan string) int64 {
sizes := make(chan int64)
var wg sync.WaitGroup // number of working goroutines
for f := range filenames {
wg.Add(1)
// worker
go func(f string) {
defer wg.Done()
thumb, err := thumbnail.ImageFile(f)
if err != nil {
log.Println(err)
return
}
info, _ := os.Stat(thumb) // OK to ignore error
fmt.Println(info.Size())
sizes <- info.Size()
}(f)
}
// closer
go func() {
wg.Wait()
close(sizes)
}()
var total int64
for size := range sizes {
total += size
}
return total
}
Why do we need to put the closer in a goroutine? Why can't below work?
// closer
// go func() {
fmt.Println("waiting for reset")
wg.Wait()
fmt.Println("closing sizes")
close(sizes)
// }()
If I try running above code it gives:
waiting for reset
3547
2793
fatal error: all goroutines are asleep - deadlock!
Why is there a deadlock in above? fyi, In the method that calls makeThumbnail6 I do close the filenames channel
Your channel is unbuffered (you didn't specify any buffer size when make()ing the channel). This means that a write to the channel blocks until the value written is read. And you read from the channel after your call to wg.Wait(), so nothing ever gets read and all your goroutines get stuck on the blocking write.
That said, you do not need WaitGroup here. WaitGroups are good when you don't know when your goroutine is done, but you are sending results back, so you know. Here is a sample code that does a similar thing to what you are trying to do (with fake worker payload).
package main
import (
"fmt"
"time"
)
func main() {
var procs int = 0
filenames := []string{"file1", "file2", "file3", "file4"}
mychan := make(chan string)
for _, f := range filenames {
procs += 1
// worker
go func(f string) {
fmt.Printf("Worker processing %v\n", f)
time.Sleep(time.Second)
mychan <- f
}(f)
}
for i := 0; i < procs; i++ {
select {
case msg := <-mychan:
fmt.Printf("got %v from worker channel\n", msg)
}
}
}
Test it in the playground here https://play.golang.org/p/RtMkYbAqtGO
Although it's been a while since the question was raised, I encountered the same issue. Initially my main looked like the following:
func main() {
filenames := make(chan string, len(os.Args))
for _, f := range os.Args[1:] {
filenames <- f
}
sizes := makeThumbnails6(filenames)
close(filenames)
log.Println("Total size: ", sizes)}
This version deadlocks as the call range filenames in makeThumbnails6 is synchronous, thus close(filenames) in main was never called. The channel in makeThumbnails6 is unbuffered so goroutines block when trying to send back the size.
The solution was to move close(filenames) before making the function call in main.
The code is wrong. In short, the channel sizes is unbuffered. To fix it, we need to use a buffered channel with enough capacity when creating sizes. One-liner fix is enough, as shown. Here I just made a simple assumption that 1024 is big enough.
func makeThumbnails6(filenames chan string) int64 {
sizes := make(chan int64, 1024) // CHANGE
var wg sync.WaitGroup // number of working goroutines
for f := range filenames {
wg.Add(1)
// worker
go func(f string) {
defer wg.Done()
thumb, err := thumbnail.ImageFile(f)
if err != nil {
log.Println(err)
return
}
info, _ := os.Stat(thumb) // OK to ignore error
fmt.Println(info.Size())
sizes <- info.Size()
}(f)
}
// closer
go func() {
wg.Wait()
close(sizes)
}()
var total int64
for size := range sizes {
total += size
}
return total
}

How to multiplex channel output in go

I'm looking for a solution to multiplex some channel output in go.
I have a source of data which is a read from an io.Reader that I send to a single channel. On the other side I have a websocket request handler that reads from the channel. Now it happens that two clients create a websocket connection, both reading from the same channel but each of them only getting a part of the messages.
Code example (simplified):
func (b *Bootloader) ReadLog() (<-chan []byte, error) {
if b.logCh != nil {
logrus.Warn("ReadLog called while channel already exists!")
return b.logCh, nil // This is where we get problems
}
b.logCh = make(chan []byte, 0)
go func() {
buf := make([]byte, 1024)
for {
n, err := b.p.Read(buf)
if err == nil {
msg := make([]byte, n)
copy(msg, buf[:n])
b.logCh <- msg
} else {
break
}
}
close(b.logCh)
b.logCh = nil
}()
return b.logCh, nil
}
Now when ReadLog() is called twice, the second call just returns the channel created in the first call, which leads to the problem explained above.
The question is: How to do proper multiplexing?
Is it better/easier/more ideomatic to care about the multiplexing on the sending or receiving site?
Should I hide the channel from the receiver and work with callbacks?
I'm a little stuck at the moment. Any hints are welcome.
Mutiplexing is pretty straightforward: make a slice of channels you want to multiplex to, start up a goroutine that reads from the original channel and copies each message to each channel in the slice:
// Really this should be in Bootloader but this is just an example
var consumers []chan []byte
func (b *Bootloader) multiplex() {
// We'll use a sync.once to make sure we don't start a bunch of these.
sync.Once(func(){
go func() {
// Every time a message comes over the channel...
for v := range b.logCh {
// Loop over the consumers...
for _,cons := range consumers {
// Send each one the message
cons <- v
}
}
}()
})
}

Resources