Selecting between time interval and length of channel - go

I'm here to find out the most idiomatic way to do the follow task.
Task:
Write data from a channel to a file.
Problem:
I have a channel ch := make(chan int, 100)
I need to read from the channel and write the values I read from the channel to a file. My question is basically how do I do so given that
If channel ch is full, write the values immediately
If channel ch is not full, write every 5s.
So essentially, data needs to be written to the file at least every 5s (assuming that data will be filled into the channel at least every 5s)
Whats the best way to use select, for and range to do my above task?
Thanks!

There is no such "event" as "buffer of channel is full", so you can't detect that [*]. This means you can't idiomatically solve your problem with language primitives using only 1 channel.
[*] Not entirely true: you could detect if the buffer of a channel is full by using select with default case when sending on the channel, but that requires logic from the senders, and repetitive attempts to send.
I would use another channel from which I would receive as values are sent on it, and "redirect", store the values in another channel which has a buffer of 100 as you mentioned. At each redirection you may check if the internal channel's buffer is full, and if so, do an immediate write. If not, continue to monitor the "incoming" channel and a timer channel with a select statement, and if the timer fires, do a "regular" write.
You may use len(chInternal) to check how many elements are in the chInternal channel, and cap(chInternal) to check its capacity. Note that this is "safe" as we are the only goroutine handling the chInternal channel. If there would be multiple goroutines, value returned by len(chInternal) could be outdated by the time we use it to something (e.g. comparing it).
In this solution chInternal (as its name says) is for internal use only. Others should only send values on ch. Note that ch may or may not be a buffered channel, solution works in both cases. However, you may improve efficiency if you also give some buffer to ch (so chances that senders get blocked will be lower).
var (
chInternal = make(chan int, 100)
ch = make(chan int) // You may (should) make this a buffered channel too
)
func main() {
delay := time.Second * 5
timer := time.NewTimer(delay)
for {
select {
case v := <-ch:
chInternal <- v
if len(chInternal) == cap(chInternal) {
doWrite() // Buffer is full, we need to write immediately
timer.Reset(delay)
}
case <-timer.C:
doWrite() // "Regular" write: 5 seconds have passed since last write
timer.Reset(delay)
}
}
}
If an immediate write happens (due to a "buffer full" situation), this solution will time the next "regular" write 5 seconds after this. If you don't want this and you want the 5-second regular writes be independent from the immediate writes, simply do not reset the timer following the immediate write.
An implementation of doWrite() may be as follows:
var f *os.File // Make sure to open file for writing
func doWrite() {
for {
select {
case v := <-chInternal:
fmt.Fprintf(f, "%d ", v) // Write v to the file
default: // Stop when no more values in chInternal
return
}
}
}
We can't use for ... range as that only returns when the channel is closed, but our chInternal channel is not closed. So we use a select with a default case so when no more values are in the buffer of chInternal, we return.
Improvements
Using a slice instead of 2nd channel
Since the chInternal channel is only used by us, and only on a single goroutine, we may also choose to use a single []int slice instead of a channel (reading/writing a slice is much faster than a channel).
Showing only the different / changed parts, it could look something like this:
var (
buf = make([]int, 0, 100)
)
func main() {
// ...
for {
select {
case v := <-ch:
buf = append(buf, v)
if len(buf) == cap(buf) {
// ...
}
}
func doWrite() {
for _, v := range buf {
fmt.Fprintf(f, "%d ", v) // Write v to the file
}
buf = buf[:0] // "Clear" the buffer
}
With multiple goroutines
If we stick to leave chInternal a channel, the doWrite() function may be called on another goroutine to not block the other one, e.g. go doWrite(). Since data to write is read from a channel (chInternal), this requires no further synchronization.

if you just use 5 seconds write, to increase the file write performance,
you may fill the channel any time you need,
then writer goroutine writes that data to the buffered file,
see this very simple and idiomatic sample without using timer
with just using for...range:
package main
import (
"bufio"
"fmt"
"os"
"sync"
)
var wg sync.WaitGroup
func WriteToFile(filename string, ch chan int) {
f, e := os.Create(filename)
if e != nil {
panic(e)
}
w := bufio.NewWriterSize(f, 4*1024*1024)
defer wg.Done()
defer f.Close()
defer w.Flush()
for v := range ch {
fmt.Fprintf(w, "%d ", v)
}
}
func main() {
ch := make(chan int, 100)
wg.Add(1)
go WriteToFile("file.txt", ch)
for i := 0; i < 500000; i++ {
ch <- i // do the job
}
close(ch) // Finish the job and close output file
wg.Wait()
}
and notice the defers order.
and in case of 5 seconds write, you may add one interval timer just to flush the buffer of this file to the disk, like this:
package main
import (
"bufio"
"fmt"
"os"
"sync"
"time"
)
var wg sync.WaitGroup
func WriteToFile(filename string, ch chan int) {
f, e := os.Create(filename)
if e != nil {
panic(e)
}
w := bufio.NewWriterSize(f, 4*1024*1024)
ticker := time.NewTicker(5 * time.Second)
quit := make(chan struct{})
go func() {
for {
select {
case <-ticker.C:
if w.Buffered() > 0 {
fmt.Println(w.Buffered())
w.Flush()
}
case <-quit:
ticker.Stop()
return
}
}
}()
defer wg.Done()
defer f.Close()
defer w.Flush()
defer close(quit)
for v := range ch {
fmt.Fprintf(w, "%d ", v)
}
}
func main() {
ch := make(chan int, 100)
wg.Add(1)
go WriteToFile("file.txt", ch)
for i := 0; i < 25; i++ {
ch <- i // do the job
time.Sleep(500 * time.Millisecond)
}
close(ch) // Finish the job and close output file
wg.Wait()
}
here I used time.NewTicker(5 * time.Second) for interval timer with quit channel, you may use time.AfterFunc() or time.Tick() or time.Sleep().
with some optimizations ( removing quit channel):
package main
import (
"bufio"
"fmt"
"os"
"sync"
"time"
)
var wg sync.WaitGroup
func WriteToFile(filename string, ch chan int) {
f, e := os.Create(filename)
if e != nil {
panic(e)
}
w := bufio.NewWriterSize(f, 4*1024*1024)
ticker := time.NewTicker(5 * time.Second)
defer wg.Done()
defer f.Close()
defer w.Flush()
for {
select {
case v, ok := <-ch:
if ok {
fmt.Fprintf(w, "%d ", v)
} else {
fmt.Println("done.")
ticker.Stop()
return
}
case <-ticker.C:
if w.Buffered() > 0 {
fmt.Println(w.Buffered())
w.Flush()
}
}
}
}
func main() {
ch := make(chan int, 100)
wg.Add(1)
go WriteToFile("file.txt", ch)
for i := 0; i < 25; i++ {
ch <- i // do the job
time.Sleep(500 * time.Millisecond)
}
close(ch) // Finish the job and close output file
wg.Wait()
}
I hope this helps.

Related

Running a maximum of two go routines continuously forever

I'm trying to run a function concurrently. It makes a call to my DB that may take 2-10 seconds. I would like it to continue on to the next routine once it has finished, even if the other one is still processing, but only ever want it be processing a max of 2 at a time. I want this to happen indefinitely. I feel like I'm almost there, but waitGroup forces both routines to wait until completion prior to continuing to another iteration.
const ROUTINES = 2;
for {
var wg sync.WaitGroup
_, err:= db.Exec(`Random DB Call`)
if err != nil {
panic(err)
}
ch := createRoutines(db, &wg)
wg.Add(ROUTINES)
for i := 1; i <= ROUTINES; i++ {
ch <- i
time.Sleep(2 * time.Second)
}
close(ch)
wg.Wait()
}
func createRoutines(db *sqlx.DB, wg *sync.WaitGroup) chan int {
var ch = make(chan int, 5)
for i := 0; i < ROUTINES ; i++ {
go func(db *sqlx.DB) {
defer wg.Done()
for {
_, ok := <-ch
if !ok {
return
}
doStuff(db)
}
}(db)
}
return ch
}
If you need to only have n number of goroutines running at the same time, you can have a buffered channel of size n and use that to block creating new goroutines when there is no space left, something like this
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
const ROUTINES = 2
rand.Seed(time.Now().UnixNano())
stopper := make(chan struct{}, ROUTINES)
var counter int
for {
counter++
stopper <- struct{}{}
go func(c int) {
fmt.Println("+ Starting goroutine", c)
time.Sleep(time.Duration(rand.Intn(3)) * time.Second)
fmt.Println("- Stopping goroutine", c)
<-stopper
}(counter)
}
}
In this example you see how you can only have ROUTINES number of goroutines that live 0, 1 or 2 seconds. In the output you can also see how every time one goroutine ends another one starts.
This adds an external dependency, but consider this implementation:
package main
import (
"context"
"database/sql"
"log"
"github.com/MicahParks/ctxerrpool"
)
func main() {
// Create a pool of 2 workers for database queries. Log any errors.
databasePool := ctxerrpool.New(2, func(_ ctxerrpool.Pool, err error) {
log.Printf("Failed to execute database query.\nError: %s", err.Error())
})
// Get a list of queries to execute.
queries := []string{
"SELECT first_name, last_name FROM customers",
"SELECT price FROM inventory WHERE sku='1234'",
"other queries...",
}
// TODO Make a database connection.
var db *sql.DB
for _, query := range queries {
// Intentionally shadow the looped variable for scope.
query := query
// Perform the query on a worker. If no worker is ready, it will block until one is.
databasePool.AddWorkItem(context.TODO(), func(workCtx context.Context) (err error) {
_, err = db.ExecContext(workCtx, query)
return err
})
}
// Wait for all workers to finish.
databasePool.Wait()
}

How to always get the latest value from a Go channel?

I'm starting out with Go and I'm now writing a simple program which reads out data from a sensor and puts that into a channel to do some calculations with it. I now have it working as follows:
package main
import (
"fmt"
"time"
"strconv"
)
func get_sensor_data(c chan float64) {
time.Sleep(1 * time.Second) // wait a second before sensor data starts pooring in
c <- 2.1 // Sensor data starts being generated
c <- 2.2
c <- 2.3
c <- 2.4
c <- 2.5
}
func main() {
s := 1.1
c := make(chan float64)
go get_sensor_data(c)
for {
select {
case s = <-c:
fmt.Println("the next value of s from the channel: " + strconv.FormatFloat(s, 'f', 1, 64))
default:
// no new values in the channel
}
fmt.Println(s)
time.Sleep(500 * time.Millisecond) // Do heavy "work"
}
}
This works fine, but the sensor generates a lot of data, and I'm always only interested in the latest data. With this setup however, it only reads out the next item with every loop, which means that if the channel at some point contains 20 values, the newest value only is read out after 10 seconds.
Is there a way for a channel to always only contain one value at a time, so that I always only get the data I'm interested in, and no unnecessary memory is used by the channel (although the memory is the least of my worries)?
Channels are best thought of as queues (FIFO). Therefore you can't really skip around. However there are libraries out there that do stuff like this: https://github.com/cloudfoundry/go-diodes is an atomic ring buffer that will overwrite old data. You can set a smaller size if you like.
All that being said, it doesn't sound like you need a queue (or ring buffer). You just need a mutex:
type SensorData struct{
mu sync.RWMutex
last float64
}
func (d *SensorData) Store(data float64) {
mu.Lock()
defer mu.Unlock()
d.last = data
}
func (d *SensorData) Get() float64 {
mu.RLock()
defer mu.RUnlock()
return d.last
}
This uses a RWMutex which means many things can read from it at the same time while only a single thing can write. It will store a single entry much like you said.
No. Channels are FIFO buffers, full stop. That is how channels work and their only purpose. If you only want the latest value, consider just using a single variable protected by a mutex; write to it whenever new data comes in, and whenever you read it, you will always be reading the latest value.
Channels serves a specific purpose. You might want to use a code that is inside a lock and update the variable whenever new value is to be set.
This way reciever will always get the latest value.
You cannot get that from one channel directly, but you can use one channel per value and get notified when there are new values:
package main
import (
"fmt"
"strconv"
"sync"
"time"
)
type LatestChannel struct {
n float64
next chan struct{}
mu sync.Mutex
}
func New() *LatestChannel {
return &LatestChannel{next: make(chan struct{})}
}
func (c *LatestChannel) Push(n float64) {
c.mu.Lock()
c.n = n
old := c.next
c.next = make(chan struct{})
c.mu.Unlock()
close(old)
}
func (c *LatestChannel) Get() (float64, <-chan struct{}) {
c.mu.Lock()
n := c.n
next := c.next
c.mu.Unlock()
return n, next
}
func getSensorData(c *LatestChannel) {
time.Sleep(1 * time.Second)
c.Push(2.1)
time.Sleep(100 * time.Millisecond)
c.Push(2.2)
time.Sleep(100 * time.Millisecond)
c.Push(2.3)
time.Sleep(100 * time.Millisecond)
c.Push(2.4)
time.Sleep(100 * time.Millisecond)
c.Push(2.5)
}
func main() {
s := 1.1
c := New()
_, hasNext := c.Get()
go getSensorData(c)
for {
select {
case <-hasNext:
s, hasNext = c.Get()
fmt.Println("the next value of s from the channel: " + strconv.FormatFloat(s, 'f', 1, 64))
default:
// no new values in the channel
}
fmt.Println(s)
time.Sleep(250 * time.Millisecond) // Do heavy "work"
}
}
If you do not need the notify about new value, you can try to read Channels inside channels pattern in Golang.
Try this package https://github.com/subbuv26/chanup
It allows the producer to update the channel with latest value, which replaces the latest value. And produces does not get blocked. (with this, stale values gets overridden).
So, on the consumer side, always only the latest item gets read.
import "github.com/subbuv26/chanup"
ch := chanup.GetChan()
_ := ch.Put(testType{
a: 10,
s: "Sample",
})
_ := ch.Update(testType{
a: 20,
s: "Sample2",
})
// Continue updating with latest values
...
...
// On consumer end
val := ch.Get()
// val contains latest value
There is another way to solve this problem (trick)
sender work faster: sender remove channel if channel_length > 1
go func() {
for {
msg:=strconv.Itoa(int(time.Now().Unix()))
fmt.Println("make: ",msg," at:",time.Now())
messages <- msg
if len(messages)>1{
//remove old message
<-messages
}
time.Sleep(2*time.Second)
}
}()
receiver work slower:
go func() {
for {
channLen :=len(messages)
fmt.Println("len is ",channLen)
fmt.Println("received",<-messages)
time.Sleep(10*time.Second)
}
}()
OR, we can delete old message from receiver side
(read message like delete it)
There is an elegant channel-only solution. If you're OK with adding one more channel and goroutine - you can introduce a buferless channel and a goroutine that tries to send the latest value from your channel to it:
package main
import (
"fmt"
"time"
)
func wrapLatest(ch <-chan int) <-chan int {
result := make(chan int) // important that this one i unbuffered
go func() {
defer close(result)
value, ok := <-ch
if !ok {
return
}
LOOP:
for {
select {
case value, ok = <-ch:
if !ok {
return
}
default:
break LOOP
}
}
for {
select {
case value, ok = <-ch:
if !ok {
return
}
case result <- value:
if value, ok = <-ch; !ok {
return
}
}
}
}()
return result
}
func main() {
sendChan := make(chan int, 10) // may be buffered or not
for i := 0; i < 10; i++ {
sendChan <- i
}
go func() {
for i := 10; i < 20; i++ {
sendChan <- i
time.Sleep(time.Second)
}
close(sendChan)
}()
recvChan := wrapLatest(sendChan)
for i := range recvChan {
fmt.Println(i)
time.Sleep(time.Second * 2)
}
}

Why is there a fatal error: all goroutines are asleep - deadlock! in this code?

This is in reference to following code in The Go Programming Language - Chapter 8 p.238 copied below from this link
// makeThumbnails6 makes thumbnails for each file received from the channel.
// It returns the number of bytes occupied by the files it creates.
func makeThumbnails6(filenames <-chan string) int64 {
sizes := make(chan int64)
var wg sync.WaitGroup // number of working goroutines
for f := range filenames {
wg.Add(1)
// worker
go func(f string) {
defer wg.Done()
thumb, err := thumbnail.ImageFile(f)
if err != nil {
log.Println(err)
return
}
info, _ := os.Stat(thumb) // OK to ignore error
fmt.Println(info.Size())
sizes <- info.Size()
}(f)
}
// closer
go func() {
wg.Wait()
close(sizes)
}()
var total int64
for size := range sizes {
total += size
}
return total
}
Why do we need to put the closer in a goroutine? Why can't below work?
// closer
// go func() {
fmt.Println("waiting for reset")
wg.Wait()
fmt.Println("closing sizes")
close(sizes)
// }()
If I try running above code it gives:
waiting for reset
3547
2793
fatal error: all goroutines are asleep - deadlock!
Why is there a deadlock in above? fyi, In the method that calls makeThumbnail6 I do close the filenames channel
Your channel is unbuffered (you didn't specify any buffer size when make()ing the channel). This means that a write to the channel blocks until the value written is read. And you read from the channel after your call to wg.Wait(), so nothing ever gets read and all your goroutines get stuck on the blocking write.
That said, you do not need WaitGroup here. WaitGroups are good when you don't know when your goroutine is done, but you are sending results back, so you know. Here is a sample code that does a similar thing to what you are trying to do (with fake worker payload).
package main
import (
"fmt"
"time"
)
func main() {
var procs int = 0
filenames := []string{"file1", "file2", "file3", "file4"}
mychan := make(chan string)
for _, f := range filenames {
procs += 1
// worker
go func(f string) {
fmt.Printf("Worker processing %v\n", f)
time.Sleep(time.Second)
mychan <- f
}(f)
}
for i := 0; i < procs; i++ {
select {
case msg := <-mychan:
fmt.Printf("got %v from worker channel\n", msg)
}
}
}
Test it in the playground here https://play.golang.org/p/RtMkYbAqtGO
Although it's been a while since the question was raised, I encountered the same issue. Initially my main looked like the following:
func main() {
filenames := make(chan string, len(os.Args))
for _, f := range os.Args[1:] {
filenames <- f
}
sizes := makeThumbnails6(filenames)
close(filenames)
log.Println("Total size: ", sizes)}
This version deadlocks as the call range filenames in makeThumbnails6 is synchronous, thus close(filenames) in main was never called. The channel in makeThumbnails6 is unbuffered so goroutines block when trying to send back the size.
The solution was to move close(filenames) before making the function call in main.
The code is wrong. In short, the channel sizes is unbuffered. To fix it, we need to use a buffered channel with enough capacity when creating sizes. One-liner fix is enough, as shown. Here I just made a simple assumption that 1024 is big enough.
func makeThumbnails6(filenames chan string) int64 {
sizes := make(chan int64, 1024) // CHANGE
var wg sync.WaitGroup // number of working goroutines
for f := range filenames {
wg.Add(1)
// worker
go func(f string) {
defer wg.Done()
thumb, err := thumbnail.ImageFile(f)
if err != nil {
log.Println(err)
return
}
info, _ := os.Stat(thumb) // OK to ignore error
fmt.Println(info.Size())
sizes <- info.Size()
}(f)
}
// closer
go func() {
wg.Wait()
close(sizes)
}()
var total int64
for size := range sizes {
total += size
}
return total
}

Idiomatic goroutine termination and error handling

I have a simple concurrency use case in go, and I cannot figure out an elegant solution to my problem.
I want to write a method fetchAll that queries an unspecified number of resources from remote servers in parallel. If any of the fetches fails, I want to return that first error immediately.
My initial implementation leaks goroutines:
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
func fetchAll() error {
wg := sync.WaitGroup{}
errs := make(chan error)
leaks := make(map[int]struct{})
defer fmt.Println("these goroutines leaked:", leaks)
// run all the http requests in parallel
for i := 0; i < 4; i++ {
leaks[i] = struct{}{}
wg.Add(1)
go func(i int) {
defer wg.Done()
defer delete(leaks, i)
// pretend this does an http request and returns an error
time.Sleep(time.Duration(rand.Intn(100)) * time.Millisecond)
errs <- fmt.Errorf("goroutine %d's error returned", i)
}(i)
}
// wait until all the fetches are done and close the error
// channel so the loop below terminates
go func() {
wg.Wait()
close(errs)
}()
// return the first error
for err := range errs {
if err != nil {
return err
}
}
return nil
}
func main() {
fmt.Println(fetchAll())
}
Playground: https://play.golang.org/p/Be93J514R5
I know from reading https://blog.golang.org/pipelines that I can create a signal channel to cleanup the other threads. Alternatively, I could probably use context to accomplish it. But it seems like such a simple use case should have a simpler solution that I'm missing.
Using Error Group makes this even simpler. This automatically waits for all the supplied Go Routines to complete successfully, or cancels all those remaining in the case of any one routine returning an error (in which case that error is the one bubble back up to the caller).
package main
import (
"context"
"fmt"
"math/rand"
"time"
"golang.org/x/sync/errgroup"
)
func fetchAll(ctx context.Context) error {
errs, ctx := errgroup.WithContext(ctx)
// run all the http requests in parallel
for i := 0; i < 4; i++ {
errs.Go(func() error {
// pretend this does an http request and returns an error
time.Sleep(time.Duration(rand.Intn(100)) * time.Millisecond)
return fmt.Errorf("error in go routine, bailing")
})
}
// Wait for completion and return the first error (if any)
return errs.Wait()
}
func main() {
fmt.Println(fetchAll(context.Background()))
}
All but one of your goroutines are leaked, because they're still waiting to send to the errs channel - you never finish the for-range that empties it. You're also leaking the goroutine who's job is to close the errs channel, because the waitgroup is never finished.
(Also, as Andy pointed out, deleting from map is not thread-safe, so that'd need protection from a mutex.)
However, I don't think maps, mutexes, waitgroups, contexts etc. are even necessary here. I'd rewrite the whole thing to just use basic channel operations, something like the following:
package main
import (
"fmt"
"math/rand"
"time"
)
func fetchAll() error {
var N = 4
quit := make(chan bool)
errc := make(chan error)
done := make(chan error)
for i := 0; i < N; i++ {
go func(i int) {
// dummy fetch
time.Sleep(time.Duration(rand.Intn(100)) * time.Millisecond)
err := error(nil)
if rand.Intn(2) == 0 {
err = fmt.Errorf("goroutine %d's error returned", i)
}
ch := done // we'll send to done if nil error and to errc otherwise
if err != nil {
ch = errc
}
select {
case ch <- err:
return
case <-quit:
return
}
}(i)
}
count := 0
for {
select {
case err := <-errc:
close(quit)
return err
case <-done:
count++
if count == N {
return nil // got all N signals, so there was no error
}
}
}
}
func main() {
rand.Seed(time.Now().UnixNano())
fmt.Println(fetchAll())
}
Playground link: https://play.golang.org/p/mxGhSYYkOb
EDIT: There indeed was a silly mistake, thanks for pointing it out. I fixed the code above (I think...). I also added some randomness for added Realism™.
Also, I'd like to stress that there really are multiple ways to approach this problem, and my solution is but one way. Ultimately it comes down to personal taste, but in general, you want to strive towards "idiomatic" code - and towards a style that feels natural and easy to understand for you.
Here's a more complete example using errgroup suggested by joth. It shows processing successful data, and will exit on the first error.
https://play.golang.org/p/rU1v-Mp2ijo
package main
import (
"context"
"fmt"
"golang.org/x/sync/errgroup"
"math/rand"
"time"
)
func fetchAll() error {
g, ctx := errgroup.WithContext(context.Background())
results := make(chan int)
for i := 0; i < 4; i++ {
current := i
g.Go(func() error {
// Simulate delay with random errors.
time.Sleep(time.Duration(rand.Intn(100)) * time.Millisecond)
if rand.Intn(2) == 0 {
return fmt.Errorf("goroutine %d's error returned", current)
}
// Pass processed data to channel, or receive a context completion.
select {
case results <- current:
return nil
// Close out if another error occurs.
case <-ctx.Done():
return ctx.Err()
}
})
}
// Elegant way to close out the channel when the first error occurs or
// when processing is successful.
go func() {
g.Wait()
close(results)
}()
for result := range results {
fmt.Println("processed", result)
}
// Wait for all fetches to complete.
return g.Wait()
}
func main() {
fmt.Println(fetchAll())
}
As long as each goroutine completes, you won't leak anything. You should create the error channel as buffered with the buffer size equal to the number of goroutines so that the send operations on the channel won't block. Each goroutine should always send something on the channel when it finishes, whether it succeeds or fails. The loop at the bottom can then just iterate for the number of goroutines and return if it gets a non-nil error. You don't need the WaitGroup or the other goroutine that closes the channel.
I think the reason it appears that goroutines are leaking is that you return when you get the first error, so some of them are still running.
By the way, maps are not goroutine safe. If you share a map among goroutines and some of them are making changes to the map, you need to protect it with a mutex.
This answer includes the ability to get the responses back into doneData -
package main
import (
"fmt"
"math/rand"
"os"
"strconv"
)
var doneData []string // responses
func fetchAll(n int, doneCh chan bool, errCh chan error) {
partialDoneCh := make(chan string)
for i := 0; i < n; i++ {
go func(i int) {
if r := rand.Intn(100); r != 0 && r%10 == 0 {
// simulate an error
errCh <- fmt.Errorf("e33or for reqno=" + strconv.Itoa(r))
} else {
partialDoneCh <- strconv.Itoa(i)
}
}(i)
}
// mutation of doneData
for d := range partialDoneCh {
doneData = append(doneData, d)
if len(doneData) == n {
close(partialDoneCh)
doneCh <- true
}
}
}
func main() {
// rand.Seed(1)
var n int
var e error
if len(os.Args) > 1 {
if n, e = strconv.Atoi(os.Args[1]); e != nil {
panic(e)
}
} else {
n = 5
}
doneCh := make(chan bool)
errCh := make(chan error)
go fetchAll(n, doneCh, errCh)
fmt.Println("main: end")
select {
case <-doneCh:
fmt.Println("success:", doneData)
case e := <-errCh:
fmt.Println("failure:", e, doneData)
}
}
Execute using go run filename.go 50 where N=50 i.e amount of parallelism

Always have x number of goroutines running at any time

I see lots of tutorials and examples on how to make Go wait for x number of goroutines to finish, but what I'm trying to do is have ensure there are always x number running, so a new goroutine is launched as soon as one ends.
Specifically I have a few hundred thousand 'things to do' which is processing some stuff that is coming out of MySQL. So it works like this:
db, err := sql.Open("mysql", connection_string)
checkErr(err)
defer db.Close()
rows,err := db.Query(`SELECT id FROM table`)
checkErr(err)
defer rows.Close()
var id uint
for rows.Next() {
err := rows.Scan(&id)
checkErr(err)
go processTheThing(id)
}
checkErr(err)
rows.Close()
Currently that will launch several hundred thousand threads of processTheThing(). What I need is that a maximum of x number (we'll call it 20) goroutines are launched. So it starts by launching 20 for the first 20 rows, and from then on it will launch a new goroutine for the next id the moment that one of the current goroutines has finished. So at any point in time there are always 20 running.
I'm sure this is quite simple/standard, but I can't seem to find a good explanation on any of the tutorials or examples or how this is done.
You may find Go Concurrency Patterns article interesting, especially Bounded parallelism section, it explains the exact pattern you need.
You can use channel of empty structs as a limiting guard to control number of concurrent worker goroutines:
package main
import "fmt"
func main() {
maxGoroutines := 10
guard := make(chan struct{}, maxGoroutines)
for i := 0; i < 30; i++ {
guard <- struct{}{} // would block if guard channel is already filled
go func(n int) {
worker(n)
<-guard
}(i)
}
}
func worker(i int) { fmt.Println("doing work on", i) }
Here I think something simple like this will work :
package main
import "fmt"
const MAX = 20
func main() {
sem := make(chan int, MAX)
for {
sem <- 1 // will block if there is MAX ints in sem
go func() {
fmt.Println("hello again, world")
<-sem // removes an int from sem, allowing another to proceed
}()
}
}
Thanks to everyone for helping me out with this. However, I don't feel that anyone really provided something that both worked and was simple/understandable, although you did all help me understand the technique.
What I have done in the end is I think much more understandable and practical as an answer to my specific question, so I will post it here in case anyone else has the same question.
Somehow this ended up looking a lot like what OneOfOne posted, which is great because now I understand that. But OneOfOne's code I found very difficult to understand at first because of the passing functions to functions made it quite confusing to understand what bit was for what. I think this way makes a lot more sense:
package main
import (
"fmt"
"sync"
)
const xthreads = 5 // Total number of threads to use, excluding the main() thread
func doSomething(a int) {
fmt.Println("My job is",a)
return
}
func main() {
var ch = make(chan int, 50) // This number 50 can be anything as long as it's larger than xthreads
var wg sync.WaitGroup
// This starts xthreads number of goroutines that wait for something to do
wg.Add(xthreads)
for i:=0; i<xthreads; i++ {
go func() {
for {
a, ok := <-ch
if !ok { // if there is nothing to do and the channel has been closed then end the goroutine
wg.Done()
return
}
doSomething(a) // do the thing
}
}()
}
// Now the jobs can be added to the channel, which is used as a queue
for i:=0; i<50; i++ {
ch <- i // add i to the queue
}
close(ch) // This tells the goroutines there's nothing else to do
wg.Wait() // Wait for the threads to finish
}
Create channel for passing data to goroutines.
Start 20 goroutines that processes the data from channel in a loop.
Send the data to the channel instead of starting a new goroutine.
Grzegorz Żur's answer is the most efficient way to do it, but for a newcomer it could be hard to implement without reading code, so here's a very simple implementation:
type idProcessor func(id uint)
func SpawnStuff(limit uint, proc idProcessor) chan<- uint {
ch := make(chan uint)
for i := uint(0); i < limit; i++ {
go func() {
for {
id, ok := <-ch
if !ok {
return
}
proc(id)
}
}()
}
return ch
}
func main() {
runtime.GOMAXPROCS(4)
var wg sync.WaitGroup //this is just for the demo, otherwise main will return
fn := func(id uint) {
fmt.Println(id)
wg.Done()
}
wg.Add(1000)
ch := SpawnStuff(10, fn)
for i := uint(0); i < 1000; i++ {
ch <- i
}
close(ch) //should do this to make all the goroutines exit gracefully
wg.Wait()
}
playground
This is a simple producer-consumer problem, which in Go can be easily solved using channels to buffer the paquets.
To put it simple: create a channel that accept your IDs. Run a number of routines which will read from the channel in a loop then process the ID. Then run your loop that will feed IDs to the channel.
Example:
func producer() {
var buffer = make(chan uint)
for i := 0; i < 20; i++ {
go consumer(buffer)
}
for _, id := range IDs {
buffer <- id
}
}
func consumer(buffer chan uint) {
for {
id := <- buffer
// Do your things here
}
}
Things to know:
Unbuffered channels are blocking: if the item wrote into the channel isn't accepted, the routine feeding the item will block until it is
My example lack a closing mechanism: you must find a way to make the producer to wait for all consumers to end their loop before returning. The simplest way to do this is with another channel. I let you think about it.
I've wrote a simple package to handle concurrency for Golang. This package will help you limit the number of goroutines that are allowed to run concurrently:
https://github.com/zenthangplus/goccm
Example:
package main
import (
"fmt"
"goccm"
"time"
)
func main() {
// Limit 3 goroutines to run concurrently.
c := goccm.New(3)
for i := 1; i <= 10; i++ {
// This function have to call before any goroutine
c.Wait()
go func(i int) {
fmt.Printf("Job %d is running\n", i)
time.Sleep(2 * time.Second)
// This function have to when a goroutine has finished
// Or you can use `defer c.Done()` at the top of goroutine.
c.Done()
}(i)
}
// This function have to call to ensure all goroutines have finished
// after close the main program.
c.WaitAllDone()
}
Also can take a look here: https://github.com/LiangfengChen/goutil/blob/main/concurrent.go
The example can refer the test case.
func TestParallelCall(t *testing.T) {
format := "test:%d"
data := make(map[int]bool)
mutex := sync.Mutex{}
val, err := ParallelCall(1000, 10, func(pos int) (interface{}, error) {
mutex.Lock()
defer mutex.Unlock()
data[pos] = true
return pos, errors.New(fmt.Sprintf(format, pos))
})
for i := 0; i < 1000; i++ {
if _, ok := data[i]; !ok {
t.Errorf("TestParallelCall pos not found: %d", i)
}
if val[i] != i {
t.Errorf("TestParallelCall return value is not right (%d,%v)", i, val[i])
}
if err[i].Error() != fmt.Sprintf(format, i) {
t.Errorf("TestParallelCall error msg is not correct (%d,%v)", i, err[i])
}
}
}

Resources