Concurrently process a lot of files and upload to S3 in Go

Concurrently process a lot of files and upload to S3 in Go - go

I'm migrating a lot of files that are currently stored in a relational database to amazon S3. I'm using go because I had heard about the concurrency of it, but I'm getting very low throughput. I'm new to go so I'm probably not doing it in the best way possible.
This is what I have at the moment
type Attachment struct {
BinaryData []byte `db:"BinaryData"`
CreatedAt time.Time `db:"CreatedDT"`
Id int `db:"Id"`
}
func main() {
connString := os.Getenv("CONNECTION_STRING")
log.SetFlags(log.Ltime)
db, err := sqlx.Connect("sqlserver", connString)
if err != nil {
panic(err)
}
log.Print("Connected to database")
sql := "SELECT TOP 1000 Id,CreatedDT, BinaryData FROM Attachment"
attachmentsDb := []Attachment{}
err = db.Select(&attachmentsDb, sql)
if err != nil {
log.Fatal(err)
}
session, err := session.NewSession(&aws.Config{
Region: aws.String("eu-west-1"),
})
if err != nil {
log.Fatal(err)
return
}
svc := s3.New(session)
wg := &sync.WaitGroup{}
for _, att := range attachmentsDb {
done := make(chan error)
go func(wg *sync.WaitGroup, att Attachment, out chan error) {
wg.Add(1)
err := <-saveAttachment(&att, svc)
if err == nil {
log.Printf("CV Written %d", att.Id)
}
wg.Done()
out<-err
}(wg, att, done)
<-done
}
wg.Wait()
//close(in)
defer db.Close()
}
func saveAttachment(att *Attachment, svc *s3.S3 )<-chan error {
out := make(chan error)
bucket := os.Getenv("BUCKET")
go func() {
defer close(out)
key := getKey(att)
input := &s3.PutObjectInput{Bucket: &bucket,
Key: &key,
Body: bytes.NewReader(att.BinaryData),
}
_, err := svc.PutObject(input)
if err != nil {
//log.Fatal(err)
log.Printf("Error uploading CV %d error %v", att.Id, err)
}
out <- err
}()
return out
}
func getKey(att *Attachment) string {
return fmt.Sprintf("%s/%d", os.Getenv("KEY"), att.Id)
}

These loops will executes sequentially because in every loop, it waits for result from channel done so there aren't any benifit from running multiple goroutines. And no need to create a new goroutine in func saveAttachment(), because you already create it in the loops.
func main() {
//....
svc := s3.New(session)
wg := &sync.WaitGroup{}
for _, att := range attachmentsDb {
done := make(chan error)
//New goroutine
go func(wg *sync.WaitGroup, att Attachment, out chan error) {
wg.Add(1)
//Already in a goroutine now, but in func saveAttachment() will create a new goroutine?
err := <-saveAttachment(&att, svc) //There is a goroutine created in this func
if err == nil {
log.Printf("CV Written %d", att.Id)
}
wg.Done()
out<-err
}(wg, att, done)
<-done //This will block until receives the result, after that a new loop countinues
}
}
func saveAttachment(att *Attachment, svc *s3.S3 )<-chan error {
out := make(chan error)
bucket := os.Getenv("BUCKET")
//Why new goroutine?
go func() {
defer close(out)
key := getKey(att)
input := &s3.PutObjectInput{Bucket: &bucket,
Key: &key,
Body: bytes.NewReader(att.BinaryData),
}
_, err := svc.PutObject(input)
if err != nil {
//log.Fatal(err)
log.Printf("Error uploading CV %d error %v", att.Id, err)
}
out <- err
}()
return out
}
If you want to upload in parallel, don't do that. You can quickly fix it like this
func main() {
//....
svc := s3.New(session)
wg := &sync.WaitGroup{}
//Number of goroutines = number of attachments
for _, att := range attachmentsDb {
wg.Add(1)
//One goroutine to uploads for each Attachment
go func(wg *sync.WaitGroup, att Attachment) {
err := saveAtt(&att, svc)
if err == nil {
log.Printf("CV Written %d", att.Id)
}
wg.Done()
}(wg, att)
//No blocking after created a goroutine, loops countines to create new goroutine
}
wg.Wait()
fmt.Println("done")
}
//This func will be executed in goroutine, so no need to create a goroutine inside it
func saveAtt(att *Attachment, svc *s3.S3) error {
bucket := os.Getenv("BUCKET")
key := getKey(att)
input := &s3.PutObjectInput{Bucket: &bucket,
Key: &key,
Body: bytes.NewReader(att.BinaryData),
}
_, err := svc.PutObject(input)
if err != nil {
log.Printf("Error uploading CV %d error %v", att.Id, err)
}
return err
}
But this approach isn't good when there are so many attachments beacause number of goroutines = number of attachments. In this case, you will need a goroutine pool so you can limit number of goroutines to run.
Warining!!!, This is just an example to show goroutine pool logic, you need to implement it by your way
//....
//Create a attachment queue
queue := make(chan *Attachment) //Or use buffered channel: queue := make(chan *Attachment, bufferedSize)
//Send all attachment to queue
go func() {
for _, att := range attachmentsDb {
queue <- &att
}
}()
//....
//Create a goroutine pool
svc := s3.New(session)
wg := &sync.WaitGroup{}
//Use this as const
workerCount := 5
//Number of goroutines = Number of workerCount
for i := 1; i <= workerCount; i++ {
//New goroutine
go func() {
//Get attachment from queue to upload. When the queue channel is empty, this code will blocks
for att := range queue {
err := saveAtt(att, svc)
if err == nil {
log.Printf("CV Written %d", att.Id)
}
}
}()
}
//....
//Warning!!! You need to call close channel only WHEN all attachments was uploaded, this code just show how you can end the goroutine pool
//Just close queue channel when all attachments was uploaded, all upload goroutines will end (because of `att := range queue`)
close(queue)
//....

Related

all goroutines are asleep in my async code

I read this and this and this but none of them solving my issue..
I'm trying to read 2 files async, so I wrote the below:
//readlines.go
package main
import (
"bufio"
"os"
)
// readLines reads a whole file into memory
// and returns a slice of its lines.
func readLines(path string) ([]string, error) {
file, err := os.Open(path)
if err != nil {
return nil, err
}
defer file.Close()
var lines []string
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines = append(lines, scanner.Text())
}
return lines, scanner.Err()
}
And calling it as:
package main
import (
"fmt"
"os"
"github.com/gocarina/gocsv"
)
func (s *stocks) Read() {
fmt.Println("Reading")
stockFile, err := os.OpenFile("current_invenory.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
panic(err)
}
defer stockFile.Close()
stocks := []systemStock{}
if err := gocsv.UnmarshalFile(stockFile, &stocks); err != nil { // Load stocks from file
panic(err)
}
*s = stocks
}
package main
import (
"fmt"
"os"
"github.com/gocarina/gocsv"
)
func (t *transactions) Read() {
fmt.Println("Reading")
trxFile, err := os.OpenFile("current_transactions.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
panic(err)
}
defer trxFile.Close()
trx := []systemTransactions{}
if err := gocsv.UnmarshalFile(trxFile, &trx); err != nil { // Load stocks from file
panic(err)
}
*t = trx
}
The above working very fine with:
stock := stocks{}
trx := transactions{}
stock.Read()
trx.Read()
for _, s := range stock {
fmt.Println("Hello", s.Code)
}
But give the error fatal error: all goroutines are asleep - deadlock! when I tried to read them as:
cs, ct := readData()
for _, s := range cs {
fmt.Println("Hello", s.Code)
}
for _, t := range ct {
fmt.Println("Hello trx of ", t.Code)
}
Using
import "sync"
//func readData(cs chan stocks, ct chan transactions) (stocks, transactions) {
func readData() (stocks, transactions) {
var wg sync.WaitGroup
defer wg.Done()
stock := stocks{}
trx := transactions{}
wg.Add(1)
go stock.Read()
wg.Add(1)
go trx.Read()
wg.Wait()
return stock, trx
}
So the error is related for something wrong I made (or do not understand) in the last block~

To run the Read methods for stocks and transactions concurrently, these methods need to have a way of signaling when they are finished executing. This can be done in a lot of ways, but here are two which require the least modifications to your code.
Solution 1
Use the sync.WaitGroup package. With this package, the Read methods should execute wg.Done() statement when they are done with executing. It should look something like this:
func (s *stocks) Read(wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Reading")
stockFile, err := os.OpenFile("current_invenory.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
panic(err)
}
defer stockFile.Close()
stocks := []systemStock{}
if err := gocsv.UnmarshalFile(stockFile, &stocks); err != nil { // Load stocks from file
panic(err)
}
*s = stocks
}
func (t *transactions) Read(wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Reading")
trxFile, err := os.OpenFile("current_transactions.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
panic(err)
}
defer trxFile.Close()
trx := []systemTransactions{}
if err := gocsv.UnmarshalFile(trxFile, &trx); err != nil { // Load stocks from file
panic(err)
}
*t = trx
}
func readData() (stocks, transactions) {
var wg sync.WaitGroup
wg.Add(2)
stock := stocks{}
trx := transactions{}
go stock.Read(&wg)
go trx.Read(&wg)
wg.Wait()
return stock, trx
}
Solution 2
This approach uses the golang.org/x/sync/errgroup package. In this case, you do not need to handle the synchronization and signaling yourself, but functions that are added with errgroup.Go method need to have a strict func() error signature. Your code should look like this:
func (s *stocks) Read() error {
fmt.Println("Reading")
stockFile, err := os.OpenFile("current_invenory.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
return err
}
defer stockFile.Close()
stocks := []systemStock{}
if err := gocsv.UnmarshalFile(stockFile, &stocks); err != nil { // Load stocks from file
return err
}
*s = stocks
return nil
}
func (t *transactions) Read() error {
fmt.Println("Reading")
trxFile, err := os.OpenFile("current_transactions.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
return err
}
defer trxFile.Close()
trx := []systemTransactions{}
if err := gocsv.UnmarshalFile(trxFile, &trx); err != nil { // Load stocks from file
return err
}
*t = trx
return nil
}
func readData() (stocks, transactions) {
g, _ := errgroup.WithContext(context.Background())
stock := stocks{}
trx := transactions{}
g.Go(stock.Read)
g.Go(trx.Read)
if err:= g.Wait(); err != nil {
panic(err)
}
return stock, trx
}
Solution 3
You’re (correctly) adding 1 to the wait group when you start reading from each CSV, bringing the wait group’s internal counter to 2, but wg.Wait() will wait until that counter goes down to zero and you don’t have any calls to wg.Done() to do that. I recommend changing go stock.Read() to:
go func() {
defer wg Done()
stock.Read()
}()
So, the full working code be:
func readData() (stocks, transactions) {
var wg sync.WaitGroup
stock := stocks{}
trx := transactions{}
wg.Add(1)
go func() {
defer wg.Done()
stock.Read()
}()
wg.Add(1)
go func() {
defer wg.Done()
trx.Read()
}()
wg.Wait()
return stock, trx
}

Optimize writing to CSV in Go

The following snippet validates a phone number and write the details to CSV.
func Parse(phone Input, output *PhoneNumber) error {
var n PhoneNumber
num, _ := phonenumbers.Parse(phone.Number, phone.Prefix)
n.PhoneNumber = phonenumbers.Format(num, phonenumbers.E164)
n.CountryCode = num.GetCountryCode()
n.PhoneType = phonenumbers.GetNumberType(num)
n.NetworkName, _ = phonenumbers.GetCarrierForNumber(num, "EN")
n.Region = phonenumbers.GetRegionCodeForNumber(num)
*output = n
return nil
}
func createFile(path string) {
// detect if file exists
var _, err = os.Stat(path)
// create file if not exists
if os.IsNotExist(err) {
var file, err = os.Create(path)
if err != nil {
return
}
defer file.Close()
}
}
func worker(ctx context.Context, dst chan string, src chan []string) {
for {
select {
case dataArray, ok := <-src: // you must check for readable state of the channel.
if !ok {
return
}
go processNumber(dataArray[0])
case <-ctx.Done(): // if the context is cancelled, quit.
return
}
}
}
func processNumber(number string) {
num, e := phonenumbers.Parse(number, "")
if e != nil {
return
}
region := phonenumbers.GetRegionCodeForNumber(num)
carrier, _ := phonenumbers.GetCarrierForNumber(num, "EN")
path := "sample_all.csv"
createFile(path)
var csvFile, _ = os.OpenFile(path, os.O_APPEND|os.O_WRONLY, os.ModeAppend)
csvwriter := csv.NewWriter(csvFile)
_ = csvwriter.Write([]string{phonenumbers.Format(num, phonenumbers.E164), fmt.Sprintf("%v", num.GetCountryCode()), fmt.Sprintf("%v", phonenumbers.GetNumberType(num)), carrier, region})
defer csvFile.Close()
csvwriter.Flush()
}
func ParseFile(phone Input, output *PhoneNumber) error {
// create a context
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// that cancels at ctrl+C
go onSignal(os.Interrupt, cancel)
numberOfWorkers := 2
start := time.Now()
csvfile, err := os.Open(phone.File)
if err != nil {
log.Fatal(err)
}
defer csvfile.Close()
reader := csv.NewReader(csvfile)
// create the pair of input/output channels for the controller=>workers com.
src := make(chan []string)
out := make(chan string)
// use a waitgroup to manage synchronization
var wg sync.WaitGroup
// declare the workers
for i := 0; i < numberOfWorkers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
worker(ctx, out, src)
}()
}
// read the csv and write it to src
go func() {
for {
record, err := reader.Read()
if err == io.EOF {
break
} else if err != nil {
log.Fatal(err)
}
src <- record // you might select on ctx.Done().
}
close(src) // close src to signal workers that no more job are incoming.
}()
// wait for worker group to finish and close out
go func() {
wg.Wait() // wait for writers to quit.
close(out) // when you close(out) it breaks the below loop.
}()
// drain the output
for res := range out {
fmt.Println(res)
}
fmt.Printf("\n%2fs", time.Since(start).Seconds())
return nil
}
In processNumber function, if I skip writing to CSV, the process of verifying number completes 6 seconds but writing one record at a time on CSV stretch the time consumption to 15s.
How can I optimize the code?
Can I chunk the records and write them in chunks instead of writing one row at a time?

Do work directly in worker goroutine instead of firing off goroutine per task.
Open file output file once. Flush output file once.
func worker(ctx context.Context, dst chan []string, src chan []string) {
for {
select {
case dataArray, ok := <-src: // you must check for readable state of the channel.
if !ok {
return
}
dst <- processNumber(dataArray[0])
case <-ctx.Done(): // if the context is cancelled, quit.
return
}
}
}
func processNumber(number string) []string {
num, e := phonenumbers.Parse(number, "")
if e != nil {
return
}
region := phonenumbers.GetRegionCodeForNumber(num)
carrier, _ := phonenumbers.GetCarrierForNumber(num, "EN")
return []string{phonenumbers.Format(num, phonenumbers.E164), fmt.Sprintf("%v", num.GetCountryCode()), fmt.Sprintf("%v", phonenumbers.GetNumberType(num)), carrier, region}
}
func ParseFile(phone Input, output *PhoneNumber) error {
// create a context
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// that cancels at ctrl+C
go onSignal(os.Interrupt, cancel)
numberOfWorkers := 2
start := time.Now()
csvfile, err := os.Open(phone.File)
if err != nil {
log.Fatal(err)
}
defer csvfile.Close()
reader := csv.NewReader(csvfile)
// create the pair of input/output channels for the controller=>workers com.
src := make(chan []string)
out := make(chan string)
// use a waitgroup to manage synchronization
var wg sync.WaitGroup
// declare the workers
for i := 0; i < numberOfWorkers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
worker(ctx, out, src)
}()
}
// read the csv and write it to src
go func() {
for {
record, err := reader.Read()
if err == io.EOF {
break
} else if err != nil {
log.Fatal(err)
}
src <- record // you might select on ctx.Done().
}
close(src) // close src to signal workers that no more job are incoming.
}()
// wait for worker group to finish and close out
go func() {
wg.Wait() // wait for writers to quit.
close(out) // when you close(out) it breaks the below loop.
}()
path := "sample_all.csv"
file, err := os.Create(path)
if err != nil {
return err
}
defer file.Close()
csvwriter := csv.NewWriter(csvFile)
// drain the output
for res := range out {
csvwriter.Write(res)
}
csvwriter.Flush()
fmt.Printf("\n%2fs", time.Since(start).Seconds())
return nil
}

Go routines with kafka consumer channel and context

I have a simple kafka consumer for which I have created a handle and trying to read it using a go routine:
func process(ctx context.Context){
consumer := queueHandle.Consume(topic_ops_req, consumerHandler)
// Get signal for finish
doneCh := make(chan struct{})
go func(consumer chan *sarama.ConsumerMessage, ctx context.Context) {
for {
select {
case msg, ok := <-consumer:
if !ok {
logger.Info("Channel has been closed")
doneCh <- struct{}{}
return
}
var request queue.Request
err := json.Unmarshal(msg.Value, &request)
if err != nil {
logger.Error("consumer unmarshal err", err)
panic(err)
}
res, err := new_process(ctx, request, service) // call another func
if err != nil {
//TODO
}
result = res
doneCh <- struct{}{}
case <-ctx.Done():
logger.Info(fmt.Sprintf("Context ended with err : %s", ctx.Err()))
doneCh <- struct{}{}
}
}
}(consumer, ctx)
<-doneCh
}
The issue I am seeing is that once I introduce the "case <-ctx.Done()", the go routine does not enter the "case msg, ok := <-consumer" and always returns that the context ended. How do I my go func work with both consumer channel and ctx.Done() ?

Confusion regarding channel directions and blocking in Go

In a function definition, if a channel is an argument without a direction, does it have to send or receive something?
func makeRequest(url string, ch chan<- string, results chan<- string) {
start := time.Now()
resp, err := http.Get(url)
defer resp.Body.Close()
if err != nil {
fmt.Printf("%v", err)
}
resp, err = http.Post(url, "text/plain", bytes.NewBuffer([]byte("Hey")))
defer resp.Body.Close()
secs := time.Since(start).Seconds()
if err != nil {
fmt.Printf("%v", err)
}
// Cannot move past this.
ch <- fmt.Sprintf("%f", secs)
results <- <- ch
}
func MakeRequestHelper(url string, ch chan string, results chan string, iterations int) {
for i := 0; i < iterations; i++ {
makeRequest(url, ch, results)
}
for i := 0; i < iterations; i++ {
fmt.Println(<-ch)
}
}
func main() {
args := os.Args[1:]
threadString := args[0]
iterationString := args[1]
url := args[2]
threads, err := strconv.Atoi(threadString)
if err != nil {
fmt.Printf("%v", err)
}
iterations, err := strconv.Atoi(iterationString)
if err != nil {
fmt.Printf("%v", err)
}
channels := make([]chan string, 100)
for i := range channels {
channels[i] = make(chan string)
}
// results aggregate all the things received by channels in all goroutines
results := make(chan string, iterations*threads)
for i := 0; i < threads; i++ {
go MakeRequestHelper(url, channels[i], results, iterations)
}
resultSlice := make([]string, threads*iterations)
for i := 0; i < threads*iterations; i++ {
resultSlice[i] = <-results
}
}
In the above code,
ch <- or <-results
seems to be blocking every goroutine that executes makeRequest.
I am new to concurrency model of Go. I understand that sending to and receiving from a channel blocks but find it difficult what is blocking what in this code.

I'm not really sure that you are doing... It seems really convoluted. I suggest you read up on how to use channels.
https://tour.golang.org/concurrency/2
That being said you have so much going on in your code that it was much easier to just gut it to something a bit simpler. (It can be simplified further). I left comments to understand the code.
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"sync"
"time"
)
// using structs is a nice way to organize your code
type Worker struct {
wg sync.WaitGroup
semaphore chan struct{}
result chan Result
client http.Client
}
// group returns so that you don't have to send to many channels
type Result struct {
duration float64
results string
}
// closing your channels will stop the for loop in main
func (w *Worker) Close() {
close(w.semaphore)
close(w.result)
}
func (w *Worker) MakeRequest(url string) {
// a semaphore is a simple way to rate limit the amount of goroutines running at any single point of time
// google them, Go uses them often
w.semaphore <- struct{}{}
defer func() {
w.wg.Done()
<-w.semaphore
}()
start := time.Now()
resp, err := w.client.Get(url)
if err != nil {
log.Println("error", err)
return
}
defer resp.Body.Close()
// don't have any examples where I need to also POST anything but the point should be made
// resp, err = http.Post(url, "text/plain", bytes.NewBuffer([]byte("Hey")))
// if err != nil {
// log.Println("error", err)
// return
// }
// defer resp.Body.Close()
secs := time.Since(start).Seconds()
b, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Println("error", err)
return
}
w.result <- Result{duration: secs, results: string(b)}
}
func main() {
urls := []string{"https://facebook.com/", "https://twitter.com/", "https://google.com/", "https://youtube.com/", "https://linkedin.com/", "https://wordpress.org/",
"https://instagram.com/", "https://pinterest.com/", "https://wikipedia.org/", "https://wordpress.com/", "https://blogspot.com/", "https://apple.com/",
}
workerNumber := 5
worker := Worker{
semaphore: make(chan struct{}, workerNumber),
result: make(chan Result),
client: http.Client{Timeout: 5 * time.Second},
}
// use sync groups to allow your code to wait for
// all your goroutines to finish
for _, url := range urls {
worker.wg.Add(1)
go worker.MakeRequest(url)
}
// by declaring wait and close as a seperate goroutine
// I can get to the for loop below and iterate on the results
// in a non blocking fashion
go func() {
worker.wg.Wait()
worker.Close()
}()
// do something with the results channel
for res := range worker.result {
fmt.Printf("Request took %2.f seconds.\nResults: %s\n\n", res.duration, res.results)
}
}

The channels in channels are nil (no make is executed; you make the slice but not the channels), so any send or receive will block. I'm not sure exactly what you're trying to do here, but that's the basic problem.
See https://golang.org/doc/effective_go.html#channels for an explanation of how channels work.

Go channel infinite loop

I am trying to catch errors from a group of goroutines using a channel, but the channel enters an infinite loop, starts consuming CPU.
func UnzipFile(f *bytes.Buffer, location string) error {
zipReader, err := zip.NewReader(bytes.NewReader(f.Bytes()), int64(f.Len()))
if err != nil {
return err
}
if err := os.MkdirAll(location, os.ModePerm); err != nil {
return err
}
errorChannel := make(chan error)
errorList := []error{}
go errorChannelWatch(errorChannel, errorList)
fileWaitGroup := &sync.WaitGroup{}
for _, file := range zipReader.File {
fileWaitGroup.Add(1)
go writeZipFileToLocal(file, location, errorChannel, fileWaitGroup)
}
fileWaitGroup.Wait()
close(errorChannel)
log.Println(errorList)
return nil
}
func errorChannelWatch(ch chan error, list []error) {
for {
select {
case err := <- ch:
list = append(list, err)
}
}
}
func writeZipFileToLocal(file *zip.File, location string, ch chan error, wg *sync.WaitGroup) {
defer wg.Done()
zipFilehandle, err := file.Open()
if err != nil {
ch <- err
return
}
defer zipFilehandle.Close()
if file.FileInfo().IsDir() {
if err := os.MkdirAll(filepath.Join(location, file.Name), os.ModePerm); err != nil {
ch <- err
}
return
}
localFileHandle, err := os.OpenFile(filepath.Join(location, file.Name), os.O_WRONLY|os.O_CREATE|os.O_TRUNC, file.Mode())
if err != nil {
ch <- err
return
}
defer localFileHandle.Close()
if _, err := io.Copy(localFileHandle, zipFilehandle); err != nil {
ch <- err
return
}
ch <- fmt.Errorf("Test error")
}
So I am looping a slice of files and writing them to my disk, when there is an error I report back to the errorChannel to save that error into a slice.
I use a sync.WaitGroup to wait for all goroutines and when they are done I want to print errorList and check if there was any error during the execution.
The list is always empty, even if I add ch <- fmt.Errorf("test") at the end of writeZipFileToLocal and the channel always hangs up.
I am not sure what I am missing here.

1. For the first point, the infinite loop:
Citing from golang language spec:
A receive operation on a closed channel can always proceed
immediately, yielding the element type's zero value after any
previously sent values have been received.
So in this function
func errorChannelWatch(ch chan error, list []error) {
for {
select {
case err := <- ch:
list = append(list, err)
}
}
}
after ch gets closed this turns into an infinite loop adding nil values to list.
Try this instead:
func errorChannelWatch(ch chan error, list []error) {
for err := range ch {
list = append(list, err)
}
}
2. For the second point, why you don't see anything in your error list:
The problem is this call:
errorChannel := make(chan error)
errorList := []error{}
go errorChannelWatch(errorChannel, errorList)
Here you hand errorChannelWatch the errorList as a value. So the slice errorList will not be changed by the function. What is changed, is the underlying array, as long as the append calls don't need to allocate a new one.
To remedy the situation, either hand a slice pointer to errorChannelWatch or rewrite it as a call to a closure, capturing
errorList.
For the first proposed solution, change errorChannelWatch to
func errorChannelWatch(ch chan error, list *[]error) {
for err := range ch {
*list = append(*list, err)
}
}
and the call to
errorChannel := make(chan error)
errorList := []error{}
go errorChannelWatch(errorChannel, &errorList)
For the second proposed solution, just change the call to
errorChannel := make(chan error)
errorList := []error{}
go func() {
for err := range errorChannel {
errorList = append(errorList, err)
}
} ()
3. A minor remark:
One could think, that there is a synchronisation problem here:
fileWaitGroup.Wait()
close(errorChannel)
log.Println(errorList)
How can you be sure, that errorList isn't modified, after the call to close? One could reason, that you can't know, how many values the goroutine errorChannelWatch still has to process.
Your synchronisation seems correct to me, as you do the wg.Done()
after the send to the error channel and so all error values will
be sent, when fileWaitGroup.Wait() returns.
But that can change, if someone later adds a buffering to the error
channel or alters the code.
So I would advise to at least explain the synchronisation in a comment.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Concurrently process a lot of files and upload to S3 in Go - go

Related

all goroutines are asleep in my async code

Optimize writing to CSV in Go

Go routines with kafka consumer channel and context

Confusion regarding channel directions and blocking in Go

Go channel infinite loop

Categories

Resources