all goroutines are asleep in my async code - go

I read this and this and this but none of them solving my issue..
I'm trying to read 2 files async, so I wrote the below:
//readlines.go
package main
import (
"bufio"
"os"
)
// readLines reads a whole file into memory
// and returns a slice of its lines.
func readLines(path string) ([]string, error) {
file, err := os.Open(path)
if err != nil {
return nil, err
}
defer file.Close()
var lines []string
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines = append(lines, scanner.Text())
}
return lines, scanner.Err()
}
And calling it as:
package main
import (
"fmt"
"os"
"github.com/gocarina/gocsv"
)
func (s *stocks) Read() {
fmt.Println("Reading")
stockFile, err := os.OpenFile("current_invenory.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
panic(err)
}
defer stockFile.Close()
stocks := []systemStock{}
if err := gocsv.UnmarshalFile(stockFile, &stocks); err != nil { // Load stocks from file
panic(err)
}
*s = stocks
}
package main
import (
"fmt"
"os"
"github.com/gocarina/gocsv"
)
func (t *transactions) Read() {
fmt.Println("Reading")
trxFile, err := os.OpenFile("current_transactions.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
panic(err)
}
defer trxFile.Close()
trx := []systemTransactions{}
if err := gocsv.UnmarshalFile(trxFile, &trx); err != nil { // Load stocks from file
panic(err)
}
*t = trx
}
The above working very fine with:
stock := stocks{}
trx := transactions{}
stock.Read()
trx.Read()
for _, s := range stock {
fmt.Println("Hello", s.Code)
}
But give the error fatal error: all goroutines are asleep - deadlock! when I tried to read them as:
cs, ct := readData()
for _, s := range cs {
fmt.Println("Hello", s.Code)
}
for _, t := range ct {
fmt.Println("Hello trx of ", t.Code)
}
Using
import "sync"
//func readData(cs chan stocks, ct chan transactions) (stocks, transactions) {
func readData() (stocks, transactions) {
var wg sync.WaitGroup
defer wg.Done()
stock := stocks{}
trx := transactions{}
wg.Add(1)
go stock.Read()
wg.Add(1)
go trx.Read()
wg.Wait()
return stock, trx
}
So the error is related for something wrong I made (or do not understand) in the last block~

To run the Read methods for stocks and transactions concurrently, these methods need to have a way of signaling when they are finished executing. This can be done in a lot of ways, but here are two which require the least modifications to your code.
Solution 1
Use the sync.WaitGroup package. With this package, the Read methods should execute wg.Done() statement when they are done with executing. It should look something like this:
func (s *stocks) Read(wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Reading")
stockFile, err := os.OpenFile("current_invenory.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
panic(err)
}
defer stockFile.Close()
stocks := []systemStock{}
if err := gocsv.UnmarshalFile(stockFile, &stocks); err != nil { // Load stocks from file
panic(err)
}
*s = stocks
}
func (t *transactions) Read(wg *sync.WaitGroup) {
defer wg.Done()
fmt.Println("Reading")
trxFile, err := os.OpenFile("current_transactions.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
panic(err)
}
defer trxFile.Close()
trx := []systemTransactions{}
if err := gocsv.UnmarshalFile(trxFile, &trx); err != nil { // Load stocks from file
panic(err)
}
*t = trx
}
func readData() (stocks, transactions) {
var wg sync.WaitGroup
wg.Add(2)
stock := stocks{}
trx := transactions{}
go stock.Read(&wg)
go trx.Read(&wg)
wg.Wait()
return stock, trx
}
Solution 2
This approach uses the golang.org/x/sync/errgroup package. In this case, you do not need to handle the synchronization and signaling yourself, but functions that are added with errgroup.Go method need to have a strict func() error signature. Your code should look like this:
func (s *stocks) Read() error {
fmt.Println("Reading")
stockFile, err := os.OpenFile("current_invenory.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
return err
}
defer stockFile.Close()
stocks := []systemStock{}
if err := gocsv.UnmarshalFile(stockFile, &stocks); err != nil { // Load stocks from file
return err
}
*s = stocks
return nil
}
func (t *transactions) Read() error {
fmt.Println("Reading")
trxFile, err := os.OpenFile("current_transactions.csv", os.O_RDWR|os.O_CREATE, os.ModePerm)
if err != nil {
return err
}
defer trxFile.Close()
trx := []systemTransactions{}
if err := gocsv.UnmarshalFile(trxFile, &trx); err != nil { // Load stocks from file
return err
}
*t = trx
return nil
}
func readData() (stocks, transactions) {
g, _ := errgroup.WithContext(context.Background())
stock := stocks{}
trx := transactions{}
g.Go(stock.Read)
g.Go(trx.Read)
if err:= g.Wait(); err != nil {
panic(err)
}
return stock, trx
}
Solution 3
You’re (correctly) adding 1 to the wait group when you start reading from each CSV, bringing the wait group’s internal counter to 2, but wg.Wait() will wait until that counter goes down to zero and you don’t have any calls to wg.Done() to do that. I recommend changing go stock.Read() to:
go func() {
defer wg Done()
stock.Read()
}()
So, the full working code be:
func readData() (stocks, transactions) {
var wg sync.WaitGroup
stock := stocks{}
trx := transactions{}
wg.Add(1)
go func() {
defer wg.Done()
stock.Read()
}()
wg.Add(1)
go func() {
defer wg.Done()
trx.Read()
}()
wg.Wait()
return stock, trx
}

Related

Concurrently process a lot of files and upload to S3 in Go

I'm migrating a lot of files that are currently stored in a relational database to amazon S3. I'm using go because I had heard about the concurrency of it, but I'm getting very low throughput. I'm new to go so I'm probably not doing it in the best way possible.
This is what I have at the moment
type Attachment struct {
BinaryData []byte `db:"BinaryData"`
CreatedAt time.Time `db:"CreatedDT"`
Id int `db:"Id"`
}
func main() {
connString := os.Getenv("CONNECTION_STRING")
log.SetFlags(log.Ltime)
db, err := sqlx.Connect("sqlserver", connString)
if err != nil {
panic(err)
}
log.Print("Connected to database")
sql := "SELECT TOP 1000 Id,CreatedDT, BinaryData FROM Attachment"
attachmentsDb := []Attachment{}
err = db.Select(&attachmentsDb, sql)
if err != nil {
log.Fatal(err)
}
session, err := session.NewSession(&aws.Config{
Region: aws.String("eu-west-1"),
})
if err != nil {
log.Fatal(err)
return
}
svc := s3.New(session)
wg := &sync.WaitGroup{}
for _, att := range attachmentsDb {
done := make(chan error)
go func(wg *sync.WaitGroup, att Attachment, out chan error) {
wg.Add(1)
err := <-saveAttachment(&att, svc)
if err == nil {
log.Printf("CV Written %d", att.Id)
}
wg.Done()
out<-err
}(wg, att, done)
<-done
}
wg.Wait()
//close(in)
defer db.Close()
}
func saveAttachment(att *Attachment, svc *s3.S3 )<-chan error {
out := make(chan error)
bucket := os.Getenv("BUCKET")
go func() {
defer close(out)
key := getKey(att)
input := &s3.PutObjectInput{Bucket: &bucket,
Key: &key,
Body: bytes.NewReader(att.BinaryData),
}
_, err := svc.PutObject(input)
if err != nil {
//log.Fatal(err)
log.Printf("Error uploading CV %d error %v", att.Id, err)
}
out <- err
}()
return out
}
func getKey(att *Attachment) string {
return fmt.Sprintf("%s/%d", os.Getenv("KEY"), att.Id)
}
These loops will executes sequentially because in every loop, it waits for result from channel done so there aren't any benifit from running multiple goroutines. And no need to create a new goroutine in func saveAttachment(), because you already create it in the loops.
func main() {
//....
svc := s3.New(session)
wg := &sync.WaitGroup{}
for _, att := range attachmentsDb {
done := make(chan error)
//New goroutine
go func(wg *sync.WaitGroup, att Attachment, out chan error) {
wg.Add(1)
//Already in a goroutine now, but in func saveAttachment() will create a new goroutine?
err := <-saveAttachment(&att, svc) //There is a goroutine created in this func
if err == nil {
log.Printf("CV Written %d", att.Id)
}
wg.Done()
out<-err
}(wg, att, done)
<-done //This will block until receives the result, after that a new loop countinues
}
}
func saveAttachment(att *Attachment, svc *s3.S3 )<-chan error {
out := make(chan error)
bucket := os.Getenv("BUCKET")
//Why new goroutine?
go func() {
defer close(out)
key := getKey(att)
input := &s3.PutObjectInput{Bucket: &bucket,
Key: &key,
Body: bytes.NewReader(att.BinaryData),
}
_, err := svc.PutObject(input)
if err != nil {
//log.Fatal(err)
log.Printf("Error uploading CV %d error %v", att.Id, err)
}
out <- err
}()
return out
}
If you want to upload in parallel, don't do that. You can quickly fix it like this
func main() {
//....
svc := s3.New(session)
wg := &sync.WaitGroup{}
//Number of goroutines = number of attachments
for _, att := range attachmentsDb {
wg.Add(1)
//One goroutine to uploads for each Attachment
go func(wg *sync.WaitGroup, att Attachment) {
err := saveAtt(&att, svc)
if err == nil {
log.Printf("CV Written %d", att.Id)
}
wg.Done()
}(wg, att)
//No blocking after created a goroutine, loops countines to create new goroutine
}
wg.Wait()
fmt.Println("done")
}
//This func will be executed in goroutine, so no need to create a goroutine inside it
func saveAtt(att *Attachment, svc *s3.S3) error {
bucket := os.Getenv("BUCKET")
key := getKey(att)
input := &s3.PutObjectInput{Bucket: &bucket,
Key: &key,
Body: bytes.NewReader(att.BinaryData),
}
_, err := svc.PutObject(input)
if err != nil {
log.Printf("Error uploading CV %d error %v", att.Id, err)
}
return err
}
But this approach isn't good when there are so many attachments beacause number of goroutines = number of attachments. In this case, you will need a goroutine pool so you can limit number of goroutines to run.
Warining!!!, This is just an example to show goroutine pool logic, you need to implement it by your way
//....
//Create a attachment queue
queue := make(chan *Attachment) //Or use buffered channel: queue := make(chan *Attachment, bufferedSize)
//Send all attachment to queue
go func() {
for _, att := range attachmentsDb {
queue <- &att
}
}()
//....
//Create a goroutine pool
svc := s3.New(session)
wg := &sync.WaitGroup{}
//Use this as const
workerCount := 5
//Number of goroutines = Number of workerCount
for i := 1; i <= workerCount; i++ {
//New goroutine
go func() {
//Get attachment from queue to upload. When the queue channel is empty, this code will blocks
for att := range queue {
err := saveAtt(att, svc)
if err == nil {
log.Printf("CV Written %d", att.Id)
}
}
}()
}
//....
//Warning!!! You need to call close channel only WHEN all attachments was uploaded, this code just show how you can end the goroutine pool
//Just close queue channel when all attachments was uploaded, all upload goroutines will end (because of `att := range queue`)
close(queue)
//....

Optimize writing to CSV in Go

The following snippet validates a phone number and write the details to CSV.
func Parse(phone Input, output *PhoneNumber) error {
var n PhoneNumber
num, _ := phonenumbers.Parse(phone.Number, phone.Prefix)
n.PhoneNumber = phonenumbers.Format(num, phonenumbers.E164)
n.CountryCode = num.GetCountryCode()
n.PhoneType = phonenumbers.GetNumberType(num)
n.NetworkName, _ = phonenumbers.GetCarrierForNumber(num, "EN")
n.Region = phonenumbers.GetRegionCodeForNumber(num)
*output = n
return nil
}
func createFile(path string) {
// detect if file exists
var _, err = os.Stat(path)
// create file if not exists
if os.IsNotExist(err) {
var file, err = os.Create(path)
if err != nil {
return
}
defer file.Close()
}
}
func worker(ctx context.Context, dst chan string, src chan []string) {
for {
select {
case dataArray, ok := <-src: // you must check for readable state of the channel.
if !ok {
return
}
go processNumber(dataArray[0])
case <-ctx.Done(): // if the context is cancelled, quit.
return
}
}
}
func processNumber(number string) {
num, e := phonenumbers.Parse(number, "")
if e != nil {
return
}
region := phonenumbers.GetRegionCodeForNumber(num)
carrier, _ := phonenumbers.GetCarrierForNumber(num, "EN")
path := "sample_all.csv"
createFile(path)
var csvFile, _ = os.OpenFile(path, os.O_APPEND|os.O_WRONLY, os.ModeAppend)
csvwriter := csv.NewWriter(csvFile)
_ = csvwriter.Write([]string{phonenumbers.Format(num, phonenumbers.E164), fmt.Sprintf("%v", num.GetCountryCode()), fmt.Sprintf("%v", phonenumbers.GetNumberType(num)), carrier, region})
defer csvFile.Close()
csvwriter.Flush()
}
func ParseFile(phone Input, output *PhoneNumber) error {
// create a context
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// that cancels at ctrl+C
go onSignal(os.Interrupt, cancel)
numberOfWorkers := 2
start := time.Now()
csvfile, err := os.Open(phone.File)
if err != nil {
log.Fatal(err)
}
defer csvfile.Close()
reader := csv.NewReader(csvfile)
// create the pair of input/output channels for the controller=>workers com.
src := make(chan []string)
out := make(chan string)
// use a waitgroup to manage synchronization
var wg sync.WaitGroup
// declare the workers
for i := 0; i < numberOfWorkers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
worker(ctx, out, src)
}()
}
// read the csv and write it to src
go func() {
for {
record, err := reader.Read()
if err == io.EOF {
break
} else if err != nil {
log.Fatal(err)
}
src <- record // you might select on ctx.Done().
}
close(src) // close src to signal workers that no more job are incoming.
}()
// wait for worker group to finish and close out
go func() {
wg.Wait() // wait for writers to quit.
close(out) // when you close(out) it breaks the below loop.
}()
// drain the output
for res := range out {
fmt.Println(res)
}
fmt.Printf("\n%2fs", time.Since(start).Seconds())
return nil
}
In processNumber function, if I skip writing to CSV, the process of verifying number completes 6 seconds but writing one record at a time on CSV stretch the time consumption to 15s.
How can I optimize the code?
Can I chunk the records and write them in chunks instead of writing one row at a time?
Do work directly in worker goroutine instead of firing off goroutine per task.
Open file output file once. Flush output file once.
func worker(ctx context.Context, dst chan []string, src chan []string) {
for {
select {
case dataArray, ok := <-src: // you must check for readable state of the channel.
if !ok {
return
}
dst <- processNumber(dataArray[0])
case <-ctx.Done(): // if the context is cancelled, quit.
return
}
}
}
func processNumber(number string) []string {
num, e := phonenumbers.Parse(number, "")
if e != nil {
return
}
region := phonenumbers.GetRegionCodeForNumber(num)
carrier, _ := phonenumbers.GetCarrierForNumber(num, "EN")
return []string{phonenumbers.Format(num, phonenumbers.E164), fmt.Sprintf("%v", num.GetCountryCode()), fmt.Sprintf("%v", phonenumbers.GetNumberType(num)), carrier, region}
}
func ParseFile(phone Input, output *PhoneNumber) error {
// create a context
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// that cancels at ctrl+C
go onSignal(os.Interrupt, cancel)
numberOfWorkers := 2
start := time.Now()
csvfile, err := os.Open(phone.File)
if err != nil {
log.Fatal(err)
}
defer csvfile.Close()
reader := csv.NewReader(csvfile)
// create the pair of input/output channels for the controller=>workers com.
src := make(chan []string)
out := make(chan string)
// use a waitgroup to manage synchronization
var wg sync.WaitGroup
// declare the workers
for i := 0; i < numberOfWorkers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
worker(ctx, out, src)
}()
}
// read the csv and write it to src
go func() {
for {
record, err := reader.Read()
if err == io.EOF {
break
} else if err != nil {
log.Fatal(err)
}
src <- record // you might select on ctx.Done().
}
close(src) // close src to signal workers that no more job are incoming.
}()
// wait for worker group to finish and close out
go func() {
wg.Wait() // wait for writers to quit.
close(out) // when you close(out) it breaks the below loop.
}()
path := "sample_all.csv"
file, err := os.Create(path)
if err != nil {
return err
}
defer file.Close()
csvwriter := csv.NewWriter(csvFile)
// drain the output
for res := range out {
csvwriter.Write(res)
}
csvwriter.Flush()
fmt.Printf("\n%2fs", time.Since(start).Seconds())
return nil
}

go use fsnotify watch file ischanged not working

I use the library (http://github.com/fsnotify/fsnotify) to monitor the file system.
I m trying to adjust the repository example to match my requirements, but when i do so the program is not working anymore.
I commented the done channel within the ExampleNewWatcher function
done := make(chan bool)
<-done
As a result, now when i run the example, this channel does not output anything anymore.
event, ok := <-watcher.Events
Complete code:
package main
import (
"github.com/fsnotify/fsnotify"
"log"
"os"
"strconv"
"time"
)
func ExampleNewWatcher() {
watcher, err := fsnotify.NewWatcher()
if err != nil {
log.Fatal(err)
}
defer watcher.Close()
done := make(chan bool) // if i commen this line the `panic` not norking
go func() {
for {
select {
case event, ok := <-watcher.Events:
if !ok {
return
}
log.Println("event:", event)
if event.Op&fsnotify.Write == fsnotify.Write {
log.Println("modified file:", event.Name)
}
panic("just for test") // new output this line
case err, ok := <-watcher.Errors:
if !ok {
return
}
log.Println("error:", err)
}
}
}()
err = watcher.Add("/tmp/foo")
if err != nil {
log.Fatal(err)
}
<-done // comment this
}
func check(err error) {
if err != nil {
panic(err)
}
}
func main() {
// create the test file
var err error
file, err := os.Create("/tmp/foo")
check(err)
_, err = file.Write([]byte("hello world"))
check(err)
stopchan := make(chan struct{})
// test change file
go func() {
for i := 0; i < 10; i++ {
file1, err := os.OpenFile("/tmp/foo", os.O_RDWR, 0644)
check(err)
d := strconv.Itoa(i) + "hello world"
_, err = file1.Write([]byte(d))
check(err)
err = file1.Close()
check(err)
time.Sleep(2 * time.Second) // wait the context writed to the file
}
}()
ExampleNewWatcher() // monitor file
stopchan <- struct{}{}
}

how to repeat shutting down and establish go routine?

every one,I am new to golang.I wanna get the data from log file generated by my application.cuz roll-back mechanism, I met some problem.For instance,my target log file is chats.log,it will be renamed to chats.log.2018xxx and a new chats.log will be created.so my go routine that read log file will fail to work.
so I need detect the change and shutdown the previous go routine and then establish the new go routine.
I looked for modules that can help me,and I found
func ExampleNewWatcher(fn string, createnoti chan string, wg sync.WaitGroup) {
wg.Add(1)
defer wg.Done()
watcher, err := fsnotify.NewWatcher()
if err != nil {
log.Fatal(err)
}
defer watcher.Close()
done := make(chan bool)
go func() {
for {
select {
case event := <-watcher.Events:
if event.Op == fsnotify.Create && event.Name==fn{
createnoti <- "has been created"
}
case err := <-watcher.Errors:
log.Println("error:", err)
}
}
}()
err = watcher.Add("./")
if err != nil {
log.Fatal(err)
}
<-done
}
I use fsnotify to detech the change,and make sure the event of file is my log file,and then send some message to a channel.
this is my worker go routine:
func tailer(fn string,isfollow bool, outchan chan string, done <-chan interface{},wg sync.WaitGroup) error {
wg.Add(1)
defer wg.Done()
_, err := os.Stat(fn)
if err != nil{
panic(err)
}
t, err := tail.TailFile(fn, tail.Config{Follow:isfollow})
if err != nil{
panic(err)
}
defer t.Stop()
for line := range t.Lines{
select{
case outchan <- line.Text:
case <- done:
return nil
}
}
return nil
}
I using tail module to read the log file,and I add a done channel to it to shutdown the cycle(I don't know whether I put it in the right way)
And I will send every log content to a channel to consuming it.
So here is the question:how should I put it together?
ps: Actually,I can use some tool to do this job.like apache-flume,but all of those tools need dependency.
Thank you a lot!
Here is a complete example that reloads and rereads the file as it changes or gets deleted and recreated:
package main
import (
"github.com/fsnotify/fsnotify"
"io/ioutil"
"log"
)
const filename = "myfile.txt"
func ReadFile(filename string) string {
data, err := ioutil.ReadFile(filename)
if err != nil {
log.Println(err)
}
return string(data)
}
func main() {
watcher, err := fsnotify.NewWatcher()
if err != nil {
log.Fatal(err)
}
defer watcher.Close()
err = watcher.Add("./")
if err != nil {
log.Fatal(err)
}
for {
select {
case event := <-watcher.Events:
if event.Op == fsnotify.Create && event.Name == filename {
log.Println(ReadFile(filename))
}
case err := <-watcher.Errors:
log.Println("error:", err)
}
}
}
Note this doesn't require goroutines, channels or a WaitGroup. Better to keep things simple and reserve those for when they're actually needed.

golang sync.WaitGroup never completes

I have the below code that fetches a list of URL's and then conditionally downloads a file and saves it to the filesystem. The files are fetched concurrently and the main goroutine waits for all the files to be fetched. But, the program never exits (and there are no errors) after completing all the requests.
What I think is happening is that somehow the amount of go routines in the WaitGroup is either incremented by too many to begin with (via Add) or not decremented by enough (a Done call is not happening).
Is there something I am obviously doing wrong? How would I inspect how many go routines are presently in the WaitGroup so I can better debug what's happening?
package main
import (
"fmt"
"io"
"io/ioutil"
"net/http"
"os"
"strings"
"sync"
)
func main() {
links := parseLinks()
var wg sync.WaitGroup
for _, url := range links {
if isExcelDocument(url) {
wg.Add(1)
go downloadFromURL(url, wg)
} else {
fmt.Printf("Skipping: %v \n", url)
}
}
wg.Wait()
}
func downloadFromURL(url string, wg sync.WaitGroup) error {
tokens := strings.Split(url, "/")
fileName := tokens[len(tokens)-1]
fmt.Printf("Downloading %v to %v \n", url, fileName)
content, err := os.Create("temp_docs/" + fileName)
if err != nil {
fmt.Printf("Error while creating %v because of %v", fileName, err)
return err
}
resp, err := http.Get(url)
if err != nil {
fmt.Printf("Could not fetch %v because %v", url, err)
return err
}
defer resp.Body.Close()
_, err = io.Copy(content, resp.Body)
if err != nil {
fmt.Printf("Error while saving %v from %v", fileName, url)
return err
}
fmt.Printf("Download complete for %v \n", fileName)
defer wg.Done()
return nil
}
func isExcelDocument(url string) bool {
return strings.HasSuffix(url, ".xlsx") || strings.HasSuffix(url, ".xls")
}
func parseLinks() []string {
linksData, err := ioutil.ReadFile("links.txt")
if err != nil {
fmt.Printf("Trouble reading file: %v", err)
}
links := strings.Split(string(linksData), ", ")
return links
}
There are two problems with this code. First, you have to pass a pointer to the WaitGroup to downloadFromURL(), otherwise the object will be copied and Done() will not be visible in main().
See:
func main() {
...
go downloadFromURL(url, &wg)
...
}
Second, defer wg.Done() should be one of the first statements in downloadFromURL(), otherwise if you return from the function before that statement, it won't get "registered" and won't get called.
func downloadFromURL(url string, wg *sync.WaitGroup) error {
defer wg.Done()
...
}
Arguments in Go are always passed by value. Use a pointer when an argument may be modified. Also, make sure that you always execute wg.Done().For example,
package main
import (
"fmt"
"io"
"io/ioutil"
"net/http"
"os"
"strings"
"sync"
)
func main() {
links := parseLinks()
wg := new(sync.WaitGroup)
for _, url := range links {
if isExcelDocument(url) {
wg.Add(1)
go downloadFromURL(url, wg)
} else {
fmt.Printf("Skipping: %v \n", url)
}
}
wg.Wait()
}
func downloadFromURL(url string, wg *sync.WaitGroup) error {
defer wg.Done()
tokens := strings.Split(url, "/")
fileName := tokens[len(tokens)-1]
fmt.Printf("Downloading %v to %v \n", url, fileName)
content, err := os.Create("temp_docs/" + fileName)
if err != nil {
fmt.Printf("Error while creating %v because of %v", fileName, err)
return err
}
resp, err := http.Get(url)
if err != nil {
fmt.Printf("Could not fetch %v because %v", url, err)
return err
}
defer resp.Body.Close()
_, err = io.Copy(content, resp.Body)
if err != nil {
fmt.Printf("Error while saving %v from %v", fileName, url)
return err
}
fmt.Printf("Download complete for %v \n", fileName)
return nil
}
func isExcelDocument(url string) bool {
return strings.HasSuffix(url, ".xlsx") || strings.HasSuffix(url, ".xls")
}
func parseLinks() []string {
linksData, err := ioutil.ReadFile("links.txt")
if err != nil {
fmt.Printf("Trouble reading file: %v", err)
}
links := strings.Split(string(linksData), ", ")
return links
}
As #Bartosz mentioned, you will need to pass a reference to your WaitGroup object. He did a great job discussing the importance of defer ws.Done()
I like WaitGroup's simplicity. However, I do not like that we need to pass the reference to the goroutine because that would mean that the concurrency logic would be mixed with your business logic.
So I came up with this generic function to solve this problem for me:
// Parallelize parallelizes the function calls
func Parallelize(functions ...func()) {
var waitGroup sync.WaitGroup
waitGroup.Add(len(functions))
defer waitGroup.Wait()
for _, function := range functions {
go func(copy func()) {
defer waitGroup.Done()
copy()
}(function)
}
}
So your example could be solved this way:
func main() {
links := parseLinks()
functions := []func(){}
for _, url := range links {
if isExcelDocument(url) {
function := func(url string){
return func() { downloadFromURL(url) }
}(url)
functions = append(functions, function)
} else {
fmt.Printf("Skipping: %v \n", url)
}
}
Parallelize(functions...)
}
func downloadFromURL(url string) {
...
}
If you would like to use it, you can find it here https://github.com/shomali11/util

Resources