Go routines not executing - go

The following is the code that is giving me problem. What i want to achieve is to create those many tables in parallel. After all the tables are created I want to exit the functions.
func someFunction(){
....
gos := 5
proc := make(chan bool, gos)
allDone := make(chan bool)
for i:=0; i<gos; i++ {
go func() {
for j:=i; j<len(tables); j+=gos {
r, err := db.Exec(tables[j])
fmt.Println(r)
if err != nil {
methods.CheckErr(err, err.Error())
}
}
proc <- true
}()
}
go func() {
for i:=0; i<gos; i++{
<-proc
}
allDone <- true
}()
for {
select {
case <-allDone:
return
}
}
}
I'm creating two channels 1 to keep track of number of tables created (proc) and other (allDone) to see if all are done.
When i run this code then the go routine to create table starts execution but before it completes someFunction gets terminated.
However there is no problem if run the code sequentially
What is the mistake in my design pattern and also how do i correct it.

The usual pattern for what you're trying to achieve uses WaitGroup.
I think the problem you're facing is that i is captured by each goroutine and it keeps getting incremented by the outer loop. Your inner loop starts at i and since the outer loop has continued, each goroutine starts at 5.
Try passing the iterator as parameter to the goroutine so that you get a new copy each time.
func someFunction(){
....
gos := 5
var wg sync.WaitGroup
wg.Add(gos)
for i:=0; i< gos; i++ {
go func(n int) {
defer wg.Done()
for j:=n; j<len(tables); j+=gos {
r, err := db.Exec(tables[j])
fmt.Println(r)
if err != nil {
methods.CheckErr(err, err.Error())
}
}
}(i)
}
wg.Wait();
}
I'm not sure what you're trying to achieve here, each goroutine does db.Exec on all the tables above the one it started with so the first one treats all the tables, the second one treats all but the first one and so on. Is this what you intended?

Related

How do I terminate an infinite loop from inside of a goroutine?

I'm writing an app using Go that is interacting with Spotify's API and I find myself needing to use an infinite for loop to call an endpoint until the length of the returned slice is less than the limit, signalling that I've reached the end of the available entries.
For my user account, there are 1644 saved albums (I determined this by looping through without using goroutines). However, when I add goroutines in, I'm getting back 2544 saved albums with duplicates. I'm also using the semaphore pattern to limit the number of goroutines so that I don't exceed the rate limit.
I assume that the issue is with using the active variable rather than channels, but my attempt at that just resulted in an infinite loop
wg := &sync.WaitGroup{}
sem := make(chan bool, 20)
active := true
offset := 0
for {
sem <- true
if active {
// add each new goroutine to waitgroup
wg.Add(1)
go func() error {
// remove from waitgroup when goroutine is complete
defer wg.Done()
// release the worker
defer func() { <-sem }()
savedAlbums, err := client.CurrentUsersAlbums(ctx, spotify.Limit(50), spotify.Offset(offset))
if err != nil {
return err
}
userAlbums = append(userAlbums, savedAlbums.Albums...)
if len(savedAlbums.Albums) < 50 {
// since the limit is set to 50, we know that if the number of returned albums
// is less than 50 that we're done retrieving data
active = false
return nil
} else {
offset += 50
return nil
}
}()
} else {
wg.Wait()
break
}
}
Thanks in advance!
I suspect that your main issue may be a misunderstanding of what the go keyword does; from the docs:
A "go" statement starts the execution of a function call as an independent concurrent thread of control, or goroutine, within the same address space.
So go func() error { starts the execution of the closure; it does not mean that any of the code runs immediately. In fact because, client.CurrentUsersAlbums will take a while, it's likely you will be requesting the first 50 items 20 times. This can be demonstrated with a simplified version of your application (playground)
func main() {
wg := &sync.WaitGroup{}
sem := make(chan bool, 20)
active := true
offset := 0
for {
sem <- true
if active {
// add each new goroutine to waitgroup
wg.Add(1)
go func() error {
// remove from waitgroup when goroutine is complete
defer wg.Done()
// release the worker
defer func() { <-sem }()
fmt.Println("Getting from:", offset)
time.Sleep(time.Millisecond) // Simulate the query
// Pretend that we got back 50 albums
offset += 50
if offset > 2000 {
active = false
}
return nil
}()
} else {
wg.Wait()
break
}
}
}
Running this will produce somewhat unpredictable results (note that the playground caches results so try it on your machine) but you will probably see 20 X Getting from: 0.
A further issue is data races. Updating a variable from multiple goroutines without protection (e.g. sync.Mutex) results in undefined behaviour.
You will want to know how to fix this but unfortunately you will need to rethink your algorithm. Currently the process you are following is:
Set pos to 0
Get 50 records starting from pos
If we got 50 records then pos=pos+50 and loop back to step 2
This is a sequential algorithm; you don't know whether you have all of the data until you have requested the previous section. I guess you could make speculative queries (and handle failures) but a better solution would be to find some way to determine the number of results expected and then split the queries to get that number of records between multiple goroutines.
Note that if you do know the number of responses then you can do something like the following (playground):
noOfResultsToGet := 1644 // In the below we are getting 0-1643
noOfResultsPerRequest := 50
noOfSimultaneousRequests := 20 // You may not need this but many services will limit the number of simultaneous requests you can make (or, at least, rate limit them)
requestChan := make(chan int) // Will be passed the starting #
responseChan := make(chan []string) // Response from whatever request we are making (can be any type really)
// Start goroutines to make the requests
var wg sync.WaitGroup
wg.Add(noOfSimultaneousRequests)
for i := 0; i < noOfSimultaneousRequests; i++ {
go func(routineNo int) {
defer wg.Done()
for startPos := range requestChan {
// Simulate making the request
maxResult := startPos + noOfResultsPerRequest
if maxResult > noOfResultsToGet {
maxResult = noOfResultsToGet
}
rsp := make([]string, 0, noOfResultsPerRequest)
for x := startPos; x < maxResult; x++ {
rsp = append(rsp, strconv.Itoa(x))
}
responseChan <- rsp
fmt.Printf("Goroutine %d handling data from %d to %d\n", routineNo, startPos, startPos+noOfResultsPerRequest)
}
}(i)
}
// Close the response channel when all goroutines have shut down
go func() {
wg.Wait()
close(responseChan)
}()
// Send the requests
go func() {
for reqFrom := 0; reqFrom < noOfResultsToGet; reqFrom += noOfResultsPerRequest {
requestChan <- reqFrom
}
close(requestChan) // Allow goroutines to exit
}()
// Receive responses (note that these may be out of order)
result := make([]string, 0, noOfResultsToGet)
for x := range responseChan {
result = append(result, x...)
}
// Order the results and output (results from gorouting may come back in any order)
sort.Slice(result, func(i, j int) bool {
a, _ := strconv.Atoi(result[i])
b, _ := strconv.Atoi(result[j])
return a < b
})
fmt.Printf("Result: %v", result)
Relying on channels to pass messages often makes this kind of thing easier to think about and reduces the chance that you will make a mistake.
Set offset as an args -> go func(offset int) error {.
Increment offset by 50 after calling go func
Change active type to chan bool
To avoid data race on userAlbums = append(userAlbums, res...). We need to create channel that same type as userAlbums, then run for loop inside goroutine, then send the results to that channel.
this is the example : https://go.dev/play/p/yzk8qCURZFC
if applied to your code :
wg := &sync.WaitGroup{}
worker := 20
active := make(chan bool, worker)
for i := 0; i < worker; i++ {
active <- true
}
// I assume the type of userAlbums is []string
resultsChan := make(chan []string, worker)
go func() {
offset := 0
for {
if <-active {
// add each new goroutine to waitgroup
wg.Add(1)
go func(offset int) error {
// remove from waitgroup when goroutine is complete
defer wg.Done()
savedAlbums, err := client.CurrentUsersAlbums(ctx, spotify.Limit(50), spotify.Offset(offset))
if err != nil {
// active <- false // maybe you need this
return err
}
resultsChan <- savedAlbums.Albums
if len(savedAlbums.Albums) < 50 {
// since the limit is set to 50, we know that if the number of returned albums
// is less than 50 that we're done retrieving data
active <- false
return nil
} else {
active <- true
return nil
}
}(offset)
offset += 50
} else {
wg.Wait()
close(resultsChan)
break
}
}
}()
for res := range resultsChan {
userAlbums = append(userAlbums, res...)
}

How can I use goroutine to execute for loop?

Suppose I have a slice like:
stu = [{"id":"001","name":"A"} {"id":"002", "name":"B"}] and maybe more elements like this. inside of the slice is a long string, I want to use json.unmarshal to parse it.
type Student struct {
Id string `json:"id"`
Name string `json:"name"`
}
studentList := make([]Student,len(stu))
for i, st := range stu {
go func(st string){
studentList[i], err = getList(st)
if err != nil {
return ... //just example
}
}(st)
}
//and a function like this
func getList(stu string)(res Student, error){
var student Student
err := json.Unmarshal(([]byte)(stu), &student)
if err != nil {
return
}
return &student,nil
}
I got the nil result, so I would say the goroutine is out-of-order to execute, so I don't know if it can use studentList[i] to get value.
Here are a few potential issues with your code:
Value of i is probably not what you expect
for i, st := range stu {
go func(st string){
studentList[i], err = getList(st)
if err != nil {
return ... //just example
}
}(st)
}
You kick off a number of goroutines and, within them, reference i. The issue is that i is likely to have changed between the time you started the goroutine and the time the goroutine references it (the for loop runs concurrently to the goroutines it starts). It is quite possible that the for completes before any of the goroutines do meaning that all output will be stored in the last element of studentList (they will overwrite each other so you will end up with one value).
A simple solution is to pass i into the goroutine function (e.g. go func(st string, i int){}(st, i) (this creates a copy). See this for more info.
Output of studentList
You don't say in the question but I suspect you are running fmt.Println(studentList[1] (or similar) immediately after the for loop completes. As mentioned above it's quite possible that none of the goroutines have completed at that point (or they may of, you don't know). Using a WaitGroup is a fairly easy way around this:
var wg sync.WaitGroup
wg.Add(len(stu))
for i, st := range stu {
go func(st string, i int) {
var err error
studentList[i], err = getList(st)
if err != nil {
panic(err)
}
wg.Done()
}(st, i)
}
wg.Wait()
I have corrected these issues in the playground.
not because of this
the goroutine is out-of-order to execute
There are at least two issues here:
you should not use the for loop variable i in goroutine.
multiple goroutines read i, for loop modify i, it's race condition here. to make i works as expected, change the code to:
for i, st := range stu {
go func(i int, st string){
studentList[i], err = getList(st)
if err != nil {
return ... //just example
}
}(i, st)
}
what's more, use sync.WaitGroup to wait for all goroutine.
var wg sync.WaitGroup
for i, st := range stu {
wg.Add(1)
go func(i int, st string){
defer wg.Done()
studentList[i], err = getList(st)
if err != nil {
return ... //just example
}
}(i, st)
}
wg.Wait()
P.S.: (WARNING: maybe not always true)
this line studentList[i], err = getList(st) ,
although it may not cause data race, but it's somehow not friendly to cpu cache line. better avoid writing code like this.

How to parallelize a recursive function

I am trying to parallelize a recursive problem in Go, and I am unsure what the best way to do this is.
I have a recursive function, which works like this:
func recFunc(input string) (result []string) {
for subInput := range getSubInputs(input) {
subOutput := recFunc(subInput)
result = result.append(result, subOutput...)
}
result = result.append(result, getOutput(input)...)
}
func main() {
output := recFunc("some_input")
...
}
So the function calls itself N times (where N is 0 at some level), generates its own output and returns everything in a list.
Now I want to make this function run in parallel. But I am unsure what the cleanest way to do this is. My Idea:
Have a "result" channel, to which all function calls send their result.
Collect the results in the main function.
Have a wait group, which determines when all results are collected.
The Problem: I need to wait for the wait group and collect all results in parallel. I can start a separate go function for this, but how do I ever quit this separate go function?
func recFunc(input string) (result []string, outputChannel chan []string, waitGroup &sync.WaitGroup) {
defer waitGroup.Done()
waitGroup.Add(len(getSubInputs(input))
for subInput := range getSubInputs(input) {
go recFunc(subInput)
}
outputChannel <-getOutput(input)
}
func main() {
outputChannel := make(chan []string)
waitGroup := sync.WaitGroup{}
waitGroup.Add(1)
go recFunc("some_input", outputChannel, &waitGroup)
result := []string{}
go func() {
nextResult := <- outputChannel
result = append(result, nextResult ...)
}
waitGroup.Wait()
}
Maybe there is a better way to do this? Or how can I ensure the anonymous go function, that collects the results, is quited when done?
tl;dr;
recursive algorithms should have bounded limits on expensive resources (network connections, goroutines, stack space etc.)
cancelation should be supported - to ensure expensive operations can be cleaned up quickly if a result is no longer needed
branch traversal should support error reporting; this allows errors to bubble up the stack & partial results to be returned without the entire recursion traversal to fail.
For asychronous results - whether using recursions or not - use of channels is recommended. Also, for long running jobs with many goroutines, provide a method for cancelation (context.Context) to aid with clean-up.
Since recursion can lead to exponential consumption of resources it's important to put limits in place (see bounded parallelism).
Below is a design patten I use a lot for asynchronous tasks:
always support taking a context.Context for cancelation
number of workers needed for the task
return a chan of results & a chan error (will only return one error or nil)
var (
workers = 10
ctx = context.TODO() // use request context here - otherwise context.Background()
input = "abc"
)
resultC, errC := recJob(ctx, workers, input) // returns results & `error` channels
// asynchronous results - so read that channel first in the event of partial results ...
for r := range resultC {
fmt.Println(r)
}
// ... then check for any errors
if err := <-errC; err != nil {
log.Fatal(err)
}
Recursion:
Since recursion quickly scales horizontally, one needs a consistent way to fill the finite list of workers with work but also ensure when workers are freed up, that they quickly pick up work from other (over-worked) workers.
Rather than create a manager layer, employ a cooperative peer system of workers:
each worker shares a single inputs channel
before recursing on inputs (subIinputs) check if any other workers are idle
if so, delegate to that worker
if not, current worker continues recursing that branch
With this algorithm, the finite count of workers quickly become saturated with work. Any workers which finish early with their branch - will quickly be delegated a sub-branch from another worker. Eventually all workers will run out of sub-branches, at which point all workers will be idled (blocked) and the recursion task can finish up.
Some careful coordination is needed to achieve this. Allowing the workers to write to the input channel helps with this peer coordination via delegation. A "recursion depth" WaitGroup is used to track when all branches have been exhausted across all workers.
(To include context support and error chaining - I updated your getSubInputs function to take a ctx and return an optional error):
func recFunc(ctx context.Context, input string, in chan string, out chan<- string, rwg *sync.WaitGroup) error {
defer rwg.Done() // decrement recursion count when a depth of recursion has completed
subInputs, err := getSubInputs(ctx, input)
if err != nil {
return err
}
for subInput := range subInputs {
rwg.Add(1) // about to recurse (or delegate recursion)
select {
case in <- subInput:
// delegated - to another goroutine
case <-ctx.Done():
// context canceled...
// but first we need to undo the earlier `rwg.Add(1)`
// as this work item was never delegated or handled by this worker
rwg.Done()
return ctx.Err()
default:
// noone available to delegate - so this worker will need to recurse this item themselves
err = recFunc(ctx, subInput, in, out, rwg)
if err != nil {
return err
}
}
select {
case <-ctx.Done():
// always check context when doing anything potentially blocking (in this case writing to `out`)
// context canceled
return ctx.Err()
case out <- subInput:
}
}
return nil
}
Connecting the Pieces:
recJob creates:
input & output channels - shared by all workers
"recursion" WaitGroup detects when all workers are idle
"output" channel can then safely be closed
error channel for all workers
kicks-off recursion workload by writing initial input to input channel
func recJob(ctx context.Context, workers int, input string) (resultsC <-chan string, errC <-chan error) {
// RW channels
out := make(chan string)
eC := make(chan error, 1)
// R-only channels returned to caller
resultsC, errC = out, eC
// create workers + waitgroup logic
go func() {
var err error // error that will be returned to call via error channel
defer func() {
close(out)
eC <- err
close(eC)
}()
var wg sync.WaitGroup
wg.Add(1)
in := make(chan string) // input channel: shared by all workers (to read from and also to write to when they need to delegate)
workerErrC := createWorkers(ctx, workers, in, out, &wg)
// get the ball rolling, pass input job to one of the workers
// Note: must be done *after* workers are created - otherwise deadlock
in <- input
errCount := 0
// wait for all worker error codes to return
for err2 := range workerErrC {
if err2 != nil {
log.Println("worker error:", err2)
errCount++
}
}
// all workers have completed
if errCount > 0 {
err = fmt.Errorf("PARTIAL RESULT: %d of %d workers encountered errors", errCount, workers)
return
}
log.Printf("All %d workers have FINISHED\n", workers)
}()
return
}
Finally, create the workers:
func createWorkers(ctx context.Context, workers int, in chan string, out chan<- string, rwg *sync.WaitGroup) (errC <-chan error) {
eC := make(chan error) // RW-version
errC = eC // RO-version (returned to caller)
// track the completeness of the workers - so we know when to wrap up
var wg sync.WaitGroup
wg.Add(workers)
for i := 0; i < workers; i++ {
i := i
go func() {
defer wg.Done()
var err error
// ensure the current worker's return code gets returned
// via the common workers' error-channel
defer func() {
if err != nil {
log.Printf("worker #%3d ERRORED: %s\n", i+1, err)
} else {
log.Printf("worker #%3d FINISHED.\n", i+1)
}
eC <- err
}()
log.Printf("worker #%3d STARTED successfully\n", i+1)
// worker scans for input
for input := range in {
err = recFunc(ctx, input, in, out, rwg)
if err != nil {
log.Printf("worker #%3d recurseManagers ERROR: %s\n", i+1, err)
return
}
}
}()
}
go func() {
rwg.Wait() // wait for all recursion to finish
close(in) // safe to close input channel as all workers are blocked (i.e. no new inputs)
wg.Wait() // now wait for all workers to return
close(eC) // finally, signal to caller we're truly done by closing workers' error-channel
}()
return
}
I can start a separate go function for this, but how do I ever quit this separate go function?
You can range over the output channel in the separate go-routine. The go-routine, in that case, will exit safely, when the channel is closed
go func() {
for nextResult := range outputChannel {
result = append(result, nextResult ...)
}
}
So, now the thing that we need to take care of is that the channel is closed after all the go-routines spawned as part of the recursive function call have successfully existed
For that, you can use a shared waitgroup across all the go-routines and wait on that waitgroup in your main function, as you are already doing. Once the wait is over, close the outputChannel, so that the other go-routine also exits safely
func recFunc(input string, outputChannel chan, wg &sync.WaitGroup) {
defer wg.Done()
for subInput := range getSubInputs(input) {
wg.Add(1)
go recFunc(subInput)
}
outputChannel <-getOutput(input)
}
func main() {
outputChannel := make(chan []string)
waitGroup := sync.WaitGroup{}
waitGroup.Add(1)
go recFunc("some_input", outputChannel, &waitGroup)
result := []string{}
go func() {
for nextResult := range outputChannel {
result = append(result, nextResult ...)
}
}
waitGroup.Wait()
close(outputChannel)
}
PS: If you want to have bounded parallelism to limit the exponential growth, check this out

Why does the goroutine only run once as part of a waitgroup

func check(name string) string {
resp, err := http.Get(endpoint + name)
if err != nil {
panic(err)
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
panic(err)
}
return string(body)
}
func worker(name string, wg *sync.WaitGroup, names chan string) {
defer wg.Done()
var a = check(name)
names <- a
}
func main() {
names := make(chan string)
var wg sync.WaitGroup
for i := 1; i <= 5; i++ {
wg.Add(1)
go worker("www"+strconv.Itoa(i), &wg, names)
}
fmt.Println(<-names)
}
The expected result would be 5 results but, only one executes and the process ends.
Is there something I am missing? New to go.
The endpoint is a generic API that returns json
You started 5 goroutines, but read input for one only. Also, you are not waiting for your goroutines to end.
// If there are only 5 goroutines unconditionally, you don't need the wg
for i := 1; i <= 5; i++ {
go worker("www"+strconv.Itoa(i), names)
}
for i:=1;i<=5;i++ {
fmt.Println(<-names)
}
If, however, you don't know how many goroutines you're waiting, then the waitgroup is necessary.
for i := 1; i <= 5; i++ {
wg.Add(1)
go worker("www"+strconv.Itoa(i), &wg, names)
}
// Read from the channel until it is closed
done:=make(chan struct{})
go func() {
for x:=range names {
fmt.Println(x)
}
// Signal that println is completed
close(done)
}()
// Wait for goroutines to end
wg.Wait()
// Close the channel to terminate the reader goroutine
close(names)
// Wait until println completes
<-done
You are launching 5 goroutines, but are reading from the names channel only once.
fmt.Println(<-names)
As soon as that first channel read is done, main() exits.
That means everything stops before having the time to be executed.
To know more about channels, see "Concurrency made easy" from Dave Cheney:
If you have to wait for the result of an operation, it’s easier to do it yourself.
Release locks and semaphores in the reverse order you acquired them.
Channels aren’t resources like files or sockets, you don’t need to close them to free them.
Acquire semaphores when you’re ready to use them.
Avoid mixing anonymous functions and goroutines
Before you start a goroutine, always know when, and how, it will stop

What am I missing on concurrency?

I have a very simple script that makes a get request and then does some thing with the response. I have 2 version one using a go routine and one without I bencharmaked both and there was no difference in speed. Here is a dumb down version of what I'm doing:
Regular Version:
func main() {
url := "http://finance.yahoo.com/q?s=aapl"
for i := 0; i < 250; i++ {
resp, err := http.Get(url)
if err != nil {
fmt.Println(err)
}
fmt.Println(resp.Status)
}
}
Go Routine:
func main() {
url := "http://finance.yahoo.com/q?s=aapl"
for i := 0; i < 250; i++ {
wg.Add(1)
go run(url, &wg)
wg.Wait()
}
}
func run(url string, wg *sync.WaitGroup) {
defer wg.Done()
resp, err := http.Get(url)
if err != nil {
fmt.Println(err)
}
fmt.Println(resp.Status)
}
In most cases when I used a go routine the program took longer to execute. What concept am I missing to understand using concurrency efficiently?
The main problem with your example is that you're calling wg.Wait() within the for loop. This causes execution to block until you the deferred wg.Done() call inside of run. As a result, the execution isn't concurrent, it happens in a goroutine but you block after starting goroutine i and before starting i+1. If you place that statement after the loop instead like below then your code won't block until after the loop (all goroutines have been started, some may have already completed).
func main() {
url := "http://finance.yahoo.com/q?s=aapl"
for i := 0; i < 250; i++ {
wg.Add(1)
go run(url, &wg)
// wg.Wait() don't wait here cause it serializes execution
}
wg.Wait() // wait here, now that all goroutines have been started
}

Resources