I wrote a golang script to scan for open ports and use sync.WaitGourp to control the number of goroutines.
When the goroutine is too large, such as 2000, the result is different from 1000.
Similar to exiting early. code show as below
func worker(wg *sync.WaitGroup) {
for job := range jobs {
_, err := net.DialTimeout("tcp", fmt.Sprintf("%s:%d", job.host, job.port), time.Millisecond*1500)
if err != nil {
results <- Result{job, false}
} else {
results <- Result{job, true}
}
}
wg.Done()
}
func main() {
go func() {
for i := 1; i < 65535; i++ {
jobs <- Job{host, i}
}
close(jobs)
}()
go func() {
for result := range results {
if result.status {
fmt.Println(result.job, "open")
}
}
}()
wg := sync.WaitGroup{}
for i := 1; i < 1000; i++ {
wg.Add(1)
go worker(&wg)
}
wg.Wait()
}
when 1000
{127.0.0.1 80} open
{127.0.0.1 631} open
{127.0.0.1 3306} open
{127.0.0.1 6379} open
{127.0.0.1 33060} open
when 2000
{127.0.0.1 80} open
{127.0.0.1 631} open
I want 2000 to output all ports like 1000
You do not wait for the two "non-worker" goroutines in main, so as soon as wg.Wait() there returns, the process shuts down, tearing down any outstanding goroutines.
Since one of them is processing the results, this appears to you as if not all the tasks were processed (and this is true).
Close the results channel when workers are done. Process the results in the main goroutine.
wg := sync.WaitGroup{}
for i := 1; i < 1000; i++ {
wg.Add(1)
go worker(&wg)
}
go func() {
for i := 1; i < 65535; i++ {
jobs <- Job{host, i}
}
// No more jobs, exit from worker loops.
close(jobs)
// Wait for workers to write all results and exit.
wg.Wait()
// No more results, exit from main loop.
close(results)
}()
for result := range results {
if result.status {
fmt.Println(result.job, "open")
}
}
View the complete program on the GoLang PlayGround.
Related
I'm writing an app using Go that is interacting with Spotify's API and I find myself needing to use an infinite for loop to call an endpoint until the length of the returned slice is less than the limit, signalling that I've reached the end of the available entries.
For my user account, there are 1644 saved albums (I determined this by looping through without using goroutines). However, when I add goroutines in, I'm getting back 2544 saved albums with duplicates. I'm also using the semaphore pattern to limit the number of goroutines so that I don't exceed the rate limit.
I assume that the issue is with using the active variable rather than channels, but my attempt at that just resulted in an infinite loop
wg := &sync.WaitGroup{}
sem := make(chan bool, 20)
active := true
offset := 0
for {
sem <- true
if active {
// add each new goroutine to waitgroup
wg.Add(1)
go func() error {
// remove from waitgroup when goroutine is complete
defer wg.Done()
// release the worker
defer func() { <-sem }()
savedAlbums, err := client.CurrentUsersAlbums(ctx, spotify.Limit(50), spotify.Offset(offset))
if err != nil {
return err
}
userAlbums = append(userAlbums, savedAlbums.Albums...)
if len(savedAlbums.Albums) < 50 {
// since the limit is set to 50, we know that if the number of returned albums
// is less than 50 that we're done retrieving data
active = false
return nil
} else {
offset += 50
return nil
}
}()
} else {
wg.Wait()
break
}
}
Thanks in advance!
I suspect that your main issue may be a misunderstanding of what the go keyword does; from the docs:
A "go" statement starts the execution of a function call as an independent concurrent thread of control, or goroutine, within the same address space.
So go func() error { starts the execution of the closure; it does not mean that any of the code runs immediately. In fact because, client.CurrentUsersAlbums will take a while, it's likely you will be requesting the first 50 items 20 times. This can be demonstrated with a simplified version of your application (playground)
func main() {
wg := &sync.WaitGroup{}
sem := make(chan bool, 20)
active := true
offset := 0
for {
sem <- true
if active {
// add each new goroutine to waitgroup
wg.Add(1)
go func() error {
// remove from waitgroup when goroutine is complete
defer wg.Done()
// release the worker
defer func() { <-sem }()
fmt.Println("Getting from:", offset)
time.Sleep(time.Millisecond) // Simulate the query
// Pretend that we got back 50 albums
offset += 50
if offset > 2000 {
active = false
}
return nil
}()
} else {
wg.Wait()
break
}
}
}
Running this will produce somewhat unpredictable results (note that the playground caches results so try it on your machine) but you will probably see 20 X Getting from: 0.
A further issue is data races. Updating a variable from multiple goroutines without protection (e.g. sync.Mutex) results in undefined behaviour.
You will want to know how to fix this but unfortunately you will need to rethink your algorithm. Currently the process you are following is:
Set pos to 0
Get 50 records starting from pos
If we got 50 records then pos=pos+50 and loop back to step 2
This is a sequential algorithm; you don't know whether you have all of the data until you have requested the previous section. I guess you could make speculative queries (and handle failures) but a better solution would be to find some way to determine the number of results expected and then split the queries to get that number of records between multiple goroutines.
Note that if you do know the number of responses then you can do something like the following (playground):
noOfResultsToGet := 1644 // In the below we are getting 0-1643
noOfResultsPerRequest := 50
noOfSimultaneousRequests := 20 // You may not need this but many services will limit the number of simultaneous requests you can make (or, at least, rate limit them)
requestChan := make(chan int) // Will be passed the starting #
responseChan := make(chan []string) // Response from whatever request we are making (can be any type really)
// Start goroutines to make the requests
var wg sync.WaitGroup
wg.Add(noOfSimultaneousRequests)
for i := 0; i < noOfSimultaneousRequests; i++ {
go func(routineNo int) {
defer wg.Done()
for startPos := range requestChan {
// Simulate making the request
maxResult := startPos + noOfResultsPerRequest
if maxResult > noOfResultsToGet {
maxResult = noOfResultsToGet
}
rsp := make([]string, 0, noOfResultsPerRequest)
for x := startPos; x < maxResult; x++ {
rsp = append(rsp, strconv.Itoa(x))
}
responseChan <- rsp
fmt.Printf("Goroutine %d handling data from %d to %d\n", routineNo, startPos, startPos+noOfResultsPerRequest)
}
}(i)
}
// Close the response channel when all goroutines have shut down
go func() {
wg.Wait()
close(responseChan)
}()
// Send the requests
go func() {
for reqFrom := 0; reqFrom < noOfResultsToGet; reqFrom += noOfResultsPerRequest {
requestChan <- reqFrom
}
close(requestChan) // Allow goroutines to exit
}()
// Receive responses (note that these may be out of order)
result := make([]string, 0, noOfResultsToGet)
for x := range responseChan {
result = append(result, x...)
}
// Order the results and output (results from gorouting may come back in any order)
sort.Slice(result, func(i, j int) bool {
a, _ := strconv.Atoi(result[i])
b, _ := strconv.Atoi(result[j])
return a < b
})
fmt.Printf("Result: %v", result)
Relying on channels to pass messages often makes this kind of thing easier to think about and reduces the chance that you will make a mistake.
Set offset as an args -> go func(offset int) error {.
Increment offset by 50 after calling go func
Change active type to chan bool
To avoid data race on userAlbums = append(userAlbums, res...). We need to create channel that same type as userAlbums, then run for loop inside goroutine, then send the results to that channel.
this is the example : https://go.dev/play/p/yzk8qCURZFC
if applied to your code :
wg := &sync.WaitGroup{}
worker := 20
active := make(chan bool, worker)
for i := 0; i < worker; i++ {
active <- true
}
// I assume the type of userAlbums is []string
resultsChan := make(chan []string, worker)
go func() {
offset := 0
for {
if <-active {
// add each new goroutine to waitgroup
wg.Add(1)
go func(offset int) error {
// remove from waitgroup when goroutine is complete
defer wg.Done()
savedAlbums, err := client.CurrentUsersAlbums(ctx, spotify.Limit(50), spotify.Offset(offset))
if err != nil {
// active <- false // maybe you need this
return err
}
resultsChan <- savedAlbums.Albums
if len(savedAlbums.Albums) < 50 {
// since the limit is set to 50, we know that if the number of returned albums
// is less than 50 that we're done retrieving data
active <- false
return nil
} else {
active <- true
return nil
}
}(offset)
offset += 50
} else {
wg.Wait()
close(resultsChan)
break
}
}
}()
for res := range resultsChan {
userAlbums = append(userAlbums, res...)
}
I am new to Go and am currently attempting to run a function that creates a file and returns it's filename and have this run concurrently.
I've decided to try and accomplish this with goroutines and a WaitGroup. When I use this approach, I end up with a list size that is a couple hundred files less than the input size. E.g. for 5,000 files I get around 4,700~ files created.
I believe this is due to some race conditions:
wg := sync.WaitGroup{}
filenames := make([]string, 0)
for i := 0; i < totalFiles; i++ {
wg.Add(1)
go func() {
defer wg.Done()
filenames = append(filenames, createFile())
}()
}
wg.Wait()
return filenames, nil
Don't communicate by sharing memory; share memory by communicating.
I tried using channels to "share memory by communicating". Whenever I do this, there appears to be a deadlock that I can't seem to wrap my head around why. Would anyone be able to point me in the right direction for using channels and waitgroups together properly in order to save all of the created files to a shared data structure?
This is the code that produces the deadlock for me (fatal error: all goroutines are asleep - deadlock!):
wg := sync.WaitGroup{}
filenames := make([]string, 0)
ch := make(chan string)
for i := 0; i < totalFiles; i++ {
wg.Add(1)
go func() {
defer wg.Done()
ch <- createFile()
}()
}
wg.Wait()
for i := range ch {
filenames = append(filenames, i)
}
return filenames, nil
Thanks!
The first one has a race. You have to protect access to filenames:
mu:=sync.Mutex{}
for i := 0; i < totalFiles; i++ {
wg.Add(1)
go func() {
defer wg.Done()
mu.Lock()
defer mu.Unlock()
filenames = append(filenames, createFile())
}()
}
For the second case, you are waiting for the goroutines to finish, but goroutines can only finish once you read from the channel, so deadlock. You can fix it by reading from the channel in a separate goroutine.
go func() {
for i := range ch {
filenames = append(filenames, i)
}
}()
wg.Wait()
close(ch) // Required, so the goroutine can terminate
return filenames, nil
There is a lock-free version, if the number of files is fixed:
filenames := make([]string, totalFiles)
for i := 0; i < totalFiles; i++ {
wg.Add(1)
go func(index int) {
defer wg.Done()
filenames[index]=createFile()
}(i)
}
wg.Wait()
I'm learning Go and I am trying to implement a job queue.
What I'm trying to do is:
Have the main goroutine feed lines through a channel for multiple parser workers (that parse a line to s struct), and have each parser send the struct to a channel of structs that other workers (goroutines) will process (send to database, etc).
The code looks like this:
lineParseQ := make(chan string, 5)
jobProcessQ := make(chan myStruct, 5)
doneQ := make(chan myStruct, 5)
fileName := "myfile.csv"
file, err := os.Open(fileName)
if err != nil {
log.Fatal(err)
}
defer file.Close()
reader := bufio.NewReader(file)
// Start line parsing workers and send to jobProcessQ
for i := 1; i <= 2; i++ {
go lineToStructWorker(i, lineParseQ, jobProcessQ)
}
// Process myStruct from jobProcessQ
for i := 1; i <= 5; i++ {
go WorkerProcessStruct(i, jobProcessQ, doneQ)
}
lineCount := 0
countSend := 0
for {
line, err := reader.ReadString('\n')
if err != nil && err != io.EOF {
log.Fatal(err)
}
if err == io.EOF {
break
}
lineCount++
if lineCount > 1 {
countSend++
lineParseQ <- line[:len(line)-1] // Avoid last char '\n'
}
}
for i := 0; i < countSend; i++ {
fmt.Printf("Received %+v.\n", <-doneQ)
}
close(doneQ)
close(jobProcessQ)
close(lineParseQ)
Here's a simplified playground: https://play.golang.org/p/yz84g6CJraa
the workers look like this:
func lineToStructWorker(workerID int, lineQ <-chan string, strQ chan<- myStruct ) {
for j := range lineQ {
strQ <- lineToStruct(j) // just parses the csv to a struct...
}
}
func WorkerProcessStruct(workerID int, strQ <-chan myStruct, done chan<- myStruct) {
for a := range strQ {
time.Sleep(time.Millisecond * 500) // fake long operation...
done <- a
}
}
I know the problem is related to the "done" channel because if I don't use it, there's no error, but I can't figure out how to fix it.
You don't start reading from doneQ until you've finished sending all the lines to lineParseQ, which is more lines than there is buffer space. So once the doneQ buffer is full, that send blocks, which starts filling the lineParseQ buffer, and once that's full, it deadlocks. Move either the loop sending to lineParseQ, the loop reading from doneQ, or both, to separate goroutine(s), e.g.:
go func() {
for _, line := range lines {
countSend++
lineParseQ <- line
}
close(lineParseQ)
}()
This will still deadlock at the end, because you've got a range over a channel and the close after it in the same goroutine; since range continues until the channel is closed, and the close comes after the range finishes, you still have a deadlock. You need to put the closes in appropriate places; that being, either in the sending routine, or blocked on a WaitGroup monitoring the sending routines if there are multiple senders for a given channel.
// Start line parsing workers and send to jobProcessQ
wg := new(sync.WaitGroup)
for i := 1; i <= 2; i++ {
wg.Add(1)
go lineToStructWorker(i, lineParseQ, jobProcessQ, wg)
}
// Process myStruct from jobProcessQ
for i := 1; i <= 5; i++ {
go WorkerProcessStruct(i, jobProcessQ, doneQ)
}
countSend := 0
go func() {
for _, line := range lines {
countSend++
lineParseQ <- line
}
close(lineParseQ)
}()
go func() {
wg.Wait()
close(jobProcessQ)
}()
for a := range doneQ {
fmt.Printf("Received %v.\n", a)
}
// ...
func lineToStructWorker(workerID int, lineQ <-chan string, strQ chan<- myStruct, wg *sync.WaitGroup) {
for j := range lineQ {
strQ <- lineToStruct(j) // just parses the csv to a struct...
}
wg.Done()
}
func WorkerProcessStruct(workerID int, strQ <-chan myStruct, done chan<- myStruct) {
for a := range strQ {
time.Sleep(time.Millisecond * 500) // fake long operation...
done <- a
}
close(done)
}
Full working example here: https://play.golang.org/p/XsnewSZeb2X
Coordinate the pipeline with sync.WaitGroup breaking each piece into stages. When you know one piece of the pipeline is complete (and no one is writing to a particular channel), close the channel to instruct all "workers" to exit e.g.
var wg sync.WaitGroup
for i := 1; i <= 5; i++ {
i := i
wg.Add(1)
go func() {
Worker(i)
wg.Done()
}()
}
// wg.Wait() signals the above have completed
Buffered channels are handy to handle burst workloads, but sometimes they are used to avoid deadlocks in poor designs. If you want to avoid running certain parts of your pipeline in a goroutine you can buffer some channels (matching the number of workers typically) to avoid a blockage in your main goroutine.
If you have dependent pieces that read & write and want to avoid deadlock - ensure they are in separate goroutines. Having all parts of the pipeline it its own goroutine will even remove the need for buffered channels:
// putting all channel work into separate goroutines
// removes the need for buffered channels
lineParseQ := make(chan string, 0)
jobProcessQ := make(chan myStruct, 0)
doneQ := make(chan myStruct, 0)
Its a tradeoff of course - a goroutine costs about 2K in resources - versus a buffered channel which is much less. As with most designs it depends on how it is used.
Also don't get caught by the notorious Go for-loop gotcha, so use a closure assignment to avoid this:
for i := 1; i <= 5; i++ {
i := i // new i (not the i above)
go func() {
myfunc(i) // otherwise all goroutines will most likely get '5'
}()
}
Finally ensure you wait for all results to be processed before exiting.
It's a common mistake to return from a channel based function and believe all results have been processed. In a service this will eventually be true. But in a standalone executable the processing loop may still be working on results.
go func() {
wgW.Wait() // waiting on worker goroutines to finish
close(doneQ) // safe to close results channel now
}()
// ensure we don't return until all results have been processed
for a := range doneQ {
fmt.Printf("Received %v.\n", a)
}
by processing the results in the main goroutine, we ensure we don't return prematurely without having processed everything.
Pulling it all together:
https://play.golang.org/p/MjLpQ5xglP3
I've been experimenting with goroutines and channels, and I wanted to test the WaitGroup feature. Here I'm trying to execute an HTTP flood job, where the parent thread spawns a lot of goroutines which will make infinite requests, unless receiving a stop message:
func (hf *HTTPFlood) Run() {
childrenStop := make(chan int, hf.ConcurrentCalls)
stop := false
totalRequests := 0
requestsChan := make(chan int)
totalErrors := 0
errorsChan := make(chan int)
var wg sync.WaitGroup
for i := 0; i < hf.ConcurrentCalls; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for {
select {
case <-childrenStop:
fmt.Printf("stop child\n")
return
default:
_, err := Request(hf.Victim.String())
requestsChan <- 1
if err != nil {
errorsChan <- 1
}
}
}
}()
}
timeout := time.NewTimer(time.Duration(MaximumJobTime) * time.Second)
for !stop {
select {
case req := <- requestsChan:
totalReq += req
case err := <- errorsChan:
totalErrors += err
case <- timeout.C:
fmt.Printf("%s timed up\n", hf.Victim.String())
for i := 0; i < hf.ConcurrentCalls; i++ {
childrenStop <- 1
}
close(childrenStop)
stop = true
break
}
}
fmt.Printf("waiting\n")
wg.Wait()
fmt.Printf("after wait\n")
close(requestsChan)
close(errorsChan)
fmt.Printf("end\n")
}
Once timeout is fired, the parent thread successfully exits the loop and reaches the Wait instruction, but even though the stopChildren channel is filled, the child goroutines seem to never receive messages on the stopChildren channel.
What am I missing?
EDIT:
So the issue obviously was how the channels and its sends/receives were managed.
First of all the childrenStop channel was closed before all childs had received the message. The channel should be closed after the Wait
On the other hand, since no reads were done neither on requestsChan nor errorsChan once the parent thread sends the stop signal, most of the childs stayed blocked sending on these two channels. I tried to keep reading in the parent thread, outside the loop just before the Wait but that didn't work so I switched the implementation to Atomic counters which seem to be a more suitable way to manage this specific use case.
func (hf *HTTPFlood) Run() {
childrenStop := make(chan int, hf.ConcurrentCalls)
var totalReq uint64
var totalErrors uint64
var wg sync.WaitGroup
for i := 0; i < hf.ConcurrentCalls; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for {
select {
case <-childrenStop:
fmt.Printf("stop child\n")
return
default:
_, err := Request(hf.Victim.String())
atomic.AddUint64(&totalReq, 1)
if err != nil {
atomic.AddUint64(&totalErrors, 1)
}
}
}
}()
}
timeout := time.NewTimer(time.Duration(MaximumJobTime) * time.Second)
<- timeout.C
fmt.Printf("%s timed up\n", hf.Victim.String())
for i := 0; i < hf.ConcurrentCalls; i++ {
childrenStop <- 1
}
fmt.Printf("waiting\n")
wg.Wait()
fmt.Printf("after wait\n")
close(childrenStop)
fmt.Printf("end\n")
}
Your go routines can be is blocked at requestsChan <- 1.
case <- timeout.C:
fmt.Printf("%s timed up\n", hf.Victim.String())
for i := 0; i < hf.ConcurrentCalls; i++ {
childrenStop <- 1
}
close(childrenStop)
stop = true
break
Here you are sending a number to childrenStop and expect the go routines to receive it. But while you are sending the childrenStop signal, your routines could have sent something on requestsChan. But as you break from the loop after sending the close signals, there's no one listening on requestsChan to receive.
You can confirm this by printing something just before and after requestsChan <- 1 to confirm the behaviour.
A channel will block when you send something on it while no one is receiving on the other end
Here is a possible modification.
package main
import (
"fmt"
"time"
)
func main() {
requestsChan := make(chan int)
done := make(chan chan bool)
for i := 0; i < 5; i++ {
go func(it int) {
for {
select {
case c := <-done:
c <- true
return
default:
requestsChan <- it
}
}
}(i)
}
max := time.NewTimer(1 * time.Millisecond)
allChildrenDone := make(chan bool)
childrenDone := 0
childDone := make(chan bool)
go func() {
for {
select {
case i := <-requestsChan:
fmt.Printf("received %d;", i)
case <-max.C:
fmt.Println("\nTimeup")
for i := 0; i < 5; i++ {
go func() {
done <- childDone
fmt.Println("sent done")
}()
}
case <-childDone:
childrenDone++
fmt.Println("child done ", childrenDone)
if childrenDone == 5 {
allChildrenDone <- true
return
}
}
}
}()
fmt.Println("Waiting")
<-allChildrenDone
}
Thing to note here is that am sending the close signal in go routines so that the loop can continue while i wait for all the children to have cleanly exit.
Please watch this talk by Rob Pike which covers these details clearly.
[Edit]: The previous code would have resulted in a running routine after exiting.
I'm tying to execute things async with multiple go routines. I pass in the number of "threads" to use to process the files async. The files is an array of Strings to process.
queue := make(chan string)
threadCount := c.Int("threads")
if c.Int("threads") < len(files) {
threadCount = len(files)
}
log.Infof("Starting %i processes", c.Int("threads"))
for i := 0; i < threadCount; i++ {
go renderGoRoutine(queue)
}
for _, f := range files {
queue <- f
}
close(queue)
And the routine itself looks like this:
func renderGoRoutine(queue chan string) {
for file := range queue {
// do some heavy lifting stuff with the file
}
}
This does work fine whenever i use just one thread. As soon as i take more then one it does exit/leave the scope before it is done with all the go routines.
How do I make it process everything?
Previous question: Using a channel for dispatching tasks to go routine
Using WaitGroups is an option.
At the beginning, you add number tasks into WaitGroup and after each task is done decrement counter in the WaitGroup. Wait until all tasks are finished at the end of your code flow.
See the example: https://godoc.org/sync#WaitGroup
Your code will look like this:
queue := make(chan string)
wg := sync.WaitGroup{}
wg.Add(len(files))
threadCount := c.Int("threads")
if c.Int("threads") < len(files) {
threadCount = len(files)
}
log.Infof("Starting %i processes", c.Int("threads"))
for i := 0; i < threadCount; i++ {
go renderGoRoutine(queue)
}
for _, f := range files {
queue <- f
}
close(queue)
wg.Wait()
renderGoRoutine:
func renderGoRoutine(queue chan string) {
for file := range queue {
// do some heavy lifting stuff with the file
// decrement the waitGroup counter
wg.Done()
}
}
You are using the channel to publish work to be done. As soon as the last item is taken from the queue (not finished processing), your program exits.
You could use a channel to write to at the end of renderGoRoutine to signal the end of processing.
At the top:
sync := make(chan bool)
In renderGoRoutine at the end (Assuming it is in the same file):
sync <- true
At the bottom:
for f := range sync {
<- sync
}
Now your program waits until the number of files are processed.
Or to have a complete example:
queue := make(chan string)
sync := make(chan bool)
threadCount := c.Int("threads")
if c.Int("threads") < len(files) {
threadCount = len(files)
}
log.Infof("Starting %i processes", c.Int("threads"))
for i := 0; i < threadCount; i++ {
go renderGoRoutine(queue)
}
for _, f := range files {
queue <- f
}
close(queue)
for f := range sync {
<- sync
}
And the routine should be changed like this:
func renderGoRoutine(queue chan string) {
for file := range queue {
// do some heavy lifting stuff with the file
sync <- true
}
}
I did forget to wait for all tasks to finish. This can simply be done by waiting for all loops to end. Since close(channel) does end the for range channel a simple sync with a channel can be used like so:
sync := make(chan bool)
queue := make(chan string)
threadCount := c.Int("threads")
if c.Int("threads") < len(files) {
threadCount = len(files)
}
log.Infof("Starting %i processes", c.Int("threads"))
for i := 0; i < threadCount; i++ {
go renderGoRoutine(queue)
}
for _, f := range files {
queue <- f
}
close(queue)
for i := 0; i < threadCount; i++ {
<- sync
}
And last but not least write to the channel whenever a iteration is stopped.
func renderGoRoutine(queue chan string) {
for file := range queue { //whatever is done here
}
sync <- true
}