Closure inconsistency between parallel and sequential execution - go

I have attempted to write a generic function that can execute functions in parallel or sequentially. While testing it, I have found some very unexpected behavior regarding closures. In the code below, I define a list of functions that accept no parameters and return an error. The functions also use a for loop variable in a closure but I'm using the trick of defining a new variable within the loop in an attempt to avoid capture.
I'm expecting that I can call these functions sequentially or concurrently with the same effect but I'm seeing different results. It's as if the closure variable is being captured but only when run concurrently.
As far as I can tell, this is not the usual case of capturing a loop variable. As I mentioned, I'm defining a new variable within the loop. Also, I'm not running the closure function within the loop. I'm generating a list of functions within the loop but I'm executing the functions after the loop.
I'm using go version go1.8.3 linux/amd64.
package closure_test
import (
"sync"
"testing"
)
// MergeErrors merges multiple channels of errors.
// Based on https://blog.golang.org/pipelines.
func MergeErrors(cs ...<-chan error) <-chan error {
var wg sync.WaitGroup
out := make(chan error)
// Start an output goroutine for each input channel in cs. output
// copies values from c to out until c is closed, then calls wg.Done.
output := func(c <-chan error) {
for n := range c {
out <- n
}
wg.Done()
}
wg.Add(len(cs))
for _, c := range cs {
go output(c)
}
// Start a goroutine to close out once all the output goroutines are
// done. This must start after the wg.Add call.
go func() {
wg.Wait()
close(out)
}()
return out
}
// WaitForPipeline waits for results from all error channels.
// It returns early on the first error.
func WaitForPipeline(errs ...<-chan error) error {
errc := MergeErrors(errs...)
for err := range errc {
if err != nil {
return err
}
}
return nil
}
func RunInParallel(funcs ...func() error) error {
var errcList [](<-chan error)
for _, f := range funcs {
errc := make(chan error, 1)
errcList = append(errcList, errc)
go func() {
err := f()
if err != nil {
errc <- err
}
close(errc)
}()
}
return WaitForPipeline(errcList...)
}
func RunSequentially(funcs ...func() error) error {
for _, f := range funcs {
err := f()
if err != nil {
return err
}
}
return nil
}
func validateOutputChannel(t *testing.T, out chan int, n int) {
m := map[int]bool{}
for i := 0; i < n; i++ {
m[<-out] = true
}
if len(m) != n {
t.Errorf("Output channel has %v unique items; wanted %v", len(m), n)
}
}
// This fails because j is being captured.
func TestClosure1sp(t *testing.T) {
n := 4
out := make(chan int, n*2)
var funcs [](func() error)
for i := 0; i < n; i++ {
j := i // define a new variable that has scope only inside the current loop iteration
t.Logf("outer i=%v, j=%v", i, j)
f := func() error {
t.Logf("inner i=%v, j=%v", i, j)
out <- j
return nil
}
funcs = append(funcs, f)
}
t.Logf("Running funcs sequentially")
if err := RunSequentially(funcs...); err != nil {
t.Fatal(err)
}
validateOutputChannel(t, out, n)
t.Logf("Running funcs in parallel")
if err := RunInParallel(funcs...); err != nil {
t.Fatal(err)
}
close(out)
validateOutputChannel(t, out, n)
}
Below is the output from the test function above.
closure_test.go:91: outer i=0, j=0
closure_test.go:91: outer i=1, j=1
closure_test.go:91: outer i=2, j=2
closure_test.go:91: outer i=3, j=3
closure_test.go:99: Running funcs sequentially
closure_test.go:93: inner i=4, j=0
closure_test.go:93: inner i=4, j=1
closure_test.go:93: inner i=4, j=2
closure_test.go:93: inner i=4, j=3
closure_test.go:104: Running funcs in parallel
closure_test.go:93: inner i=4, j=3
closure_test.go:93: inner i=4, j=3
closure_test.go:93: inner i=4, j=3
closure_test.go:93: inner i=4, j=3
closure_test.go:80: Output channel has 1 unique items; wanted 4
Any ideas? Is this a bug in Go?

Always run your tests with -race. In your case, you forgot to recreate f on each iteration in RunInParallel:
func RunInParallel(funcs ...func() error) error {
var errcList [](<-chan error)
for _, f := range funcs {
f := f // << HERE
errc := make(chan error, 1)
errcList = append(errcList, errc)
go func() {
err := f()
if err != nil {
errc <- err
}
close(errc)
}()
}
return WaitForPipeline(errcList...)
}
As a result, you always launched the last f instead of each one.

I believe your problem lies in your RunInParallel func.
func RunInParallel(funcs ...func() error) error {
var errcList [](<-chan error)
for _, f := range funcs {
errc := make(chan error, 1)
errcList = append(errcList, errc)
go func() {
// This line probably isn't being reached until your range
// loop has completed, meaning f is the last func by the time
// each goroutine starts. If you capture f
// in another variable inside the range, you won't have this issue.
err := f()
if err != nil {
errc <- err
}
close(errc)
}()
}
return WaitForPipeline(errcList...)
}
You could also pass f as a parameter to your anonymous function to avoid this issue.
for _, f := range funcs {
errc := make(chan error, 1)
errcList = append(errcList, errc)
go func(g func() error) {
err := g()
if err != nil {
errc <- err
}
close(errc)
}(f)
}
Here is a live example in the playground.

Related

Concurrency not running any faster [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I have written a code, tried to use concurrency but it's not helping to run any faster. How can I improve that?
package main
import (
"bufio"
"fmt"
"os"
"strings"
"sync"
)
var wg sync.WaitGroup
func checkerr(e error) {
if e != nil {
fmt.Println(e)
}
}
func readFile() {
file, err := os.Open("data.txt")
checkerr(err)
fres, err := os.Create("resdef.txt")
checkerr(err)
defer file.Close()
defer fres.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
wg.Add(1)
go func() {
words := strings.Fields(scanner.Text())
shellsort(words)
writeToFile(fres, words)
wg.Done()
}()
wg.Wait()
}
}
func shellsort(words []string) {
for inc := len(words) / 2; inc > 0; inc = (inc + 1) * 5 / 11 {
for i := inc; i < len(words); i++ {
j, temp := i, words[i]
for ; j >= inc && strings.ToLower(words[j-inc]) > strings.ToLower(temp); j -= inc {
words[j] = words[j-inc]
}
words[j] = temp
}
}
}
func writeToFile(f *os.File, words []string) {
datawriter := bufio.NewWriter(f)
for _, s := range words {
datawriter.WriteString(s + " ")
}
datawriter.WriteString("\n")
datawriter.Flush()
}
func main() {
readFile()
}
Everything works well except that it take the same time to do everything as without concurrency.
You must place wg.Wait() after the for loop:
for condition {
wg.Add(1)
go func() {
// a concurrent job here
wg.Done()
}()
}
wg.Wait()
Note: the work itself should have a concurrent nature.
Here is my tested solution - read from the input file sequentially then do n concurrent tasks and finally write to the output file sequentially in order, try this:
package main
import (
"bufio"
"fmt"
"log"
"os"
"runtime"
"sort"
"strings"
"sync"
)
type sortQueue struct {
index int
data []string
}
func main() {
n := runtime.NumCPU()
a := make(chan sortQueue, n)
b := make(chan sortQueue, n)
var wg sync.WaitGroup
for i := 0; i < n; i++ {
wg.Add(1)
go parSort(a, b, &wg)
}
go func() {
file, err := os.Open("data.txt")
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
i := 0
for scanner.Scan() {
a <- sortQueue{index: i, data: strings.Fields(scanner.Text())}
i++
}
close(a)
err = scanner.Err()
if err != nil {
log.Fatal(err)
}
}()
fres, err := os.Create("resdef.txt")
if err != nil {
log.Fatal(err)
}
defer fres.Close()
go func() {
wg.Wait()
close(b)
}()
writeToFile(fres, b, n)
}
func writeToFile(f *os.File, b chan sortQueue, n int) {
m := make(map[int][]string, n)
order := 0
for v := range b {
m[v.index] = v.data
var slice []string
exist := true
for exist {
slice, exist = m[order]
if exist {
delete(m, order)
order++
s := strings.Join(slice, " ")
fmt.Println(s)
_, err := f.WriteString(s + "\n")
if err != nil {
log.Fatal(err)
}
}
}
}
}
func parSort(a, b chan sortQueue, wg *sync.WaitGroup) {
defer wg.Done()
for q := range a {
sort.Slice(q.data, func(i, j int) bool { return q.data[i] < q.data[j] })
b <- q
}
}
data.txt file:
1 2 0 3
a 1 b d 0 c
aa cc bb
Output:
0 1 2 3
0 1 a b c d
aa bb cc
You're not parallelizing anything, because for every call to wg.Add(1) you have matching call to wg.Wait(). It's one-to-one: You spawn a Go routine, and then immediately block the main Go routine waiting for the newly spawned routine to finish.
The point of a WaitGroup is to wait for many things to finish, with a single call to wg.Wait() when all the Go routines have been spawned.
However, in addition to fixing your call to wg.Wait, you need to control concurrent access to your scanner. One approach to this might be to use a channel for your scanner to emit lines of text to waiting Go routines:
lines := make(chan string)
go func() {
for line := range lines {
go func(line string) {
words := strings.Fields(line)
shellsort(words)
writeToFile(fres, words)
}(line)
}
}()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines <- scanner.Text()
}
close(lines)
Note that this may lead to garbled output in your file, as you have many concurrent Go routines all writing their results at the same time. You can control output through a second channel:
lines := make(chan string)
out := make(chan []string)
go func() {
for line := range lines {
go func(line string) {
words := strings.Fields(line)
shellsort(words)
out <- words
}(line)
}
}()
go func() {
for words := range out {
writeToFile(fres, words)
}
}()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines <- scanner.Text()
}
close(lines)
close(out)
At this point, you can refactor into a "reader", a "processor" and a "writer", which form a pipeline that communicates via channels.
The reader and writer use a single go routine to prevent concurrent access to a resource, while the processor spawns many go routines (currently unbounded) to "fan out" the work across many processors:
package main
import (
"bufio"
"os"
"strings"
)
func main() {
lines := reader()
out := processor(lines)
writer(out)
}
func reader() chan<- string {
lines := make(chan string)
file, err := os.Open("data.txt")
checkerr(err)
go func() {
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines <- scanner.Text()
}
close(lines)
}()
return lines
}
func processor(lines chan<- string) chan []string {
out := make(chan []string)
go func() {
for line := range lines {
go func(line string) {
words := strings.Fields(line)
shellsort(words)
out <- words
}(line)
}
close(out)
}()
return out
}
func writer(out chan<- []string) {
fres, err := os.Create("resdef.txt")
checkerr(err)
for words := range out {
writeToFile(fres, words)
}
}
As other answers have said, by waiting on the WaitGroup each loop iteration, you're limiting your concurrency to 1 (no concurrency). There are a number of ways to solve this, but what's correct depends entirely on what is taking time, and that hasn't been shown in the question. Concurrency doesn't magically make things faster; it just lets things happen at the same time, which only makes things faster if things that take a lot of time can happen concurrently.
Presumably, in your code, the thing that takes a long time is the sort. If that is the case, you could do something like this:
results := make(chan []string)
for scanner.Scan() {
wg.Add(1)
go func(line string) {
words := strings.Fields(line)
shellsort(words)
result <- words
}(scanner.Text())
}
go func() {
wg.Wait()
close(results)
}()
for words := range results {
writeToFile(fres, words)
}
This moves the Wait to where it should be, and avoids concurrent use of the scanner and writer. This should be faster than serial processing, if the sort is taking a significant amount of processing time.

Slice automatically be sorted?

While I want to create my own pipeline to practice with goroutines, there's something particularly weird.
I use the random perm function to generate some int numbers, randomly obviously, I write them to IO writer and then read them form IO reader, cuz its binary source so I print them and they are sorted!!
Here's the code:
func RandomSource(tally int) chan int {
out := make(chan int)
sli := rand.Perm(tally)
fmt.Println(sli)
go func() {
for num := range sli {
out <- num
}
close(out)
}()
return out
}
func ReaderSource(reader io.Reader) chan int {
out := make(chan int)
go func() {
buffer := make([]byte, 8)
for ; ; {
n, err := reader.Read(buffer)
if n > 0 {
v := int(binary.BigEndian.Uint64(buffer))
out <- v
}
if err != nil {
break
}
}
close(out)
}()
return out
}
func WriterSink(writer io.Writer, in chan int) {
for v := range in {
buffer := make([]byte, 8)
binary.BigEndian.PutUint64(
buffer, uint64(v))
writer.Write(buffer)
}
}
func main() {
fileName := "small.in"
file, err := os.Create(fileName)
if err != nil {
log.Fatal(err)
}
defer file.Close()
p := RandomSource(500)
WriterSink(file, p)
file, err = os.Open(fileName)
if err != nil {
log.Fatal(err)
}
defer file.Close()
p = ReaderSource(file)
for v := range p {
fmt.Println(v)
}
}
range returns an index as the first value for an array or slice, which always goes from 0 up to len - 1. Use for _, num := range sli { if you want to iterate over the values themselves rather than the set of indices.

Why does this for loop never exit?

There is the following code, the use of which leads to an infinite loop. The values ​​from the channel are correct, the value of the variable sum is also correct. All the goroutines end up without errors.
func responseHandler(w http.ResponseWriter, r *http.Request) {
var c = make(chan string)
for i := 0; i < 100; i++ {
url := fmt.Sprintf("someurl/page%v/etc", i)
go parse(url, i, c)
if i%5 == 0 {
time.Sleep(1000 * time.Millisecond)
}
}
for range c {
sum = append(sum, <-c)
}
fmt.Println("Exit from channel wait")
fmt.Fprintln(w, sum)
}
func parse(url string, num int, c chan string) {
response, err1 := http.Get(url)
if err1 != nil {
log.Fatal(err1)
}
defer response.Body.Close()
if response.StatusCode != 200 {
log.Fatalf("status code error: %d %s", response.StatusCode,
response.Status)
}
res, err := DecodeHTMLBody(response.Body, "windows-1251")
doc, err := goquery.NewDocumentFromReader(res)
if err != nil {
log.Fatal(err)
}
doc.Find(".b-advItem__content").Each(func(i int, s *goquery.Selection) {
title := strings.TrimSpace(s.Find(".someclass").Text())
price := strings.TrimSpace(s.Find(".someclass").Text())
formatPrice := parsePrice(price)
c <- fmt.Sprintf("output %d: %s:%s\n", i, title, formatPrice)
})
fmt.Printf("Channel %d - exit\n", num)
Sum - global []string.
The range statement over a channel exits only when the channel is closed (well, think about it: how the range would otherwise detect there's no more data to fetch?), and nothing closes the channel in your code.
func responseHandler(w http.ResponseWriter, r *http.Request) {
...
for range c {
sum = append(sum, <-c)
if len(aa) == 100 {
close(c)
}
fmt.Fprintln(w, sum)
}
func parse(...){
...
aa = append(aa, num)
}
Adding such a check allows you to exit the loop correctly

how to limit goroutine

I'm developing a gmail client based on google api.
I have a list of labels obtained through this call
r, err := s.gClient.Service.Users.Labels.List(s.gClient.User).Do()
Then, for every label I need to get details
for _, l := range r.Labels {
d, err := s.gClient.Service.Users.Labels.Get(s.gClient.User, l.Id).Do()
}
I'd like to handle the loop in a more powerful way so I have implemented a goroutine in the loop:
ch := make(chan label.Label)
for _, l := range r.Labels {
go func(gmailLabels *gmailclient.Label, gClient *gmail.Client, ch chan<- label.Label) {
d, err := s.gClient.Service.Users.Labels.Get(s.gClient.User, l.Id).Do()
if err != nil {
panic(err)
}
// Performs some operation with the label `d`
preparedLabel := ....
ch <- preparedLabel
}(l, s.gClient, ch)
}
for i := 0; i < len(r.Labels); i++ {
lab := <-ch
fmt.Printf("Processed %v\n", lab.LabelID)
}
The problem with this code is that gmail api has a rate limit, so, I get this error:
panic: googleapi: Error 429: Too many concurrent requests for user, rateLimitExceeded
What is the correct way to handle this situation?
How about only starting e.g. 10 goroutines and pass the values in from one for loop in another go routine. The channels have a small buffer to decrease synchronisation time.
chIn := make(chan label.Label, 20)
chOut := make(chan label.Label, 20)
for i:=0;i<10;i++ {
go func(gClient *gmail.Client, chIn chan label.Label, chOut chan<- label.Label) {
for gmailLabels := range chIn {
d, err := s.gClient.Service.Users.Labels.Get(s.gClient.User, l.Id).Do()
if err != nil {
panic(err)
}
// Performs some operation with the label `d`
preparedLabel := ....
chOut <- preparedLabel
}
}(s.gClient, chIn, chOut)
}
go func(chIn chan label.Label) {
defer close(chIn)
for _, l := range r.Labels {
chIn <- l
}
}(chIn)
for i := 0; i < len(r.Labels); i++ {
lab := <-chOut
fmt.Printf("Processed %v\n", lab.LabelID)
}
EDIT:
Here a playground sample.

write string to file in goroutine

I am using go routine in code as follow:
c := make(chan string)
work := make(chan string, 1000)
clvl := runtime.NumCPU()
for i := 0; i < clvl; i++ {
go func(i int) {
f, err := os.Create(fmt.Sprintf("/tmp/sample_match_%d.csv", i))
if nil != err {
panic(err)
}
defer f.Close()
w := bufio.NewWriter(f)
for jdId := range work {
for _, itemId := range itemIdList {
w.WriteString("test")
}
w.Flush()
c <- fmt.Sprintf("done %s", jdId)
}
}(i)
}
go func() {
for _, jdId := range jdIdList {
work <- jdId
}
close(work)
}()
for resp := range c {
fmt.Println(resp)
}
This is ok, but can I all go routine just write to one files? just like this:
c := make(chan string)
work := make(chan string, 1000)
clvl := runtime.NumCPU()
f, err := os.Create("/tmp/sample_match_%d.csv")
if nil != err {
panic(err)
}
defer f.Close()
w := bufio.NewWriter(f)
for i := 0; i < clvl; i++ {
go func(i int) {
for jdId := range work {
for _, itemId := range itemIdList {
w.WriteString("test")
}
w.Flush()
c <- fmt.Sprintf("done %s", jdId)
}
}(i)
}
This can not work, error : panic: runtime error: slice bounds out of range
The bufio.Writer type does not support concurrent access. Protect it with a mutex.
Because the short strings are flushed on every write, there's no point in using a bufio.Writer. Write to the file directly (and protect it with a mutex).
There's no code to ensure that the goroutines complete before the file is closed or the program exits. Use a sync.WaitGroup.

Resources