Solving data race in benchmark function that simulates stream - go

I'm trying to write a function that basically benchmarks streamed CSV over to a HTTP endpoint.
To do this I want to generate data and POST that data.
However, gos data race detector says that there is a data race and the benchmark finishes faster than I would believe is reasonable, so I guess the HTTP request is not properly processed.
How should I structure my test code to avoid this?
Is there a way to wait until the HTTP client call has been processed?
func BenchmarkStream(b *testing.B) {
header := "header\n"
buf := bytes.NewBufferString(header)
var wg sync.WaitGroup
wg.Add(1)
go func() {
for i := 0; i < b.N; i++ {
buf.WriteString(fmt.Sprintf("%d\n", i+1))
}
wg.Done()
}() <-- this line is mentioned in the data race detector
w := httptest.NewRecorder()
r, _ := http.NewRequest("POST", "/", buf)
h := &MyHandler{}
h.ServeHTTP(w, r)
wg.Wait()
if w.Code != 200 {
b.Errorf("test failed")
}
}
EDIT: #Grzegorz Żurs comment made me question my approach to begin with, I refactored it with an io.Pipe:
func BenchmarkStream(b *testing.B) {
pr, pw := io.Pipe()
go func() {
pw.Write([]byte("header\n"))
for i := 0; i < b.N; i++ {
pw.Write([]byte(fmt.Sprintf("%d\n", i+1)))
}
}()
w := httptest.NewRecorder()
r, _ := http.NewRequest("POST", "/", pr)
h := &MyHandler{}
h.ServeHTTP(w, r)
if w.Code != 200 {
b.Errorf("test failed")
}
}

You are sharing buf between two goroutines.

You're not going to get useful benchmark results if you only invoke the handler once. Build the request body once and then invoke your handler over and over again.
buf := &bytes.Buffer{}
buf.WriteString("header\n")
buf.WriteString(strings.Repeat("1\n", 1000)
body := buf.Bytes()
b.ResetTimer()
for i := 0; i < b.N; i++ {
w := httptest.NewRecorder()
r, err := http.NewRequest("POST", "/", bytes.NewReader(body))
if err != nil {
b.Fatal(err)
}
h := &MyHandler{}
h.ServeHTTP(w, r)
if w.Code != 200 {
b.Errorf("test failed")
}
}

Related

How to read a data file using golang?

I have a txt file :
2
Data 5 1.32
DataSecond 4 5.41
4
...
And so on. How to read the first line to know the count and then go on spliting the other lines to get individual parameters? I tried doing as follows, but it is obviously wrong.
f, err := os.Open("DATA.txt")
check(err)
s := bufio.NewScanner(f)
for s.Scan() {
line := s.Text()
count, err := strconv.Atoi(line)
check(err)
for i := 0; i < count; i++ {
testArray := strings.Fields(s.Text())
for _, v := range testArray {
fmt.Println(v)
}
}
}
You just forgot to Scan() inside the inner loop.
f, err := os.Open("DATA.txt")
check(err)
s := bufio.NewScanner(f)
for s.Scan() {
line := s.Text()
count, err := strconv.Atoi(line)
check(err)
for i := 0; i < count && s.Scan(); i++ {
testArray := strings.Fields(s.Text())
for _, v := range testArray {
fmt.Println(v)
}
}
}
You could do something like this, read the single lines into a count int
and use it as a test. If count is > 0 then use sscanf to get the other values
func main() {
f, err := os.Open("DATA.txt")
check(err)
s := bufio.NewScanner(f)
count := 0
for s.Scan() {
line := s.Text()
if count < 1 {
count, err = strconv.Atoi(line)
check(err)
continue
}
count--
var tag string
var n int
var f float64
fmt.Sscanf(line, "%s %d %f", &tag, &n, &f)
// not sure what you really wnant to do with the data!
fmt.Println(n, f, tag)
}
}

Why does this for loop never exit?

There is the following code, the use of which leads to an infinite loop. The values ​​from the channel are correct, the value of the variable sum is also correct. All the goroutines end up without errors.
func responseHandler(w http.ResponseWriter, r *http.Request) {
var c = make(chan string)
for i := 0; i < 100; i++ {
url := fmt.Sprintf("someurl/page%v/etc", i)
go parse(url, i, c)
if i%5 == 0 {
time.Sleep(1000 * time.Millisecond)
}
}
for range c {
sum = append(sum, <-c)
}
fmt.Println("Exit from channel wait")
fmt.Fprintln(w, sum)
}
func parse(url string, num int, c chan string) {
response, err1 := http.Get(url)
if err1 != nil {
log.Fatal(err1)
}
defer response.Body.Close()
if response.StatusCode != 200 {
log.Fatalf("status code error: %d %s", response.StatusCode,
response.Status)
}
res, err := DecodeHTMLBody(response.Body, "windows-1251")
doc, err := goquery.NewDocumentFromReader(res)
if err != nil {
log.Fatal(err)
}
doc.Find(".b-advItem__content").Each(func(i int, s *goquery.Selection) {
title := strings.TrimSpace(s.Find(".someclass").Text())
price := strings.TrimSpace(s.Find(".someclass").Text())
formatPrice := parsePrice(price)
c <- fmt.Sprintf("output %d: %s:%s\n", i, title, formatPrice)
})
fmt.Printf("Channel %d - exit\n", num)
Sum - global []string.
The range statement over a channel exits only when the channel is closed (well, think about it: how the range would otherwise detect there's no more data to fetch?), and nothing closes the channel in your code.
func responseHandler(w http.ResponseWriter, r *http.Request) {
...
for range c {
sum = append(sum, <-c)
if len(aa) == 100 {
close(c)
}
fmt.Fprintln(w, sum)
}
func parse(...){
...
aa = append(aa, num)
}
Adding such a check allows you to exit the loop correctly

how to limit goroutine

I'm developing a gmail client based on google api.
I have a list of labels obtained through this call
r, err := s.gClient.Service.Users.Labels.List(s.gClient.User).Do()
Then, for every label I need to get details
for _, l := range r.Labels {
d, err := s.gClient.Service.Users.Labels.Get(s.gClient.User, l.Id).Do()
}
I'd like to handle the loop in a more powerful way so I have implemented a goroutine in the loop:
ch := make(chan label.Label)
for _, l := range r.Labels {
go func(gmailLabels *gmailclient.Label, gClient *gmail.Client, ch chan<- label.Label) {
d, err := s.gClient.Service.Users.Labels.Get(s.gClient.User, l.Id).Do()
if err != nil {
panic(err)
}
// Performs some operation with the label `d`
preparedLabel := ....
ch <- preparedLabel
}(l, s.gClient, ch)
}
for i := 0; i < len(r.Labels); i++ {
lab := <-ch
fmt.Printf("Processed %v\n", lab.LabelID)
}
The problem with this code is that gmail api has a rate limit, so, I get this error:
panic: googleapi: Error 429: Too many concurrent requests for user, rateLimitExceeded
What is the correct way to handle this situation?
How about only starting e.g. 10 goroutines and pass the values in from one for loop in another go routine. The channels have a small buffer to decrease synchronisation time.
chIn := make(chan label.Label, 20)
chOut := make(chan label.Label, 20)
for i:=0;i<10;i++ {
go func(gClient *gmail.Client, chIn chan label.Label, chOut chan<- label.Label) {
for gmailLabels := range chIn {
d, err := s.gClient.Service.Users.Labels.Get(s.gClient.User, l.Id).Do()
if err != nil {
panic(err)
}
// Performs some operation with the label `d`
preparedLabel := ....
chOut <- preparedLabel
}
}(s.gClient, chIn, chOut)
}
go func(chIn chan label.Label) {
defer close(chIn)
for _, l := range r.Labels {
chIn <- l
}
}(chIn)
for i := 0; i < len(r.Labels); i++ {
lab := <-chOut
fmt.Printf("Processed %v\n", lab.LabelID)
}
EDIT:
Here a playground sample.

Closure inconsistency between parallel and sequential execution

I have attempted to write a generic function that can execute functions in parallel or sequentially. While testing it, I have found some very unexpected behavior regarding closures. In the code below, I define a list of functions that accept no parameters and return an error. The functions also use a for loop variable in a closure but I'm using the trick of defining a new variable within the loop in an attempt to avoid capture.
I'm expecting that I can call these functions sequentially or concurrently with the same effect but I'm seeing different results. It's as if the closure variable is being captured but only when run concurrently.
As far as I can tell, this is not the usual case of capturing a loop variable. As I mentioned, I'm defining a new variable within the loop. Also, I'm not running the closure function within the loop. I'm generating a list of functions within the loop but I'm executing the functions after the loop.
I'm using go version go1.8.3 linux/amd64.
package closure_test
import (
"sync"
"testing"
)
// MergeErrors merges multiple channels of errors.
// Based on https://blog.golang.org/pipelines.
func MergeErrors(cs ...<-chan error) <-chan error {
var wg sync.WaitGroup
out := make(chan error)
// Start an output goroutine for each input channel in cs. output
// copies values from c to out until c is closed, then calls wg.Done.
output := func(c <-chan error) {
for n := range c {
out <- n
}
wg.Done()
}
wg.Add(len(cs))
for _, c := range cs {
go output(c)
}
// Start a goroutine to close out once all the output goroutines are
// done. This must start after the wg.Add call.
go func() {
wg.Wait()
close(out)
}()
return out
}
// WaitForPipeline waits for results from all error channels.
// It returns early on the first error.
func WaitForPipeline(errs ...<-chan error) error {
errc := MergeErrors(errs...)
for err := range errc {
if err != nil {
return err
}
}
return nil
}
func RunInParallel(funcs ...func() error) error {
var errcList [](<-chan error)
for _, f := range funcs {
errc := make(chan error, 1)
errcList = append(errcList, errc)
go func() {
err := f()
if err != nil {
errc <- err
}
close(errc)
}()
}
return WaitForPipeline(errcList...)
}
func RunSequentially(funcs ...func() error) error {
for _, f := range funcs {
err := f()
if err != nil {
return err
}
}
return nil
}
func validateOutputChannel(t *testing.T, out chan int, n int) {
m := map[int]bool{}
for i := 0; i < n; i++ {
m[<-out] = true
}
if len(m) != n {
t.Errorf("Output channel has %v unique items; wanted %v", len(m), n)
}
}
// This fails because j is being captured.
func TestClosure1sp(t *testing.T) {
n := 4
out := make(chan int, n*2)
var funcs [](func() error)
for i := 0; i < n; i++ {
j := i // define a new variable that has scope only inside the current loop iteration
t.Logf("outer i=%v, j=%v", i, j)
f := func() error {
t.Logf("inner i=%v, j=%v", i, j)
out <- j
return nil
}
funcs = append(funcs, f)
}
t.Logf("Running funcs sequentially")
if err := RunSequentially(funcs...); err != nil {
t.Fatal(err)
}
validateOutputChannel(t, out, n)
t.Logf("Running funcs in parallel")
if err := RunInParallel(funcs...); err != nil {
t.Fatal(err)
}
close(out)
validateOutputChannel(t, out, n)
}
Below is the output from the test function above.
closure_test.go:91: outer i=0, j=0
closure_test.go:91: outer i=1, j=1
closure_test.go:91: outer i=2, j=2
closure_test.go:91: outer i=3, j=3
closure_test.go:99: Running funcs sequentially
closure_test.go:93: inner i=4, j=0
closure_test.go:93: inner i=4, j=1
closure_test.go:93: inner i=4, j=2
closure_test.go:93: inner i=4, j=3
closure_test.go:104: Running funcs in parallel
closure_test.go:93: inner i=4, j=3
closure_test.go:93: inner i=4, j=3
closure_test.go:93: inner i=4, j=3
closure_test.go:93: inner i=4, j=3
closure_test.go:80: Output channel has 1 unique items; wanted 4
Any ideas? Is this a bug in Go?
Always run your tests with -race. In your case, you forgot to recreate f on each iteration in RunInParallel:
func RunInParallel(funcs ...func() error) error {
var errcList [](<-chan error)
for _, f := range funcs {
f := f // << HERE
errc := make(chan error, 1)
errcList = append(errcList, errc)
go func() {
err := f()
if err != nil {
errc <- err
}
close(errc)
}()
}
return WaitForPipeline(errcList...)
}
As a result, you always launched the last f instead of each one.
I believe your problem lies in your RunInParallel func.
func RunInParallel(funcs ...func() error) error {
var errcList [](<-chan error)
for _, f := range funcs {
errc := make(chan error, 1)
errcList = append(errcList, errc)
go func() {
// This line probably isn't being reached until your range
// loop has completed, meaning f is the last func by the time
// each goroutine starts. If you capture f
// in another variable inside the range, you won't have this issue.
err := f()
if err != nil {
errc <- err
}
close(errc)
}()
}
return WaitForPipeline(errcList...)
}
You could also pass f as a parameter to your anonymous function to avoid this issue.
for _, f := range funcs {
errc := make(chan error, 1)
errcList = append(errcList, errc)
go func(g func() error) {
err := g()
if err != nil {
errc <- err
}
close(errc)
}(f)
}
Here is a live example in the playground.

write string to file in goroutine

I am using go routine in code as follow:
c := make(chan string)
work := make(chan string, 1000)
clvl := runtime.NumCPU()
for i := 0; i < clvl; i++ {
go func(i int) {
f, err := os.Create(fmt.Sprintf("/tmp/sample_match_%d.csv", i))
if nil != err {
panic(err)
}
defer f.Close()
w := bufio.NewWriter(f)
for jdId := range work {
for _, itemId := range itemIdList {
w.WriteString("test")
}
w.Flush()
c <- fmt.Sprintf("done %s", jdId)
}
}(i)
}
go func() {
for _, jdId := range jdIdList {
work <- jdId
}
close(work)
}()
for resp := range c {
fmt.Println(resp)
}
This is ok, but can I all go routine just write to one files? just like this:
c := make(chan string)
work := make(chan string, 1000)
clvl := runtime.NumCPU()
f, err := os.Create("/tmp/sample_match_%d.csv")
if nil != err {
panic(err)
}
defer f.Close()
w := bufio.NewWriter(f)
for i := 0; i < clvl; i++ {
go func(i int) {
for jdId := range work {
for _, itemId := range itemIdList {
w.WriteString("test")
}
w.Flush()
c <- fmt.Sprintf("done %s", jdId)
}
}(i)
}
This can not work, error : panic: runtime error: slice bounds out of range
The bufio.Writer type does not support concurrent access. Protect it with a mutex.
Because the short strings are flushed on every write, there's no point in using a bufio.Writer. Write to the file directly (and protect it with a mutex).
There's no code to ensure that the goroutines complete before the file is closed or the program exits. Use a sync.WaitGroup.

Resources