Why does for loop with goroutines result in missing data - go

Ok, so I have two bits of code. First off is a simple for loop that works great
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"os"
elasticsearch "github.com/elastic/go-elasticsearch/v7"
"github.com/elastic/go-elasticsearch/v7/esapi"
"github.com/mitchellh/mapstructure"
)
type Esindices struct {
Health string `json:"health"`
Status string `json:"status"`
Index string `json:"index"`
Uuid string `json:"uuid"`
Pri string `json:"pri"`
Rep string `json:"rep"`
DocsCount string `json:"docs.count"`
DocsDeleted string `json:"docs.deleted"`
StoreSize string `json:"store.size"`
PriStoreSize string `json:"pri.store.size"`
}
func main() {
var r []map[string]interface{}
es, err := elasticsearch.NewDefaultClient()
if err != nil {
log.Fatalf("Error creating client: %s", err)
}
req := esapi.CatIndicesRequest{
Format: "json",
Pretty: false,
}
res, err := req.Do(context.Background(), es)
if err != nil {
log.Fatalf("Error getting response: %s", err)
}
defer res.Body.Close()
if err := json.NewDecoder(res.Body).Decode(&r); err != nil {
log.Printf("Error parsing the response body: %s", err)
}
indexSlice := make([]*Esindices, len(r))
for i, element := range r {
result := &Esindices{}
cfg := &mapstructure.DecoderConfig{
Metadata: nil,
Result: &result,
TagName: "json",
}
decoder, _ := mapstructure.NewDecoder(cfg)
decoder.Decode(element)
indexSlice[i] = result
}
thisisjson, err := json.MarshalIndent(indexSlice, "", " ")
if err != nil {
log.Fatal("Can't encode to JSON", err)
}
fmt.Fprintf(os.Stdout, "%s", thisisjson)
Most of this is pretty self-explanatory, but just to clarify I am using the Elasticsearch client and the api.cat.indices API to get a list of all the indices in a local Elasticsearch install and then store them as an array of map[string]interface{} and then loop over this to add them to a slice of a struct of the results. This is fine, actually, but I want to be mindful of performance, and while I can't improve the latency of the request itself I can certainly improve the performance of the loop, at least I think I should be able to.
So when I try the below instead I get weird results.
var wg sync.WaitGroup
defer wg.Wait()
for i, element := range r {
wg.Add(1)
go func(i int, element map[string]interface{}) {
defer wg.Done()
result := Esindices{}
cfg := &mapstructure.DecoderConfig{
Metadata: nil,
Result: &result,
TagName: "json",
}
decoder, _ := mapstructure.NewDecoder(cfg)
decoder.Decode(element)
indexSlice[i] = result
}(i, element)
}
The issue is, specifically, the some of the values of the keys of the elements in the slice are empty. This makes me think the code is trying to add to the slice, but it's passing even if it's not done.
Thoughts?

Instead of defer wg.Wait, use wg.Wait at the end of the for-loop. You are using the data constructed by the goroutines in the for-loop right after for-loop completes, and you're not waiting for all the goroutines to complete before you use that data.
When you use defer wg.Wait, waiting happens at the end of the function. The for-loop using the data operates on incomplete data because the goroutines are still running.
When you use wg.Wait at the end of the for-loop, you first wait for all the goroutines to end, and then use the data generated by them.

Related

Trouble figuring out data race in goroutine

I started learning go recently and I've been chipping away at this for a while now, but figured it was time to ask for some specific help. I have my program requesting paginated data from an api and because there are about 160 pages of data. Seems like a good use of goroutines, except I have race conditions and I can't seem to figure out why. It's probably because I'm new to the language, but my impressions was that params for a function are passed as a copy of the data in the function calling it unless it's a pointer.
According to what I think I know this should be making copies of my data which leaves me free to change it in the main function, but I end up request some pages multiple times and other pages just once.
My main.go
package main
import (
"bufio"
"encoding/json"
"log"
"net/http"
"net/url"
"os"
"strconv"
"sync"
"github.com/joho/godotenv"
)
func main() {
err := godotenv.Load()
if err != nil {
log.Fatalln(err)
}
httpClient := &http.Client{}
baseURL := "https://api.data.gov/ed/collegescorecard/v1/schools.json"
filters := make(map[string]string)
page := 0
filters["school.degrees_awarded.predominant"] = "2,3"
filters["fields"] = "id,school.name,school.city,2018.student.size,2017.student.size,2017.earnings.3_yrs_after_completion.overall_count_over_poverty_line,2016.repayment.3_yr_repayment.overall"
filters["api_key"] = os.Getenv("API_KEY")
outFile, err := os.Create("./out.txt")
if err != nil {
log.Fatalln(err)
}
writer := bufio.NewWriter(outFile)
requestURL := getRequestURL(baseURL, filters)
response := requestData(requestURL, httpClient)
wg := sync.WaitGroup{}
for (page+1)*response.Metadata.ResultsPerPage < response.Metadata.TotalResults {
page++
filters["page"] = strconv.Itoa(page)
wg.Add(1)
go func() {
defer wg.Done()
requestURL := getRequestURL(baseURL, filters)
response := requestData(requestURL, httpClient)
_, err = writer.WriteString(response.TextOutput())
if err != nil {
log.Fatalln(err)
}
}()
}
wg.Wait()
}
func getRequestURL(baseURL string, filters map[string]string) *url.URL {
requestURL, err := url.Parse(baseURL)
if err != nil {
log.Fatalln(err)
}
query := requestURL.Query()
for key, value := range filters {
query.Set(key, value)
}
requestURL.RawQuery = query.Encode()
return requestURL
}
func requestData(url *url.URL, httpClient *http.Client) CollegeScoreCardResponseDTO {
request, _ := http.NewRequest(http.MethodGet, url.String(), nil)
resp, err := httpClient.Do(request)
if err != nil {
log.Fatalln(err)
}
defer resp.Body.Close()
var parsedResponse CollegeScoreCardResponseDTO
err = json.NewDecoder(resp.Body).Decode(&parsedResponse)
if err != nil {
log.Fatalln(err)
}
return parsedResponse
}
I know another issue I will be running into is writing to the output file in the correct order, but I believe using channels to tell each routine what request finished writing could solve that. If I'm incorrect on that I would appreciate any advice on how to approach that as well.
Thanks in advance.
goroutines do not receive copies of data. When the compiler detects that a variable "escapes" the current function, it allocates that variable on the heap. In this case, filters is one such variable. When the goroutine starts, the filters it accesses is the same map as the main thread. Since you keep modifying filters in the main thread without locking, there is no guarantee of what the goroutine sees.
I suggest you keep filters read-only, create a new map in the goroutine by copying all items from the filters, and add the "page" in the goroutine. You have to be careful to pass a copy of the page as well:
go func(page int) {
flt:=make(map[string]string)
for k,v:=range filters {
flt[k]=v
}
flt["page"]=strconv.Itoa(page)
...
} (page)

Too Many open files/ No such host error while running a go program which makes concurrent requests

I have a golang program which is supposed to call an API with different payloads, the web application is a drop wizard application which is running on localhost, and the go program is below
package main
import (
"bufio"
"encoding/json"
"log"
"net"
"net/http"
"os"
"strings"
"time"
)
type Data struct {
PersonnelId string `json:"personnel_id"`
DepartmentId string `json:"department_id"`
}
type PersonnelEvent struct {
EventType string `json:"event_type"`
Data `json:"data"`
}
const (
maxIdleConnections = 20
maxIdleConnectionsPerHost = 20
timeout = time.Duration(5 * time.Second)
)
var transport = http.Transport{
Dial: dialTimeout,
MaxIdleConns: maxIdleConnections,
MaxIdleConnsPerHost: 20,
}
var client = &http.Client{
Transport: &transport,
}
func dialTimeout(network, addr string) (net.Conn, error) {
return net.DialTimeout(network, addr, timeout)
}
func makeRequest(payload string) {
req, _ := http.NewRequest("POST", "http://localhost:9350/v1/provider-
location-personnel/index", strings.NewReader(payload))
req.Header.Set("X-App-Token", "TESTTOKEN1")
req.Header.Set("Content-Type", "application/json")
resp, err := client.Do(req)
if err != nil {
log.Println("Api invocation returned an error ", err)
} else {
defer resp.Body.Close()
log.Println(resp.Body)
}
}
func indexPersonnels(personnelPayloads []PersonnelEvent) {
for _, personnelEvent := range personnelPayloads {
payload, err := json.Marshal(personnelEvent)
if err != nil {
log.Println("Error while marshalling payload ", err)
}
log.Println(string(payload))
// go makeRequest(string(payload))
}
}
func main() {
ch := make(chan PersonnelEvent)
for i := 0; i < 20; i++ {
go func() {
for personnelEvent := range ch {
payload, err := json.Marshal(personnelEvent)
if err != nil {
log.Println("Error while marshalling payload", err)
}
go makeRequest(string(payload))
//log.Println("Payload ", string(payload))
}
}()
}
file, err := os.Open("/Users/tmp/Desktop/personnels.txt")
defer file.Close()
if err != nil {
log.Fatalf("Error opening personnel id file %v", err)
}
scanner := bufio.NewScanner(file)
for scanner.Scan() {
go func() {
ch <- PersonnelEvent{EventType: "provider_location_department_personnel_linked", Data: Data{DepartmentId: "2a8d9687-aea8-4a2c-bc08-c64d7716d973", PersonnelId: scanner.Text()}}
}()
}
}
Its reading some ids from a file and then creating a payload out of it and invoking a post request on the web server, but when i run the program it gives too many open file errors/no such host errors, i feel that the program is too much concurrent how to make it run gracefully?
inside your 20 goroutines started in main(), "go makeRequest(...)" again created one goroutine for each event. you don't need start extra goroutine there.
Besides, I think you don't need start goroutine in your scan loop, either. buffered channel is enough,because bottleneck should be at doing http requests.
You can use a buffered channel, A.K.A. counting semaphore, to limit the parallelism.
// The capacity of the buffered channel is 10,
// which means you can have 10 goroutines to
// run the makeRequest function in parallel.
var tokens = make(chan struct{}, 10)
func makeRequest(payload string) {
tokens <- struct{}{} // acquire the token or block here
defer func() { <-tokens }() // release the token to awake another goroutine
// other code...
}

GoLang Unmarshal JSON From Elastic Search Result

I have data returned from Elasticsearch, using "github.com/olivere/elastic". That sort of works, when i add it to my struct and string it, like so,
data := Api {
Total: myTotal,
Data: string(result),
}
c.JSON(http.StatusOK, totalData)
the api is a struct like so,
type Api struct {
Total interface{}
Data interface{}
}
This returns data ok, from 1 to any number of results on request. How the results loaded into the data interface are not escaped or something, e.g.
"Data":"{\"CID\":\"XXXXXXXXXX\",\"Link\":\"XXXXXXXXX\",
So I have tried to unmarshal the data before adding it to the api struct.
var p DataApi
err := json.Unmarshal(result, &p)
if err != nil {
panic(err)
}
totalData := Api {
Total: myTotal,
Data: p,
}
c.JSON(http.StatusOK, totalData)
This sort of works fine, returns the data in the correct way, but only when loading one result. When 2 or more results are requested, I get this error from the unmarshal panic
invalid character '{' after top-level value
I have tried and google all over but can not find a solution to this? I am not sure what I am doing wrong? The DataApi is a nested set of structs, I was not sure if there was anything I should be being because of that?
This is being run within the Gin framework.
Thanks.
EDIT
So when I use fmt.Println on the string(result) I can print any number of results on the screen. How can I add this to the API struct and then I need the struct converted into JSON data. Is there some way of appending this string data on the JSON converted API struct?
Try to unmarshal multiple results into a slice:
var q []Api
err = json.Unmarshal(result, &q)
See on playground https://play.golang.org/p/D_bVAd4jBlI
package main
import (
"encoding/json"
"fmt"
)
type Api struct {
Total interface{}
Data interface{}
}
func main() {
data := Api{
Total: 1,
Data: "2",
}
result, err := json.Marshal(data)
if err != nil {
panic(err)
}
fmt.Printf("single data: %s\n", result)
var p Api
err = json.Unmarshal(result, &p)
if err != nil {
panic(err)
}
dataSlice := []Api{data}
result, err = json.Marshal(dataSlice)
if err != nil {
panic(err)
}
fmt.Printf("slice of data: %s\n", result)
var q []Api
err = json.Unmarshal(result, &q)
if err != nil {
panic(err)
}
}
Use json.RawMessage to store arbitrary JSON documents:
var p json.RawMessage
err := json.Unmarshal(result, &p)
if err != nil {
panic(err)
}
totalData := Api {
Total: myTotal,
Data: p,
}
c.JSON(http.StatusOK, totalData)
I have a working solution to my problem, I just use the Hits Hits From the data returned by elastic search, I would like just the source data but i think it does what I need it to do... for now.
Thanks.

Confusion regarding channel directions and blocking in Go

In a function definition, if a channel is an argument without a direction, does it have to send or receive something?
func makeRequest(url string, ch chan<- string, results chan<- string) {
start := time.Now()
resp, err := http.Get(url)
defer resp.Body.Close()
if err != nil {
fmt.Printf("%v", err)
}
resp, err = http.Post(url, "text/plain", bytes.NewBuffer([]byte("Hey")))
defer resp.Body.Close()
secs := time.Since(start).Seconds()
if err != nil {
fmt.Printf("%v", err)
}
// Cannot move past this.
ch <- fmt.Sprintf("%f", secs)
results <- <- ch
}
func MakeRequestHelper(url string, ch chan string, results chan string, iterations int) {
for i := 0; i < iterations; i++ {
makeRequest(url, ch, results)
}
for i := 0; i < iterations; i++ {
fmt.Println(<-ch)
}
}
func main() {
args := os.Args[1:]
threadString := args[0]
iterationString := args[1]
url := args[2]
threads, err := strconv.Atoi(threadString)
if err != nil {
fmt.Printf("%v", err)
}
iterations, err := strconv.Atoi(iterationString)
if err != nil {
fmt.Printf("%v", err)
}
channels := make([]chan string, 100)
for i := range channels {
channels[i] = make(chan string)
}
// results aggregate all the things received by channels in all goroutines
results := make(chan string, iterations*threads)
for i := 0; i < threads; i++ {
go MakeRequestHelper(url, channels[i], results, iterations)
}
resultSlice := make([]string, threads*iterations)
for i := 0; i < threads*iterations; i++ {
resultSlice[i] = <-results
}
}
In the above code,
ch <- or <-results
seems to be blocking every goroutine that executes makeRequest.
I am new to concurrency model of Go. I understand that sending to and receiving from a channel blocks but find it difficult what is blocking what in this code.
I'm not really sure that you are doing... It seems really convoluted. I suggest you read up on how to use channels.
https://tour.golang.org/concurrency/2
That being said you have so much going on in your code that it was much easier to just gut it to something a bit simpler. (It can be simplified further). I left comments to understand the code.
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"sync"
"time"
)
// using structs is a nice way to organize your code
type Worker struct {
wg sync.WaitGroup
semaphore chan struct{}
result chan Result
client http.Client
}
// group returns so that you don't have to send to many channels
type Result struct {
duration float64
results string
}
// closing your channels will stop the for loop in main
func (w *Worker) Close() {
close(w.semaphore)
close(w.result)
}
func (w *Worker) MakeRequest(url string) {
// a semaphore is a simple way to rate limit the amount of goroutines running at any single point of time
// google them, Go uses them often
w.semaphore <- struct{}{}
defer func() {
w.wg.Done()
<-w.semaphore
}()
start := time.Now()
resp, err := w.client.Get(url)
if err != nil {
log.Println("error", err)
return
}
defer resp.Body.Close()
// don't have any examples where I need to also POST anything but the point should be made
// resp, err = http.Post(url, "text/plain", bytes.NewBuffer([]byte("Hey")))
// if err != nil {
// log.Println("error", err)
// return
// }
// defer resp.Body.Close()
secs := time.Since(start).Seconds()
b, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Println("error", err)
return
}
w.result <- Result{duration: secs, results: string(b)}
}
func main() {
urls := []string{"https://facebook.com/", "https://twitter.com/", "https://google.com/", "https://youtube.com/", "https://linkedin.com/", "https://wordpress.org/",
"https://instagram.com/", "https://pinterest.com/", "https://wikipedia.org/", "https://wordpress.com/", "https://blogspot.com/", "https://apple.com/",
}
workerNumber := 5
worker := Worker{
semaphore: make(chan struct{}, workerNumber),
result: make(chan Result),
client: http.Client{Timeout: 5 * time.Second},
}
// use sync groups to allow your code to wait for
// all your goroutines to finish
for _, url := range urls {
worker.wg.Add(1)
go worker.MakeRequest(url)
}
// by declaring wait and close as a seperate goroutine
// I can get to the for loop below and iterate on the results
// in a non blocking fashion
go func() {
worker.wg.Wait()
worker.Close()
}()
// do something with the results channel
for res := range worker.result {
fmt.Printf("Request took %2.f seconds.\nResults: %s\n\n", res.duration, res.results)
}
}
The channels in channels are nil (no make is executed; you make the slice but not the channels), so any send or receive will block. I'm not sure exactly what you're trying to do here, but that's the basic problem.
See https://golang.org/doc/effective_go.html#channels for an explanation of how channels work.

Anonymous function doesn't seem to execute in Go routine

I have the following code. Pay special attention to the anonymous function:
func saveMatterNodes(matterId int, nodes []doculaw.LitigationNode) (bool, error) {
var (
err error
resp *http.Response
)
// Do this in multiple threads
for _, node := range nodes {
fmt.Println("in loops")
go func() {
postValues := doculaw.LitigationNode{
Name: node.Name,
Description: node.Description,
Days: node.Days,
Date: node.Date,
IsFinalStep: false,
Completed: false,
Matter: matterId}
b := new(bytes.Buffer)
json.NewEncoder(b).Encode(postValues)
resp, err = http.Post("http://127.0.0.1:8001/matterNode/", "application/json", b)
io.Copy(os.Stdout, resp.Body)
fmt.Println("Respone from http post", resp)
if err != nil {
fmt.Println(err)
}
}()
}
if err != nil {
return false, err
} else {
return true, nil
}
}
If I remove the go func() {}() part and just leave the code in between it seems to execute fine but the moment I add it back it does not execute. Any idea why that is? I initially thought maybe because it's executing on a different thread but this doesn't seem to be the case as I can see on my webservice access logs that it is not executing.
I think this behaviour is because function never yields back to main thread ( After you launch goroutines, there is no construct in program to wait for them to finish their work).
Use of channels, IO operations, sync.WaitGroup etc can yield control back to the main thread.
You may want to try sync.WaitGroup
Example: https://play.golang.org/p/Zwn0YBynl2

Resources