Trouble figuring out data race in goroutine - go

I started learning go recently and I've been chipping away at this for a while now, but figured it was time to ask for some specific help. I have my program requesting paginated data from an api and because there are about 160 pages of data. Seems like a good use of goroutines, except I have race conditions and I can't seem to figure out why. It's probably because I'm new to the language, but my impressions was that params for a function are passed as a copy of the data in the function calling it unless it's a pointer.
According to what I think I know this should be making copies of my data which leaves me free to change it in the main function, but I end up request some pages multiple times and other pages just once.
My main.go
package main
import (
"bufio"
"encoding/json"
"log"
"net/http"
"net/url"
"os"
"strconv"
"sync"
"github.com/joho/godotenv"
)
func main() {
err := godotenv.Load()
if err != nil {
log.Fatalln(err)
}
httpClient := &http.Client{}
baseURL := "https://api.data.gov/ed/collegescorecard/v1/schools.json"
filters := make(map[string]string)
page := 0
filters["school.degrees_awarded.predominant"] = "2,3"
filters["fields"] = "id,school.name,school.city,2018.student.size,2017.student.size,2017.earnings.3_yrs_after_completion.overall_count_over_poverty_line,2016.repayment.3_yr_repayment.overall"
filters["api_key"] = os.Getenv("API_KEY")
outFile, err := os.Create("./out.txt")
if err != nil {
log.Fatalln(err)
}
writer := bufio.NewWriter(outFile)
requestURL := getRequestURL(baseURL, filters)
response := requestData(requestURL, httpClient)
wg := sync.WaitGroup{}
for (page+1)*response.Metadata.ResultsPerPage < response.Metadata.TotalResults {
page++
filters["page"] = strconv.Itoa(page)
wg.Add(1)
go func() {
defer wg.Done()
requestURL := getRequestURL(baseURL, filters)
response := requestData(requestURL, httpClient)
_, err = writer.WriteString(response.TextOutput())
if err != nil {
log.Fatalln(err)
}
}()
}
wg.Wait()
}
func getRequestURL(baseURL string, filters map[string]string) *url.URL {
requestURL, err := url.Parse(baseURL)
if err != nil {
log.Fatalln(err)
}
query := requestURL.Query()
for key, value := range filters {
query.Set(key, value)
}
requestURL.RawQuery = query.Encode()
return requestURL
}
func requestData(url *url.URL, httpClient *http.Client) CollegeScoreCardResponseDTO {
request, _ := http.NewRequest(http.MethodGet, url.String(), nil)
resp, err := httpClient.Do(request)
if err != nil {
log.Fatalln(err)
}
defer resp.Body.Close()
var parsedResponse CollegeScoreCardResponseDTO
err = json.NewDecoder(resp.Body).Decode(&parsedResponse)
if err != nil {
log.Fatalln(err)
}
return parsedResponse
}
I know another issue I will be running into is writing to the output file in the correct order, but I believe using channels to tell each routine what request finished writing could solve that. If I'm incorrect on that I would appreciate any advice on how to approach that as well.
Thanks in advance.

goroutines do not receive copies of data. When the compiler detects that a variable "escapes" the current function, it allocates that variable on the heap. In this case, filters is one such variable. When the goroutine starts, the filters it accesses is the same map as the main thread. Since you keep modifying filters in the main thread without locking, there is no guarantee of what the goroutine sees.
I suggest you keep filters read-only, create a new map in the goroutine by copying all items from the filters, and add the "page" in the goroutine. You have to be careful to pass a copy of the page as well:
go func(page int) {
flt:=make(map[string]string)
for k,v:=range filters {
flt[k]=v
}
flt["page"]=strconv.Itoa(page)
...
} (page)

Related

Stop all running nested loops inside for -> go func -> for loop

The best way I figured out how to "multi-thread" requests with a different proxy each time was to nest a go func and for loop inside of another for loop, but I can't figure out how to stop all loops like a break normally would, I tried a regular break also tried break out and added out: above the loops but that didn't stop it.
package main
import (
"log"
"encoding/json"
"github.com/parnurzeal/gorequest"
)
func main(){
rep := 100
for i := 0; i < rep; i++ {
log.Println("starting loop")
go func() {
for{
request := gorequest.New()
resp, body, errs := request.Get("https://discord.com/api/v9/invites/family").End()
if errs != nil {
return
}
if resp.StatusCode == 200{
var result map[string]interface{}
json.Unmarshal([]byte(body), &result)
serverName := result["guild"].(map[string]interface{})["name"]
log.Println(sererName +" response 200, closing all loops")
//break all loops and goroutine here
}
}
}
}
log.Println("response 200,closed all loops")
Answering this would be complicated by your use of parnurzeal/gorequest because that package does not provide any obvious way to cancel requests (see this issue). Because your focus appears to be on the process rather than the specific function I've just used the standard library (http) instead (if you do need to use gorequest then perhaps ask a question specifically about that).
Anyway the below solution demonstrates a few things:
Uses a Waitgroup so it knows when all go routines are done (not essential here but often you want to know you have shutdown cleanly)
Passes the result out via a channel (updating shared variables from a goroutine leads to data races).
Uses a context for cancellation. The cancel function is called when we have a result and this will stop in progress requests.
package main
import (
"context"
"encoding/json"
"errors"
"fmt"
"log"
"net/http"
"sync"
)
func main() {
// Get the context and a function to cancel it
ctx, cancel := context.WithCancel(context.Background())
defer cancel() // Not really required here but its good practice to ensure context is cancelled eventually.
results := make(chan string)
const goRoutineCount = 100
var wg sync.WaitGroup
wg.Add(goRoutineCount) // we will be waiting on 100 goRoutines
for i := 0; i < goRoutineCount; i++ {
go func() {
defer wg.Done() // Decrement WaitGroup when goRoutine exits
req, err := http.NewRequestWithContext(ctx, http.MethodGet, "https://discord.com/api/v9/invites/family", nil)
if err != nil {
panic(err)
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
if errors.Is(err, context.Canceled) {
return // The error is due to the context being cancelled so just shutdown
}
panic(err)
}
defer resp.Body.Close() // Ensure body is closed
if resp.StatusCode == 200 {
var result map[string]interface{}
if err = json.NewDecoder(resp.Body).Decode(&result); err != nil {
panic(err)
}
serverName := result["guild"].(map[string]interface{})["name"]
results <- serverName.(string) // Should error check this...
cancel() // We have a result so all goroutines can stop now!
}
}()
}
// We need to process results until everything has shutdown; simple approach is to just close the channel when done
go func() {
wg.Wait()
close(results)
}()
var firstResult string
requestsProcessed := 0
for x := range results {
fmt.Println("got result")
if requestsProcessed == 0 {
firstResult = x
}
requestsProcessed++ // Possible that we will get more than one result (remember that requests are running in parallel)
}
// At this point all goroutines have shutdown
if requestsProcessed == 0 {
log.Println("No results received")
} else {
log.Printf("xx%s response 200, closing all loops (requests processed: %d)", firstResult, requestsProcessed)
}
}

Semantic way of http.Response receiver functions in Go

I just started learning GO and wrote this piece of code that writes an http.Response.Body to os.Stdout or to a file, but I'm not happy about the semantics of this.
I want the http.Response struct to have these receiver functions, so I can use it more easily throughout the entire app.
I know that the answers might get flagged as opinionated, but I still wonder, is there a better way of writing this?
Is there some sort of best practice?
package main
import (
"fmt"
"io"
"io/ioutil"
"net/http"
"os"
)
type httpResp http.Response
func main() {
res, err := http.Get("http://www.stackoverflow.com")
if err != nil {
fmt.Println("Error: ", err)
os.Exit(1)
}
defer res.Body.Close()
response := httpResp(*res)
response.toFile("stckovrflw.html")
response.toStdOut()
}
func (r httpResp) toFile(filename string) {
str, err := ioutil.ReadAll(r.Body)
if err != nil {
panic(err)
}
ioutil.WriteFile(filename, []byte(str), 0666)
}
func (r httpResp) toStdOut() {
_, err := io.Copy(os.Stdout, r.Body)
if err != nil {
panic(err)
}
}
On a side note, is there a way to make the http.Get method spit out a custom type that already has access to these receiver functions without the need for casting? So i could do something like this:
func main() {
res, err := http.Get("http://www.stackoverflow.com")
if err != nil {
fmt.Println("Error: ", err)
os.Exit(1)
}
defer res.Body.Close()
res.toFile("stckovrflw.html")
res.toStdOut()
}
Thanks!
You don't have to implement these functions. *http.Response already implements io.Writer:
Write writes r to w in the HTTP/1.x server response format, including the status line, headers, body, and optional trailer.
package main
import (
"net/http"
"os"
)
func main() {
r := &http.Response{}
r.Write(os.Stdout)
}
In the example above, the zero value prints:
HTTP/0.0 000 status code 0
Content-Length: 0
Playground: https://play.golang.org/p/2AUEAUPCA8j
In case you need additional business logic in the write methods, you can embed *http.Response in your defined type:
type RespWrapper struct {
*http.Response
}
func (w *RespWrapper) toStdOut() {
_, err := io.Copy(os.Stdout, w.Body)
if err != nil {
panic(err)
}
}
But then you must construct a variable of type RespWrapper with the *http.Response:
func main() {
// resp with a fake body
r := &http.Response{Body: io.NopCloser(strings.NewReader("foo"))}
// or r, _ := http.Get("example.com")
// construct the wrapper
wrapper := &RespWrapper{Response: r}
wrapper.toStdOut()
}
is there a way to make the http.Get method spit out a custom type
No, the return types of http.Get are (resp *http.Response, err error), that's part of the function signature, you can't change it.

Why does for loop with goroutines result in missing data

Ok, so I have two bits of code. First off is a simple for loop that works great
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"os"
elasticsearch "github.com/elastic/go-elasticsearch/v7"
"github.com/elastic/go-elasticsearch/v7/esapi"
"github.com/mitchellh/mapstructure"
)
type Esindices struct {
Health string `json:"health"`
Status string `json:"status"`
Index string `json:"index"`
Uuid string `json:"uuid"`
Pri string `json:"pri"`
Rep string `json:"rep"`
DocsCount string `json:"docs.count"`
DocsDeleted string `json:"docs.deleted"`
StoreSize string `json:"store.size"`
PriStoreSize string `json:"pri.store.size"`
}
func main() {
var r []map[string]interface{}
es, err := elasticsearch.NewDefaultClient()
if err != nil {
log.Fatalf("Error creating client: %s", err)
}
req := esapi.CatIndicesRequest{
Format: "json",
Pretty: false,
}
res, err := req.Do(context.Background(), es)
if err != nil {
log.Fatalf("Error getting response: %s", err)
}
defer res.Body.Close()
if err := json.NewDecoder(res.Body).Decode(&r); err != nil {
log.Printf("Error parsing the response body: %s", err)
}
indexSlice := make([]*Esindices, len(r))
for i, element := range r {
result := &Esindices{}
cfg := &mapstructure.DecoderConfig{
Metadata: nil,
Result: &result,
TagName: "json",
}
decoder, _ := mapstructure.NewDecoder(cfg)
decoder.Decode(element)
indexSlice[i] = result
}
thisisjson, err := json.MarshalIndent(indexSlice, "", " ")
if err != nil {
log.Fatal("Can't encode to JSON", err)
}
fmt.Fprintf(os.Stdout, "%s", thisisjson)
Most of this is pretty self-explanatory, but just to clarify I am using the Elasticsearch client and the api.cat.indices API to get a list of all the indices in a local Elasticsearch install and then store them as an array of map[string]interface{} and then loop over this to add them to a slice of a struct of the results. This is fine, actually, but I want to be mindful of performance, and while I can't improve the latency of the request itself I can certainly improve the performance of the loop, at least I think I should be able to.
So when I try the below instead I get weird results.
var wg sync.WaitGroup
defer wg.Wait()
for i, element := range r {
wg.Add(1)
go func(i int, element map[string]interface{}) {
defer wg.Done()
result := Esindices{}
cfg := &mapstructure.DecoderConfig{
Metadata: nil,
Result: &result,
TagName: "json",
}
decoder, _ := mapstructure.NewDecoder(cfg)
decoder.Decode(element)
indexSlice[i] = result
}(i, element)
}
The issue is, specifically, the some of the values of the keys of the elements in the slice are empty. This makes me think the code is trying to add to the slice, but it's passing even if it's not done.
Thoughts?
Instead of defer wg.Wait, use wg.Wait at the end of the for-loop. You are using the data constructed by the goroutines in the for-loop right after for-loop completes, and you're not waiting for all the goroutines to complete before you use that data.
When you use defer wg.Wait, waiting happens at the end of the function. The for-loop using the data operates on incomplete data because the goroutines are still running.
When you use wg.Wait at the end of the for-loop, you first wait for all the goroutines to end, and then use the data generated by them.

Iterating over multiple returns in golang

I am trying to take input from a text file containing domains(unknown amount) to then use each as an argument and get their server type. As expected, this only returns the last domain. How do I iterating multiple return values?
Below is the code.
// Test
package main
import (
"bufio"
"time"
"os"
"fmt"
"net/http"
//"github.com/gocolly/colly"
)
var Domain string
var Target string
func main() {
Domain := DomainGrab()
Target := BannerGrab(Domain)
//CheckDB if not listed then add else skip
//RiskDB
//Email
fmt.Println(Domain)
fmt.Println(Target)
}
func BannerGrab(s string) string {
client := &http.Client{}
req, err := http.NewRequest("GET", s, nil)
if err != nil {
log.Fatalln(err)
}
req.Header.Set("User-Agent", "Authac/0.1")
resp, _ := client.Do(req)
serverEntry := resp.Header.Get("Server")
return serverEntry
}
func DomainGrab() string {
//c := colly.NewCollector()
// Open the file.
f, _ := os.Open("domains.txt")
defer f.Close()
// Create a new Scanner for the file.
scanner := bufio.NewScanner(f)
// Loop over all lines in the file and print them.
for scanner.Scan() {
line := scanner.Text()
time.Sleep(2 * time.Second)
//fmt.Println(line)
return line
}
return Domain
}
If you wanted to do it "concurrently", you would return a channel through which you will send the multiple things you want to return:
https://play.golang.org/p/iYBGPwfYLYR
func DomainGrab() <-chan string {
ch := make(chan string, 1)
f, _ := os.Open("domains.txt")
defer f.Close()
scanner := bufio.NewScanner(f)
go func() {
// Loop over all lines in the file and print them.
for scanner.Scan() {
line := scanner.Text()
time.Sleep(2 * time.Second)
ch <- line
}
close(ch)
}()
return ch
}
If I understand your question, you want to read the file, somehow detect that file was modified and have a method that will emit these modifications to client code.
That is not how files work.
you have two options:
Listen for file changes using some OS specific API - https://www.linuxjournal.com/content/linux-filesystem-events-inotify
Read file using infinite loop. Read the file once. Save the copy into memory. Read the same file again and again in loop till new file is different from copy and calculate the delta.
Check if that is possible to use push instead of pull for getting new domains. Is it possible that system that control domain names in file will push data to you directly?
If loop is the only possible option, set up some pause time between file reads to reduce system load.
Use channels as #dave suggested when you managed to get new domains and need to process them concurrently.
Probably not the BEST solution. But, I decided to get rid of a separate function all together just cover more ground. I'll post below the code of what I expect. Now, I need to parse the domains so that Root URL's and Subdomains are only scanned once.
// Main
package main
import (
"log"
"fmt"
"time"
"net/http"
"github.com/gocolly/colly"
)
//var Domain string
var Target string
func main() {
c := colly.NewCollector()
c.OnError(func(r *colly.Response, err error) {
fmt.Println("Request URL:", r.Request.URL, "\n Failed with response:", r.StatusCode)
})
// Find and visit all links
c.OnHTML("a", func(e *colly.HTMLElement) {
e.Request.Visit(e.Attr("href"))
})
c.OnRequest(func(r *colly.Request) {
Domain := r.URL.String()
Target := BannerGrab(Domain)
fmt.Println(Domain)
fmt.Println(Target)
fmt.Println("Dropping By.. ", r.URL)
time.Sleep(1000 * time.Millisecond)
})
c.Visit("http://www.milliondollarhomepage.com/")
}
//CheckDB if not listed else add
//RiskDB
//Email
func BannerGrab(s string) string {
client := &http.Client{}
req, err := http.NewRequest("GET", s, nil)
if err != nil {
log.Fatalln(err)
}
req.Header.Set("User-Agent", "Authac/0.1")
resp, _ := client.Do(req)
serverEntry := resp.Header.Get("Server")
return serverEntry
}

How to mock second try of http call?

As part of my first project I am creating a tiny library to send an SMS to any user. I have added the logic of waiting and retrying if it doesn't receive a positive status on first go. It's a basic HTTP call to am SMS sending service. My algorithm looks like this (comments would explain the flow of the code):
for {
//send request
resp, err := HTTPClient.Do(req)
checkOK, checkSuccessUrl, checkErr := CheckSuccessStatus(resp, err)
//if successful don't continue
if !checkOK and checkErr != nil {
err = checkErr
return resp, SUCCESS, int8(RetryMax-remain+1), err
}
remain := remain - 1
if remain == 0 {
break
}
//calculate wait time
wait := Backoff(RetryWaitMin, RetryWaitMax, RetryMax-remain, resp)
//wait for time calculated in backoff above
time.Sleep(wait)
//check the status of last call, if unsuccessful then continue the loop
if checkSuccessUrl != "" {
req, err := GetNotificationStatusCheckRequest(checkSuccessUrl)
resp, err := HTTPClient.Do(req)
checkOK, _, checkErr = CheckSuccessStatusBeforeRetry(resp, err)
if !checkOK {
if checkErr != nil {
err = checkErr
}
return resp,SUCCESS, int8(RetryMax-remain), err
}
}
}
Now I want to test this logic using any HTTP mock framework available. The best I've got is https://github.com/jarcoal/httpmock
But this one does not provide functionality to mock the response of first and second URL separately. Hence I cannot test the success in second or third retry. I can either test success in first go or failure altogether.
Is there a package out there which suits my needs of testing this particular feature? If no, How can I achieve this using current tools?
This can easily be achieved using the test server that comes in the standard library's httptest package. With a slight modification to the example contained within it you can set up functions for each of the responses you want up front by doing this:
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"net/http/httptest"
)
func main() {
responseCounter := 0
responses := []func(w http.ResponseWriter, r *http.Request){
func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintln(w, "First response")
},
func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintln(w, "Second response")
},
}
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
responses[responseCounter](w, r)
responseCounter++
}))
defer ts.Close()
printBody(ts.URL)
printBody(ts.URL)
}
func printBody(url string) {
res, err := http.Get(url)
if err != nil {
log.Fatal(err)
}
resBody, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s", resBody)
}
Which outputs:
First response
Second response
Executable code here:
https://play.golang.org/p/YcPe5hOSxlZ
Not sure you still need an answer, but github.com/jarcoal/httpmock provides a way to do this using ResponderFromMultipleResponses.

Resources