Why does HTTP request always take as long as the full timeout? - go

I am making a _golang git bruteforcer. It's acting a bit weird, I guess it's got something to do with concurrency.
sync.WaitGroup
Here's the code : https://dpaste.org/vO7y
package main
import { <snipped for brevity> }
// ReadFile : Reads File and returns it's contents
func ReadFile(fileName string) []string { <snipped for brevity> }
func joinString(strs ...string) string { <snipped for brevity> }
// MakeRequest : Makes requests concurrently
func MakeRequest(client *http.Client, url string, useragent string, ch chan<- string, wg *sync.WaitGroup) {
defer wg.Done()
// start := time.Now()
request, err := http.NewRequest("GET", url, nil)
if err != nil {
fmt.Println(err)
return
}
request.Header.Set("User-Agent", useragent)
response, err := client.Do(request)
if err != nil {
return
}
// secs := time.Since(start).Seconds()
if response.StatusCode < 400 {
// fmt.Printf("Time elapsed %f", secs)
bodyBytes, err := ioutil.ReadAll(response.Body)
if err != nil {
log.Fatal(err)
}
defer response.Body.Close()
bodyString := string(bodyBytes)
notGit, err := regexp.MatchString("<html>", strings.ToLower(bodyString))
if !notGit && len(bodyString) > 0 { // empty pages and html pages shouldn't be included
fmt.Println(bodyString)
ch <- fmt.Sprintf(" %s ", Green(url))
}
}
}
func main() {
start := time.Now()
useragent := "Mozilla/10.0 (Windows NT 10.0) AppleWebKit/538.36 (KHTML, like Gecko) Chrome/69.420 Safari/537.36"
gitEndpoint := []string{"/.git/", "/.git/HEAD", "/.gitignore", "/.git/description", "/.git/index"}
timeout := 10 * time.Second
var tr = &http.Transport{
MaxIdleConns: 30,
IdleConnTimeout: time.Second,
DisableKeepAlives: true,
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
DialContext: (&net.Dialer{
Timeout: timeout,
KeepAlive: time.Second,
}).DialContext,
}
re := func(req *http.Request, via []*http.Request) error {
return http.ErrUseLastResponse
}
client := &http.Client{
Transport: tr,
CheckRedirect: re,
Timeout: timeout,
}
output := ReadFile(os.Args[1])
// start := time.Now()
ch := make(chan string)
var wg sync.WaitGroup
for _, url := range output {
for _, endpoint := range gitEndpoint {
wg.Add(1)
go MakeRequest(client, "https://"+url+endpoint, useragent, ch, &wg)
}
}
go func() {
wg.Wait()
close(ch)
}()
f, err := os.OpenFile("git_finder.txt", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
for val := range ch {
if err != nil {
fmt.Println(Red(err))
}
_, err = fmt.Fprintln(f, val)
fmt.Println(val)
}
f.Close()
fmt.Printf("Total time taken %.2fs elapsed\n", time.Since(start).Seconds())
}
Working :
It reads the urls from a file and checks for /.git, /.git/HEAD, /.git/description, /.git/index on the webserver.
Problem :
If I change the http.Client timeout to 2 seconds it will finish in 2 seconds, if it's 50 seconds it will wait till 50 seconds, it doesn't matter if the input file contains 10 urls or 500 urls.
My Understanding is if there's more number of urls it will wait till the timeout of last URL that's passed with the goroutine.
Update 1 :
As adrian mentioned in the comments, it doesn't look like a concurrency problem, that's what one of the main issue with this is that I can't place a finger on what the exact problem is here

In your code, you are reading URLs from a file, then firing requests in parallel to all those URLs, then waiting for all the parallel requests to finish.
So this actually makes sense and would not indicate an issue:
If I change the http.Client timeout to 2 seconds it will finish in 2 seconds, if it's 50 seconds it will wait till 50 seconds, it doesn't matter if the input file contains 10 urls or 500 urls.
Let's say your file has 500 URLs.
You fire the 500 requests in parallel... then wait for all of them to finish (remember, they are all executing in parallel). How long would that take?
In the worst case (all of the requests timeout at 50 seconds), it will just take 50 seconds in total (since they are all waiting for those 50 seconds in parallel).
In the best case (all requests go through successfully with no timeouts) it should take a few seconds.
In the average case you are probably seeing (a few timeout at 50 seconds) then it takes 50 seconds (you will be waiting for those few requests to wait those 50 seconds in parallel as in the worst case).

Related

How to use channel properly for concurrent POST API call and log the data in File

I am trying to design a HTTP client in Go that will be capable ofcConcurrent API calls to the web services and write some data in a textfile.
func getTotalCalls() int {
reader := bufio.NewReader(os.Stdin)
...
return callInt
}
getTotalColls decide how many calls I want to make, input comes from terminal.
func writeToFile(s string, namePrefix string) {
fileStore := fmt.Sprintf("./data/%s_calls.log", namePrefix)
...
defer f.Close()
if _, err := f.WriteString(s); err != nil {
log.Println(err)
}
}
The writeToFile will write data to file synchronously from a buffered channel.
func makeRequest(url string, ch chan<- string, id int) {
var jsonStr = []byte(`{"from": "Saru", "message": "Saru to Discovery. Over!"}`)
req, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonStr))
req.Header.Set("Content-Type", "application/json")
client := &http.Client{}
start := time.Now()
resp, err := client.Do(req)
if err != nil {
panic(err)
}
secs := time.Since(start).Seconds()
defer resp.Body.Close()
body, _ := ioutil.ReadAll(resp.Body)
ch <- fmt.Sprintf("%d, %.2f, %d, %s, %s\n", id, secs, len(body), url, body)
}
This is the function which make the API call in a go Routine.
and Finally Here is the Main function, which send data from go routine to a bufferend channel and Later I range over the bufferend channel of string and write the data to file.
func main() {
urlPrefix := os.Getenv("STARCOMM_GO")
url := urlPrefix + "discovery"
totalCalls := getTotalCalls()
queue := make(chan string, totalCalls)
for i := 1; i <= totalCalls; i++ {
go makeRequest(url, queue, i)
}
for item := range queue {
fmt.Println(item)
writeToFile(item, fmt.Sprint(totalCalls))
}
}
The problem is at the end of the call the buffered somehow block and the program wait forever end of all the call. Does someone have a better way to design such use case? My final goal is to check for different number of concurrent post request how much time it takes for each calls for bench marking the API endpoint for 5, 10, 50, 100, 500, 1000 ... set of concurrent call.
Something has to close(queue). Otherwise range queue will block. If you want to range queue, you have to ensure that this channel is closed once the final client is done.
However... It's not even clear that you need to range queue though, since you know exactly how many results you'll get - it's totalCalls. You just need to loop this many times receiving from queue.
I believe your use case is similar to the Worker Pools example on gobyexample, so you may want to check that one out. Here's the code from that example:
// In this example we'll look at how to implement
// a _worker pool_ using goroutines and channels.
package main
import (
"fmt"
"time"
)
// Here's the worker, of which we'll run several
// concurrent instances. These workers will receive
// work on the `jobs` channel and send the corresponding
// results on `results`. We'll sleep a second per job to
// simulate an expensive task.
func worker(id int, jobs <-chan int, results chan<- int) {
for j := range jobs {
fmt.Println("worker", id, "started job", j)
time.Sleep(time.Second)
fmt.Println("worker", id, "finished job", j)
results <- j * 2
}
}
func main() {
// In order to use our pool of workers we need to send
// them work and collect their results. We make 2
// channels for this.
const numJobs = 5
jobs := make(chan int, numJobs)
results := make(chan int, numJobs)
// This starts up 3 workers, initially blocked
// because there are no jobs yet.
for w := 1; w <= 3; w++ {
go worker(w, jobs, results)
}
// Here we send 5 `jobs` and then `close` that
// channel to indicate that's all the work we have.
for j := 1; j <= numJobs; j++ {
jobs <- j
}
close(jobs)
// Finally we collect all the results of the work.
// This also ensures that the worker goroutines have
// finished. An alternative way to wait for multiple
// goroutines is to use a [WaitGroup](waitgroups).
for a := 1; a <= numJobs; a++ {
<-results
}
}
Your "worker" makes HTTP requests, otherwise it's pretty much the same pattern. Note the for loop at the end which reads from the channel a known number of times.
If you need to limit a number of simultaneous requests, you can use a semaphore implemented with a buffered channel.
func makeRequest(url string, id int) string {
var jsonStr = []byte(`{"from": "Saru", "message": "Saru to Discovery. Over!"}`)
req, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonStr))
req.Header.Set("Content-Type", "application/json")
client := &http.Client{}
start := time.Now()
resp, err := client.Do(req)
if err != nil {
panic(err)
}
secs := time.Since(start).Seconds()
defer resp.Body.Close()
body, _ := ioutil.ReadAll(resp.Body)
return fmt.Sprintf("%d, %.2f, %d, %s, %s\n", id, secs, len(body), url, body)
}
func main() {
urlPrefix := os.Getenv("STARCOMM_GO")
url := urlPrefix + "discovery"
totalCalls := getTotalCalls()
concurrencyLimit := 50 // 5, 10, 50, 100, 500, 1000.
// Declare semaphore as a buffered channel with capacity limited by concurrency level.
semaphore := make(chan struct{}, concurrencyLimit)
for i := 1; i <= totalCalls; i++ {
// Take a slot in semaphore before proceeding.
// Once all slots are taken this call will block until slot is freed.
semaphore <- struct{}{}
go func() {
// Release slot on job finish.
defer func() { <-semaphore }()
item := makeRequest(url, i)
fmt.Println(item)
// Beware that writeToFile will be called concurrently and may need some synchronization.
writeToFile(item, fmt.Sprint(totalCalls))
}()
}
// Wait for jobs to finish by filling semaphore to full capacity.
for i := 0; i < cap(semaphore); i++ {
semaphore <- struct{}{}
}
close(semaphore)
}

How to optimise processing large data

The objective of my backend service is to process 90 milllion data and at least 10 million of data in 1 day.
My system config:
Ram 2000 Mb
CPU 2core(s)
what I am doing right now is something like this:
var wg sync.WaitGroup
//length of evs is 4455
for i, ev := range evs {
wg.Add(1)
go migrate(&wg)
}
wg.Wait()
func migrate(wg *sync.WaitGroup) {
defer wg.Done()
//processing
time.Sleep(time.Second)
}
Without knowing more detail about the type of work you need to do, your approach seems good. Some things to think about:
Re-using variables and or clients in your processing loop. For example reusing an HTTP client instead of recreating one.
Depending on how your use case calls to handle failures. It might be efficient to use erroGroup. It's a convenience wrapper that stops all the threads on error possibly saving you a lot of time.
In the migrate function be sure to be aware of the caveats regarding closure and goroutines.
func main() {
g := new(errgroup.Group)
var urls = []string{
"http://www.someasdfasdfstupidname.com/",
"ftp://www.golang.org/",
"http://www.google.com/",
}
for _, url := range urls {
url := url // https://golang.org/doc/faq#closures_and_goroutines
g.Go(func() error {
resp, err := http.Get(url)
if err == nil {
resp.Body.Close()
}
return err
})
}
fmt.Println("waiting")
if err := g.Wait(); err == nil {
fmt.Println("Successfully fetched all URLs.")
} else {
fmt.Println(err)
}
}
I have got the solution. to achieve this much huge processing what I have done is
a limited number of goroutine to 50 and increased the number of cores from 2 to 5.

"Infinitely" high data transfer with Golang TCP connection on localhost

Problem
I have written a TCP echo server in Go and I am trying to write/read as often as I can in 10s to measure how much data got transfered in this time. Weirdly, the value is way too high and does not depend on the length of the bytearray which I am transfering (but it should!). It is always around 600k connections in this 10 seconds (The length of the "result" Array depicts how much connections were made in the 10s). As soon as I add let's say a print statement to the server and the values get processed, I get more realistic values that depend on the length of the bytearray as a result.
Why doesn't the length of the bytearray matter in the first case?
Code
Server
package main
import (
"fmt"
"log"
"net"
)
func main() {
tcpAddr, err := net.ResolveTCPAddr("tcp", fmt.Sprintf("127.0.0.1:8888"))
checkError(err)
ln, err := net.ListenTCP("tcp", tcpAddr)
checkError(err)
for {
conn, err := ln.Accept()
checkError(err)
go handleConnection(conn)
}
}
func checkError(err error) {
if err != nil {
log.Fatal(err)
}
}
func handleConnection(conn net.Conn) {
var input [1000000]byte
for {
n, err := conn.Read(input[0:])
checkError(err)
//fmt.Println(input[0:n])
_, err = conn.Write(input[0:n])
checkError(err)
}
}
Client
package main
import (
"fmt"
"log"
"net"
"time"
)
var (
result []int
elapsed time.Duration
)
func main() {
input := make([]byte, 1000)
tcpAddr, err := net.ResolveTCPAddr("tcp", "127.0.0.1:8888")
checkError(err)
conn, err := net.DialTCP("tcp", nil, tcpAddr)
checkError(err)
for start := time.Now(); time.Since(start) < time.Second*time.Duration(10); {
startTimer := time.Now()
_, err = conn.Write(input)
checkError(err)
_, err := conn.Read(input[0:])
checkError(err)
elapsed = time.Since(startTimer)
result = append(result, int(elapsed))
}
fmt.Println(fmt.Sprintf("result: %v", len(result)))
}
func checkError(err error) {
if err != nil {
log.Fatal(err)
}
}
Read in the client loop is not guaranteed to read all of the data sent in the previous call to Write.
When input is small enough to be transmitted in a single packet on the network, Read in the client returns all of the data in the previous call to Write in the client. In this mode, the application measures the time to execute request/response pairs.
For larger sizes of input, read on the client can fall behind what the client is writing. When this happens, the calls to Read complete faster because the calls return data from an earlier call to Write. The application is pipelining in this mode. The throughput for pipelining is higher than the throughput for request/response pairs. The client will not read all data in this mode, but the timing impact of that is not significant.
Use the following code to time request/response pairs for arbitrary sizes of input.
for start := time.Now(); time.Since(start) < time.Second*time.Duration(10); {
startTimer := time.Now()
_, err = conn.Write(input)
checkError(err)
_, err := io.ReadFull(conn, input) // <-- read all of the data
checkError(err)
elapsed = time.Since(startTimer)
result = append(result, int(elapsed))
}
To measure full-on pipelining, modify the client to read and write from different goroutines. An example follows.
go func() {
for start := time.Now(); time.Since(start) < time.Second*time.Duration(10); {
_, err = conn.Write(input)
checkError(err)
}
conn.CloseWrite() // tell server that we are done sending data
}()
start := time.Now()
output := make([]byte, 4096)
for {
_, err := conn.Read(output)
if err != nil {
if err == io.EOF {
break
}
checkError(err)
}
}
fmt.Println(time.Since(start))

How do I handle panic in goroutines?

I am quite new to golang.So, please spare me the sword ( if possible ).
I was trying to get data from the web by studying the tutorial here
Now, the tutorial goes all well, but I wanted to check for edge cases and error-handling ( just to be thorough with my new learning of the language, don't want to be the one with half-baked knowledge ).
Here's my go-playground code.
Before asking I looked at a lot of references like :
Go blog defer,panic and recover
handling panics in goroutines
how-should-i-write-goroutine
And a few more, however I couldn't figure it out much.
Here's the code in case you don't want to go to the playground ( for reasons yet unknown to man ) :
// MakeRequest : Makes requests concurrently
func MakeRequest(url string, ch chan<- string, wg *sync.WaitGroup) {
start := time.Now()
resp, err := http.Get(url)
defer func() {
resp.Body.Close()
wg.Done()
if r := recover(); r != nil {
fmt.Println("Recovered in f", r)
}
}()
if err != nil {
fmt.Println(err)
panic(err)
}
secs := time.Since(start).Seconds()
body, _ := ioutil.ReadAll(resp.Body)
ch <- fmt.Sprintf("%.2f elapsed with response length: %d %s", secs, len(body), url)
}
func main() {
var wg sync.WaitGroup
output := []string{
"https://www.facebook.com",
"",
}
start := time.Now()
ch := make(chan string)
for _, url := range output {
wg.Add(1)
go MakeRequest(url, ch, &wg)
}
for range output {
fmt.Println(<-ch)
}
fmt.Printf("%.2fs elapsed\n", time.Since(start).Seconds())
}
Update
I changed the code to ( let's say ) handle the error in goroutine like this ( go-playground here ):
func MakeRequest(url string, ch chan<- string, wg *sync.WaitGroup) {
start := time.Now()
resp, err := http.Get(url)
if err == nil {
secs := time.Since(start).Seconds()
body, _ := ioutil.ReadAll(resp.Body)
ch <- fmt.Sprintf("%.2f elapsed with response length: %d %s", secs, len(body), url)
// fmt.Println(err)
// panic(err)
}
defer wg.Done()
}
Update 2 :
After an answer I changed the code to this and it successfully removes the chan deadlock, however now I need to handle this in main :
func MakeRequest(url string, ch chan<- string, wg *sync.WaitGroup) {
defer wg.Done()
start := time.Now()
resp, err := http.Get(url)
if err == nil {
secs := time.Since(start).Seconds()
body, _ := ioutil.ReadAll(resp.Body)
ch <- fmt.Sprintf("%.2f elapsed with response length: %d %s", secs, len(body), url)
// fmt.Println(err)
// panic(err)
}
// defer resp.Body.Close()
ch <- fmt.Sprintf("")
}
Isn't there a more elegant way to handle this ?
But now I get locked up in deadlock.
Thanks and regards.
Temporarya
( a golang noobie )
You are using recover correctly. You have two problems:
You are using panic incorrectly. You should only panic when there was a programming error. Avoid using panics unless you believe taking down the program is a reasonable response to what happened. In this case, I would just return the error, not panic.
You are panicing during your panic. What is happening is that you are first panicing at panic(err). Then in your defer function, you are panicing at resp.Body.Close(). When http.Get returns an error, it returns a nil response. That means that resp.Body.Close() is acting on a nil value.
The idiomatic way to handle this would be something like the following:
func MakeRequest(url string, ch chan<- string, wg *sync.WaitGroup) {
defer wg.Done()
start := time.Now()
resp, err := http.Get(url)
if err != nil {
//handle error without panicing
}
// there was no error, so resp.Body is guaranteed to exist.
defer resp.Body.Close()
...
Response to update: Ifhttp.Get() returns an error, you never send on the channel. At some point all goroutines except the main goroutine stop running and the main goroutine is waiting on <-ch. Since that channel receive will never complete and there is nothing else for the Go runtime to schedule, it panics (unrecoverably).
Response to comment: To ensure the channel doesn't hang, you need some sort of coordination to know when messages will stop coming. How this is implemented depends on your real program, and an example cannot necessarily extrapolate to reality. For this example, I would simply close the channel when the WaitGroup is done.
Playground
func main() {
var wg sync.WaitGroup
output := []string{
"https://www.facebook.com",
"",
}
start := time.Now()
ch := make(chan string)
for _, url := range output {
wg.Add(1)
go MakeRequest(url, ch, &wg)
}
go func() {
wg.Wait()
close(ch)
}()
for val := range ch {
fmt.Println(val)
}
fmt.Printf("%.2fs elapsed\n", time.Since(start).Seconds())
}

Redigo: getting errors on apache load testing

I am connecting my go program to redis using library redigo. When i run one request I get correct results. But on load testing, using apache benchmark tool, It works when:
ab -n 1000 -k -c 10 -p post.txt -T application/x-www-form-urlencoded http://localhost:8084/abcd
However when request is:
ab -n 1000 -k -c 15 -p post.txt -T application/x-www-form-urlencoded http://localhost:8084/abcd
I am getting error:
panic: dial tcp :6379: too many open files
This is my code:
func newPool() *redis.Pool {
return &redis.Pool{
MaxIdle: 80, // max number of connections
Dial: func() (redis.Conn, error) {
c, err := redis.Dial("tcp", ":6379")
if err != nil {
panic(err.Error())
}
return c, err
},
// If Wait is true and the pool is at the MaxActive limit, then Get() waits
// for a connection to be returned to the pool before returning
Wait: true,
}
}
var pool = newPool()
func checkError(err error) {
if err != nil {
log.Fatal(err)
}
}
func func1(pool *redis.Pool, id int) string {
c := pool.Get()
defer c.Close()
m, err := redis.String(c.Do("HGET", "key", id))
checkError(err)
return m
}
func func2(pool *redis.Pool, string_ids []string) chan string {
c := make(chan string)
var ids []int
var temp int
for _, v := range string_ids {
temp, _ = strconv.Atoi(v)
ids = append(ids, temp)
}
go func() {
var wg sync.WaitGroup
wg.Add(len(ids))
for _, v := range ids {
go func(id int) {
defer wg.Done()
c <- func1(pool, id)
}(v)
}
wg.Wait()
close(c)
}()
return c
}
func getReq(w http.ResponseWriter, req *http.Request) {
err := req.ParseForm()
checkError(err)
ids := req.Form["ids[]"]
for v := range func2(pool, ids) {
fmt.Println(v)
}
}
func main() {
http.HandleFunc("/abcd", getReq)
log.Fatal(http.ListenAndServe(":8084", nil))
}
How to handle atleast 40 concurrent request using apache benchmark tool.
Note: I haven't changed anything in my redis conf file
I am getting following response when running apache benchmark tool. Only 15 request are completed.
$ ab -n 1000 -k -c 15 -p post.txt -T application/x-www-form-urlencoded http://localhost:8084/abcd
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
apr_socket_recv: Connection refused (111)
Total of 15 requests completed
To fix the issue immediately, set MaxActive on your redis.Pool, which you mention in the comments but don't set yourself.
Fundamentally though, you should not be dispatching goroutines for every id lookup. Your maximum possible concurrency would then be (number of client connections) x (number of ids in the request), each of which can open a new redis connection. It would be far faster and more efficient to have a single redis connection read each of the ids serially. There's no need for any of the extra concurrency you have here, do it all serially from the handler, and don't convert the strings to ints when redis only operates on string keys to begin with.
func getReq(w http.ResponseWriter, req *http.Request) {
err := req.ParseForm()
checkError(err)
c := pool.Get()
defer c.Close()
for _, id := range req.Form["ids[]"] {
m, err := redis.String(c.Do("HGET", "key", id))
checkError(err)
fmt.Println(id)
}
}
If you want to optimize this further, you can use pipelines to reduce the round trips to the redis server.
for _, id := range req.Form["ids[]"] {
c.Send("HGET", "key", id))
}
c.Flush()
values, err := redis.Strings(c.Receive())
checkError(err)
...

Resources