I have a very strange problem with a simple HTTP Get Request in Golang.
Every request in Golang to https://www.alltron.ch/json/searchSuggestion?searchTerm=notebook needs about 6-8 seconds (!)
If same request fired in Chrome, with Postman or with Powershell it needs less than a second.
Does somebody has a clue why this happens?
My Code:
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
client := &http.Client{}
req, _ := http.NewRequest("GET", "https://www.alltron.ch/json/searchSuggestion?searchTerm=notebook", nil)
response, err := client.Do(req)
if err != nil && response == nil {
log.Fatalf("Error on request. %v", err)
}
defer response.Body.Close()
body, err := ioutil.ReadAll(response.Body)
if err != nil {
log.Fatalf("Couldn't get response body. %v", err)
}
fmt.Print(string(body))
}
The site you are trying to access is behind the Akamai CDN:
$ dig www.alltron.ch
...
www.alltron.ch. 152 IN CNAME competec.botmanager.edgekey.net.
competec.botmanager.edgekey.net. 7052 IN CNAME e9179.f.akamaiedge.net.
e9179.f.akamaiedge.net. 162 IN A 2.20.176.40
Akamai offers its customers a detection of web clients which are not browsers so that the customers can keep bots away or slowing bots down.
As can be seen from Strange CURL issue with a particular website SSL certificate and Scraping attempts getting 403 error this kind of detection mainly cares about having a Accept-Language header, having a Connection header with the value Keep-Alive and having a User-Agent which matches Mozilla/....
This means the following code changes result in an immediate response:
req, _ := http.NewRequest("GET", "https://www.alltron.ch/json/searchSuggestion?searchTerm=notebook", nil)
req.Header.Set("Connection","Keep-Alive")
req.Header.Set("Accept-Language","en-US")
req.Header.Set("User-Agent","Mozilla/5.0")
Still, the site obviously does not like bots and you should adhere to these wishes and not stress the site too much (like doing lots of information scraping). And, the bot detection done by Akamai might change without notice, i.e. even if this code fixes the problem now it might no longer work in the future. Such changes will be especially true if many clients bypass the bot detection.
try to disable cache in your chrome and compare to golang
Related
I just created a lambda and have given it the default VPC, Security Group, and Subnets. Gave it a role which has AWSLambdaVPCAccessExecutionRole. Verified outbound rules show 0.0.0.0/0 for all ports and protocols. Verified that lambda.amazonaws.com is a trusted entity on the policy.
Gave it code that is this (which works locally):
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
res, err := http.Get("http://www.google.com/robots.txt")
if err != nil {
log.Fatal(err)
}
robots, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s", robots)
}
And I use the "test" function in lambda, and all I get for my troubles is:
Get "http://www.google.com/robots.txt": dial tcp 172.217.1.196:80: i/o timeout
I've tried looking through every AWS document on the subject, and it doesn't appear that I'm missing something, but maybe somebody else would know?
I wasn't originally trying to access the google robots.txt file, but after getting the same result no matter what I tried, I figured I would pull an example straight from the documentation to rule out anything I'm doing. At least I know my code is executing, otherwise we wouldn't get this far. Any ideas at all for what to try?
The one thing I overlooked in here is the subnets need a public NAT gateway… not an internet gateway. Once we added those, it was able to reach outside the AWS walls and downloaded the needed files.
Hi I developed a little go server that does (at the moment) nothing but forwarding the request to a local service on the machine it is running.
So nearly the same as nginx as reverse proxy.
But I observed a really bad performance that even uses up all resources of the server and runs into timeouts on further requests.
I know that this cannot be as performant as nginx, but I don't think that it should be that slow.
Here is the server I use for forwarding the request:
package main
import (
"github.com/gorilla/mux"
"net/http"
"github.com/sirupsen/logrus"
"bytes"
"io/ioutil"
)
func main() {
router := mux.NewRouter()
router.HandleFunc("/", forwarder).Methods("POST")
server := http.Server{
Handler: router,
Addr: ":8443",
}
logrus.Fatal(server.ListenAndServeTLS("cert.pem", "key.pem"))
}
var client = &http.Client{}
func forwarder(w http.ResponseWriter, r *http.Request) {
// read request
body, err := ioutil.ReadAll(r.Body)
if err != nil {
logrus.Error(err.Error())
ServerError(w, nil)
return
}
// create forwarding request
req, err := http.NewRequest("POST", "http://localhost:8000", bytes.NewReader(body))
if err != nil {
logrus.Error(err.Error())
ServerError(w, nil)
return
}
resp, err := client.Do(req)
if err != nil {
logrus.Error(err.Error())
ServerError(w, nil)
return
}
// read response
respBody, err := ioutil.ReadAll(resp.Body)
if err != nil {
logrus.Error(err.Error())
ServerError(w, nil)
return
}
resp.Body.Close()
// return response
w.Header().Set("Content-Type", "application/json; charset=utf-8")
w.WriteHeader(resp.StatusCode)
w.Write(respBody)
}
From the client side I just measure the roundtrip time. And when I fire 100 Requests per second the response time goes up quite fast.
It starts with a response time of about 50ms. After 10 Seconds the response time is at 500ms. After 10 more seconds the response time is at 8000ms and so on, until I get timeouts.
When I use the nginx instead of my server there is no problem running 100 requests per second. Using nginx it stays at 40ms per each request.
Some observation:
using nginx: lsof -i | grep nginx
has no more than 2 connections open.
using my server the number of connection increases up to 500 and then the connections with state SYN_SENT increases and then the requets run into timeouts.
Another finding: I measured the delay of this code line:
resp, err := client.Do(req)
There is where most of the time is spent, but the could also just be because the go routines are starving!?
What I also tried:
r.Close = true (or KeepAlive = false)
I modified timeouts on the server side
I modified all this stuff on the http client used by my forward server (keepalive false, request.Close = true) etc.
I don't know why I got such a bad performance.
My guess is that go runs into problems because of the huge number of go routines. Maybe most of the time is used up scheduling this go routines and so the latency goes up?
I also tried to use the included httputil.NewSingleHostReverseProxy(). Performance is a little bit better, but still the same problem.
UPDATE:
Now I tried fasthttp:
package main
import (
"github.com/sirupsen/logrus"
"github.com/valyala/fasthttp"
)
func StartNodeManager() {
fasthttp.ListenAndServeTLS(":8443", "cert.pem", "key.pem", forwarder)
}
var client = fasthttp.Client{}
func forwarder(ctx *fasthttp.RequestCtx) {
resp := fasthttp.AcquireResponse()
req := fasthttp.AcquireRequest()
req.Header.SetMethod("POST")
req.SetRequestURI("http://127.0.0.1:8000")
req.SetBody(ctx.Request.Body())
err := client.Do(req, resp)
if err != nil {
logrus.Error(err.Error())
ctx.Response.SetStatusCode(500)
return
}
ctx.Response.SetBody(resp.Body())
fasthttp.ReleaseRequest(req)
fasthttp.ReleaseResponse(resp)
}
Little bit better but after 30 seconds the first timeouts arrive and the response time goes up to 5 seconds.
The root cause of the problem is GO http module is not handling connections to upstream in
a manged way, time is increasing because lots of connections are getting opened and they go into time_wait state.
So with number of increasing connections, you will get decrease in performance.
You just have to set
// 1000 what I am using
http.DefaultTransport.(*http.Transport).MaxIdleConns = 1000
http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = 1000
in your forwarder and this will solve your problem.
By the way, use go std library reverse proxy, this will take away lot of headache.
But still for reverse proxy you need to set MaxIdleConns and MaxIdleConnsPerHost , in it's transport.
Follow the article given below.
First of all you should profile your app and find out where is the bottleneck.
Second I would be looking to way write code with less memory allocation in heap and more on stack.
Few ideas:
Do you need read request body for all request?
Do you need always read response body?
Can you pass body of client request to request to server? func NewRequest(method, url string, body io.Reader) (*Request, error)
Use sync.Pool
Consider using fasthttp as it creates less pressure to garbage collector
Check if your server uses same optimisation as Nginx. E.g. Keep-Alive, caching, etc.
Again profile and compare against Nginx.
Seems there is a lot of space for optimization.
......
resp, err := httplib.Get(url)
if err != nil {
fmt.Println(err)
}
defer resp.Body.Close()
......
Is it necessary to close the response body every time?
Quoting from the official documentation of the http package:
The client must close the response body when finished with it
Contrary to the top-voted answer: yes, it is necessary to close resp.Body whether you consume it or not.
This is a good question, and the docs are very misleading here. In this thread of the official Go forums, the diagnosis and conclusion -- which I have experienced for myself -- is:
I was leading to leaking open files on the server, so I confirm, you MUST close the body, even if you don't read it
What happens if we do not close the response body ?
It is a resource leak. It can remain open and client connection will not be
reused.
It is recommended to close immediately after checking the error.
client := http.DefaultClient
resp, err := client.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
From the http.Client documentation:
If the returned error is nil, the Response will contain a non-nil Body
which the user is expected to close. If the Body is not both read to
EOF and closed, the Client's underlying RoundTripper (typically
Transport) may not be able to re-use a persistent TCP connection to
the server for a subsequent "keep-alive" request.
The request Body, if non-nil, will be closed by the underlying
Transport, even on errors.
What is RoundTripper ?
It is a Transport specifies the mechanism by which individual
HTTP requests are made.If nil, DefaultTransport is used.
I have a function that makes a call to an external API using a Go http.Client, parses the result, and uses the result in the template executed afterwards. Occasionally, the external API will respond slowly (~20s), and the template execution will fail citing "i/o timeout", or more specifically,
template: :1:0: executing "page.html" at <"\n\t\t\t\t\t\t\t\t\...>: write tcp 127.0.0.1:35107: i/o timeout
This always coincides with a slow API response, but there is always a valid response in the JSON object, so the http.Client is receiving a proper response. I am just wondering if anyone could point me towards what could be causing the i/o timeout in the ExecuteTemplate call.
I have tried ResponseHeaderTimeout and DisableKeepAlives in the client transport (both with and without those options) to no avail. I've also tried setting the request's auto-close value to true to no avail. A stripped-down version of the template generation code is below:
func viewPage(w http.ResponseWriter, r *http.Request) {
tmpl := pageTemplate{}
duration, _ := time.ParseDuration("120s")
tr := &http.Transport{
ResponseHeaderTimeout: duration,
DisableKeepAlives: true,
}
client := &http.Client{Transport: tr}
req, _ := http.NewRequest("GET", "http://example.com/some_function", nil)
req.Close = true
resp, _ := client.Do(req)
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
var res api_response // some struct that matches the JSON response
err = json.Unmarshal(body, &res)
t, _ := template.New("page.html")
err = t.ExecuteTemplate(w, "page.html", tmpl)
}
The timeout on this line:
err = t.ExecuteTemplate(w, "page.html", tmpl)
means that the outgoing response is timing out when being written into, so nothing you change in the locally created client should affect it. It also does make sense that a slow response from that client increases the chance of the timeout on w, since the deadline is set when the response is created, before your handler is called, so a slow activity from your handler will increase the chances of a timeout.
There's no write timeout on the http.Server instance used by http.ListenAndServe, so you must be setting the Server.WriteTimeout field explicitly on the created server.
As a side note, there are errors being ignored in that handler, which is a strongly discouraged practice.
Right now I'm fetching urls from indiegogo as part of a side project using the basic get request template found [here][1]. I then translate the byte data into a string using
responseText, err:= ioutil.ReadAll(response.Body)
trueText:= string(responseText)
with appropriate error handling where needed
It works fine for repeated attempts at getting and some other urls of varying length(at least as large as the previous url and some longer than the next).
Strangely, when I attempt to get it breaks and throws a runtime error of
panic: runtime error: index out of range
and exits with a status of 2. I'm curious as to what the issue could be.
I know it isn't indiegogo getting angry about my once a minute requests and cutting my connection because I can request continiously for 20 minutes at with no issue. Give it a bit of downtime and it still completely breaks on
Thanks for the assistance
EDIT, it appears as though it was a malformed bit of html in some of the pages that messed with a loop I was running based on the content that managed to break go in the runtime on only some urls. Thanks for the help
[1]:
There is no error when getting from the url and converting the body to the Go string type. For example,
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
url := "http://www.indiegogo.com/projects/culcharge-smallest-usb-charge-and-data-cable-for-iphone-and-android"
res, err := http.Get(url)
if err != nil {
log.Fatal(err)
}
body, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
text := string(body)
fmt.Println(len(body), len(text))
}
Output:
66363 66363
You didn't provide us with a small fragment of code which compiles, runs, and fails in the manner you describe. That leaves us all guessing.