Error timeout get HTTP request golang - go

I tried to get html source from Reddit with Golang:
package main
import (
"fmt"
"io/ioutil"
"net/http"
"time"
)
func main() {
timeout := time.Duration(5 * time.Second)
client := http.Client{
Timeout: timeout,
}
resp, _ := client.Get("https://www.reddit.com/")
bytes, _ := ioutil.ReadAll(resp.Body)
fmt.Println("HTML:\n\n", string(bytes))
defer resp.Body.Close()
var input string
fmt.Scanln(&input)
}
First attemp was good. But at the second time it ran into an error:
<p>we're sorry, but you appear to be a bot and we've seen too many requests
from you lately. we enforce a hard speed limit on requests that appear to come
from bots to prevent abuse.</p>
<p>if you are not a bot but are spoofing one via your browser's user agent
string: please change your user agent string to avoid seeing this message
again.</p>
<p>please wait 6 second(s) and try again.</p>
<p>as a reminder to developers, we recommend that clients make no
more than <a href="http://github.com/reddit/reddit/wiki/API">one
request every two seconds</a> to avoid seeing this message.</p>
I tried to set delay but it still not work.
Sorry about my bad English.

Reddit doesn't want automatic scanner\grabbers on their site and has a bot-protection mechanism.
Here's a recommendation from them:
one request every two seconds
Just add a delay between requests.

timeout serves a different purpose. timeout is an upper limit for a routine to run. What you need is sleep between subsequent requests.
time.Sleep(6 * time.Second)

Related

Golang Handlefunc (Runtime output display on browser) [duplicate]

I am trying to send a page response as soon as request is received, then process something, but I found the response does not get sent out "first" even though it is first in code sequence.In real life I have a page for uploading a excel sheet which gets saved into the database which takes time (50,0000+ rows) and would like to update to user progress. Here is a simplified example; (depending how much RAM you have you may need to add a couple zeros to counter to see result)
package main
import (
"fmt"
"net/http"
)
func writeAndCount(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("Starting to count"))
for i := 0; i < 1000000; i++ {
if i%1000 == 0 {
fmt.Println(i)
}
}
w.Write([]byte("Finished counting"))
}
func main() {
http.HandleFunc("/", writeAndCount)
http.ListenAndServe(":8080", nil)
}
The original concept of the HTTP protocol is a simple request-response server-client computation model. There was no streaming or "continuous" client update support. It is (was) always the client who first contacted the server should it needed some kind of information.
Also since most web servers cache the response until it is fully ready (or a certain limit is reached–which is typically the buffer size), data you write (send) to the client won't be transmitted immediately.
Several techniques were "developed" to get around this "limitation" so that the server is able to notify the client about changes or progress, such as HTTP Long polling, HTTP Streaming, HTTP/2 Server Push or Websockets. You can read more about these in this answer: Is there a real server push over http?
So to achieve what you want, you have to step around the original "borders" of the HTTP protocol.
If you want to send data periodically, or stream data to the client, you have to tell this to the server. The easiest way is to check if the http.ResponseWriter handed to you implements the http.Flusher interface (using a type assertion), and if it does, calling its Flusher.Flush() method will send any buffered data to the client.
Using http.Flusher is only half of the solution. Since this is a non-standard usage of the HTTP protocol, usually client support is also needed to handle this properly.
First, you have to let the client know about the "streaming" nature of the response, by setting the ContentType=text/event-stream response header.
Next, to avoid clients caching the response, be sure to also set Cache-Control=no-cache.
And last, to let the client know that you might not send the response as a single unit (but rather as periodic updates or as a stream) and so that the client should keep the connection alive and wait for further data, set the Connection=keep-alive response header.
Once the response headers are set as the above, you may start your long work, and whenever you want to update the client about the progress, write some data and call Flusher.Flush().
Let's see a simple example that does everything "right":
func longHandler(w http.ResponseWriter, r *http.Request) {
flusher, ok := w.(http.Flusher)
if !ok {
http.Error(w, "Server does not support Flusher!",
http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
start := time.Now()
for rows, max := 0, 50*1000; rows < max; {
time.Sleep(time.Second) // Simulating work...
rows += 10 * 1000
fmt.Fprintf(w, "Rows done: %d (%d%%), elapsed: %v\n",
rows, rows*100/max, time.Since(start).Truncate(time.Millisecond))
flusher.Flush()
}
}
func main() {
http.HandleFunc("/long", longHandler)
panic(http.ListenAndServe("localhost:8080", nil))
}
Now if you open http://localhost:8080/long in your browser, you will see an output "growing" by every second:
Rows done: 10000 (20%), elapsed: 1s
Rows done: 20000 (40%), elapsed: 2s
Rows done: 30000 (60%), elapsed: 3s
Rows done: 40000 (80%), elapsed: 4.001s
Rows done: 50000 (100%), elapsed: 5.001s
Also note that when using SSE, you should "pack" updates into SSE frames, that is you should start them with "data:" prefix, and end each frame with 2 newline chars: "\n\n".
"Literature" and further reading / tutorials
Read more about Server-sent events on Wikipedia.
See a Golang HTML5 SSE example.
See Golang SSE server example with client codes using it.
See w3school.com's turorial on Server-Sent Events - One Way Messaging.
You can check if the ResponseWriter is a http.Flusher, and if so, force the flush to network:
if f, ok := w.(http.Flusher); ok {
f.Flush()
}
However, bear in mind that this is a very unconventional HTTP handler. Streaming out progress messages to the response as if it were a terminal presents a few problems, particularly if the client is a web browser.
You might want to consider something more fitting with the nature of HTTP, such as returning a 202 Accepted response immediately, with a unique identifier the client can use to check on the status of processing using subsequent calls to your API.

net/http server: too many open files error

I'm trying to develop a simple job queue server with some worker that query it but I encountered a problem with my net/http server. I'm surely doing something bad but after ~3 minutes my server start displaying :
http: Accept error: accept tcp [::]:4200: accept4: too many open files; retrying in 1s
For information it receive 10 request per second in my test case.
Here's two files to reproduce this error :
// server.go
package main
import (
"net/http"
)
func main() {
http.HandleFunc("/get", func(rw http.ResponseWriter, r *http.Request) {
http.Error(rw, "Try again", http.StatusInternalServerError)
})
http.ListenAndServe(":4200", nil)
}
// worker.go
package main
import (
"net/http"
"time"
)
func main() {
for {
res, _ := http.Get("http://localhost:4200/get")
defer res.Body.Close()
if res.StatusCode == http.StatusInternalServerError {
time.Sleep(100 * time.Millisecond)
continue
}
return
}
}
I already done some search about this error and I found some interesting response but none of these fixed my issue.
The first response I saw was to correctly close the Body in the http.Get response, as you can see I did it.
The second response was to change the file descriptor ulimit of my system but as I will not control where my app will run I prefer to not use this solution (But for information it's set at 1024 on my system)
Can someone explain me why this problem happen and how I can fix it in my code ?
Thanks a lot for your time
EDIT :
EDIT 2 : In comment Martin says that I'm not closing the Body, I tried to close it (without defer so) and it fixed the issue. Thanks Martin ! I was thinking that continue will execute my defer, I was wrong.
I found a post explaining the root problem in a lot more detail.
Nathan Smith even explains how to control timeouts on the TCP level, if needed.
Below is a summary of everything I could find on this particular problem, as well as the best practices to avoid this problem in future.
Problem
When a response is received regardless of whether response-body is required or not, the connection is kept alive until the response-body stream is closed. So, as mentioned in this thread, always close the response-body. Even if you do not need to use/read the body content:
func Ping(url string) (bool) {
// simple GET request on given URL
res, err := http.Get(url)
if err != nil {
// if unable to GET given URL, then ping must fail
return false
}
// always close the response-body, even if content is not required
defer res.Body.Close()
// is the page status okay?
return res.StatusCode == http.StatusOK
}
Best Practice
As mentioned by Nathan Smith never use the http.DefaultClient in production systems, this includes calls like http.Get as it uses http.DefaultClient at its base.
Another reason to avoid http.DefaultClient is that it is a Singleton (package level variable), meaning that the garbage collector will not try to clean it up, which will leave idling subsequent streams/sockets alive.
Instead create your own instance of http.Client and remember to always specify a sane Timeout:
func Ping(url string) (bool) {
// create a new instance of http client struct, with a timeout of 2sec
client := http.Client{ Timeout: time.Second * 2 }
// simple GET request on given URL
res, err := client.Get(url)
if err != nil {
// if unable to GET given URL, then ping must fail
return false
}
// always close the response-body, even if content is not required
defer res.Body.Close()
// is the page status okay?
return res.StatusCode == http.StatusOK
}
Safety Net
The safety net is for that newbie on the team, who does not know the shortfalls of http.DefaultClient usage. Or even that very useful, but not so active, open-source library that is still riddled with http.DefaultClient calls.
Since http.DefaultClient is a Singleton we can easily change the Timeout setting, just to ensure that legacy code does not cause idle connections to remain open.
I find it best to set this on the package main file in the init function:
package main
import (
"net/http"
"time"
)
func init() {
/*
Safety net for 'too many open files' issue on legacy code.
Set a sane timeout duration for the http.DefaultClient, to ensure idle connections are terminated.
Reference: https://stackoverflow.com/questions/37454236/net-http-server-too-many-open-files-error
*/
http.DefaultClient.Timeout = time.Minute * 10
}
As Martin say in comment I don't really closed the Body after the Get request. I used defer res.Body.Close() but it's not executed since I'm staying in the for loop. So continue dont't trigger defer
Please note that in some cases the setting in /etc/sysctl.conf
net.ipv4.tcp_tw_recycle = 1
Could cause this error because TCP connections remain open.
A temporary solution, just increase the number of open files:
ulimit -Sn 10000

Go amqp method to list all currently declared queues?

I'm using streadway/amqp to do a tie in from rabbitmq to our alert system. I need a method that can return a list of all the currently declared queues (exchanges would be nice too!) so that I can go through and get all the message counts.
I'm digging through the api documentation here...
http://godoc.org/github.com/streadway/amqp#Queue
...but I don't seem to be finding what I'm looking for. We're currently using a bash call to 'rabbitmqctl list_queues' but that's a kludge way to get this information, requires a custom sudo setting, and fires off hundreds of log entries a day to the secure log.
edit: method meaning, 'a way to get this piece of information' as opposed to an actual call, though a call would be great I don't believe it exists.
Answered my own question. There isn't a way! The amqp spec doesn't have a standard way of finding this out which seems like a glaring oversight to me. However, since my backend is rabbitmq with the management plugin, I can make a call to that to get this information.
from https://stackoverflow.com/a/21286370/5076297 (in python, I'll just have to translate this and probably also figure out the call to get vhosts):
import requests
def rest_queue_list(user='guest', password='guest', host='localhost', port=15672, virtual_host=None):
url = 'http://%s:%s/api/queues/%s' % (host, port, virtual_host or '')
response = requests.get(url, auth=(user, password))
queues = [q['name'] for q in response.json()]
return queues
edit: In golang (this was a headache to figure out as I haven't done anything with structures in years)
package main
import (
"fmt"
"net/http"
"encoding/json"
)
func main() {
type Queue struct {
Name string `json:name`
VHost string `json:vhost`
}
manager := "http://127.0.0.1:15672/api/queues/"
client := &http.Client{}
req, _ := http.NewRequest("GET", manager, nil)
req.SetBasicAuth("guest", "guest")
resp, _ := client.Do(req)
value := make([]Queue, 0)
json.NewDecoder(resp.Body).Decode(&value)
fmt.Println(value)
}
Output looks like this (I have two queues named hello and test)
[{hello /} {test /}]

In Golang, Is http.HandleFunc block?

i'm write a httpserver in Golang , but i find the http.HandleFunc will be block when multi request from the web browser. how can i do make the server handle multi request in the same time ? thanks.
my code is:
func DoQuery(w http.ResponseWriter, r *http.Request) {
r.ParseForm()
fmt.Printf("%d path %s\n", time.Now().Unix(), r.URL.Path)
time.Sleep(10 * time.Second)
fmt.Fprintf(w, "hello...")
//why this function block when multi request ?
}
func main() {
fmt.Printf("server start working...\n")
http.HandleFunc("/query", DoQuery)
s := &http.Server{
Addr: ":9090",
ReadTimeout: 30 * time.Second,
WriteTimeout: 30 * time.Second,
//MaxHeaderBytes: 1 << 20,
}
log.Fatal(s.ListenAndServe())
fmt.Printf("server stop...")
}
I ran your code and everything worked as expected. I did two requests at the same time (curl localhost:9090/query) and they both finished 10 seconds later, together. Maybe the problem is elsewhere? Here's the command I used: time curl -s localhost:9090/query | echo $(curl -s localhost:9090/query) – tjameson
thakns
that's strange.
when i request same url from chrome ,send two request not handle in the same time, but use cur test can handle in the same time.
but when i send two request use different url, it's can be handle in the same time.
[root#localhost httpserver]# ./httpServer
server start working...
1374301593 path /query?form=chrome
1374301612 path /query?from=cur2
1374301614 path /query?from=cur1
1374301618 path /query?form=chrome
1374301640 path /query?form=chrome2
1374301643 path /query?form=chrome1
*1374301715 path /query?form=chrome
1374301725 path /query?form=chrome*
**1374301761 path /query?form=chrome1
1374301763 path /query?form=chrome2**
Yes, the standard HTTP server will start a new goroutine for each request. You should be able to do thousands of requests in parallel depending on the operating system settings.
Your browser might be limiting how many requests it will send to one server; be sure you are testing with a client that doesn't have that limitation/"optimization".
Reliably Go docs explaining Http Server creates a new gorotine for each request: http://golang.org/pkg/net/http/#Server.Serve

Why is my webserver in golang not handling concurrent requests?

This simple HTTP server contains a call to time.Sleep() that makes
each request take five seconds. When I try quickly loading multiple
tabs in a browser, it is obvious that each request
is queued and handled sequentially. How can I make it handle concurrent requests?
package main
import (
"fmt"
"net/http"
"time"
)
func serve(w http.ResponseWriter, r *http.Request) {
fmt.Fprintln(w, "Hello, world.")
time.Sleep(5 * time.Second)
}
func main() {
http.HandleFunc("/", serve)
http.ListenAndServe(":1234", nil)
}
Actually, I just found the answer to this after writing the question, and it is very subtle. I am posting it anyway, because I couldn't find the answer on Google. Can you see what I am doing wrong?
Your program already handles the requests concurrently. You can test it with ab, a benchmark tool which is shipped with Apache 2:
ab -c 500 -n 500 http://localhost:1234/
On my system, the benchmark takes a total of 5043ms to serve all 500 concurrent requests. It's just your browser which limits the number of connections per website.
Benchmarking Go programs isn't that easy by the way, because you need to make sure that your benchmark tool isn't the bottleneck and that it is also able to handle that many concurrent connections. Therefore, it's a good idea to use a couple of dedicated computers to generate load.
From Server.go , the go routine is spawned in the Serve function when a connection is accepted. Below is the snippet, :-
// Serve accepts incoming connections on the Listener l, creating a
// new service goroutine for each. The service goroutines read requests and
// then call srv.Handler to reply to them.
func (srv *Server) Serve(l net.Listener) error {
for {
rw, e := l.Accept()
if e != nil {
......
c, err := srv.newConn(rw)
if err != nil {
continue
}
c.setState(c.rwc, StateNew) // before Serve can return
go c.serve()
}
}
If you use xhr request, make sure that xhr instance is a local variable.
For example, xhr = new XMLHttpRequest() is a global variable. When you do parallel request with the same xhr variable you receive only one result. So, you must declare xhr locally like this var xhr = new XMLHttpRequest().

Resources