Limit size of response read by rest-client - ruby

I'm using the Ruby gem rest-client (1.6.7) to retrieve data using HTTP GET requests. However, sometimes the responses are bigger than I want to handle, so I would like some way to have the RestClient stop reading once it exceeds a size limit I set. The documentation says
For cases not covered by the general API, you can use the RestClient::Request class which provide a lower-level API.
but I do not see how that helps me. I do not see anything that looks like a hook into processing the incoming data stream, only operations I could perform after the whole thing is read. I don't want to waste time and memory reading a huge response into a buffer only to discard it.
How can I set a limit on the amount of data read by RestClient in a GET request? Or is there a different client I can use that makes it easy to set such a limit?

rest-client uses ruby's Net::HTTP underneath: https://github.com/rest-client/rest-client/blob/master/lib/restclient/request.rb#L303
Unfortunately, it doesn't seem like Net::HTTP will let you abandon response based on its length as it uses, after all, this method to issue all requests:
http://docs.ruby-lang.org/en/2.0.0/Net/HTTP.html#method-i-transport_request
As you can see, it uses HTTPResponse to read an HTTP response from server:
http://ruby-doc.org/stdlib-2.0.0/libdoc/net/http/rdoc/Net/HTTPResponse.html#method-i-read_body
HTTPResponse seems like the place where you could control whether to read all response and store it into memory, or read and throw away.
I you don't want even to read the response, I guess you'll need to close the socket.
I don't know whether there are rest-clients with functionality you need. I guess you'll need to write your own little rest-client if you want to have such a fine-grained control.

Related

aiohttp download only first n-bytes of body

We are using aiohttp to post data into elastic search server. Elastic on such insertions generates response for each inserted line, which results in massive unwanted traffic coming back to client application. We wanted to get around this problem using following code
response = await http_session.request("POST", url, data = data, params = params)
first_n_bytes = (await response.content.read(n_bytes)).decode("utf-8")
response.release()
# response.close()
First we tried release method, but from documentation and from bandwidth measurements it seems to also download the whole content. Then we tried response.close() but we are quite unsure whether this is safe thing to do while maintaining the same http_session for other requests.
The question is whether response.close() is safe and whether it would even solve our problem, or alternatively whether there is some other way of doing it asynchronously.
Yes, calling resp.close() is safe.
It closes opened connection to server without reading the response tail.
Obviously keep-alives are not supported with explicit connection closing, what's why resp.release() is recommended for default usage.
But in you case resp.close() should work pretty well.

http HEAD vs GET performance

I am setting-up a REST web service that just need to answer YES or NO, as fast as possible.
Designing a HEAD service seems the best way to do it but I would like to know if I will really gain some time versus doing a GET request.
I suppose I gain the body stream not to be open/closed on my server (about 1 millisecond?).
Since the amount of bytes to return is very low, do I gain any time in transport, in IP packet number?
Edit:
To explain further the context:
I have a set of REST services executing some processes, if they are in an active state.
I have another REST service indicating the state of all these first services.
Since that last service will be called very often by a very large set of clients (one call expected every 5ms), I was wondering if using a HEAD method can be a valuable optimization? About 250 chars are returned in the response body. HEAD method at least gain the transport of these 250 chars, but what is that impact?
I tried to benchmark the difference between the two methods (HEAD vs GET), running 1000 times the calls, but see no gain at all (< 1ms)...
A RESTful URI should represent a "resource" at the server. Resources are often stored as a record in a database or a file on the filesystem. Unless the resource is large or is slow to retrieve at the server, you might not see a measurable gain by using HEAD instead of GET. It could be that retrieving the meta data is not any faster than retrieving the entire resource.
You could implement both options and benchmark them to see which is faster, but rather than micro-optimize, I would focus on designing the ideal REST interface. A clean REST API is usually more valuable in the long run than a kludgey API that may or may not be faster. I'm not discouraging the use of HEAD, just suggesting that you only use it if it's the "right" design.
If the information you need really is meta data about a resource that can be represented nicely in the HTTP headers, or to check if the resource exists or not, HEAD might work nicely.
For example, suppose you want to check if resource 123 exists. A 200 means "yes" and a 404 means "no":
HEAD /resources/123 HTTP/1.1
[...]
HTTP/1.1 404 Not Found
[...]
However, if the "yes" or "no" you want from your REST service is a part of the resource itself, rather than meta data, you should use GET.
I found this reply when looking for the same question that requester asked. I also found this at http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html:
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.
It would seem to me that the correct answer to requester's question is that it depends on what is represented by the REST protocol. For example, in my particular case, my REST protocol is used to retrieve fairly large (as in more than 10K) images. If I have a large number of such resources being checked on a constant basis, and given that I make use of the request headers, then it would make sense to use HEAD request, per w3.org's recommendations.
GET fetches head + body, HEAD fetches head only. It should not be a matter of opinion which one is faster. I don't undestand the upvoted answers above. If you are looking for META information than go for HEAD, which is meant for this purpose.
I strongly discourage this kind of approach.
A RESTful service should respect the HTTP verbs semantics. The GET verb is meant to retrieve the content of the resource, while the HEAD verb will not return any content and may be used, for example, to see if a resource has changed, to know its size or its type, to check if it exists, and so on.
And remember : early optimization is the root of all evil.
HEAD requests are just like GET requests, except the body of the response is empty. This kind of request can be used when all you want is metadata about a file but don't need to transport all of the file's data.
Your performance will hardly change by using a HEAD request instead of a GET request.
Furthermore when you want it to be REST-ful and you want to GET data you should use a GET request instead of a HEAD request.
I don't understand your concern of the 'body stream being open/closed'. The response body will be over the same stream as the http response headers and will NOT be creating a second connection (which by the way is more in the range of 3-6ms).
This seems like a very pre-mature optimization attempt on something that just won't make a significant or even measurable difference. The real difference is the conformity with REST in general, which recommends using GET to get data..
My answer is NO, use GET if it makes sense, there's no performance gain using HEAD.
You could easily make a small test to measure the performance yourself. I think the performance difference would be negligable, because if you're only returning 'Y' or 'N' in the body, it's a single extra byte appended to an already open stream.
I'd also go with GET since it's more correct. You're not supposed to return content in HTTP headers, only metadata.

Ruby HTTP server without networking

I am trying to add an HTTP server to an existing Ruby application. The application is based around a select loop, and I want to handle incoming HTTP requests there too (it is important to process the requests in the same thread, or I have to jump through hoops to marshal them there).
Ruby has plenty of solutions for standalone HTTP servers, but I can't seem to find a library which implements an HTTP server on an existing socket. I don't want the HTTP library to open a port and wait, I want to feed it sockets.
The basic logic I'm looking for is this:
handler = SomeHTTPParsingLibrary.new
# set up handler callbacks, etc on handler...
while socket = get_incoming_connection()
handler.handle_request(socket)
end
Are there any existing Ruby libraries that can work like this? HTTP is a simple enough protocol, but there are enough irritating details involved (I need cookies, basic auth, etc) that I'd rather not roll my own.
You may have to roll your sleeves up a bit to figure out what methods to call, but I'd suggest trying the HTTPParser class from within mongrel.
A quick glance through the code in httprequest.rb (webrick - from ruby stdlib) seems like it might suit your purpose.
A WEBrick::HTTPRequest object is able to accept a socket as an argument to its parse() method. It will then block, and return when the request object has been fully populated with the incoming HTTP request.
eg:
res = HTTPResponse.new(#config)
req = HTTPRequest.new(#config)
# some code to "select" a socket goes here
# sock is active, hand it over to the req object for reading.
req.parse(sock)
res.request_method = req.request_method
Of course, this assumes that this thread will block will the current request handling is complete.
OTOH, something like tmm1/http_parser.rb might also fit your needs, but sacrifice other things (like handling cookies) in favor of speed.

Throttle Mechanize gem

Is there any built-in way to throttle Mechanize gem?
I'm searching something like a callback on making an HTTP request.
Later edit:
I would like to implement bandwith throttling, to avoid flooding parsed sites.
EG: Only allow one request per second.
It may be that pre_connect_hooks is what you are looking for. Sadly, I am unable to find any way to add one but adding directly a lambda/Proc to the array.
They are called here and this method is called here

Ruby: send HTTP GET request, receive JSON output - what is the fastest way?

My application will call Facebook API multiple times: the example.
What is the fastest / most reliable way to send HTTP GET requests and parse the returned output in JSON format?
Should I use Curl::Easy? If yes, how does it deal with JSON?
Use httparty
it includes crack for json. Use it like so:-
httparty "http://twitter.com/statuses/public_timeline.json"
What does "multiple times" mean? Twice? X times every n minutes? Thousands of times an hour?
Ruby has Typhoeus/Hydra, which handles huge numbers of concurrent requests. Processing the JSON is easy compared to handling multiple requests.
The Times example is a good starting point. Stick your JSON processing in the on_complete handler.
Check out crack. Dead simple and just works.

Resources