Reading OkHttp ResponseBody data asynchronously - okhttp

I'm using OkHttp's Call#enqueue to issue a couple dozen HTTP requests in parallel. In my callback, I'm given an Response with a ResponseBody. Because I have several requests in flight, I'd like to read data from the ResponseBody's source() without blocking. Is there some way to do that?
For example, if suppose there are 324 bytes available in a given ResponseBody, is there some way to read those bytes and then wait asynchronously for more data to be available (potentially reading data from other ResponseBody objects for other in-flight requests in the interim)?

Nope! OkHttp doesn't yet offer anything that flexible. You might want to look at Parallel Universe, which has hooked up OkHttp to fibers.

Related

Send Concurrent HTTP Requests From Array In Springboot

I have an array of objects that i need to send to an endpoint. I am currently looping through the array and sending the requests one by one. The issue is that i now have over 35,000 requests to be made, and i need to update the database with the response.In my limited knowledge of springboot , i am not aware of any method i can use to send the 35,000 requests at once (without looping through one by one).
Is the best method to use still employing looping but utilize asynchronous calls, or is there a method that i can use to send the 35,000 http requests at once?..i just need a pointer because i am not aware how threads can be used, since this is already an array and each element needs to be sent.
Thank you
Well, first off 35,000 at a time of, well, anything, is a bad idea.
However, if you look in to the Java ExecutorService, this gives you the ability to fill a queue with tasks, and then each task will be performed by a thread taken from a thread pool. As the threads complete, the service pulls another request from the queue and handles that. So, you simply provide a Runnable that performs your web requests, create an Adequately Sized Thread Pool (which is basically sized through experimentation to give the best throughput), and then let the threads crunch away on the queue of tasks.
You will need a queue large enough to absorb all of your tasks, or you can look at something like the NotifyingBlockingThreadPoolExecutor. This will allow you to just gorge a queue and block when the queue gets to full, until all of your tasks are complete.
Addenda:
I don't know enough about Spring Boot to comment about whether a "batch job" would do what you want or not.
However, on that note, an alternative to creating 35,000 individual entries for the ExecutorService, you could, indeed, send a subset. For example 3,500 entries representing 10 items each, or 350 with 100 each. The idea there is to leverage any potential gains from reusing HTTP connections and what not, so there's less stand up and tear down for each request. Standing up 350 connections if far cheaper than standing up 35,000.

Batch HTTP Request Performance gain

I want to know the performance gain from doing a HTTP batch request. is it only reducing the number of round trips to one instead of n times where n is the number of HTTP requests? if it's like that I guess you can keep http connection opened and send your http messages through and once finish you can close it to get performance gain.
The performance gain of doing batch requests depends on what you are doing with them. However just as an agnostic approach here you go:
If you can manage a keep-alive connection, yes this means you don't have to do the initial handshake for the connection. That reduces some overhead and certainly saves time spent handling subsequent packets along this connection. Because of this you can "pipeline" requests and decrease overall load latency (all else not considered). However, requests in HTTP1.1 are still bound to be FIFO so you can have hangups. This is where batching is useful. Since even with a keep-alive connection you can have this hangup (HTTP/2 will allow asynchronous handling) you can still have some significant latency between requests.
This can be mitigated further by batching. If possible you lump all the data needed for subsequent requests into one and this way everything is processed together and sent back as one response. Sure it may take a bit longer to handle a single packet as opposed to the sequential method, but your throughput is increased per time because roundtrip latency for request->response is not multiplied. Thus you get an even better performance gain in terms of requests handling speeds.
Naturally this approach depends on what you're doing with the requests for it to be effective. Sometimes batching can put too much stress on a server if you have a lot of users doing this with a lot of data so to increase overall concurrent throughput across all users you sometimes need to take the technically slower sequential approach to balance things out. However, the best approach will be known by you upon some simple monitoring and analysis.
And as always, don't optimize prematurely :)
Consider this typical scenario: the client has the identifier of a resource which resides in a database behind an HTTP server, of which resource they want to get an object representation.
The general flow to execute that goes like this:
The client code constructs an HTTP client.
The client builds an URI and sets the proper HTTP request fields.
Client issues the HTTP request.
Client OS initiates a TCP connection, which the server accepts.
Client sends the request to the server.
Server OS or webserver parses the request.
Server middleware parses the request components into a request for the server application.
Server application gets initialized, the relevant module is loaded and passed the request components.
The module obtains an SQL connection.
Module builds an SQL query.
The SQL server finds the record and returns that to the module.
Module parses the SQL response into an object.
Module selects the proper serializer through content negotiation, JSON in this case.
The JSON serializer serializes the object into a JSON string.
The response containing the JSON string is returned by the module.
Middleware returns this response to the HTTP server.
Server sends the response to the client.
Client fires up their version of the JSON serializer.
Client deserializes the JSON into an object.
And there you have it, one object obtained from a webserver.
Now each of those steps along the way is heavily optimized, because a typical server and client execute them so many times. However, even if one of those steps only take a millisecond, when you for example have fifty resources to obtain, those milliseconds add up fast.
So yes, HTTP keep-alive cuts away the time the TCP connection takes to build up and warm up, but each and every other step will still have to be executed fifty times. Yes, there's SQL connection pooling, but every query to the database adds overhead.
So instead of going through this flow fifty separate times, if you have an endpoint that can accept fifty identifiers at once, for example through a comma-separated query string or even a POST with a body, and return their JSON representation at once, that will always be way faster than individual requests.

OKHttp / retrofit: thread-safety / immutability of Call data

Can I pass the requestBody() headers() or anything else I retrieve from a finished OkHttp Call<> around to other threads, or is it necessary to copy the relevant data first?
You can pass the RequestBody to another thread, but only one thread is allowed to read the body. If multiple threads attempt to read it, you’re going to have a bad time.
Request and response headers are immutable.

Understanding goroutines for web API

Just starting out with Go and hoping to create a simple Web API. I'm looking into using Gorilla mux (http://www.gorillatoolkit.org/pkg/mux) to handle web requests.
I'm not sure how to best use Go's concurrency options to handle the requests. Did I read somewhere that the main function is actually a goroutine or should I dispatch each request to a goroutine as they are received? Apologies if I'm "way off".
Assuming you're using the Go's http.ListenAndServe to serve your http requests, the documentation clearly states that each incoming connection is handled by a separate goroutine for you. http://golang.org/pkg/net/http/#Server.Serve
You would usually call ListenAndServe from your main function.
Gorilla mux is simply a package for more flexible routing of requests to your handlers than the http.DefaultServeMux. It doesn't actually handle the incoming connection or request just simply relays it to your handler.
I highly suggest you read a bit of the documentation, specifically this guide https://golang.org/doc/articles/wiki/#tmp_3 on writing web applications.
I'm providing an answer even though I voted to close for being too broad.
Anyway, none of that is really necessary. You're over thinking it. If you haven't read this it looks like a decent tutorial; http://thenewstack.io/make-a-restful-json-api-go/
You can really just set up routes like you would with most typical rest frameworks and let the webserver/framework worry about concurrency at the request handling level. You would only employ goroutines to generate the response of a request, say if you needed to aggregate data from 10 files that are all in a folder. Contrived example, but this is where you would spin off 1 goroutine per file, aggregate all the information by reading off a channel in a non-blocking select and then return the result. You can expect all points of entry to your code are called in an asynchronous, non-blocking fashion if that makes sense...

Limit size of response read by rest-client

I'm using the Ruby gem rest-client (1.6.7) to retrieve data using HTTP GET requests. However, sometimes the responses are bigger than I want to handle, so I would like some way to have the RestClient stop reading once it exceeds a size limit I set. The documentation says
For cases not covered by the general API, you can use the RestClient::Request class which provide a lower-level API.
but I do not see how that helps me. I do not see anything that looks like a hook into processing the incoming data stream, only operations I could perform after the whole thing is read. I don't want to waste time and memory reading a huge response into a buffer only to discard it.
How can I set a limit on the amount of data read by RestClient in a GET request? Or is there a different client I can use that makes it easy to set such a limit?
rest-client uses ruby's Net::HTTP underneath: https://github.com/rest-client/rest-client/blob/master/lib/restclient/request.rb#L303
Unfortunately, it doesn't seem like Net::HTTP will let you abandon response based on its length as it uses, after all, this method to issue all requests:
http://docs.ruby-lang.org/en/2.0.0/Net/HTTP.html#method-i-transport_request
As you can see, it uses HTTPResponse to read an HTTP response from server:
http://ruby-doc.org/stdlib-2.0.0/libdoc/net/http/rdoc/Net/HTTPResponse.html#method-i-read_body
HTTPResponse seems like the place where you could control whether to read all response and store it into memory, or read and throw away.
I you don't want even to read the response, I guess you'll need to close the socket.
I don't know whether there are rest-clients with functionality you need. I guess you'll need to write your own little rest-client if you want to have such a fine-grained control.

Resources