Throttle HTTP Request based on Available Memory - performance

I have a REST API that is expected to receive a large payload as request body. The API calls a blocking method that takes 2 seconds to process each request and then returns 200 OK. I wish to introduce throttling based on available memory such that the API returns 429 Too Many Request when the available memory falls below a threshold.
When the threshold condition is met, I wish to reject subsequent requests right away, even before loading the large request payloads in my application memory. This will also give me some protection against denial of service attacks.
In a Java EE, Tomcat environment, if I use a Filter to check available memory, I understand the complete request is already loaded in memory. Is it then better to add the check in ServletRequestListener.requestInitialized method so that I can reject the request even before the app receives it?
P.S. I use the below formula to calculate available memory based on this SO post:
long presumableFreeMemory =
- Runtime.getRuntime().totalMemory()
+ Runtime.getRuntime().freeMemory();


What's the body request size limit for a PUT request?

I've created a PUT endpoint on a tomcat server using Spring boot. For safety reasons, I don't want the body size to exceed a certain amount, so I was checking whether there is a limit as there is for POST requests (which I know stays at 2MB)
At the same time I don't find any documentation for the PUT requests. Is there a limit for those too?

Batch HTTP Request Performance gain

I want to know the performance gain from doing a HTTP batch request. is it only reducing the number of round trips to one instead of n times where n is the number of HTTP requests? if it's like that I guess you can keep http connection opened and send your http messages through and once finish you can close it to get performance gain.
The performance gain of doing batch requests depends on what you are doing with them. However just as an agnostic approach here you go:
If you can manage a keep-alive connection, yes this means you don't have to do the initial handshake for the connection. That reduces some overhead and certainly saves time spent handling subsequent packets along this connection. Because of this you can "pipeline" requests and decrease overall load latency (all else not considered). However, requests in HTTP1.1 are still bound to be FIFO so you can have hangups. This is where batching is useful. Since even with a keep-alive connection you can have this hangup (HTTP/2 will allow asynchronous handling) you can still have some significant latency between requests.
This can be mitigated further by batching. If possible you lump all the data needed for subsequent requests into one and this way everything is processed together and sent back as one response. Sure it may take a bit longer to handle a single packet as opposed to the sequential method, but your throughput is increased per time because roundtrip latency for request->response is not multiplied. Thus you get an even better performance gain in terms of requests handling speeds.
Naturally this approach depends on what you're doing with the requests for it to be effective. Sometimes batching can put too much stress on a server if you have a lot of users doing this with a lot of data so to increase overall concurrent throughput across all users you sometimes need to take the technically slower sequential approach to balance things out. However, the best approach will be known by you upon some simple monitoring and analysis.
And as always, don't optimize prematurely :)
Consider this typical scenario: the client has the identifier of a resource which resides in a database behind an HTTP server, of which resource they want to get an object representation.
The general flow to execute that goes like this:
The client code constructs an HTTP client.
The client builds an URI and sets the proper HTTP request fields.
Client issues the HTTP request.
Client OS initiates a TCP connection, which the server accepts.
Client sends the request to the server.
Server OS or webserver parses the request.
Server middleware parses the request components into a request for the server application.
Server application gets initialized, the relevant module is loaded and passed the request components.
The module obtains an SQL connection.
Module builds an SQL query.
The SQL server finds the record and returns that to the module.
Module parses the SQL response into an object.
Module selects the proper serializer through content negotiation, JSON in this case.
The JSON serializer serializes the object into a JSON string.
The response containing the JSON string is returned by the module.
Middleware returns this response to the HTTP server.
Server sends the response to the client.
Client fires up their version of the JSON serializer.
Client deserializes the JSON into an object.
And there you have it, one object obtained from a webserver.
Now each of those steps along the way is heavily optimized, because a typical server and client execute them so many times. However, even if one of those steps only take a millisecond, when you for example have fifty resources to obtain, those milliseconds add up fast.
So yes, HTTP keep-alive cuts away the time the TCP connection takes to build up and warm up, but each and every other step will still have to be executed fifty times. Yes, there's SQL connection pooling, but every query to the database adds overhead.
So instead of going through this flow fifty separate times, if you have an endpoint that can accept fifty identifiers at once, for example through a comma-separated query string or even a POST with a body, and return their JSON representation at once, that will always be way faster than individual requests.

Would you violate the idempotency principle of REST for the sake of performance?

I'm working on a low latency app for telecommunications industry where the main workflow triggered a computation as follows:
Call a REST API (POST /workflow +payload)
REST web app will perform highly parallelized processing in an fast access cache store
the rest call will return a response (maybe a JSON object of 4 or 5 fields)
Now, my initial idea adhering to REST design principles, is to do 2 REST API calls, one that POSTS to trigger the processing, then returns a 201 with the location of the processing result in the header (because my understanding is REST calls can either change or return a resource, but not both), then automatically redirects to the GET call.
Now remember I'm trying to reduce latency as much as possible, and HTTP redirects obviously increase that. Is it OK if I make my POST return the payload instead of redirecting to a GET? What are the implications?
First, a POST is not an idempotent method to begin with, so it not really possible for a POST not to "violate idempotency."
Second, there is no reason a POST may not return a representation of the newly created resource. In fact, according to RFC7231 (one of the new replacements for RFC2616) it may even be cached for subsequent GETS. See section 4.3.3:
For cases where an origin server wishes the client to be able to cache
the the result of a POST in a way that can be used by a later GET, the
origin server MAY send a 200 (OK) response containing the result and a
Content-Location header field...

Does throttle responses?

I have an angularjs app that calls a RESTful service at
Does throttle responses and delay responses after a certain number have been received?
If so what are the parameters?
Currently, Apiary limits you for 120 reqs/minute/IP.
There are no artificial delays, but occasionally someones floods Apiary with production traffic and even when ratelimiting is fairly efficient, it may temporarily degrade service for other users.
You can (and should) check X-Apiary-RateLimit-Limit and X-Apiary-RateLimit-Remaining header. Once you'll hit the limit, Apiary will sent Retry-After header you should obey.
From their docs:
API Call Limit
API calls are subject to the default limit of 15 requests per second and exceeding this limit will result in all endpoints returning an HTTP status code of 429. Limits are per API key. If the limit is exceeding then the API Key will be blocked for the remainder of the sample period. If an API key continually hits the call limit we reserve the right to permanently block the key and to charge a fee to unblock the key.
To determine the API call amount we monitor the traffic over a sample period. If the traffic results in a particular API key reaching 80% of the limit (i.e., 12 if the limit is 15) over the sample period then the responses will start to contain a throttle node which contains useful information on how close you are to reaching the call limit.

Cache Policy - caching only if request succeeded

I have enabled some cache policies on a few resources end points. System works quite well, response is cached, the following requests hit the cache, cache is correctly refreshed when I set it to be refreshed.
My only concern is that sometimes a client makes a request that does not hit the cache (for example, because the cache must be refreshed), the server in that moment returns an error (it can happen, it's statistic...) and so the cached response is not a "normal" response (e.g. 2xx) but a 4xx, or a 5xx response.
I would like to know if it is possible to cache the response only if, for example, the server response code is 2xx.
I didn't find any example on Apigee docs for doing this, also if there are some parameters for the cache policy called "SkipCachePopulation" that I think I can use for this purpose.
Any suggestion?
Yes, you can use the SkipCachePopulation field of ResponseCache. It uses a condition to determine when the cache population will not occur. Here is an example:
<SkipCachePopulation>response.status.code >= 400</SkipCachePopulation>
