Why would AWS API Gateway let cached queries through to the backend? - caching

I have a GET method in AWS Api Gateway. The cache is enabled for the stage, and works for most requests. However some requests seem to slip through to the backend no matter what I do. That is, some requests going through the API are not cached.
I have defined the parameter a, b & c to be cached; by checking their respective "caching"-box under the "request" settings. There are also other parameters which are not cached.
The request can either have all three parameters or just one:
example.com/?a=foo&b=bar&c=baz&d=qux
example.com/?a=foo&d=qux
a, b & c can take on between 3 and 25 different values. But a can only have one value if b & c are present. Also b cannot be present without c and vice versa.
An example, say the cache's TTL is 60 I send this between time 0 and 10:
example.com/?a=foo&b=bar&c=baz&d=qux
example.com/?a=quux&d=qux
example.com/?a=foo&b=quux&c=baz&d=qux
example.com/?a=foo&b=corge&c=fred&d=qux
example.com/?a=baz&d=qux
And then between time 30 and 40 I send the same requests and I might see the following log:
example.com/?a=foo&b=bar&c=baz&d=qux
example.com/?a=quux&d=qux
example.com/?a=baz&d=qux
So these requests were cached while the others weren't:
example.com/?a=foo&b=quux&c=baz&d=qux
example.com/?a=foo&b=corge&c=fred&d=qux
In the example above most were not cached but this is not the case in the real case; Most queries are cached. In the real case there are a fairly big number of requests coming in on the second run, about 600/s. In the first run the request-rate is about 1/s. The queries I see slipping through are among the first that would be requested by the application.
It seems unlikely that AWS API Gateway wouldn't be able to handle similar query rates (throttling is enabled at 10 000 requests and 5000 at burst) but yet it seems the first few queries the application sends slip through. Is this to be expected from API Gateway?
I was also thinking that there might be a cache size issue but increasing the cache does not seem to help.
So what reasons could there be for API Gateway to let seemingly cached requests slip through to the backend?
UPDATE: The nature of the application, which creates the requests is that it starts a request chain. Meaning, there are about 500-600 applications which all start at the same time. When they make a handful of asynchronously and then a chain of about 300-500 requests (synchronously).
With this in mind, The burst rate at 0 s is probably much higher. The ~600 requests/s stated above the average of ~36 000 queries over 60 s. Most of the requests would be done at the beginning of those 60 s but I don't have a number on the exact rate. An estimate might be about 1000-2000 requests/s for the first seconds and maybe even more (say 3000+) for the first second.

In short, I still don't know why this happens but I did manage to minimize the number of requests that slipped through.
I did this by having the requesting application delay the start (I explained the nature of the start sequence in the update to the question) by some random time. I let the application pick a random start time between 0 and 3 minutes to avoid spikes to API Gateway.
This didn't eliminate the phenomenon of requests slipping through but it lowered the number from about 500-1500 over 60s to between 0-10 over 3 minutes. Something my backend could easily handle, compared to the 1000+ over 60 s which was on the edge.
It seems to me that when API Gateway is flooded with a large number of requests over a short time it will just pass these requests through. I am surprised (and a little skeptical) that these numbers would be so large as to cause problems for AWS but that is what I see.
Perhaps this can be solved by changing the throttling levels, but I found no difference when playing around with it (mind you, I'm no expert!).

Related

Incorrect Google Cloud metrics? or what is going on?

My background is more from the Twitter side where all stats are recorded minutely so you might have 120 request per minute. Inside twitter someone had the bright idea to divide by 60 so most graphs(except some teams who realize dividing by 60 is NOT the true rps at all since in a minute, that will fluctuate). So instead of 120 request per minute, many graphs report out 2 request per second. In google, seems like they are doing the same EXCEPT the math is not showing that. In twitter, we could multiply by 60 and the answer was always a whole integer of how many requests occurred in that minute.
In Google however, we see 0.02 requests / second which if we multiply by 60 is 1.2 request per minute. IF they are a minute granularity, they are definitely counting it wrong or something is wrong with their math.
This is from cloudrun metrics as we click into the instance itself
What am I missing here? AND BETTER yet, can we please report on request per minute. request per second is really the average req/second for that minute and it can be really confusing to people when we have these discussions of how you can get 0.5 request / second.
I AM assuming that this is not request per second 'at' the minute boundary because that would be VERY hard to calculate BUT would also be a whole number as well...ie. 0 requests or 1, not 0.2 and that would be quite useless to be honest.
EVERY cloud run instance creates this chart so I assume it's the same for everyone but if I click 'view in metrics explorer' it then give this picture on how 'google configured it'....
As it is available on the Metrics from Cloud Run Documentation, the Request Count metric is sampled every 60 seconds and it excludes from the count requests that are not reaching your container instances, the examples given are unauthorized requests or request sent after the maximum number of instances are reached, which obviously are not your case but again, something to be consider.
Assuming that the calculation of the request count is wrong, I did some digging on Google's IssueTracker system for the monitoring and cloud run components to check if there are any bugs opened that are related to that but could not find any, I would advice that you create a bug in their system so that Google can address it and that you are notified once that is fixed.

Is combining rest api calls to reduce # requests worth doing?

My server used to handle 700+ user burst and now it is failing at around 200 users.
(Users are connecting to the server almost at the same time after clicking a push message)
I think the change is due to the change how the requests are made.
Back then, webserver collected all the information in a single response in an html.
Now, each section in a page is making a rest api request resulting in probably 10+ more requests.
I'm considering making an api endpoint to aggregate those requests for pages that users would open when they click on push notification.
Another solution I think of is caching those frequently used rest api responses.
Is it a good idea to combine api calls to reduce api calls ?
It is always a good idea to reduce API calls. The optimal solution is to get all the necessary data in one go without any unused information.
This results in less traffic, less requests (and loads) to the server, less RAM and CPU usage, as well as less concurrent DB operations.
Caching is also a great choice. You can consider both caching the entire request and separate parts of the response.
A combined API response means that there will be just one response, which will reduce the pre-execution time (where the app is loading everything), but will increase the processing time, because it's doing everything in one thread. This will result in less traffic, but a slightly slower response time.
From the user's perspective this would mean that if you combine everything, the page will load slower, but when it does it will load up entirely.
It's a matter of finding the balance.
And for the question if it's worth doing - it depends on your set-up. You should measure the start-up time of the application and the execution time and do the math.
Another thing you should consider is the amount of time this might require. There is also the solution of increasing the server power, like creating a clustered cache and using a load balancer to split the load. You should compare the needed time for both tasks and work from there.

How to estimate requests per second for an GPS tracking app

I am going to develop an Android application that enables tracking (and monitoring on map interface) of multiple users by a specific user. For this reason, I want to study on a mBaaS, Parse. However I cannot figure out how much requests per second performed by such an app considering the count of users. To exemplify, if I choose free option for the monthly cost, the limit will be 30 requests per second. I have some doubts about whether this number is sufficient for this app.
In other words, there will be periodic API requests (let's say every 30 seconds) for all users that are tracking. I think it is highly possible to exceed the limit of 30 requests per second with a very few active users. Even if 5 different users track 10 different users at the same time, the probability of catching 30 requests per second is very high.
Considering all these, what kind of strategy you advise? How can I manage periodic geolocation requests in this system? Is Parse the right choice? If not, any better alternative?
The approach used in Traccar GPS tracking system is to return all user's objects in one request. So, say if you want one user to track 100 other users, you still need only one request to get all 100 locations.
You can optimize it further by not sending location if it hasn't changed. So, if only 10 users from 100 changed their location since last request, you can return only 10 location items in response.

Parse API Limit/requests and reporting on Parse.com

We are building a new application in parse and are trying to estimate our requests/second and optimize the application to limit it and keep it below the 30/second. Our app, still in development, makes various calls to parse. Some only use 1 requests, and a few as many as 5 requests. We have tested and verified this in the analytics/events/api requests tab.
However, when I go to the analytics/performance/total requests section, the requests/second rarely go above .2 and are often much lower. I assume this is because this is an average over a minute or more. So I have two questions:
1) does anyone know what the # represents on this total requests/second screen. Is it an average over a certain time period. If so, how much?
2) when parse denies the request due to rate limit, does it deny based on the actual per second, or is it based on an average over a certain time period?
Thanks!
I supposed you have your answer by now but just in case:
You're allowed 30reqs/sec on a free plan, but Parse actually counts it on a per minute basis, or 1800 requests per minute.

Spreading out data from bursts

I am trying to spread out data that is received in bursts. This means I have data that is received by some other application in large bursts. For each data entry I need to do some additional requests on some server, at which I should limit the traffic. Hence I try to spread up the requests in the time that I have until the next data burst arrives.
Currently I am using a token-bucket to spread out the data. However because the data I receive is already badly shaped I am still either filling up the queue of pending request, or I get spikes whenever a bursts comes in. So this algorithm does not seem to do the kind of shaping I need.
What other algorithms are there available to limit the requests? I know I have times of high load and times of low load, so both should be handled well by the application.
I am not sure if I was really able to explain the problem I am currently having. If you need any clarifications, just let me know.
EDIT:
I'll try to clarify the problem some more and explain, why a simple rate limiter does not work.
The problem lies in the bursty nature of the traffic and the fact, that burst have a different size at different times. What is mostly constant is the delay between each burst. Thus we get a bunch of data records for processing and we need to spread them out as evenly as possible before the next bunch comes in. However we are not 100% sure when the next bunch will come in, just aproximately, so a simple divide time by number of records does not work as it should.
A rate limiting does not work, because the spread of the data is not sufficient this way. If we are close to saturation of the rate, everything is fine, and we spread out evenly (although this should not happen to frequently). If we are below the threshold, the spreading gets much worse though.
I'll make an example to make this problem more clear:
Let's say we limit our traffic to 10 requests per seconds and new data comes in about every 10 seconds.
When we get 100 records at the beginning of a time frame, we will query 10 records each second and we have a perfect even spread. However if we get only 15 records we'll have one second where we query 10 records, one second where we query 5 records and 8 seconds where we query 0 records, so we have very unequal levels of traffic over time. Instead it would be better if we just queried 1.5 records each second. However setting this rate would also make problems, since new data might arrive earlier, so we do not have the full 10 seconds and 1.5 queries would not be enough. If we use a token bucket, the problem actually gets even worse, because token-buckets allow bursts to get through at the beginning of the time-frame.
However this example over simplifies, because actually we cannot fully tell the number of pending requests at any given moment, but just an upper limit. So we would have to throttle each time based on this number.
This sounds like a problem within the domain of control theory. Specifically, I'm thinking a PID controller might work.
A first crack at the problem might be dividing the number of records by the estimated time until next batch. This would be like a P controller - proportional only. But then you run the risk of overestimating the time, and building up some unsent records. So try adding in an I term - integral - to account for built up error.
I'm not sure you even need a derivative term, if the variation in batch size is random. So try using a PI loop - you might build up some backlog between bursts, but it will be handled by the I term.
If it's unacceptable to have a backlog, then the solution might be more complicated...
If there are no other constraints, what you should do is figure out the maximum data rate that you are comfortable with sending additional requests, and limit your processing speed according to that. Then monitor what is happening. If that gets through all of your requests quickly, then there is no harm . If its sustained level of processing is not fast enough, then you need more capacity.

Resources