Correlation between requests per second and response time? - performance

Can someone please explain the correlation between requests per second and response time? Which are you trying to improve at first? If your competitor offers less 'requests per second' on his most used functionality then you, is your application performing better in terms of end-user performance?

Can someone please explain the correlation between requests per second and response time?
Think of this situation as if it were a gas station. Cars arrive at various intervals and occupy a pump; they spend some time filling up, and then they leave.
Each car that arrives and occupies a pump is a request.
The time it takes to fill up is your response time.
You can improve things in two ways:
If you add more pumps, you can service additional cars at once because there will be more capacity.
If you make all your pumps faster, you can service more cars over time with the same number of pumps, because each car will finish sooner.
Which are you trying to improve at first?
That depends. Do you want to serve people faster (improving their experience while making some others wait) and thus more people overall, or do you want to serve more people at once (at the possible expense of request time)? Ideally, get both metrics as good as possible.

It all depends on what sort of load your system will be under.
If you have millions of users then you need to handle more requests per second possibly at the expense of response time otherwise users may not be able to connect when they want to.
However, if you are only going to have 30 users then it's more important to them that your system responds quickly than it being able to handle a thousand requests a second.

Requests per second may be high while offering an awful user experience. You might have a lot of users buying thousands of concert tickets per second but the response time for each user is over 30 seconds.
For a high performing, enjoyable web site, you need to have a high number of requests per second and a maximum response time. As a user, I like 5 seconds or less.

If your competitor offers less 'requests per second' on his most used functionality then you, is your application performing better in terms of end-user performance?
I wouldn't agree with that. Look at Google. They make thousands of requests a second - hell, I think it's something like 100 million per day and 3 billion per month.
To answer your question, I think response time is more important than requests per second. Sure you can optimize/minimize the number of requests made, but if your product scales to handle unlimited requests (just by throwing more hardware at the problem) then I think that is more valuable.

Related

Difference betwwen processing 1,400,000 orders per second and 1,400,000 concurrent connections

You may have heard that some cryptocurrency exchange platforms claim to be able to proceed 1,400,000 orders per second. My question is that is this the same as having 1,400,000 concurrent connections per second?
Please advise.
Thank you.
Not necessarily, in the majority of cases the number of connections will be higher.
I would say that 1.4M orders per second can stand for 1.4M connections only if the system is being used by i.e. trading bots which are capable of placing an order in a single request.
If the system is being used by a "normal" human using a web browser the number of connections will be approx. 5 times higher mostly due to AJAX requests. In case of real browsers you will also need to think about cookies, cache, embedded resources and so on.
So in order to come up with a well-behaved load test you need to identify how exactly the system will be used and design your test plan to simulate this real usage as close as possible, otherwise the load test will not make a lot of sense.

Parse pricing and requests per second

Parse now allows us to send 30 requests/second, but it is not straightforward to me.
Quoting some info gathered:
Here they say
At 30 requests/sec, an app can send us 77.76 million API requests in
a month before needing to pay a dime.
So I suppose he meant
send up to 77.76 million
Here, they suggest the rate of requests/second is calculated in a small window, generally a minute. This was answered about 2 years ago.
On their pricing faq page they give an example:
if an app is set to 30 requests/second, your app will hit its request
limit once it makes more than 1,800 requests over a 60 second period.
Suggesting that the window is one minute, even though they didn't clearly state it.
What intrigues me is that they say:
Pricing is pro-rated by the hour.
What does it mean? (sorry if it's obvious, English is not my first language)
Has anyone actually used parse and kept track of those request/second and burst/limits?
The only resource I found was a guy saying he had a web application with 10,000 users/day staying in the website around 4min, and he had under 12r/s.
Moreover, if my app logs users' activities, would that be a good practice to cache this info then send it in random times like between from 3am-7am?
Any help is very appreciated. My company is deciding whether go forward with Parse or not, so it's very important.
They could have worded it better but it basically means the same as "We'll charge you for a minimum of 1 hour based on the request limit you have set".
Here's an example. Assume you are using a 40 rps settings ($100/month which is $100/720 hours). If you keep this for 1 minute, you'll be charged for 1 hour, roughly $1.40.
You can change the request limit as often as you want. So if your app/site receives peak traffic for only 12 hours/day, you can increase the limit just for those 12 hours and end up paying just for those 12 hours.
Check the third question (How frequently can I increase/decrease my request limit?) on the FAQ page at https://parse.com/plans/faq
How frequently can I increase/decrease my request limit?
You can increase/decrease your request/limit as frequently as you would like
within a given month. We will prorate your charges on an hourly basis.
Not really clear what the pro rating means as I understand the setting to be an explicit limit that you pay for. If your limit is exceeded then the requests fail. I don't think there's an option to allow for payment on demand when the limit is exceeded but pro rating would do that.
The one minute is accurate and that is the current limit management.
The point of the pricing model is that your service should be making money before you reach any of the limits. If you have enough users to hit the limits and you aren't making money then you need to reconsider your business plan. As such you shouldn't need to upload at random times of day as your users should naturally spread out a bit.
Here is something that can help you understand Parse's users request per second.
Parse estimates that the average app's active user will issue 10 requests. Thus, if you had a million users on a particular day, and their traffic was evenly spread throughout the day, you could estimate your app would need about 10,000,000 total API requests, or about 120 requests per second. Every app is different, therefore Parse encourages you to measure how many requests your users send.
You can read more this question answered directly from Parse staff here on Parse/F&A link
Hope this helps

Parse API Limit/requests and reporting on Parse.com

We are building a new application in parse and are trying to estimate our requests/second and optimize the application to limit it and keep it below the 30/second. Our app, still in development, makes various calls to parse. Some only use 1 requests, and a few as many as 5 requests. We have tested and verified this in the analytics/events/api requests tab.
However, when I go to the analytics/performance/total requests section, the requests/second rarely go above .2 and are often much lower. I assume this is because this is an average over a minute or more. So I have two questions:
1) does anyone know what the # represents on this total requests/second screen. Is it an average over a certain time period. If so, how much?
2) when parse denies the request due to rate limit, does it deny based on the actual per second, or is it based on an average over a certain time period?
Thanks!
I supposed you have your answer by now but just in case:
You're allowed 30reqs/sec on a free plan, but Parse actually counts it on a per minute basis, or 1800 requests per minute.

How to Calculate average case after doing HTTP benchmark

If i do a benchmark, and for example i found the following:
With 1 concurrent user, The api give 150 req/s. (9000 req/minute)
With more than 300 concurrent user, The api start throwing exception.
An app is doing request 1 every 30 minute.
Is it correct if I say:
the best cases is that the api could handle (30 * 9000 = 270.000 user). That is under 30 minute, there would be 270.000 sequential request and each are coming from different user
The worst cases would be when there is 300 user posting request at the same time.
And if it's true, would there any way to calculate the average case ?
Is is the same as calculating worst case, average case complexity of an algorithm ?
One theoretical tool to answer these questions is http://en.wikipedia.org/wiki/Queueing_theory. It says that you are very unlikely to get the level of performance that you are assuming, because the load applied to the system fluctuates, so that there are busy periods and quiet periods. If the system has nothing to do in quiet periods it is forced into idleness that you haven't accounted for. In busy periods, on the other hand, it will typically build up long queues of pending work, until the queues get so long that customers walk away, or the queues become longer than the system can support and it collapses, or both.
The graph at figure 1 page 3 of http://pages.cs.wisc.edu/~dsmyers/cs547/lecture_12_mm1_queue.pdf shows a graph of response time vs applied load for what is probably the most optimistic even vaguely realistic situation. You can see that response time gets very large as you approach maximum load.
By far the most sensible thing to do is to run tests which apply a realistic load to your application - this is important enough for people to build things like http://jmeter.apache.org/. If you want a rule of thumb I'd say don't plan to stress the system at more than 50% of theoretical capacity as you originally calculated.

Spreading out data from bursts

I am trying to spread out data that is received in bursts. This means I have data that is received by some other application in large bursts. For each data entry I need to do some additional requests on some server, at which I should limit the traffic. Hence I try to spread up the requests in the time that I have until the next data burst arrives.
Currently I am using a token-bucket to spread out the data. However because the data I receive is already badly shaped I am still either filling up the queue of pending request, or I get spikes whenever a bursts comes in. So this algorithm does not seem to do the kind of shaping I need.
What other algorithms are there available to limit the requests? I know I have times of high load and times of low load, so both should be handled well by the application.
I am not sure if I was really able to explain the problem I am currently having. If you need any clarifications, just let me know.
EDIT:
I'll try to clarify the problem some more and explain, why a simple rate limiter does not work.
The problem lies in the bursty nature of the traffic and the fact, that burst have a different size at different times. What is mostly constant is the delay between each burst. Thus we get a bunch of data records for processing and we need to spread them out as evenly as possible before the next bunch comes in. However we are not 100% sure when the next bunch will come in, just aproximately, so a simple divide time by number of records does not work as it should.
A rate limiting does not work, because the spread of the data is not sufficient this way. If we are close to saturation of the rate, everything is fine, and we spread out evenly (although this should not happen to frequently). If we are below the threshold, the spreading gets much worse though.
I'll make an example to make this problem more clear:
Let's say we limit our traffic to 10 requests per seconds and new data comes in about every 10 seconds.
When we get 100 records at the beginning of a time frame, we will query 10 records each second and we have a perfect even spread. However if we get only 15 records we'll have one second where we query 10 records, one second where we query 5 records and 8 seconds where we query 0 records, so we have very unequal levels of traffic over time. Instead it would be better if we just queried 1.5 records each second. However setting this rate would also make problems, since new data might arrive earlier, so we do not have the full 10 seconds and 1.5 queries would not be enough. If we use a token bucket, the problem actually gets even worse, because token-buckets allow bursts to get through at the beginning of the time-frame.
However this example over simplifies, because actually we cannot fully tell the number of pending requests at any given moment, but just an upper limit. So we would have to throttle each time based on this number.
This sounds like a problem within the domain of control theory. Specifically, I'm thinking a PID controller might work.
A first crack at the problem might be dividing the number of records by the estimated time until next batch. This would be like a P controller - proportional only. But then you run the risk of overestimating the time, and building up some unsent records. So try adding in an I term - integral - to account for built up error.
I'm not sure you even need a derivative term, if the variation in batch size is random. So try using a PI loop - you might build up some backlog between bursts, but it will be handled by the I term.
If it's unacceptable to have a backlog, then the solution might be more complicated...
If there are no other constraints, what you should do is figure out the maximum data rate that you are comfortable with sending additional requests, and limit your processing speed according to that. Then monitor what is happening. If that gets through all of your requests quickly, then there is no harm . If its sustained level of processing is not fast enough, then you need more capacity.

Resources