I have a program that make many queries to Google Search Analytics server. My program does the queries one after the other sequentially, so each instant, only one query will be in process.
Google has advised about a throughput limit of 2000 queries per each 100 seconds at most so to configure my system to be the more efficient it could be I have two ideas on mind:
Known that 2000 queries per 100 seconds is one query per each 0.05 seconds, i have separated my queries by sleeping the process, but only if any query take less than 0.05 seconds, so the time the process will sleep in that case is the remaining time to complete the 0.05 second interval. If the query takes 0.05s or more I trigger the following without waiting.
The second idea is more easy to implement but I think it will be less efficient: i will trigger the queries taking note of the time when the process start so if i reach 2000 queries before 100 seconds, I will wait the remaining time sleeping.
So far I don't know how to measure which one is the best.
Which is your opinion about the two options? Any of them is better and why? Any additional option I haven't figured out? (specially if it's better than mine)
Actually what you need to consider is that its 2000 requests per 100 seconds. But you could do all 2000 requests in 10 seconds and still be on the good side of the quota.
I am curious as to why you are worried about it though. If you get one of the following errors
403 userRateLimitExceeded
403 rateLimitExceeded
429 RESOURCE_EXHAUSTED
Google just recommends that you implement Exponential backoff which consists of making your request getting the error sleeping for a bit and trying again. (do this up to eight times). Google will not penalize you for getting these errors they just ask that you wait a bit before trying again.
If you want to go crazy you can do something like what i did in my C# application I created a request queue that i use to track how much time has gone since i created the last 100 request. I call it Google APIs Flood Buster.
Basically i have a queue where i log each requests as i make it before i make a new request i check how long it has gone since i started. Yes this requires moving the items around the queue a bit. If there has gone more then 90 seconds then i sleep (100 - time since ) this has reduced my errors a great deal. Its not perfect but that's because google is not perfect with regard to tracking your quota. they are normally off by a little.
Related
My background is more from the Twitter side where all stats are recorded minutely so you might have 120 request per minute. Inside twitter someone had the bright idea to divide by 60 so most graphs(except some teams who realize dividing by 60 is NOT the true rps at all since in a minute, that will fluctuate). So instead of 120 request per minute, many graphs report out 2 request per second. In google, seems like they are doing the same EXCEPT the math is not showing that. In twitter, we could multiply by 60 and the answer was always a whole integer of how many requests occurred in that minute.
In Google however, we see 0.02 requests / second which if we multiply by 60 is 1.2 request per minute. IF they are a minute granularity, they are definitely counting it wrong or something is wrong with their math.
This is from cloudrun metrics as we click into the instance itself
What am I missing here? AND BETTER yet, can we please report on request per minute. request per second is really the average req/second for that minute and it can be really confusing to people when we have these discussions of how you can get 0.5 request / second.
I AM assuming that this is not request per second 'at' the minute boundary because that would be VERY hard to calculate BUT would also be a whole number as well...ie. 0 requests or 1, not 0.2 and that would be quite useless to be honest.
EVERY cloud run instance creates this chart so I assume it's the same for everyone but if I click 'view in metrics explorer' it then give this picture on how 'google configured it'....
As it is available on the Metrics from Cloud Run Documentation, the Request Count metric is sampled every 60 seconds and it excludes from the count requests that are not reaching your container instances, the examples given are unauthorized requests or request sent after the maximum number of instances are reached, which obviously are not your case but again, something to be consider.
Assuming that the calculation of the request count is wrong, I did some digging on Google's IssueTracker system for the monitoring and cloud run components to check if there are any bugs opened that are related to that but could not find any, I would advice that you create a bug in their system so that Google can address it and that you are notified once that is fixed.
While working on the SharePoint app, I noticed that it takes more than 10 seconds to load the app first time. so, I was thinking that how much ramp-up period will be idle for running 1000 users on JMeter?
Not sure about the "idle", do you mean "ideal" ramp-up?
I believe the slow initial load can be explained by how IIS internally works:
https://social.technet.microsoft.com/Forums/ie/en-US/297fb51b-b7b4-4b7b-a898-f6c91efd994e/sharepoint-2013-first-load-takes-long-time?forum=sharepointadmin
Taking all that the information into account, we can predict that high response times due to the long initial load will occur for at most several minutes in the beginning of your test and affect the users which are started during this period.
Finding out the "ideal" ramp-up time can be a little bit tricky.
I would address the problem like that:
Add the Response Times Over Time (https://jmeter-plugins.org/wiki/ResponseTimesOverTime/) graph.
Set the ramp-up time to 10 seconds per user and, using the graph added above, find out when the response times will settle down. This will mean that the initial load of the application is completed. Don't forget to exclude this period from the report.
Now you should observe that the response times are slowly growing with the addition of new users.
After the addition of all of the users, if everything went well, you can try to lower your ramp-up time to, say, 3600 seconds for all of the users.
If you see that the response times skyrocketed and/or exceeded the SLA, than you either are adding the users too fast, or the application is already saturated.
I am analyzing a web application and want to predict the maximum users that application can support. Now i have the below numbers out of my load test execution
1. Response Time
2. Throughput
3. CPU
I have the application use case SLA
Response Time - 4 Secs
CPU - 65%
When i execute load test of 10 concurrent users (without Think Time) for a particular use case the average response time reaches 3.5 Seconds and CPU touches 50%. Next I execute load test of 20 concurrent users and response time reaches 6 seconds and CPU 70% thus surpassing the SLA.
The application server configuration is 4 core 7 GB RAM.
Going by the data does this suggests that the web application can support only 10 user at a time? Is there any formula or procedure which can suggest what is the maximum users the application can support.
TIA
"Concurrent users" is not a meaningful measurement, unless you also model "think time" and a couple of other things.
Think about the case of people reading books on a Kindle. An average reader will turn the page every 60 seconds, sending a little ping to a central server. If the system can support 10,000 of those pings per second, how many "concurrent users" is that? About 10,000 * 60, or 600,000. Now imagine that people read faster, turning pages every 30 seconds. The same system will only be able to support half as many "concurrent users". Now imagine a game like Halo online. Each user will be emitting multiple transactions / requests per second. In other words, user behavior matters a lot, and you can't control it. You can only model it.
So, for your application, you have to make a reasonable guess at the "think time" between requests, and add that to your benchmark. Only then will you start to approach a reasonable simulation. Other things to think about are session time, variability, time of day, etc.
Chapter 4 of the "Mature Optimization Handbook" discusses a lot of these issues: http://carlos.bueno.org/optimization/mature-optimization.pdf
We are building a new application in parse and are trying to estimate our requests/second and optimize the application to limit it and keep it below the 30/second. Our app, still in development, makes various calls to parse. Some only use 1 requests, and a few as many as 5 requests. We have tested and verified this in the analytics/events/api requests tab.
However, when I go to the analytics/performance/total requests section, the requests/second rarely go above .2 and are often much lower. I assume this is because this is an average over a minute or more. So I have two questions:
1) does anyone know what the # represents on this total requests/second screen. Is it an average over a certain time period. If so, how much?
2) when parse denies the request due to rate limit, does it deny based on the actual per second, or is it based on an average over a certain time period?
Thanks!
I supposed you have your answer by now but just in case:
You're allowed 30reqs/sec on a free plan, but Parse actually counts it on a per minute basis, or 1800 requests per minute.
If i do a benchmark, and for example i found the following:
With 1 concurrent user, The api give 150 req/s. (9000 req/minute)
With more than 300 concurrent user, The api start throwing exception.
An app is doing request 1 every 30 minute.
Is it correct if I say:
the best cases is that the api could handle (30 * 9000 = 270.000 user). That is under 30 minute, there would be 270.000 sequential request and each are coming from different user
The worst cases would be when there is 300 user posting request at the same time.
And if it's true, would there any way to calculate the average case ?
Is is the same as calculating worst case, average case complexity of an algorithm ?
One theoretical tool to answer these questions is http://en.wikipedia.org/wiki/Queueing_theory. It says that you are very unlikely to get the level of performance that you are assuming, because the load applied to the system fluctuates, so that there are busy periods and quiet periods. If the system has nothing to do in quiet periods it is forced into idleness that you haven't accounted for. In busy periods, on the other hand, it will typically build up long queues of pending work, until the queues get so long that customers walk away, or the queues become longer than the system can support and it collapses, or both.
The graph at figure 1 page 3 of http://pages.cs.wisc.edu/~dsmyers/cs547/lecture_12_mm1_queue.pdf shows a graph of response time vs applied load for what is probably the most optimistic even vaguely realistic situation. You can see that response time gets very large as you approach maximum load.
By far the most sensible thing to do is to run tests which apply a realistic load to your application - this is important enough for people to build things like http://jmeter.apache.org/. If you want a rule of thumb I'd say don't plan to stress the system at more than 50% of theoretical capacity as you originally calculated.