We are seeing inconsistent performance on Heroku that is unrelated to the recent unicorn/intelligent routing issue.
This is an example of a request which normally takes ~150ms (and 19 out of 20 times that is how long it takes). You can see that on this request it took about 4 seconds, or between 1 and 2 orders of magnitude longer.
Some things to note:
the database was not the bottleneck, and it spent only 25ms doing db queries
we have more than sufficient dynos, so I don't think this was the bottleneck (20 double dynos running unicorn with 5 workers each, we get only 1000 requests per minute, avg response time of 150ms, which means we should be able to serve (60 / 0.150) * 20 * 5 = 40,000 requests per minute. In other words we had 40x the capacity on dynos when this measurement was taken.
So I'm wondering what could cause these occasional slow requests. As I mentioned, anecdotally it seems to happen in about 1 in 20 requests. The only thing I can think of is there is a noisy neighbor problem on the boxes, or the routing layer has inconsistent performance. If anyone has additional info or ideas I would be curious. Thank you.
I have been chasing a similar problem myself, with not much luck so far.
I suppose the first order of business would to be to recommend NewRelic. It may have some more info for you on these cases.
Second, I suggest you look at queue times: how long your request was queued. Look at NewRelic for this, or do it yourself with the "start time" HTTP header that Heroku adds to your incoming request (just print now() minus "start time" as your queue time).
When those failed me in my case, I tried coming up with things that could go wrong, and here's a (unorthodox? weird?) list:
1) DNS -- are you making any DNS calls in your view? These can take a while. Even DNS requests for resolving DB host names, Redis host names, external service providers, etc.
2) Log performance -- Heroku collects all your stdout using their "Logplex", which it then drains to your own defined logdrains, services such as Papertrail, etc. There is no documentation on the performance of this, and writes to stdout from your process could block, theoretically, for periods while Heroku is flushing any buffers it might have there.
3) Getting a DB connection -- not sure which framework you are using, but maybe you have a connection pool that you are getting DB connections from, and that took time? It won't show up as query time, it'll be blocking time for your process.
4) Dyno performance -- Heroku has an add-on feature that will print, every few seconds, some server metrics (load avg, memory) to stdout. I used Graphite to graph those and look for correlation between the metrics and times where I saw increased instances of "sporadic slow requests". It didn't help me, but might help you :)
Do let us know what you come up with.
Related
Context:
My Spring-Boot app runs as expected on Cloud Run when I deploy it with max-instances set to 1: It receives a constant stream of pubsub messages via push, and makes anywhere from 0 to 5 writes to an associated CloudSQL instance, depending on the message payload. Typically it handles between 20 and 40 messages per second. Latency/response-time varies between 50ms and 60sec, probably due to some resource contention.
In order to increase throughput/ decrease resource contention, I'm looking to experiment with the connection pool size per app-instance, as well as the concurrency and max-instances parameters for my cloud run app.
I understand that due to Spring-Boot, my app has a relatively high cold-start time of about 30-40 seconds. This is acceptable for how this service is used.
Problem:
I'm experiencing problems when deploying a spring-boot app to cloud run with max-instances set to a value greater than 1:
Instances start, handle a single request successfully, and then produce no more logs.
This happens a few times per minute, leading me to believe that instances get started (cold-start), handle a single request, die, and then get started again. They are not being reused as described in the docs, and as is happening when I set max-instances to 1. Official docs on concurrency
Instead, I expect 3 container instances to be started, which then each requests according to max-concurrency setting.
Billable container time at max-instances=3:
As shown in the graph, the number of instances is fluctuating wildly, once the new revision with max-instances=3 is deployed.
The graphs for CPU- and memory-usage also look like this.
There are no error logs. As before at max-instaces=1, there are warnings indicating that there are not enough instances available to handle requests (HTTP 429).
Connection Limit of CloudSQL instance has not been exceeded
Requests are handled at less than 10/s
Finally, this is the command used to deploy:
gcloud beta run deploy my-service --project=[...] --image=[...] --add-cloudsql-instances=[...] --region=[...] --platform=managed --memory=1Gi --max-instances=3 --concurrency=3 --no-allow-unauthenticated
What could cause this behavior?
Some month ago, in private Alpha, I performed tests and I observed the same behavior. After discussion with Google team, I understood that instances are over provisioned "in case of": an instances crashes, an instances is preempted, the traffic suddenly increase,...
The trade-off of this is that you will have more cold start that your max instances values. Worse, you will be charged for this over provisioned cold start -> this is not an issue because Cloud Run has a huge free tier that covers this kind of glitches.
Going deeper in the logs (you can do it by creating a sink of Cloud Run logs into BigQuery and then by requesting them), even if there is more instances up than your max instances, only your max instances are active in the same time. I'm not sure to be clear. With your parameters, that means, if you have 5 instances up in the same time, only 3 serve the traffic at the same point of time
This part is not documented because it evolves constantly for find the best balance between over-provisioning and lack of ressources (and 429 errors).
#Steren #AhmetB can you confirm or correct me?
When Cloud Run receives and processes requests rapidly, it predicts how many instances it needs, and will try to scale to the amount. If a sudden burst of requests occur, Cloud Run will instantiate a larger number of instances as a response. This is done in order to adapt to a possible higher number of network requests beyond what it is currently serving, with attempts to take into consideration the length of time it will take for the existing instance to complete loading the request. Per the documentation, it is possible that the amount of container instances can go above the max instance value when it spikes.
You mentioned with max-instances set to 1 it was running fine, but later you mentioned it was in fact producing 429s with it set to 1 as well. Seeing behavior of 429s as well as the instances spiking could indicate that the amount of traffic is not being handled fluidly.
It is also worth noting, because of the cold start time you mention, when instances are serving the first request(s), by design, the number of concurrent requests is actually hard set to 1. Once things are fully ready,only then the concurrency setting you have chosen is applied.
Was there some specific reason you chose 3 and 3 for Max Instance settings and concurrency? Also how was the concurrency set when you had max instance set to 1? Perhaps you could try tinkering up further the concurrency (max 80) and /or Max instances (high limit up to 1000) and see if that removes the 429s.
We are currently conducting performance tests on both web apps that we have, one is running within a private network and the other is accessible for all. For both apps, a single page-load of the landing page or initial page only takes between 2-3 seconds on a user POV, but when we use blaze and JMeter, the results are between 15-20 seconds. Am I missing something? The 15-20 seconds result came from the Loadtime/Sample Time in JMeter and in Elapsed column if extracted to .csv. Please help as I'm stuck.
We have tried conducting tests on multiple PCs within the office premises along with a PC remotely accessed on another site and we still get the same results. The number of thread and ramp-up period is both set to 1 to imitate a single user only.
Where a delta exists, it is certain to mean that two different items are being timed. It would help to understand on your front end are you timing to a standard metric, such as w3c domComplete, time to interactive, first contentful paint, some other location, and then compare where this comes into play on the drilldown on the performance tab of chrome. Odds are that there is a lot occuring that is not visible that is being captured by Jmeter.
You might also look for other threads on here on how jmeter operates as compared to a "real browser" There are differences which could come into play affecting your page comparisons, particularly if you have dozens/hundreds of elements that need to be downloaded to complete your page. Also, pay attention to third party components where you do not have permission to test their servers.
I can think of 2 possible causees:
Clear your browser history, especially browser cache. It might be the case you're getting HTTP Status 304 for all requests in browser because responses are being returned from the browser cache and no actual requests are being made while JMeter always uses "clean" session.
Pay attention to Connect Time and Latency metrics as it might be the case the server response time is low but the time for network packets to travel back and forth is very high.
Connect Time. JMeter measures the time it took to establish the connection, including SSL handshake. Note that connect time is not automatically subtracted from latency. In case of connection error, the metric will be equal to the time it took to face the error, for example in case of Timeout, it should be equal to connection timeout.
Latency. JMeter measures the latency from just before sending the request to just after the first response has been received. Thus the time includes all the processing needed to assemble the request as well as assembling the first part of the response, which in general will be longer than one byte. Protocol analysers (such as Wireshark) measure the time when bytes are actually sent/received over the interface. The JMeter time should be closer to that which is experienced by a browser or other application client.
So basically "Elapsed time = Connect Time + Latency + Server Processing Time"
In general given:
the same machine
clean browser session
and JMeter configured to behave like a real browser
you should get similar or equal timings for the same page
I'm load testing a system with 500 virtual users. I've kept the "Ramp-Up period (in seconds)" option to zero. So, what I understand, JMeter will hit the system with 500 virtual users all at the same time. Please correct me if I'm wrong here.
Now, the summary report shows the average response time for the first page is ~100 seconds!. Which is more than a minute and a half of wait time. But while the JMeter is running, I manually went to the same page/url using a browser and didn't have to wait for that long. It was not even close, the page response was almost immediate for me.
My question is: is there any known issue for the average response time of the first page? Is it JMeter which is taking long to trigger that many users?
Thanks in advance.
--Ishtiaque
There is no issue in Jmeter related to first page response time.
Summary Report shows all response time details in Milliseconds, the value "100" seconds have you converted milliseconds to seconds?
Also in order to make sure that 500 users hit concurrently, use Synchronizing Timer.
Hope this will help.
While the response times will be accurate, you need to consider the affect of starting so many threads at once on both your server and your client.
500 threads to start at once is not insignificant n the client. If your server has the connections, it will start 500 threads as well.
Ramping over a period of time is more realistic loadwise, but still not really indicative of server capability until the threads have all started and settled in.
Databases can also require a settling in period which can affect response times.
Alternative to ramping is introducing a random wait at the start of each thread before firing the first sample. You can then choose not to ramp over time, but still expect resources on the client to suddenly come under load and change the settings if you hit limits. This will make the entire run much more realistic of typical behaviour. However, you need to determine if your use cases are typical.
Although the heap size is increased, i notice there is still longer time as compared to actual response time. Later i realised it was the probe effect (the extra time a tool generates due to test execution)
I am using to test my web server https://buyandbrag.in .
I have tested it for 100 users. But the main server is not showing like it is crowded or not.
I want to know whether it is really pressuring the main server(a cloud server I am using).Or just use the client resourse where the tool is installed.
Yes as mentioned you should be monitoring both servers to see how they handle the load. The simplest way to do this is with TOP (if your server OS is *NIX) also you should be watching the network activity i.e. Bandwidth, connection status (time wait, close wait and so on).
Also if your using apache keep an eye on the logs you should see the requests being logged there
Good luck with the tests
I want to know "how many users my website can handele ?",when I tested with 50 threads ,the cpu usage of my server increased but not the connections log(It showed just 2 connections).also the bandwidth usage is not that much
Firstly what connections are you referring to? Apache, DB etc?
Secondly if you want to see how many users your current setup can hand you need to create a profile or traffic model of what an average user will do on your site.
For example:
Say 90% of the time they will search for something
5% of the time they will purchase x
5% of the time they login.
Once you have your "Traffic Model" defined, implement it in jMeter then start increasing your load in increments i.e. running your load test for 10mins with x users, after 10mins increment that number and so on until you find your breaking point.
If you graph your responses you should see two main things:
1) The optimum response time / number of users before the service degrades
2) The tipping point i.e. at what point you start returning 503's etc
Now you'll have enough data to scale your site or to start making performance improvements from a code point of view.
I have an application running in CF8 which does calls to external systems like search engine and ldaps often. But at times some request never gets the response and is shown always in the the active request list.
Even tho there is request timeout set in the administration, its not getting applied to these scenarios.
I have around 5 request still pending to be finished for the last 20hours !!!
My server settings are as below
Timeout Requests after ( seconds) : 300 sec
Max no of simultaneous requests : 20
Maximum number of running JRun threads : 50
Maximum number of running JRun threads : 1000
Timeout requests waiting in queue after 300 seconds
I read through some articles and found there are cases where threads are never responded or killed. But i dont have a solid solution how can i timeout this or kill this automatically
really appreciated if you guys have some idea on this :)
The ColdFusion timeout does not apply to 'third party' connections.
A long-running LDAP query, for example, will take as long as it needs. When the calling template gets the result from the query your timeout will apply.
This often leads to confusion interpreting errors. You will get an error saying that whichever function after the long running request causes the timeout.
Further reading available here
You can (and probably should) set a timeout on the CFLDAP call itself. http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7f97.html
Thanks, Antony, for recommending my blog entry CF911: Lies, Damned Lies, and CF Request Timeouts...What You May Not Realize. This problem of requests not timing out when expected can be very troublesome and a surprise for most.
But Anooj, while that at least explains WHY they don't die (and you can't kill them within CF), one thing to consider is that you may be able to kill them in the REMOTE server being called, in your case, the LDAP server.
You may be able to go to the administrator of THAT server and on showing them that CF has a long-running request, they may be able to spot and resolve the problem. And if they can, that may free the connection from CF and your request then will stop.
I have just added a new section on this idea to the bottom of that blog entry, as "So is there really nothing I can do for the hung requests?"
Hope that helps.