I have an API which take a json object and forward it to Azure Event Hub. The API running .NET Core 3.1, with EventHub SDK 3.0, it also have Application Insight configured to collect dependency telemetry, including Event Hub.
Using the following kusto query in Application Insight, I've found that there are some call to Event Hub which have really high latency (highest is 60 second, on average it fall around 3-7 seconds).
dependencies
| where timestamp > now()-7d
| where type == "Azure Event Hubs" and duration > 3000
| order by duration desc
Also it is worth noting that it return 890 results, out of 4.6 million Azure Event Hubs dependency result
I've check Event Hub metrics blade on Azure Portal, with average (in 1 minute time granularity) incoming/outgoing request way below the throughput unit (I have 2 event hubs in a EH namespace, 1 TU, autoscale to 20 max), which is around 50-100 message per second, bytes around 100kB, both incoming and outgoing. 0 throttled requests, 1-2 server/user errors from time to time
There are spike but it does not exceed throughput limit, and the slow dependency timestamp also don't match these spike
I also increased throughput unit to 2 manually, and it does not change anything
My question is:
Is it normal to have extremely high latency to Event Hub sometimes? Or it is acceptable if it only in small amount?
Codewise, only use 1 EventHubClient instance to send all the request, it is a bad practice or should I used something else like a client pool?
I also have a support engineer told me during a timestamp where I have high latency in Application Insight, the Event Hub log does not seem to have such high latency (322ms max), without going into details, it is possible for Application Insight to produce wrong performance telemetry?
Related
Background:
I hava a spring boot application,I want to monitor the max and avg request’s number per minute.
Since the server assigns a thread to a request, I can observe the thread number.I use micrometer to expose the metric, and use a prometheus to pull the metrics.I chose the Gauge type to track the thread number.
AtomicInteger concurrentNumber = meterRegistry.gauge(“concurrent_thread_number”, new AtomicInteger(0));
But in prometheus, the gauge value remains 0 while I make lots of requests to the spring boot application.
I have found the reason.
My application deals with the request pretty fast, it may finish processing many requests in 1 second. I set the prometheus scrape_interval to 30s(because I don't want my machines to suffer the great load).so between the scrape intervals, the thread number changes from 0 to 1,2,3,4,..,and finally to 0. So the samples are 0 and 0.
My Question:
I don’t want to shorten the scrape_interval, is there any trick to monitor the max and avg request’s number per minute while scrape_interval is 30s? Maybe choosing another type of metric instead of gauge? Any advice would be appreciated.
Context:
My Spring-Boot app runs as expected on Cloud Run when I deploy it with max-instances set to 1: It receives a constant stream of pubsub messages via push, and makes anywhere from 0 to 5 writes to an associated CloudSQL instance, depending on the message payload. Typically it handles between 20 and 40 messages per second. Latency/response-time varies between 50ms and 60sec, probably due to some resource contention.
In order to increase throughput/ decrease resource contention, I'm looking to experiment with the connection pool size per app-instance, as well as the concurrency and max-instances parameters for my cloud run app.
I understand that due to Spring-Boot, my app has a relatively high cold-start time of about 30-40 seconds. This is acceptable for how this service is used.
Problem:
I'm experiencing problems when deploying a spring-boot app to cloud run with max-instances set to a value greater than 1:
Instances start, handle a single request successfully, and then produce no more logs.
This happens a few times per minute, leading me to believe that instances get started (cold-start), handle a single request, die, and then get started again. They are not being reused as described in the docs, and as is happening when I set max-instances to 1. Official docs on concurrency
Instead, I expect 3 container instances to be started, which then each requests according to max-concurrency setting.
Billable container time at max-instances=3:
As shown in the graph, the number of instances is fluctuating wildly, once the new revision with max-instances=3 is deployed.
The graphs for CPU- and memory-usage also look like this.
There are no error logs. As before at max-instaces=1, there are warnings indicating that there are not enough instances available to handle requests (HTTP 429).
Connection Limit of CloudSQL instance has not been exceeded
Requests are handled at less than 10/s
Finally, this is the command used to deploy:
gcloud beta run deploy my-service --project=[...] --image=[...] --add-cloudsql-instances=[...] --region=[...] --platform=managed --memory=1Gi --max-instances=3 --concurrency=3 --no-allow-unauthenticated
What could cause this behavior?
Some month ago, in private Alpha, I performed tests and I observed the same behavior. After discussion with Google team, I understood that instances are over provisioned "in case of": an instances crashes, an instances is preempted, the traffic suddenly increase,...
The trade-off of this is that you will have more cold start that your max instances values. Worse, you will be charged for this over provisioned cold start -> this is not an issue because Cloud Run has a huge free tier that covers this kind of glitches.
Going deeper in the logs (you can do it by creating a sink of Cloud Run logs into BigQuery and then by requesting them), even if there is more instances up than your max instances, only your max instances are active in the same time. I'm not sure to be clear. With your parameters, that means, if you have 5 instances up in the same time, only 3 serve the traffic at the same point of time
This part is not documented because it evolves constantly for find the best balance between over-provisioning and lack of ressources (and 429 errors).
#Steren #AhmetB can you confirm or correct me?
When Cloud Run receives and processes requests rapidly, it predicts how many instances it needs, and will try to scale to the amount. If a sudden burst of requests occur, Cloud Run will instantiate a larger number of instances as a response. This is done in order to adapt to a possible higher number of network requests beyond what it is currently serving, with attempts to take into consideration the length of time it will take for the existing instance to complete loading the request. Per the documentation, it is possible that the amount of container instances can go above the max instance value when it spikes.
You mentioned with max-instances set to 1 it was running fine, but later you mentioned it was in fact producing 429s with it set to 1 as well. Seeing behavior of 429s as well as the instances spiking could indicate that the amount of traffic is not being handled fluidly.
It is also worth noting, because of the cold start time you mention, when instances are serving the first request(s), by design, the number of concurrent requests is actually hard set to 1. Once things are fully ready,only then the concurrency setting you have chosen is applied.
Was there some specific reason you chose 3 and 3 for Max Instance settings and concurrency? Also how was the concurrency set when you had max instance set to 1? Perhaps you could try tinkering up further the concurrency (max 80) and /or Max instances (high limit up to 1000) and see if that removes the 429s.
We are currently conducting performance tests on both web apps that we have, one is running within a private network and the other is accessible for all. For both apps, a single page-load of the landing page or initial page only takes between 2-3 seconds on a user POV, but when we use blaze and JMeter, the results are between 15-20 seconds. Am I missing something? The 15-20 seconds result came from the Loadtime/Sample Time in JMeter and in Elapsed column if extracted to .csv. Please help as I'm stuck.
We have tried conducting tests on multiple PCs within the office premises along with a PC remotely accessed on another site and we still get the same results. The number of thread and ramp-up period is both set to 1 to imitate a single user only.
Where a delta exists, it is certain to mean that two different items are being timed. It would help to understand on your front end are you timing to a standard metric, such as w3c domComplete, time to interactive, first contentful paint, some other location, and then compare where this comes into play on the drilldown on the performance tab of chrome. Odds are that there is a lot occuring that is not visible that is being captured by Jmeter.
You might also look for other threads on here on how jmeter operates as compared to a "real browser" There are differences which could come into play affecting your page comparisons, particularly if you have dozens/hundreds of elements that need to be downloaded to complete your page. Also, pay attention to third party components where you do not have permission to test their servers.
I can think of 2 possible causees:
Clear your browser history, especially browser cache. It might be the case you're getting HTTP Status 304 for all requests in browser because responses are being returned from the browser cache and no actual requests are being made while JMeter always uses "clean" session.
Pay attention to Connect Time and Latency metrics as it might be the case the server response time is low but the time for network packets to travel back and forth is very high.
Connect Time. JMeter measures the time it took to establish the connection, including SSL handshake. Note that connect time is not automatically subtracted from latency. In case of connection error, the metric will be equal to the time it took to face the error, for example in case of Timeout, it should be equal to connection timeout.
Latency. JMeter measures the latency from just before sending the request to just after the first response has been received. Thus the time includes all the processing needed to assemble the request as well as assembling the first part of the response, which in general will be longer than one byte. Protocol analysers (such as Wireshark) measure the time when bytes are actually sent/received over the interface. The JMeter time should be closer to that which is experienced by a browser or other application client.
So basically "Elapsed time = Connect Time + Latency + Server Processing Time"
In general given:
the same machine
clean browser session
and JMeter configured to behave like a real browser
you should get similar or equal timings for the same page
I want to stresstest a site using Jmeter, right now i am using WAMP and widnows.
What would be the best stresstesting settings for this? this is a page that will have alot of users, 100k users / day+
100k users per day is not that "lot" in my opinion:
100 000 / 24 == 4166.67 users per hour
4167 / 60 == 69.44 users per minute
69 / 60 = 1.15 users per second
The normal way of testing the application is starting with either 1 or expected amount of virtual users and gradually increase the load until one of the following conditions is met:
response time goes above acceptable level
application starts consuming > 80-90% of underlying hardware resources (CPU, RAM, Disk and/or Network IO)
application starts returning errors and the amount of errors exceeds threshold
Also I would recommend to test the following scenarios:
load test - put your application under anticipated load for short period to see how does performance metrics look like
stress test - determine the maximum amount of concurrent users that your application can handle keeping reasonable response times. You can also determine breaking point, i.e. when application starts returning errors or stops responding and whether it can recover
soak test - basically the same as point 1, but the load should sustain for a longer period, i.e. several hours or if time allows - several days. It will help to identify memory leaks if any
We are seeing inconsistent performance on Heroku that is unrelated to the recent unicorn/intelligent routing issue.
This is an example of a request which normally takes ~150ms (and 19 out of 20 times that is how long it takes). You can see that on this request it took about 4 seconds, or between 1 and 2 orders of magnitude longer.
Some things to note:
the database was not the bottleneck, and it spent only 25ms doing db queries
we have more than sufficient dynos, so I don't think this was the bottleneck (20 double dynos running unicorn with 5 workers each, we get only 1000 requests per minute, avg response time of 150ms, which means we should be able to serve (60 / 0.150) * 20 * 5 = 40,000 requests per minute. In other words we had 40x the capacity on dynos when this measurement was taken.
So I'm wondering what could cause these occasional slow requests. As I mentioned, anecdotally it seems to happen in about 1 in 20 requests. The only thing I can think of is there is a noisy neighbor problem on the boxes, or the routing layer has inconsistent performance. If anyone has additional info or ideas I would be curious. Thank you.
I have been chasing a similar problem myself, with not much luck so far.
I suppose the first order of business would to be to recommend NewRelic. It may have some more info for you on these cases.
Second, I suggest you look at queue times: how long your request was queued. Look at NewRelic for this, or do it yourself with the "start time" HTTP header that Heroku adds to your incoming request (just print now() minus "start time" as your queue time).
When those failed me in my case, I tried coming up with things that could go wrong, and here's a (unorthodox? weird?) list:
1) DNS -- are you making any DNS calls in your view? These can take a while. Even DNS requests for resolving DB host names, Redis host names, external service providers, etc.
2) Log performance -- Heroku collects all your stdout using their "Logplex", which it then drains to your own defined logdrains, services such as Papertrail, etc. There is no documentation on the performance of this, and writes to stdout from your process could block, theoretically, for periods while Heroku is flushing any buffers it might have there.
3) Getting a DB connection -- not sure which framework you are using, but maybe you have a connection pool that you are getting DB connections from, and that took time? It won't show up as query time, it'll be blocking time for your process.
4) Dyno performance -- Heroku has an add-on feature that will print, every few seconds, some server metrics (load avg, memory) to stdout. I used Graphite to graph those and look for correlation between the metrics and times where I saw increased instances of "sporadic slow requests". It didn't help me, but might help you :)
Do let us know what you come up with.