Azure Table Increased Latency - performance

I'm trying to create an app which can efficiently write data into Azure Table. In order to test storage performance, I created a simple console app, which sends hardcoded entities in a loop. Each entry is 0.1 kByte. Data is sent in batches (100 items in each batch, 10 kBytes each batch). For every batch, I prepare entries with the same partition key, which is generated by incrementing a global counter - so I never send more than one request to the same partition. Also, I control a degree of parallelism by increasing/decreasing the number of threads. Each thread sends batches synchronously (no request overlapping).
If I use 1 thread, I see 5 requests per second (5 batches, 500 entities). At that time Azure portal metrics shows table latency below 100ms - which is quite good.
If I increase the number of treads up to 12 I see x12 increase in outgoing requests. This rate stays stable for a few minutes. But then, for some reason I start being throttled - I see latency increase and requests amount drop.
Below you can see account metrics - highlighted point shows 2K31 transactions (batches) per minute. It is 3850 entries per second. If threads are increased up to 50, then latency increases up to 4 seconds, and transaction rate drops to 700 requests per second.
According to documentation, I should be able to send up to 20K transaction per second within one account (my test account is used only for my performance test). 20K batches mean 200K entries. So the question is why I'm being throttled after 3K entries?
Test details:
Azure Datacenter: West US 2.
My location: Los Angeles.
App is written in C#, uses CosmosDB.Table nuget with the following configuration: ServicePointManager.DefaultConnectionLimit = 250, Nagles Algorithm is disabled.
Host machine is quite powerful with 1Gb internet link (i7, 8 cores, no high CPU, no high memory is observed during the test).
PS: I've read docs
The system's ability to handle a sudden burst of traffic to a partition is limited by the scalability of a single partition server until the load balancing operation kicks-in and rebalances the partition key range.
and waited for 30 mins, but the situation didn't change.
EDIT
I got a comment that E2E Latency doesn't reflect server problem.
So below is a new graph which shows not only E2E latency but also the server's one. As you can see they are almost identical and that makes me think that the source of the problem is not on the client side.

Related

JMeter sending less requests than expected

I'm using jmeter to generate a performance test, to keep things short and straight i read the initial data from a json file, i have a single thread group in which after reading the data i randomize certain values to prevent data duplication when i need it, then i'm passing the final data to the endpoint using variables, this will end up in a json body that is recieved by the endpoint and it will basically generate a new transaction in the database. Also i added a constant timer to add a 7 seconds delay between requests, with a test duration of 10 minutes and no ramp up, i calculated the requests per second like this:
1 minute has 60 seconds and i have a delay of 7 seconds per request then it's logical to say that every minute i'm sending approximately 8.5 requests per minute, this is my calculation (60/7) = 8.5 now if the test lasts for 10 minutes then i multiply (8.5*10) = 85 giving me a total of 85 transactions in 10 minutes, so i should be able to see that exact same amount of transactions created in the database after the test completes.
This is true when i'm running 10-20-40 users, after the load test run i query the db and i get the exact same number of transaction however, as i increase the users in the thread group this doesn't happen anymore, for example if i set 1000 users i should be able to generate 8500 transactions in 10 minutes, but this is not the case, the db only creates around 5.1k transactions.
What is happening, what is wrong? Why it initially works as expected and as i increase the users it doesn't? I can provide more information if needed. Please help.
There could be 2 possible reasons for this:
You discovered your application bottleneck. When you add more users the application response time increases therefore throughput decreases. There is a term called saturation point which stands for the maximum performance of the system, if you go beyond this point - the system will respond slower and you will get less TPS than initially. From the application under test side you should take a look into the following areas:
It might be the case your application simply lacks resources (CPU, RAM, Network, etc.), make sure that it has enough headroom to operate using i.e. JMeter PerfMon Plugin
Your application middleware (application server, database, load balancer, etc.) are not properly set up for the high loads. Identify your application infrastructure stack and make sure to follow performance tuning guidelines for each component
It is also possible that your application code needs optimization, you can detect the most time/resource consuming functions, largest objects, slowest DB queries, idle times, etc. using profiling tools
JMeter is not sending requests fast enough
Just like for the application under test check that JMeter machine(s) have enough resources (CPU, RAM, etc.)
Make sure to follow JMeter Best Practices
Consider going for Distributed Testing
Can you please check once CPU and Memory utilization(RAM and java heap utilization) of jmeter load generator while running jemter for 1000 users? If it is higher or reaching to max then it may affect requests/sec. Also just to confirm requests/sec from Jmeter side, can you please add listener in Jmeter script to track Hit/sec or TPS?
This will also be true(8.5K requests in 10 mins test duration) if your API response time is 1 second and also you have provided enough ramp-up time for those 1000 users.
So possible reason is:
You did not provide enough ramp-up time for 1000 users.
Your API average response time is more than 1 second while you performing tests for 1000 users.
Possible workarounds:
First, try to measure the API response time for 1 user.
Then calculate accordingly that how many users you need to reach 8500 requests in 10 mins. Use this formula:
TPS* max response time in second
Give proper ramp-up time for 1000 users. Check this thread to understand how you should calculate ramp-up time.
Check that your load generator is able to generate 1000 users without any memory or health (i.e CPU usage) issues. If requires, try to use distributed architecture.

Jmeter tps adjustment

Do we need to adjust Throughput given by jmeter, to find out the actual tps of the system
For eg : I am getting 100 tps for concurrent 250 users. This ran for 10 hrs. Can I come to a conclusion like my software can handle 100 transactions per second. Or else do I need to do some adjustment and need to get a value. Why i am asking this because when load started, system will take sometime to perform in adequate level (warm up time). If so how to do this. Please help me to understand this.
By default JMeter sends requests as fast as it can, the main factor which are affecting TPS rate are:
number of threads (virtual users) - this you can define in Thread Group
your application response time - this is not something you can control
Ideally when you increase number of threads the number of TPS should increase by the same factor, i.e. if you have 250 users and getting 100 tps you should get 200 tps for 500 users. If this is not the case - these 500 users are beyond saturation point and your application bottleneck is somewhere between 250 and 500 users (if not earlier).
With regards to "warm up" time - the recommended approach of conducting the load is doing it gradually, this way you will allow your application to get prepared to increasing load, warm up caches, let JIT compiler/optimizer to go their work, etc. Moreover this way you will be able to correlate the increasing load with increasing/decreasing throughput, response time, number of errors, etc. while having 250 users released at once doesn't tell the full story. See
The system warmup period varies from one system to the other. Warm up period is where configurations are cached, different libraries are initialized (eg. Builder.init()) and other initial functions that usually don't happen for subsequent calls. If you study results of the load test, there is a slow period at the very beginning. For most systems, it could be as small as 5 to 10 minutes. These values could be even negligible if the test is as long as 10 hours. But then again, average calculation can be effected if the results give extremely low values at the start (it always depend on the jump from initial warming up period to normal operations).
As per jmeter configurations this thread may explain the configuration. How to exclude warmup time from JMeter summary?

How are server hits/second more than active thread count? | Jmeter

I'm running a load test to test the throughput of a server by making HTTP requests through JMeter.
I'm using the Thread Stepper plugin that allows me to increase the number of threads I'm using to make the requests after a particular time period.
The following graphs show the number of active threads with time and another one shows the corresponding hits per second I was able to make.
The third graph shows the latencies of the requests. The fourth one shows the response per second.
I'm not able to correlate the four graphs together.
In the server hits per second, I'm able to make a maximum of around 240 requests per second with only 50 active threads. However, the latency of the request is around 1 second.
My understanding is that a single thread would make a request, and then wait for the response to return before making the second request.
Since the minimum latency in my case is around 1 second, how is JMeter able to hit 240 requests per second with only 50 threads?
Server hits per second, max of 240 with only 50 threads. How?
Response latencies (minimum latency of 1 sec)
Active threads with time (50 threads when server hits are 240/sec)
Response per second (max of 300/sec, how?)
My expectation is that the reasons could be in:
Response time is less than 1 second therefore JMeter is able to send more than one request per second with every thread
It might also be connected with HTTP redirections and/or Embedded Resources processing, as per plugin's documentation:
Hits uncludes child samples from transactions and embedded resources hits.
For example this single HTTP Request with 1 single user results in 20 sub-samples which are being counted by the "Server Hits Per Second" plugin.
I took some time at analyzing the four graphs you provided and it seems to make sense that Jmeter Graphs are plotted reasonably well (since you feel the Jmeter is plotting incorrectly I will try to explain why the graphs look normal to me) .Taking clue from the point 1 of the answer that #Dmitri T provided I start the below analysis:
1 . Like pointed by #Dimitry T, the number of responses are coming in more faster than than the number of hits(requests) sent to the server; which can be seen from the Number of responses/second graph as the first batch of hits is sent at -between 50 to 70 from 0 to first five minutes . The responses for this set of requests come a a much faster rate in i.e at 60 to 90 from 0 to the first five minutes.. the same trend is observed for the set of hits fired from five to 10 minutes (responses come faster than the requests(hits) i.e 100 to 150 responses compared to 85 to 130 hits) ...Hence by the continuous tned the Load Generator is able to send more hits and more hits and more hits for the 50 active threads...which gives the upwards positive slope coupled with the Thread Stepper plugin's capability..
Hence the hits and responses graph are in lock step pattern(marching in unison) with the response graph having a better slope compared to hits per second graph.
This upwards happy happy trend continues till the queuing effect due to entire processing capacity use ,takes place at 23 minutes. This point in time all the graphs seems to have a opposite effect of what they were doing up till now i.e for 22.59 minutes.
The response latency (i.e the time taken to get the response is increased from 23rd minute on . At the same time there is a drop in hits per second(maybe due to not enough threads available to load generator o fire next request as they(threads aka users) are in queue and have not exited the process to make the next request). This drop in requests have dropped the rate of receiving responses as seen from the number of responses graph. But still you can see "service center" still processing the requests efficiently i.e sending back request faster the arriving rate i.e as per queuing theory the service rate is faster then the arrival rate and hence reinforcing point 1 of our analysis.
At 60 users load .Something happens ..Queuing happens!!(Confirm this by checking drop in response time graph with Throughput graph drop at the same time.If yes then requests were piped-up at the server i.e queued.) and this is the point where all the service centers are busy.and hence a drop in response time which impact the user threads from being able to generate a new hits causing low in hits per second.
The error codes observed in number of responses per second graph namely the 400,403,500 and 504 seem to part of the response codes all, from the 10th user load onwards which may indicate a time bound or data issue(first 10 users of your csv have proper data in database and the rest don't)..
Or it could be with the "credit" or "debit" transaction since chances are both may conflict...or be deadlocked on a Bank account etc.
If you notice the nature of all the error codes they can be seen to be many where more volume of responses are received i.e till 23 minute and reduced in volume since the level of responses are less due to queuing from 23rd minute on wards.Hence directly proportional with response codes. The 504 (gateway timeout) error which is a sure sign of lot of time taken to process and the web server timing out means the load is high..so we can consider the load till 80 users ..i.e at 40th minute as a reasonable load bearing capacity of the system(Obliviously if more 504 errors are observed we can fix that point as the unstressed load the system can handle.)
***Important: Check your HITS per second Graph configuration :Another observation is that the metering parameter to plot the graph could be not in sync with the expected scale i.e per second .Since you are expecting Hits in seconds but in your Hits per second graph you per configuration to plot could be 500 ms i.e half a second.so this could cause the plotting to go up high i.e higher than 50hits per 50 users ..

Estimating maximum users that an application can support

I am analyzing a web application and want to predict the maximum users that application can support. Now i have the below numbers out of my load test execution
1. Response Time
2. Throughput
3. CPU
I have the application use case SLA
Response Time - 4 Secs
CPU - 65%
When i execute load test of 10 concurrent users (without Think Time) for a particular use case the average response time reaches 3.5 Seconds and CPU touches 50%. Next I execute load test of 20 concurrent users and response time reaches 6 seconds and CPU 70% thus surpassing the SLA.
The application server configuration is 4 core 7 GB RAM.
Going by the data does this suggests that the web application can support only 10 user at a time? Is there any formula or procedure which can suggest what is the maximum users the application can support.
TIA
"Concurrent users" is not a meaningful measurement, unless you also model "think time" and a couple of other things.
Think about the case of people reading books on a Kindle. An average reader will turn the page every 60 seconds, sending a little ping to a central server. If the system can support 10,000 of those pings per second, how many "concurrent users" is that? About 10,000 * 60, or 600,000. Now imagine that people read faster, turning pages every 30 seconds. The same system will only be able to support half as many "concurrent users". Now imagine a game like Halo online. Each user will be emitting multiple transactions / requests per second. In other words, user behavior matters a lot, and you can't control it. You can only model it.
So, for your application, you have to make a reasonable guess at the "think time" between requests, and add that to your benchmark. Only then will you start to approach a reasonable simulation. Other things to think about are session time, variability, time of day, etc.
Chapter 4 of the "Mature Optimization Handbook" discusses a lot of these issues: http://carlos.bueno.org/optimization/mature-optimization.pdf

Understanding RESTful Web Service stress test results

I'm trying to stress-test my Spring RESTful Web Service.
I run my Tomcat server on a Intel Core 2 Duo notebook, 4 GB of RAM. I know it's not a real server machine, but i've only this and it's only for study purpose.
For the test, I run JMeter on a remote machine and communication is through a private WLAN with a central wireless router. I prefer to test this from wireless connection because it would be accessed from mobile clients. With JMeter i run a group of 50 threads, starting one thread per second, then after 50 seconds all threads are running. Each thread sends repeatedly an HTTP request to the server, containing a small JSON object to be processed, and sleeping on each iteration for an amount of time equals to the sum of a 100 milliseconds constant delay and a random value of gaussian distribution with standard deviation of 100 milliseconds. I use some JMeter plugins for graphs.
Here are the results:
I can't figure out why mi hits per seconds doesn't pass the 100 threshold (in the graph they are multiplied per 10), beacuse with this configuration it should have been higher than this value (50 thread sending at least three times would generate 150 hit/sec). I don't get any error message from server, and all seems to work well. I've tried even more and more configurations, but i can't get more than 100 hit/sec.
Why?
[EDIT] Many time I notice a substantial performance degradation from some point on without any visible cause: no error response messages on client, only ok http response messages, and all seems to work well on the server too, but looking at the reports:
As you can notice, something happens between 01:54 and 02:14: hits per sec decreases, and response time increase, okay it could be a server overload, but what about the cpu decreasing? This is not compatible with the congestion hypothesis.
I want to notice that you've chosen very well which rows to display on Composite Graph. It's enough to make some conclusions:
Make note that Hits Per Second perfectly correlates with CPU usage. This means you have "CPU-bound" system, and the maximum performance is mostly limited by CPU. This is very important to remember: server resources spent by Hits, not active users. You may disable your sleep timers at all and still will receive the same 80-90 Hits/s.
The maximum level of CPU is somewhere at 80%, so I assume you run Windows OS (Win7?) on your machine. I used to see that it's impossible to achieve 100% CPU utilization on Windows machine, it just does not allow to spend the last 20%. And if you achieved the maximum, then you see your installation's capacity limit. It just has not enough CPU resources to serve more requests. To fight this bottleneck you should either give more CPU (use another server with higher level CPU hardware), or configure OS to let you use up to 100% (I don't know if it is applicable), or optimize your system (code, OS settings) to spend less CPU to serve single request.
For the second graph I'd suppose something is downloaded via the router, or something happens on JMeter machine. "Something happens" means some task is running. This may be your friend who just wanted to do some "grep error.log", or some scheduled task is running. To pin this down you should look at the router resources and jmeter machine resources at the degradation situation. There must be a process that swallows CPU/DISK/Network.

Resources