I'm trying to create an app which can efficiently write data into Azure Table. In order to test storage performance, I created a simple console app, which sends hardcoded entities in a loop. Each entry is 0.1 kByte. Data is sent in batches (100 items in each batch, 10 kBytes each batch). For every batch, I prepare entries with the same partition key, which is generated by incrementing a global counter - so I never send more than one request to the same partition. Also, I control a degree of parallelism by increasing/decreasing the number of threads. Each thread sends batches synchronously (no request overlapping).
If I use 1 thread, I see 5 requests per second (5 batches, 500 entities). At that time Azure portal metrics shows table latency below 100ms - which is quite good.
If I increase the number of treads up to 12 I see x12 increase in outgoing requests. This rate stays stable for a few minutes. But then, for some reason I start being throttled - I see latency increase and requests amount drop.
Below you can see account metrics - highlighted point shows 2K31 transactions (batches) per minute. It is 3850 entries per second. If threads are increased up to 50, then latency increases up to 4 seconds, and transaction rate drops to 700 requests per second.
According to documentation, I should be able to send up to 20K transaction per second within one account (my test account is used only for my performance test). 20K batches mean 200K entries. So the question is why I'm being throttled after 3K entries?
Test details:
Azure Datacenter: West US 2.
My location: Los Angeles.
App is written in C#, uses CosmosDB.Table nuget with the following configuration: ServicePointManager.DefaultConnectionLimit = 250, Nagles Algorithm is disabled.
Host machine is quite powerful with 1Gb internet link (i7, 8 cores, no high CPU, no high memory is observed during the test).
PS: I've read docs
The system's ability to handle a sudden burst of traffic to a partition is limited by the scalability of a single partition server until the load balancing operation kicks-in and rebalances the partition key range.
and waited for 30 mins, but the situation didn't change.
EDIT
I got a comment that E2E Latency doesn't reflect server problem.
So below is a new graph which shows not only E2E latency but also the server's one. As you can see they are almost identical and that makes me think that the source of the problem is not on the client side.
How many maximum numbers of active HTTP sessions (not the concurrent) tomcat 8 can handle? Tomcat 8 hosted on Linux, have only one web app containing the REST services. I have observed around 50 K active HTTP sessions during the day. I had used psi-probe to view them. The session timeout is 30 minutes. Each Http Session holds 1 MB of data. so 50 GB of session data. Heap is 450 GB, as required by the product - holds the multidimensional cube in the memory.
does this lead to any performance problems? Because I have observed frequent GCs and many Stop the Worlds ( more than 5 seconds ) 10-15 times a day.
I'm getting the following results, where the throughput does not have a change, even when I increase the number of threads.
Scenario#1:
Number of threads: 10
Ramp-up period: 60
Throughput: 5.8/s
Avg: 4025
Scenario#2:
Number of threads: 20
Ramp-up period: 60
Throughput: 7.8/s
Avg: 5098
Scenario#3:
Number of threads: 40
Ramp-up period: 60
Throughput: 6.8/s
Avg: 4098
The my JMeter file consists of a single ThreadGroup that contains a single GET.
When I perform the request for an endpoit where the response time faster (less than 300 ms) I can achieve throughput greater than 50 requests per seconds.
Can you see the bottleneck of this?
Is there a relationship between response time and throughput?
It's simple as JMeter user manual states:
Throughput = (number of requests) / (total time)
Now assuming your test contains only a single GET then Throughput will be correlate average response time of your requests.
Notice Ramp-up period: 60 will start to create threads over 1 minute, so it will add to total time of execution, you can try to reduce it to 10 or equal to Number of threads.
But you may have other sampler/controllers/component that may effect total time.
Also in your case especially in Scenario 3, maybe some requests failed then you are not calculating Throughput of successful transactions.
In ideal world if you increase number of threads by factor of 2x - throughput should increase by the same factor.
In reality the "ideal" scenario is hardly achievable so it looks like a bottleneck in your application. The process of identifying the bottleneck normally looks as follows:
Amend your test configuration to increase the load gradually so i.e. start with 1 virtual user and increase the load to i.e. 100 virtual users in 5 minutes
Run your test and look into Active Threads Over Time, Response Times Over Time and Server Hits Per Second listeners. This way you will be able to correlate increasing load with increasing response time and identify the point where performance starts degrading. See What is the Relationship Between Users and Hits Per Second? for more information
Once you figure out what is the saturation point you need to know what prevents your application from from serving more requests, the reasons could be in:
Application simply lacks resources (CPU, RAM, Network, Disk, etc.), make sure to monitor the aforementioned resources, this could be done using i.e JMeter PerfMon Plugin
The infrastructure configuration is not suitable for high loads (i.e. application or database thread pool settings incorrect)
The problem is in your application code (inefficient algorithm, large objects, slow DB queries). These items can be fetched using a profiler tool
Also make sure you're following JMeter Best Practices as it might be the case JMeter is not capable of sending requests fast enough due to either lack of resources on JMeter load generator side or incorrect JMeter configuration (too low heap, running test in GUI mode, using listeners, etc)
i want to support 7k requests per minute for my system . Considering there are network calls and database calls which might take around 4-5 seconds to complete . how should i configure task max threads and max connections to achieve that ?
This is just math.
7k requests/minute is roughly 120 requests/second.
If each request is taking 5s then you will have roughly 5 x 120 = 600 inflight requests.
That's 600 HTTP connections, 600 threads and possibly 600 database connections.
These numbers are a little simplistic but I think you get the picture.
Note the standard Linux stack size for each thread is 8MB, therefore 600 threads is going to want nearly 5GB of memory just for the stacks. This is configurable at the OS level - but how do you size it?
Therefore you're going to be up for some serious OS tuning if you're planning to run this on a single server instance.
i'm trying to figure out how to use ApacheBench and benchmark my website. I installed the default site project (it's ASP.NET MVC but please don't put stop reading if u're not a .NET person).
I didn't change anything. Add new project. Set confuration to RELEASE. Run without Debug. (so it's in LIVE mode). Yes, this is with the built in webserver, not the production grade IIS or Apache or whatever.
So here's the results :-
C:\Temp>ab -n 1000 -c 1 http://localhost:50035/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Server Software: ASP.NET
Server Hostname: localhost
Server Port: 50035
Document Path: /
Document Length: 1204 bytes
Concurrency Level: 1
Time taken for tests: 2.371 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 1504000 bytes
HTML transferred: 1204000 bytes
Requests per second: 421.73 [#/sec] (mean)
Time per request: 2.371 [ms] (mean)
Time per request: 2.371 [ms] (mean, across all concurrent requests)
Transfer rate: 619.41 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.1 0 16
Processing: 0 2 5.5 0 16
Waiting: 0 2 5.1 0 16
Total: 0 2 5.6 0 16
Percentage of the requests served within a certain time (ms)
50% 0
66% 0
75% 0
80% 0
90% 16
95% 16
98% 16
99% 16
100% 16 (longest request)
C:\Temp>
Now, i'm not sure exactly what I should be looking at.
Firstly, I after the number of requests a second. So if we have a requirement to handle 300 reqs/sec, then is this saying it handles and average of 421 req's a sec?
Secondly, what is the reason for adding more concurrent? As in, if i have 1000 hits on 1 concurrent, how does that differ to 500 on 2 concurrent? Is it to test if there's any code that blocks other requests?
Lastly, is there anything important I've missed from the results which I should take note of?
Thanks :)
what is the reason for adding more
concurrent? As in, if i have 1000 hits
on 1 concurrent, how does that differ
to 500 on 2 concurrent? Is it to test
if there's any code that blocks other
requests?
It's a bit about that, yes : your application is probably doing things where concurrency can bring troubles.
A couple of examples :
a page is trying to access a file -- locking it in the process ; it means if another page has to access the same file, it'll have to wait until the first page has finished working with it.
quite the same for database access : if one page is writing to a database, there is some kind of locking mecanisms (be it table-based, or row-based, or whatever, depending on your DBMS)
Testing with a concurrency of one is OK... As long as your website will never have more than one user at the same time ; which is quite not realistic, I hope for you.
You have to think about how many users will be on site at the same time, when it's in production -- and adjust the concurrency ; just remember that 5 users at the same time on your site doesn't mean you have to test with a concurrency of 5 with ab :
real users will wait a couple of seconds between each request (time to read the page, click on a link, ...)
ab doesn't wait at all : each time a page is loaded (ie, a request is finished), it launches another request !
Also, two other things :
ab only tests for one page -- real users will navigate on the whole website, which could cause concurrency problems you would not have while testing only one page
ab only loads one page : it doesn't request external resources (think CSS, images, JS, ...) ; which means you'll have lots of other requests, even if not realy costly, when your site is in production.
As a sidenote : you might want to take a look at other tools, which can do far more complete tests, like siege, Jmeter, or OpenSTA : ab is really nice when you want to measure if something you did is optimizing your page or not ; but if you want to simulate "real" usage of your site, those are far more adapted.
Yes, if you want to know how many requests per second your site is able to serve, look at the "Requests per second" line.
In your case it's really quite simple since you ran ab with concurrency of 1. Each request, on average took only 2.371ms. 421 of those, one after the other, take 1 second.
You really should play with the concurrency a little bit, to accurately gauge the capacity of your site.
Up to a certain degree of concurrency you'd expect the throughput to increase, as multiple requests get handled in parallel by IIS.
E.g. if your server has multiple CPUs/cores. Also if a page relies on external IO (middle tier service, or DB calls) the cpu can work on one request, while another is waiting for IO to complete.
At a certain point requests/sec will level off, with increasing concurrency, and you'll see latency increase. Increase concurrency even more and you'll see your throughput (req/sec) decrease, as the server has to devote more resources to juggling all these concurrent requests.
All that said, the majority of your requests return in about 2ms. That's pretty darn fast, so I am guessing there is not much going on in terms of DB or middle tier calls, and your system is probably maxed out on cpu when the test is running (or something is wrong, and failing really fast. Are you sure ab gets the response page you intend it to? I.e. is the page you think you are testing 1204 bytes large?).
Which brings up another point: ab itself consumes cpu too, especially once you up the concurrency. So you want to run ab on another machine.
Also, should your site make external calls to middle tier services or DBs, you want to adjust your machine.config to optimize the number of threads IIS allocates:http://support.microsoft.com/default.aspx?scid=kb;en-us;821268
And just a little trivia: the time taken statistics is done in increments of ~16ms, as that appears to be the granularity of the timer used. I.e. 80% of your responses did not take 0ms, they took some time <16ms.