What's a great way to benchmark Apache locally on Linux? - performance

I've been developing a web-site that uses Django and MySQL; what I want to know is how many HTTP requests my server can handle serving certain pages.
I have been using siege but I'm not sure if that's a good benchmarking tool.

ab, the Apache HTTP server benchmarking tool. Many options. An example of use with ten concurrent requests:
% ab -n 20 -c 10 http://www.bortzmeyer.org/
...
Benchmarking www.bortzmeyer.org (be patient).....done
Server Software: Apache/2.2.9
Server Hostname: www.bortzmeyer.org
Server Port: 80
Document Path: /
Document Length: 208025 bytes
Concurrency Level: 10
Time taken for tests: 9.535 seconds
Complete requests: 20
Failed requests: 0
Write errors: 0
Total transferred: 4557691 bytes
HTML transferred: 4551113 bytes
Requests per second: 2.10 [#/sec] (mean)
Time per request: 4767.540 [ms] (mean)
Time per request: 476.754 [ms] (mean, across all concurrent requests)
Transfer rate: 466.79 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 22 107 254.6 24 854
Processing: 996 3301 1837.9 3236 8139
Waiting: 23 25 1.3 25 27
Total: 1018 3408 1795.9 3269 8164
Percentage of the requests served within a certain time (ms)
50% 3269
66% 4219
...
(In that case, network latency was the main slowness factor.)
ab reports itself in the User-Agent field so, in the log of the HTTP server, you'll see something like:
2001:660:3003:8::4:69 - - [28/Jul/2009:12:22:45 +0200] "GET / HTTP/1.0" 200 208025 "-" "ApacheBench/2.3" www.bortzmeyer.org

ab is a widely used benchmarking tool that comes with apache httpd

Grinder is pretty good. It lets you simulate coordinated load from several client machines, which is more meaningful than from a single machine.

There's also JMeter.

I've used httperf and it's quite easy to use. There's a peepcode screencast on how to use it as well.

Related

Samples failing on increasing number of users and decreasing Ramp-up Period

For Jmeter script, when I use
Number of Threads = 10
Ramp-up Period = 40
Loop Count = 1
Then 6 out of 40 samples failed.
When I increase the Ramp-up Period to 60 then all the samples pass.
For the failed requests, the response code returned is 522:
Sampler result
Thread Name: Liberty Insight 1-4
Sample Start: 2018-02-23 20:43:12 IST
Load time: 1
Connect Time: 0
Latency: 1
Size in bytes: 112
Sent bytes:584
Headers size in bytes: 112
Body size in bytes: 0
Sample Count: 1
Error Count: 1
Data type ("text"|"bin"|""):
Response code: 522
Response message:
Response headers:
HTTP/1.1 522
Server: nginx
Date: Fri, 23 Feb 2018 15:13:12 GMT
Content-Length: 0
Connection: keep-alive
HTTPSampleResult fields:
ContentType:
DataEncoding: null
I am unable to figure out the reason for this type of behaviour. Any pointers what could be issue for such type of behaviour?
If you choose Ramp-up Period = 40 with 10 threads calls to server are about 4 transaction a second.
When you use cloudflare services one of its feature is to prevent overload the server
There are a few main causes of this:
The origin server was too overloaded to respond.
The origin web server has a firewall that is blocking our requests, or packets are being dropped within the host’s network.
Error 522:
(source: cloudflare.com)
If you need to load test your server use different route than cloudflare, Consult your IT for such option.
If not reduce transaction per second rate
Ensure that the origin server isn’t overloaded. If it is, it could be dropping requests.

Requests and Threads understanding in JMeter logs

I am still confused with some JMeter logs displayed here. Can someone please give me some light into this?
Below is a log generated by JMeter for my tests.
Waiting for possible Shutdown/StopTestNow/Heapdump message on port 4445
summary + 1 in 00:00:02 = 0.5/s Avg: 1631 Min: 1631 Max: 1631 Err: 0 (0.00%) Active: 2 Started: 2 Finished: 0
summary + 218 in 00:00:25 = 8.6/s Avg: 816 Min: 141 Max: 1882 Err: 1 (0.46%) Active: 10 Started: 27 Finished: 17
summary = 219 in 00:00:27 = 8.1/s Avg: 820 Min: 141 Max: 1882 Err: 1 (0.46%)
summary + 81 in 00:00:15 = 5.4/s Avg: 998 Min: 201 Max: 2096 Err: 1 (1.23%) Active: 0 Started: 30 Finished: 30
summary = 300 in 00:00:42 = 7.1/s Avg: 868 Min: 141 Max: 2096 Err: 2 (0.67%)
Tidying up ... # Fri Jun 09 04:19:15 IDT 2017 (1496971155116)
Does this log means [ last step ] 300 requests were fired, 00.00:42 secs took for the whole tests, 7.1 threads/sec or 7.1 requests/sec fired?
How can i make sure to increase the TPS? Same tests were done in a different site and they are getting 132TPS for the same tests and on the same server. Can someone put some light into this?
In here, total number of requests is 300. Throughput is 7 requests per second. These 300 requests generated by your given number of threads in Thread group configuration. You can also see the number of active threads in the log results. These threads become active depend on your ramp-up time.
Ramp-up time is the speed at which users or threads arrive on your application.
Check this for an example: How should I calculate Ramp-up time in Jmeter
You can give enough duration in your script and also check the loop count forever, so that all of the threads will be hitting those requests in your application server until the test finishes.
When all the threads become active on the server, then they will hit those requests in server.
To increase the TPS, you must have to increase the number of threads because those threads will hit your desired requests in server.
It also depends on the response time of your requests.
Suppose,
If you have 500 virtual users and application response time is 1 second - you will have 500 RPS
If you have 500 virtual users and application response time is 2 seconds - you will have 250 RPS
If you have 500 virtual users and application response time is 500 ms - you will have 1000 RPS.
First of all, a little of theory:
You have Sampler(s) which should mimic real user actions
You have Threads (virtual users) defined under Thread Group which mimic real users
JMeter starts threads which execute samplers as fast as they can and generate certain amount of requests per second. This "request per second" value depends on 2 factors:
number of virtual users
your application response time
JMeter Summarizer doesn't tell the full story, I would recommend generating the HTML Reporting Dashboard from the .jtl results file, it will provide more comprehensive load test result data which is much easier to analyze looking into tables and charts, it can be done as simple as:
jmeter -g /path/to/testresult.jtl -o /path/to/dashboard/output/folder
Looking into current results, you achieved maximum throughput of 7.1 requests second with average response time of 868 milliseconds.
So in order to have more "requests per second" you need to increase the number of "virtual users". If you increase the number of virtual users and "requests per second" is not increasing - it means that you identified so called saturation point and your application is not capable of handling more.

why is lift framework so slow?

I am learning Lift framework. I used project template from git://github.com/lift/lift_25_sbt.git and started server with container:start sbt command.
This template application displays just simple menu. If i use ab from apache to measure performance, its pretty bad. I am missing something fundamental to improve performance?
C:\Program Files\Apache Software Foundation\httpd-2.0.64\Apache2\bin>ab -n 30 -c
10 http://127.0.0.1:8080/
Benchmarking 127.0.0.1 (be patient).....done
Server Software: Jetty(8.1.7.v20120910)
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /
Document Length: 2877 bytes
Concurrency Level: 10
Time taken for tests: 8.15625 seconds
Complete requests: 30
Failed requests: 0
Write errors: 0
Total transferred: 96275 bytes
HTML transferred: 86310 bytes
Requests per second: 3.74 [#/sec] (mean)
Time per request: 2671.875 [ms] (mean)
Time per request: 267.188 [ms] (mean, across all concurrent requests)
Transfer rate: 11.73 [Kbytes/sec] received
Are you running it in production mode? I found i had like 30 rps in devel, but over 250 in production mode. ( https://www.assembla.com/spaces/liftweb/wiki/Run_Modes )
as mentioned earlier, you should run Lift in production mode. This is the main key to get good performance. All templates are cached this way, and other optimizations apply.
if you want to measure something not abstract and theoretical, then you should give the JVM time to "warm up", apply it's JIT optimizations. So, you should apply ~thousand requests first and totally ignore them (must be a couple of seconds). After that, measure the real performance of an already-started server
there are some slight JVM optimizations, altrough they seem more like a hack to me, and give a boost not more than around 20%
other small hacks include serving static content with nginx, starting the application in a dedicated server instead of Simple Build Tool and such.

Is performance of Grails 2.0 really that awfully low?

I'm somewhat newbie for WEB development based on JVM stack, but future project will require specifically some JVM-based WEB engine. So I started looking on some ground to make things quickly and turned to try Grails. Things looked good from the book, but beeing impressed by really long startup time (grails run-app) I decided to test how this works under load. Here it is:
test app: follow few instruction here to make it from ground (takes 2 mins assuming you already have Grails and Tomcat installed):
_http://grails.org/Quick+Start
test case (with Apache benchmark - comes with Apache httpd - _http://httpd.apache.org):
ab.exe -n 500 -c _http://localhost:8080/my-project/book/create
(Note: this is just displays 2 input fields within styled container)
hardware: Intel i5 650 (4Core*3.2GHz) 8GB Ram & Win Server 2003 x64
The result is ..
Grails: 32 Req/Sec
Total transferred: 1380500 bytes
HTML transferred: 1297500 bytes
Requests per second: 32.45 [#/sec] (mean)
Time per request: 308.129 [ms] (mean)
Time per request: 30.813 [ms] (mean, across all concurrent requests)
Transfer rate: 87.51 [Kbytes/sec] received
(Only 32 Req/Sec with 100% of CPU saturation, this is a way too below my expectations for such hardware)
... Next - i tried to compare it for example with similar dummy JSF application (i took one here: _http://www.ibm.com/developerworks/library/j-jsf2/ - look for "Source code with JAR files", there is \jsf-example2\target\jsf-example2-1.0.war inside),
test case: ab.exe -n 500 -c 10 _http://localhost:8080/jsf/backend/listing.jsp
The result is ..
JSF: 400 Req/Sec
Total transferred: 5178234 bytes
HTML transferred: 5065734 bytes
Requests per second: 405.06 [#/sec] (mean)
Time per request: 24.688 [ms] (mean)
Time per request: 2.469 [ms] (mean, across all concurrent requests)
Transfer rate: 4096.65 [Kbytes/sec] received
... And finally goes raw dummy JSP (just for reference)
Jsp: 8000 req/sec:
<html>
<body>
<% for( int i = 0; i < 100; i ++ ) { %>
Dummy Jsp <%= i %> </br>
<% } %>
</body>
</html>
Result:
Total transferred: 12365000 bytes
HTML transferred: 11120000 bytes
Requests per second: 7999.90 [#/sec] (mean)
Time per request: 1.250 [ms] (mean)
Time per request: 0.125 [ms] (mean, across all concurrent requests)
Transfer rate: 19320.07 [Kbytes/sec] received
...
Am i missing something? ... and Grails app can run much better?
PS: I tried profiling my running Grails app with VisualVM, but got endless loop of messages like ...
Profiler Agent: Redefining 100 classes at idx 0, out of total 413
...
Profiler Agent: Redefining 100 classes at idx 0, out of total 769
...
And finally app just stopped working after few mins - so, looks like profiling Grails is no the choice for good diagnose.
Update - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
First of all I have to admin, yes i need to RTFM - i.e. 'grails run-app' is not the correct way to run Grails for performance measurement. After compiling WAR and deploying it to Tomcat performance is not that awfully low - it is just low. The metrics below are for concurrency of 1 user (I just wanted to check what is MAX performance of the framework in one thread and with no heavy load) and while reading other related posts here i came to "http://stackoverflow.com/questions/819684/jsf-and-spring-performance-vs-poor-jsp-performance" and decided to check Apache Wicket mentioned there - its performance is also included.
Use case is:
- ab.exe -n 500 -c 1 _http://localhost:8080/...
- server is Tomcat7 in vFabric tcServer Dev edition with 'insight' running on background
---------------------- tcServer Plain Tomcat 7 -c 10
/Grails/book/create 77 req/sec 130 req/sec 410 req/sec
/jsf/backend/listing.jsp 133 req/sec 194 req/sec 395 req/sec
/wicket/library/ 870 req/sec 1400 req/sec 5300 req/sec
So ... anyway there is something wrong with Grails. I have made some profiling using tcServer (thanks Karthick) - it looks like it is able only to trace 'Spring-based' actions and internal stacktrace for Grails is like following (for 2 requests - note: metrics are not stable - i bet accuracy of tcServer far from beeing perfect, but can be used just for inforamtion)
Total (81ms)
Filter: urlMapping (32ms)
-> SimpleGrailsController#handleRequest (26ms)
-> Render view "/book/create" (4ms)
Render view "/layouts/main.gsp" (47ms)
Total (79ms)
Filter: urlMapping (56ms) ->
-> SimpleGrailsController#handleRequest (4ms)
-> Render view "/book/create" (38ms)
Render view "/layouts/main.gsp" (22ms)
PS: it may happen that the root cause for bad performance in Grails are underlying 'Spring' libs, will check this in more details.
Are you running it with run-app?
http://grails.org/Deployment states:
"Grails should never be deployed using the grails run-app command as this sets Grails up in "development" mode which has additional overheads. "
try deploying your sample app to tomcat. grails run-app is for development only.
in which environment did you start the app? prod? dev?
do you use scaffolding?
i've tried it on my machine (core i7-2600k). a login page with 4 input fields, dynamic layouts and some other things. i've got 525 requests per second in the slower dev environment.
Yeah this is a benchmark by someone who doesn't know alot about grails or his environment; first he is running on Windows know for being bad at resource management which is why most web serves/app serves run in Linux environments.
Second, if he is using 'ab' to benchmark, then he doesn't have his proxy cache setup because after the first hit, the remainder of the hits would be cached and the he is now benchmarking his cache from my understanding of his setup.
So this all just looks like the benchmarking of a bad setup and a poor understanding of Grails. No offense intended.

Varnish and ESI, how is the performance?

Im wondering how the performance of th ESI module is nowadays? I've read some posts on the web that ESI performance on varnish were actually slower than the real thing.
Say i had a page with over 3500 esi includes, how would this perform? is esi designed for such usage?
We're using Varnish and ESI to embed sub-documents into JSON documents. Basically a response from our app-server looks like this:
[
<esi:include src="/station/best_of_80s" />,
<esi:include src="/station/herrmerktradio" />,
<esi:include src="/station/bluesclub" />,
<esi:include src="/station/jazzloft" />,
<esi:include src="/station/jahfari" />,
<esi:include src="/station/maximix" />,
<esi:include src="/station/ondalatina" />,
<esi:include src="/station/deepgroove" />,
<esi:include src="/station/germanyfm" />,
<esi:include src="/station/alternativeworld" />
]
The included resources are complete and valid JSON responses on their own. The complete list of all stations is about 1070. So when the cache is cold and a complete station list is the first request varnish issues 1000 requests on our backend. When the cache is hot ab looks like this:
$ ab -c 100 -n 1000 http://127.0.0.1/stations
[...]
Document Path: /stations
Document Length: 2207910 bytes
Concurrency Level: 100
Time taken for tests: 10.075 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 2208412000 bytes
HTML transferred: 2207910000 bytes
Requests per second: 99.26 [#/sec] (mean)
Time per request: 1007.470 [ms] (mean)
Time per request: 10.075 [ms] (mean, across all concurrent requests)
Transfer rate: 214066.18 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 11 7.3 9 37
Processing: 466 971 97.4 951 1226
Waiting: 0 20 16.6 12 86
Total: 471 982 98.0 960 1230
Percentage of the requests served within a certain time (ms)
50% 960
66% 985
75% 986
80% 988
90% 1141
95% 1163
98% 1221
99% 1229
100% 1230 (longest request)
$
100 rec/sec doesn't look that good but consider the size of the document. 214066Kbytes/sec oversaturates a 1Gbit interface well.
A single request with warm cache ab (ab -c 1 -n 1 ...) shows 83ms/req.
The backend itself is redis based. We're measuring a mean response time of 0.9ms [sic] in NewRelic. After restarting Varnish the first request with a cold cache (ab -c 1 -n 1 ...) shows 3158ms/rec. This means it takes Varnish and our backend about 3ms per ESI include when generating the response. This is a standard core i7 pizza box with 8 cores. I measured this while being under full load. We're serving about 150mio req/month this way with a hitrate of 0.9. These numbers suggest indeed that the ESI-includes are resolved in serial.
What you have to consider when designing a system like this is 1) that your backend is able to take the load after a Varnish restart when the cache is cold and 2) that usually your resources don't expire all at once. In case of our stations they expire every full hour but we're adding a random value of up to 120 seconds to the expiration header.
Hope that helps.
This isn't first-hand, but I'm led to believe that Varnish's current ESI implementation serialises include requests; i.e., they're not concurrent.
If that's the case, it would indeed suck for performance in the case you mention.
I'll try to get someone with first-hand experience to comment.
Parallel ESI requests are available in the **commercial** version of varnish: https://www.varnish-software.com/plus/parallel-esi/. The parallel nature of the fragment requests apparently makes the assembly of a page comprised of multiple fragments faster.
(this would be a comment but I have insufficient reputation to do that)

Resources