I'm running a stress test on a websocket server to measure how many clients it can serve simultaneously and on what depends that number.
The server implementation I'm using is pywebsocket, the extension for apache server.
Apparently, this creates a new thread for every new client.
The problem is I can only go up to 378 clients, always the same number (and pretty low), and for the next one I receive the following trace:
[2013-08-22 07:47:09,454] [ERROR] __main__.WebSocketServer: Exception in processing request from: ('::ffff:10.36.154.147', 41509, 0, 0)
Traceback (most recent call last):
File "/usr/lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 594, in process_request
t.start()
File "/usr/lib/python2.7/threading.py", line 495, in start
_start_new_thread(self.__bootstrap, ())
**error: can't start new thread**
I really don't know where this limit might come from, it seems to low to be the number of maximum threads for the process, which I just set to unlimited, or the maximum number of processes for the user, also now set to unlimited.
I also checked the apache2 configuration files and this is what I have in apache2.conf, should be enough:
MaxKeepAliveRequests 0
KeepAliveTimeout 5
<IfModule mpm_prefork_module>
StartServers 50
ServerLimit 2000
MinSpareServers 50
MaxSpareServers 2000
MaxClients 2000
MaxRequestsPerChild 2000
</IfModule>
<IfModule mpm_worker_module>
StartServers 50
ServerLimit 2000
MinSpareThreads 50
MaxSpareThreads 2000
ThreadLimit 0
ThreadsPerChild 2000
MaxClients 2000
MaxRequestsPerChild 2000
</IfModule>
<IfModule mpm_event_module>
StartServers 50
ServerLimit 2000
MinSpareThreads 50
MaxSpareThreads 2000
ThreadLimit 0
ThreadsPerChild 2000
MaxClients 2000
MaxRequestsPerChild 2000
</IfModule>
The server is an Amazon EC2 t1.micro instance with ubuntu.
What else can be causing this limit?
Try reducing ulimit -s to a much lower value than unlimited/default for whatever piece of code will create many threads, and make sure /proc/sys/kernel/threads-max is not lower then six figures.
Related
When i run this command on my VPS:
netstat -n|grep :80|cut -c 45-|cut -f 1 -d ':'|sort|uniq -c|sort -nr|more
i get this result:
207 222.73.144.194
89 191.96.249.54
58 191.96.249.53
21 2400
15 51.255.64.23
6 143.137.103.251
3 103.27.72.36
1 89.180.150.168
1 66.102.7.137
1 5.189.170.167
1 191.181.39.208
1 183.2.246.218
I think this command is showing the number of connections per IP to port 80.
Is this a DDoS atack?
Have you check another aspect (eg. cpu load, network throughput to specific ip(s) using iftop or iptraf)? If there's normal, maybe it's just web/http scanner to your web.
If you're use nginx, you can use limit_conn module, to queue the rest of ip(s) if didn't obey your policy.
If we want tu run some benchmark, the following
/ab.exe -n 500 -c 300 -v 1000 -k http://server:port/test.html
lead to an "connection refused":
Test aborted after 10 failures
apr_socket_connect(): Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte. (730061)
It is possible to deactive the "limit"? Which limit is applied? We see nothing in the Windows event log.
If we call less connections it works:
./ab.exe -n 500 -c 257 -k http://server:port/test.html
Does anyone knows ho to find the corresponding limit? And mabe how to deactivate, because we have not enough hosts to use different IPs for the request.
The solution is to uncomment the MPM config include in httpd.conf
# Server-pool management (MPM specific)
Include conf/extra/httpd-mpm.conf
Then locate the mpm_winnt_module and change the thread count from the current value to whatever you want.
<IfModule mpm_winnt_module>
ThreadsPerChild 1000
MaxRequestsPerChild 0
</IfModule>
Using gwan_linux64-bit.tar.bz2 under Ubuntu 12.04 LTS unpacking and running gwan
then pointing wrk at it (using a null file null.html)
wrk --timeout 10 -t 2 -c 100 -d20s http://127.0.0.1:8080/null.html
Running 20s test # http://127.0.0.1:8080/null.html
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 11.65s 5.10s 13.89s 83.91%
Req/Sec 3.33k 3.65k 12.33k 75.19%
125067 requests in 20.01s, 32.08MB read
Socket errors: connect 0, read 37, write 0, timeout 49
Requests/sec: 6251.46
Transfer/sec: 1.60MB
.. very poor performance, in fact there seems to be some kind of huge latency issue.
During the test gwan is 200% busy and wrk is 67% busy.
Pointing at nginx, wrk is 200% busy and nginx is 45% busy:
wrk --timeout 10 -t 2 -c 100 -d20s http://127.0.0.1/null.html
Thread Stats Avg Stdev Max +/- Stdev
Latency 371.81us 134.05us 24.04ms 91.26%
Req/Sec 72.75k 7.38k 109.22k 68.21%
2740883 requests in 20.00s, 540.95MB read
Requests/sec: 137046.70
Transfer/sec: 27.05MB
Pointing weighttpd at nginx gives even faster results:
/usr/local/bin/weighttp -k -n 2000000 -c 500 -t 3 http://127.0.0.1/null.html
weighttp - a lightweight and simple webserver benchmarking tool
starting benchmark...
spawning thread #1: 167 concurrent requests, 666667 total requests
spawning thread #2: 167 concurrent requests, 666667 total requests
spawning thread #3: 166 concurrent requests, 666666 total requests
progress: 9% done
progress: 19% done
progress: 29% done
progress: 39% done
progress: 49% done
progress: 59% done
progress: 69% done
progress: 79% done
progress: 89% done
progress: 99% done
finished in 7 sec, 13 millisec and 293 microsec, 285172 req/s, 57633 kbyte/s
requests: 2000000 total, 2000000 started, 2000000 done, 2000000 succeeded, 0 failed, 0 errored
status codes: 2000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 413901205 bytes total, 413901205 bytes http, 0 bytes data
The server is a virtual 8 core dedicated server (bare metal), under KVM
Where do I start looking to identify the problem gwan is having on this platform ?
I have tested lighttpd, nginx and node.js on this same OS, and the results are all as one would expect. The server has been tuned in the usual way with expanded ephemeral ports, increased ulimits, adjusted time wait recycling etc.
Nov. 7 UPDATE: We have fixed the empty-file issue in G-WAN v4.11.7 and G-WAN is now twice faster (with the www cache disabled) than Nginx at this game too.
Recent releases of G-WAN are faster than Nginx with small and large files, and the G-WAN caches are disabled by default in order to make it easier for people to compare G-WAN with other servers like Nginx.
Nginx has a few caching features (a fd cahe to skip stat() calls and a memcached-based module) but both are necessarily much slower than G-WAN's local cache.
Disabling caching was also desirable for certain applications like CDNs. Other applications like AJAX applications greatly benefit from G-WAN caching capabilities so caching can be re-enabled at will, even on a per-request basis.
Hope this clarifies this question.
"reproducing the performance claims"
First, the title is misleading as the poorly documented* test above does not use the same tools nor the HTTP resources fetched by G-WAN tests.
[*] where is your nginx.conf file? what are the HTTP response headers of the two servers? what is your "bare metal" 8-Core CPU?
G-WAN tests are based on ab.c, a wrapper written by the G-WAN Team for weighttp (a test tool made by the Lighttpd server Team) because the information disclosed by ab.c is much more informative.
Second, the tested file "null.html" is... an empty file.
We won't waste time to discuss the irrelevance of such a test (how many empty HTML files your Web site is serving?) but it is likely to be the reason of the observed "poor performance".
G-WAN was not created to serve empty files (and we never tried nor ewre ever asked to do this). But we will surely add this feature to avoid the confusion created by such a test.
As you want to "check the claims" I would encourage you to use weighttp (the fastest HTTP load tool in your test) with a 100.bin file (a 100-byte file with an uncompressible MIME type: no Gzip will be involved here).
With a non-null file Nginx is massively slower than G-WAN, even in independent tests.
We did not know about wrk so far but it seems to be a tool made by the Nginx team:
"wrk was written specifically to try and push nginx to it's limits,
and in it's first round of tests was pushed up to 0.5Mr/s."
UPDATE (a day later)
Since you did not bother to publish any more data, we did it:
wrk weighttp
----------------------- -----------------------
Web Server 0.html RPS 100.html RPS 0.html RPS 100.html RPS
---------- ---------- ------------ ---------- ------------
G-WAN 80,783.03 649,367.11 175,515 717,813
Nginx 198,800.93 179,939.40 184,046 199,075
Like in your test, we can see that wrk is slightly slower than weighttp.
We can also see that G-WAN is faster than Nginx with both HTTP load tools.
Here are the detailled results:
G-WAN
./wrk -c300 -d3 -t6 "http://127.0.0.1:8080/0.html"
Running 3s test # http://127.0.0.1:8080/0.html
6 threads and 300 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.87ms 5.30ms 80.97ms 99.53%
Req/Sec 14.73k 1.60k 16.33k 94.67%
248455 requests in 3.08s, 55.68MB read
Socket errors: connect 0, read 248448, write 0, timeout 0
Requests/sec: 80783.03
Transfer/sec: 18.10MB
./wrk -c300 -d3 -t6 "http://127.0.0.1:8080/100.html"
Running 3s test # http://127.0.0.1:8080/100.html
6 threads and 300 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 263.15us 381.82us 16.50ms 99.60%
Req/Sec 115.55k 14.38k 154.55k 82.70%
1946700 requests in 3.00s, 655.35MB read
Requests/sec: 649367.11
Transfer/sec: 218.61MB
weighttp -kn300000 -c300 -t6 "http://127.0.0.1:8080/0.html"
progress: 100% done
finished in 1 sec, 709 millisec and 252 microsec, 175515 req/s, 20159 kbyte/s
requests: 300000 total, 300000 started, 300000 done, 150147 succeeded, 149853 failed, 0 errored
status codes: 150147 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 35284545 bytes total, 35284545 bytes http, 0 bytes data
weighttp -kn300000 -c300 -t6 "http://127.0.0.1:8080/100.html"
progress: 100% done
finished in 0 sec, 417 millisec and 935 microsec, 717813 req/s, 247449 kbyte/s
requests: 300000 total, 300000 started, 300000 done, 300000 succeeded, 0 failed, 0 errored
status codes: 300000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 105900000 bytes total, 75900000 bytes http, 30000000 bytes data
Nginx
./wrk -c300 -d3 -t6 "http://127.0.0.1:8080/100.html"
Running 3s test # http://127.0.0.1:8080/100.html
6 threads and 300 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.54ms 1.16ms 11.67ms 72.91%
Req/Sec 34.47k 6.02k 56.31k 70.65%
539743 requests in 3.00s, 180.42MB read
Requests/sec: 179939.40
Transfer/sec: 60.15MB
./wrk -c300 -d3 -t6 "http://127.0.0.1:8080/0.html"
Running 3s test # http://127.0.0.1:8080/0.html
6 threads and 300 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.44ms 1.15ms 9.37ms 75.93%
Req/Sec 38.16k 8.57k 62.20k 69.98%
596070 requests in 3.00s, 140.69MB read
Requests/sec: 198800.93
Transfer/sec: 46.92MB
weighttp -kn300000 -c300 -t6 "http://127.0.0.1:8080/0.html"
progress: 100% done
finished in 1 sec, 630 millisec and 19 microsec, 184046 req/s, 44484 kbyte/s
requests: 300000 total, 300000 started, 300000 done, 300000 succeeded, 0 failed, 0 errored
status codes: 300000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 74250375 bytes total, 74250375 bytes http, 0 bytes data
weighttp -kn300000 -c300 -t6 "http://127.0.0.1:8080/100.html"
progress: 100% done
finished in 1 sec, 506 millisec and 968 microsec, 199075 req/s, 68140 kbyte/s
requests: 300000 total, 300000 started, 300000 done, 300000 succeeded, 0 failed, 0 errored
status codes: 300000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 105150400 bytes total, 75150400 bytes http, 30000000 bytes data
Nginx configuration file trying to match G-WAN's behavior
# ./configure --without-http_charset_module --without-http_ssi_module
# --without-http_userid_module --without-http_rewrite_module
# --without-http_limit_zone_module --without-http_limit_req_module
user www-data;
worker_processes 6;
worker_rlimit_nofile 500000;
pid /var/run/nginx.pid;
events {
# tried other values up to 100000 without better results
worker_connections 4096;
# multi_accept on; seems to be slower
multi_accept off;
use epoll;
}
http {
charset utf-8; # HTTP "Content-Type:" header
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 10;
keepalive_requests 10; # 1000+ slows-down nginx enormously...
types_hash_max_size 2048;
include /usr/local/nginx/conf/mime.types;
default_type application/octet-stream;
gzip off; # adjust for your tests
gzip_min_length 500;
gzip_vary on; # HTTP "Vary: Accept-Encoding" header
gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
# cache metadata (file time, size, existence, etc) to prevent syscalls
# this does not cache file contents. It should helps in benchmarks where
# a limited number of files is accessed more often than others (this is
# our case as we serve one single file fetched repeatedly)
# THIS IS ACTUALY SLOWING-DOWN THE TEST...
#
# open_file_cache max=1000 inactive=20s;
# open_file_cache_errors on;
# open_file_cache_min_uses 2;
# open_file_cache_valid 300s;
server {
listen 127.0.0.1:8080;
access_log off;
# only log critical errors
#error_log /usr/local/nginx/logs/error.log crit;
error_log /dev/null crit;
location / {
root /usr/local/nginx/html;
index index.html;
}
location = /nop.gif {
empty_gif;
}
location /imgs {
autoindex on;
}
}
}
Comments are welcome - especially from Nginx experts - to have a discussion based on this fully-documented test.
I am running centos 5.5 with 768mb ram. i keep getting server reached MaxClients setting, consider raising the MaxClients setting in the logs also apache runs really slow. when i look at cacti graphs it shows the server is not even using all the resources.. here is the current configuration
<IfModule prefork.c>
StartServers 8
MinSpareServers 5
MaxSpareServers 10
ServerLimit 1024
MaxClients 768
MaxRequestsPerChild 4000
</IfModule>
<IfModule worker.c>
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
</IfModule>
free -m
total used free shared buffers cached
Mem: 768 352 415 0 0 37
-/+ buffers/cache: 315 452
Swap: 0 0 0
top - 11:03:54 up 41 days, 11:53, 1 user, load average: 0.05, 0.03, 0.00
Tasks: 35 total, 1 running, 34 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.3%st
Mem: 786432k total, 389744k used, 396688k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 38284k cached
I have tried the following but the server responds very slowly
<IfModule worker.c>
#StartServers 2
#MaxClients 150
#MinSpareThreads 25
#MaxSpareThreads 75
#ThreadsPerChild 25
#MaxRequestsPerChild 0
StartServers 20
MaxClients 1024
ServerLimit 1024
MinSpareThreads 128
MaxSpareThreads 768
ThreadsPerChild 64
MaxRequestsPerChild 0
</IfModule>
free -m
total used free shared buffers cached
Mem: 768 324 443 0 0 37
-/+ buffers/cache: 286 481
Swap: 0 0 0
#regilero
I have updated to
<IfModule prefork.c>
StartServers 12
MinSpareServers 12
MaxSpareServers 12
MaxClients 50
MaxRequestsPerChild 300
</IfModule>
using top i see
Tasks: 36 total, 1 running, 35 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 786432k total, 613180k used, 173252k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 76488k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 10364 92 60 S 0.0 0.0 1:09.53 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd/808
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper/808
124 root 16 -4 12620 8 4 S 0.0 0.0 0:00.00 udevd
533 root 20 0 95504 5692 228 S 0.0 0.7 4:02.94 memcached
546 root 20 0 5924 332 276 S 0.0 0.0 6:54.51 syslogd
557 root 20 0 101m 1456 868 S 0.0 0.2 13:18.64 snmpd
570 root 20 0 62640 316 208 S 0.0 0.0 2:39.56 sshd
579 root 20 0 21656 24 20 S 0.0 0.0 0:00.00 xinetd
589 root 20 0 12072 12 8 S 0.0 0.0 0:00.05 mysqld_safe
940 mysql 20 0 559m 164m 3832 S 0.3 21.5 209:33.88 mysqld
1015 root 20 0 20880 200 132 S 0.0 0.0 0:10.48 crond
1023 root 20 0 46748 4 0 S 0.0 0.0 0:00.00 saslauthd
1024 root 20 0 46748 4 0 S 0.0 0.0 0:00.00 saslauthd
3605 root 20 0 62832 2168 636 S 0.0 0.3 0:02.58 sendmail
3613 smmsp 20 0 57712 1648 504 S 0.0 0.2 0:00.01 sendmail
17610 root 20 0 85932 3312 2600 S 0.0 0.4 0:00.02 sshd
17612 mcmap 20 0 86072 1760 1012 S 0.0 0.2 0:00.17 sshd
17613 mcmap 20 0 12076 1656 1292 S 0.0 0.2 0:00.01 bash
17637 root 20 0 45052 1432 1120 S 0.0 0.2 0:00.00 su
17638 root 20 0 12180 1800 1324 S 0.0 0.2 0:00.08 bash
17740 root 20 0 246m 9264 4516 S 0.0 1.2 0:00.19 httpd
18264 apache 20 0 282m 43m 4940 S 0.0 5.7 0:00.56 httpd
18514 apache 20 0 279m 40m 4832 S 0.0 5.3 0:01.47 httpd
18518 apache 20 0 273m 36m 4396 S 0.0 4.7 0:00.45 httpd
18528 apache 20 0 251m 13m 3660 S 0.0 1.8 0:00.41 httpd
18529 apache 20 0 278m 40m 4340 S 0.0 5.3 0:00.99 httpd
18530 apache 20 0 278m 40m 4268 S 0.0 5.3 0:00.67 httpd
18548 apache 20 0 272m 33m 3516 S 0.0 4.4 0:00.28 httpd
18552 apache 20 0 280m 42m 3684 S 0.0 5.5 0:00.48 httpd
18553 apache 20 0 271m 33m 3768 S 0.0 4.3 0:00.45 httpd
18555 apache 20 0 274m 36m 3672 S 0.0 4.7 0:00.58 httpd
18572 apache 20 0 247m 9020 2856 S 0.0 1.1 0:00.01 httpd
18578 apache 20 0 280m 42m 3684 S 0.0 5.6 0:00.76 httpd
18589 apache 20 0 246m 5452 676 S 0.0 0.7 0:00.00 httpd
18588 root 20 0 12624 1216 932 R 0.0 0.2 0:00.06
free -m
total used free shared buffers cached
Mem: 768 578 189 0 0 74
-/+ buffers/cache: 504 263
Swap: 0 0 0
Just added current picture of cacti result last 4 hours. busy periods are monday tuesday. So i will wait till next week to see further results of the config change. but it looks like an improvement as before i only had max 10 threads available. Looking at this do you think i can make more improvment?
free -m
total used free shared buffers cached
Mem: 768 619 148 0 0 49
-/+ buffers/cache: 570 197
Swap: 0 0 0
NEW TEST
On a 2GB Ram VPS box i have now set prefork to
StartServers 20
MinSpareServers 20
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000
today morning my memcache server died from
Nov 20 09:28:40 vps22899094 kernel: Out of memory: Kill process 12517 (memcached) score 81 or sacrifice child
Nov 20 09:28:40 vps22899094 kernel: Killed process 12517, UID 497, (memcached) total-vm:565252kB, anon-rss:42940kB, file-rss:44kB
What should the optimal values be to set in apache?
#/etc/sysconfig/memcached
PORT="11211"
USER="memcached"
MAXCONN="1024"
CACHESIZE="1024"
OPTIONS="-l 127.0.0.1"
/etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
bind-address=127.0.0.1
#script
thread_concurrency=2
query_cache_size = 16M
query_cache_type=1
query_cache_limit=5M
# MyISAM #
#key-buffer-size = 32M
#myisam-recover = FORCE,BACKUP
# SAFETY #
#max-allowed-packet = 16M
#max-connect-errors = 1000000
# CACHES AND LIMITS #
tmp-table-size = 32M
max-heap-table-size = 32M
#query-cache-type = 0
#query-cache-size = 0
max-connections = 50
thread-cache-size = 16
#open-files-limit = 65535
#table-definition-cache = 1024
#table-open-cache = 2048
# INNODB #
#innodb-flush-method = O_DIRECT
#innodb-log-files-in-group = 2
#innodb-log-file-size = 5M
#innodb-flush-log-at-trx-commit = 1
#innodb-file-per-table = 1
#innodb-buffer-pool-size = 921M
# LOGGING #
log-error = /var/log/mysqld.log
log-queries-not-using-indexes = 1
slow-query-log = 1
slow-query-log-file = /var/log/mysqld-slow.log
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
When you use Apache with mod_php apache is enforced in prefork mode, and not worker. As, even if php5 is known to support multi-thread, it is also known that some php5 libraries are not behaving very well in multithreaded environments (so you would have a locale call on one thread altering locale on other php threads, for example).
So, if php is not running in cgi way like with php-fpm you have mod_php inside apache and apache in prefork mode.
On your tests you have simply commented the prefork settings and increased the worker settings, what you now have is default values for prefork settings and some altered values for the shared ones :
StartServers 20
MinSpareServers 5
MaxSpareServers 10
MaxClients 1024
MaxRequestsPerChild 0
This means you ask apache to start with 20 process, but you tell it that, if there is more than 10 process doing nothing it should reduce this number of children, to stay between 5 and 10 process available. The increase/decrease speed of apache is 1 per minute. So soon you will fall back to the classical situation where you have a fairly low number of free available apache processes (average 2). The average is low because usually you have something like 5 available process, but as soon as the traffic grows they're all used, so there's no process available as apache is very slow in creating new forks. This is certainly increased by the fact your PHP requests seems to be quite long, they do not finish early and the apache forks are not released soon enough to treat another request.
See on the last graphic the small amount of green before the red peak? If you could graph this on a 1 minute basis instead of 5 minutes you would see that this green amount was not big enough to take the incoming traffic without any error message.
Now you set 1024 MaxClients. I guess the cacti graph are not taken after this configuration modification, because with such modification, when no more process are available, apache would continue to fork new children, with a limit of 1024 busy children. Take something like 20MB of RAM per child (or maybe you have a big memory_limit in PHP and allows something like 64MB or 256MB and theses PHP requests are really using more RAM), maybe a DB server... your server is now slowing down because you have only 768MB of RAM. Maybe when apache is trying to initiate the first 20 children you already reach the available RAM limit.
So. a classical way of handling that is to check the amount of memory used by an apache fork (make some top commands while it is running), then find how many parallel request you can handle with this amount of RAM (that mean parallel apache children in prefork mode). Let's say it's 12, for example. Put this number in apache mpm settings this way:
<IfModule prefork.c>
StartServers 12
MinSpareServers 12
MaxSpareServers 12
MaxClients 12
MaxRequestsPerChild 300
</IfModule>
That means you do not move the number of fork while traffic increase or decrease, because you always want to use all the RAM and be ready for traffic peaks. The 300 means you recyclate each fork after 300 requests, it's better than 0, it means you will not have potential memory leaks issues. MaxClients is set to 12 25 or 50 which is more than 12 to handle the ListenBacklog queue, which can enqueue some requests, you may take a bigger queue, but you would get some timeouts maybe (removed this strange sentende, I can't remember why I said that, if more than 12 requests are incoming the next one will be pushed in the Backlog queue, but you should set MaxClient to your targeted number of processes).
And yes, that means you cannot handle more than 12 parallel requests.
If you want to handle more requests:
buy some more RAM
try to use apache in worker mode, but remove mod_php and use php as a parallel daemon with his own pooler settings (this is called php-fpm), connect it with fastcgi. Note that you will certainly need to buy some RAM to allow a big number of parallel php-fpm process, but maybe less than with mod_php
Reduce the time spent in your php process. From your cacti graphs you have to potential problems: a real traffic peak around 11:25-11:30 or some php code getting very slow. Fast requests will reduce the number of parallel requests.
If your problem is really traffic peaks, solutions could be available with caches, like a proxy-cache server. If the problem is a random slowness in PHP then... it's an application problem, do you do some HTTP query to another site from PHP, for example?
And finally, as stated by #Jan Vlcinsky you could try nginx, where php will only be available as php-fpm. If you cannot buy RAM and must handle a big traffic that's definitively desserve a test.
Update: About internal dummy connections (if it's your problem, but maybe not).
Check this link and this previous answer. This is 'normal', but if you do not have a simple virtualhost theses requests are maybe hitting your main heavy application, generating slow http queries and preventing regular users to acces your apache processes. They are generated on graceful reload or children managment.
If you do not have a simple basic "It works" default Virtualhost prevent theses requests on your application by some rewrites:
RewriteCond %{HTTP_USER_AGENT} ^.*internal\ dummy\ connection.*$ [NC]
RewriteRule .* - [F,L]
Update:
Having only one Virtualhost does not protect you from internal dummy connections, it is worst, you are sure now that theses connections are made on your unique Virtualhost. So you should really avoid side effects on your application by using the rewrite rules.
Reading your cacti graphics, it seems your apache is not in prefork mode bug in worker mode. Run httpd -l or apache2 -l on debian, and check if you have worker.c or prefork.c. If you are in worker mode you may encounter some PHP problems in your application, but you should check the worker settings, here is an example:
<IfModule worker.c>
StartServers 3
MaxClients 500
MinSpareThreads 75
MaxSpareThreads 250
ThreadsPerChild 25
MaxRequestsPerChild 300
</IfModule>
You start 3 processes, each containing 25 threads (so 3*25=75 parallel requests available by default), you allow 75 threads doing nothing, as soon as one thread is used a new process is forked, adding 25 more threads. And when you have more than 250 threads doing nothing (10 processes) some process are killed. You must adjust theses settings with your memory. Here you allow 500 parallel process (that's 20 process of 25 threads). Your usage is maybe more:
<IfModule worker.c>
StartServers 2
MaxClients 250
MinSpareThreads 50
MaxSpareThreads 150
ThreadsPerChild 25
MaxRequestsPerChild 300
</IfModule>
Did you consider using nginx (or other event based web server) instead of apache?
nginx shall allow higher number of connections and consume much less resources (as it is event based and does not create separate process per connection). Anyway, you will need some processes, doing real work (like WSGI servers or so) and if they stay on the same server as the front end web server, you only shift the performance problem to a bit different place.
Latest apache version shall allow similar solution (configure it in event based manner), but this is not my area of expertise.
Here's an approach that could resolve your problem, and if not would help with troubleshooting.
Create a second Apache virtual server identical to the current one
Send all "normal" user traffic to the original virtual server
Send special or long-running traffic to the new virtual server
Special or long-running traffic could be report-generation, maintenance ops or anything else you don't expect to complete in <<1 second. This can happen serving APIs, not just web pages.
If your resource utilization is low but you still exceed MaxClients, the most likely answer is you have new connections arriving faster than they can be serviced. Putting any slow operations on a second virtual server will help prove if this is the case. Use the Apache access logs to quantify the effect.
I recommend to use bellow formula suggested on Apache:
MaxClients = (total RAM - RAM for OS - RAM for external programs) / (RAM per httpd process)
Find my script here which is running on Rhel 6.7. you can made change according to your OS.
#!/bin/bash
echo "HostName=`hostname`"
#Formula
#MaxClients . (RAM - size_all_other_processes)/(size_apache_process)
total_httpd_processes_size=`ps -ylC httpd --sort:rss | awk '{ sum += $9 } END { print sum }'`
#echo "total_httpd_processes_size=$total_httpd_processes_size"
total_http_processes_count=`ps -ylC httpd --sort:rss | wc -l`
echo "total_http_processes_count=$total_http_processes_count"
AVG_httpd_process_size=$(expr $total_httpd_processes_size / $total_http_processes_count)
echo "AVG_httpd_process_size=$AVG_httpd_process_size"
total_httpd_process_size_MB=$(expr $AVG_httpd_process_size / 1024)
echo "total_httpd_process_size_MB=$total_httpd_process_size_MB"
total_pttpd_used_size=$(expr $total_httpd_processes_size / 1024)
echo "total_pttpd_used_size=$total_pttpd_used_size"
total_RAM_size=`free -m |grep Mem |awk '{print $2}'`
echo "total_RAM_size=$total_RAM_size"
total_used_size=`free -m |grep Mem |awk '{print $3}'`
echo "total_used_size=$total_used_size"
size_all_other_processes=$(expr $total_used_size - $total_pttpd_used_size)
echo "size_all_other_processes=$size_all_other_processes"
remaining_memory=$(($total_RAM_size - $size_all_other_processes))
echo "remaining_memory=$remaining_memory"
MaxClients=$((($total_RAM_size - $size_all_other_processes) / $total_httpd_process_size_MB))
echo "MaxClients=$MaxClients"
exit
I am playing around with concurrency in Ruby (1.9.3-p0), and have created a very simple, I/O-heavy proxy task. First, I tried the non-blocking approach:
require 'rack'
require 'rack/fiber_pool'
require 'em-http'
require 'em-synchrony'
require 'em-synchrony/em-http'
proxy = lambda {|*|
result = EM::Synchrony.sync EventMachine::HttpRequest.new('http://google.com').get
[200, {}, [result.response]]
}
use Rack::FiberPool, :size => 1000
run proxy
=begin
$ thin -p 3000 -e production -R rack-synchrony.ru start
>> Thin web server (v1.3.1 codename Triple Espresso)
$ ab -c100 -n100 http://localhost:3000/
Concurrency Level: 100
Time taken for tests: 5.602 seconds
HTML transferred: 21900 bytes
Requests per second: 17.85 [#/sec] (mean)
Time per request: 5602.174 [ms] (mean)
=end
Hmm, I thought I must be doing something wrong. An average request time of 5.6s for a task where we are mostly waiting for I/O? I tried another one:
require 'sinatra'
require 'sinatra/synchrony'
require 'em-synchrony/em-http'
get '/' do
EM::HttpRequest.new("http://google.com").get.response
end
=begin
$ ruby sinatra-synchrony.rb -p 3000 -e production
== Sinatra/1.3.1 has taken the stage on 3000 for production with backup from Thin
>> Thin web server (v1.3.1 codename Triple Espresso)
$ ab -c100 -n100 http://localhost:3000/
Concurrency Level: 100
Time taken for tests: 5.476 seconds
HTML transferred: 21900 bytes
Requests per second: 18.26 [#/sec] (mean)
Time per request: 5475.756 [ms] (mean)
=end
Hmm, a little better, but not what I would call a success. Finally, I tried a threaded implementation:
require 'rack'
require 'excon'
proxy = lambda {|*|
result = Excon.get('http://google.com')
[200, {}, [result.body]]
}
run proxy
=begin
$ thin -p 3000 -e production -R rack-threaded.ru --threaded --no-epoll start
>> Thin web server (v1.3.1 codename Triple Espresso)
$ ab -c100 -n100 http://localhost:3000/
Concurrency Level: 100
Time taken for tests: 2.014 seconds
HTML transferred: 21900 bytes
Requests per second: 49.65 [#/sec] (mean)
Time per request: 2014.005 [ms] (mean)
=end
That was really, really surprising. Am I missing something here? Why is EM performing so badly here? Is there some tuning I need to do? I tried various combinations (Unicorn, several Rainbows configurations, etc), but none of them came even close to the simple, old I/O-blocking threading.
Ideas, comments and - obviously - suggestions for better implementations are very welcome.
See how your "Time per request" exactly equals total "Time taken for tests"? This is a reporting arithmetic artifact due to your request count (-n) being equal to your concurrency level (-c). The mean-time is the total-time*concurrency/num-requests. So the reported mean when -n == -c will be the time of the longest request. You should conduct your ab runs with -n > -c by several factors to get reasonable measures.
You seem to be using an old version of ab as a relatively current one reports far more detailed results by default. Running directly against google I show similar total-time == mean time when -n == -c, and get more reasonable numbers when -n > -c. You really want to look at the req/sec, mean across all concurrent requests, and the final service level breakdown to get a better understanding.
$ ab -c50 -n50 http://google.com/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking google.com (be patient).....done
Server Software: gws
Server Hostname: google.com
Server Port: 80
Document Path: /
Document Length: 219 bytes
Concurrency Level: 50
Time taken for tests: 0.023 seconds <<== note same as below
Complete requests: 50
Failed requests: 0
Write errors: 0
Non-2xx responses: 50
Total transferred: 27000 bytes
HTML transferred: 10950 bytes
Requests per second: 2220.05 [#/sec] (mean)
Time per request: 22.522 [ms] (mean) <<== note same as above
Time per request: 0.450 [ms] (mean, across all concurrent requests)
Transfer rate: 1170.73 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 2 0.6 3 3
Processing: 8 9 2.1 9 19
Waiting: 8 9 2.1 9 19
Total: 11 12 2.1 11 22
WARNING: The median and mean for the initial connection time are not within a normal deviation
These results are probably not that reliable.
Percentage of the requests served within a certain time (ms)
50% 11
66% 12
75% 12
80% 12
90% 12
95% 12
98% 22
99% 22
100% 22 (longest request) <<== note same as total and mean above
$ ab -c50 -n500 http://google.com/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking google.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests
Server Software: gws
Server Hostname: google.com
Server Port: 80
Document Path: /
Document Length: 219 bytes
Concurrency Level: 50
Time taken for tests: 0.110 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Non-2xx responses: 500
Total transferred: 270000 bytes
HTML transferred: 109500 bytes
Requests per second: 4554.31 [#/sec] (mean)
Time per request: 10.979 [ms] (mean)
Time per request: 0.220 [ms] (mean, across all concurrent requests)
Transfer rate: 2401.69 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 1 0.7 1 3
Processing: 8 9 0.7 9 13
Waiting: 8 9 0.7 9 13
Total: 9 10 1.3 10 16
Percentage of the requests served within a certain time (ms)
50% 10
66% 11
75% 11
80% 12
90% 12
95% 13
98% 14
99% 15
100% 16 (longest request)