JBoss CPU utilization issue - performance

I am using JBoss AS 4.2.3 along with the seam framework. My CPU usage increases as the number of users increase and it hits 99% for just 80 users. We also use Hibernate, EJB3 and Apache with mod_jk for loadbalancing.
When I took the thread dump all the runnable threads are doing a same activity with the following trace:
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at org.apache.coyote.ajp.AjpProcessor.read(AjpProcessor.java:1012)
at org.apache.coyote.ajp.AjpProcessor.readMessage(AjpProcessor.java:1091)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:384)
at org.apache.coyote.ajp.AjpProtocol$AjpConnectionHandler.process(AjpProtocol.java:366)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:446)
at java.lang.Thread.run(Thread.java:662)
I am not able to interpret this with the stack trace. Also I find that even when the users have logged out, the CPU utilization still continues to be the same with threads in the same state.

These threads are attempting to read from a Socket connection. In this case they are waiting for the next request to be sent to the server from mod_jk in Apache. This is quite normal and they probably are not the reason for your CPU usage.
At this point you really need to go and run your application through a profiler.
If you are unable to run a profiler on the system (i.e. it's a production box) the next best thing is to start to take many stack dumps each a couple of seconds apart and then go though them by hand matching up the thread IDs. You need to look for the threads that are running your code and don't seem to have changed between dumps.
It is a very tedious task and doesn't always get clear results, but without a profiler or some sort of instrumentation you won't be able to find where all that CPU is going.

Review your AJP configuration between Apache and Jboss, as described in https://developer.jboss.org/wiki/OptimalModjk12Configuration
The problem
JBoss Web's (Tomcat) server.xml AJP snippet:
<Connector port="8009" address="${jboss.bind.address}" protocol="AJP/1.3"
emptySessionPath="true" enableLookups="false" redirectPort="8443" ></Connector> Apache's httpd.conf:
<IfModule prefork.c>
StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000
</IfModule>
The above configuration, under load, may cause mod_jk to be very slow
and unresponsive, cause http errors, and cause half-closed
connections. These problems can arise because there are no
connection timeouts specified to take care of orphaned connections, no
error handling properties defined in workers.properties, and no
connection limits set in Apache and Tomcat.
But this high number of threads could be from another source. As described here:
the most common scenario for a hanging Socket.read() is a high
processing time or unhealthy state of your remote service provider.
This means that you will need to communicate with the service provider
support team right away in order to confirm if they are facing some
slowdown condition on their system.
Your applicaton server Threads should be released once the remote
service provider system problem is resolved but quite often you will
need to restart your server instances (Java VM) to clear all the
hanging Threads; especially if you are lacking proper timeout
implementation.
Other less common causes include:
Huge response data causing increased elapsed time to read / consume the Socket Inputstream e.g. such as very large XML data. This can be
proven easily by analysing the size of the response data
Network latency causing increased elapsed time in data transfer from the service provider to your Java EE production system. This can
be proven by running some network sniffer between your production
server and the service provider and determine any major lag/latency
problem
Whatever was your problem, the first thing to do is review your timeout configuration!
What you can do?
You need to do some configuration for Jboss and Apache.
JBoss side
The main concern with server.xml is setting the connectionTimeout
which sets the SO_TIMEOUT of the underlying socket. So when a connection in
Tomcat hasn't had a request in the amount of time specified by
connectionTimeout, then the connection dies off. This is necessary
because if the connection hasn't been used for a certain period of
time then there is the chance that it is half-close on the mod_jk end.
If the connection isn't closed there will be an inflation of threads
which can over time hit the maxThreads count in Tomcat then Tomcat will
not be able to accept any new connections. A connectionTimeout of
600000 (10 minutes) is a good number to start out with. There may be
a situation where the connections are not being recycled fast enough,
in this instance the connectionTimeout could be lowered to 60000 or 1
minute.
When setting connectionTimeout in Tomcat, mod_jk should also have
connect_timeout/prepost_timeout set, which allows detection that the
Tomcat connection has been closed and preventing a retry request.
The recommended value of maxThreads is 200 per CPU, so here we assume
the server is a single core machine. If it has been quad core, we
could push that value to 800, and more depending on RAM and other
machine specs.
<Connector port="8009"
address="${jboss.bind.address}"
emptySessionPath="true"
enableLookups="false"
redirectPort="8443"
protocol="AJP/1.3"
maxThreads="200"
connectionTimeout="600000"></Connector>
Apache side
worker.properties file
See comments inline.
worker.list=loadbalancer,status
worker.template.port=8009
worker.template.type=ajp13
worker.template.lbfactor=1
#ping_timeout was introduced in 1.2.27
worker.template.ping_timeout=1000
#ping_mode was introduced in 1.2.27, if not
#using 1.2.27 please specify connect_timeout=10000
#and prepost_timeout=10000 as an alternative
worker.template.ping_mode=A
worker.template.socket_timeout=10
#It is not necessary to specify connection_pool_timeout if you are running the worker mpm
worker.template.connection_pool_timeout=600
#Referencing the template worker properties makes the workers.properties shorter and more concise
worker.node1.reference=worker.template
worker.node1.host=192.168.1.2
worker.node2.reference=worker.template
worker.node2.host=192.168.1.3
worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=node1,node2
worker.loadbalancer.sticky_session=True
worker.status.type=status
The key points in the above workers.properties is we've added limits
for the connections mod_jk makes. With the base configuration, socket
timeouts default to infinite. The other important properties are
ping_mode and ping_timeout which handle probing a connection for
errors and connection_pool_timeout which must be set to equal
server.xml's connectionTimeout when using the prefork mpm. When these
two values are the same, after a connection has been inactive for x
amount of time, the connection in mod_jk and Tomcat will be closed at
the same time, preventing a half-closed connection.
Apache configuration
Make note that maxThreads for the AJP connection should coincide with
the MaxClients set in Apache's httpd.conf. MaxClients needs to be set
in the correct module in Apache.
This can be determined by running httpd -V:
# httpd -V
Server version: Apache/2.2.3
Server built: Sep 11 2006 09:43:05
Server's Module Magic Number: 20051115:3
Server loaded: APR 1.2.7, APR-Util 1.2.8
Compiled using: APR 1.2.7, APR-Util 1.2.7
Architecture: 32-bit
Server MPM: Prefork
threaded: no
forked: yes (variable process count)
Server compiled with....
-D APACHE_MPM_DIR="server/mpm/prefork"
-D APR_HAS_SENDFILE
-D APR_HAS_MMAP
-D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
-D APR_USE_SYSVSEM_SERIALIZE
-D APR_USE_PTHREAD_SERIALIZE
-D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
-D APR_HAS_OTHER_CHILD
-D AP_HAVE_RELIABLE_PIPED_LOGS
-D DYNAMIC_MODULE_LIMIT=128
-D HTTPD_ROOT="/etc/httpd"
-D SUEXEC_BIN="/usr/sbin/suexec"
-D DEFAULT_PIDLOG="logs/httpd.pid"
-D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
-D DEFAULT_LOCKFILE="logs/accept.lock"
-D DEFAULT_ERRORLOG="logs/error_log"
-D AP_TYPES_CONFIG_FILE="conf/mime.types"
-D SERVER_CONFIG_FILE="conf/httpd.conf"
Which tells me the Server MPM is Prefork. This is not always 100%
accurate so you should also view the output of /etc/sysconfig/httpd to
see if the following line is there: HTTPD=/usr/sbin/httpd.worker. If
it is commented out you are running prefork, otherwise if uncommented
worker.
httpd.conf:
<IfModule prefork.c>
StartServers 8
MinSpareServers 5
MaxSpareServers 20
MaxClients 200
MaxRequestsPerChild 0
</IfModule>
Or if Apache is using worker, it is
<IfModule worker.c>
StartServers 2
MaxClients 200
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
</IfModule>
MaxRequestsPerChild is 0, this is the recommended value when using
mod_jk as mod_jk keeps open persistent connections. The key values in
the above configuration are MaxClients and MaxRequestsPerChild, the
rest of the values are left as default. Note that MaxRequestsPerChild
is recommended to be 0 however the value may need to be greater than 0
depending on if Apache is used for other modules also, especially in
the case of resource leakage.
In the link you can find another configuration to optimize even more this scenario.

Related

apache configuration on windows server for high traffic

hi there i am trying to deploy laravel project on my deticated server which has two processors about 32 core and 128 giga ram
i am using apache longue as web server and using mpm winnt (this is the allowed mpm for windows
my problem is when i testing using jmeter i get about 90% of requests quicly but the rest requests take too long time to response and i can not understand why ??
it seems like those requests hold in queue but i really dont no why
my winnt conf is
<IfModule mpm_winnt_module>
ThreadsPerChild 1920
MaxConnectionsPerChild 1000000
</IfModule>
i am trying to get 2000 request per second

Jmeter throwing "socketexception: connection reset" during the tests. But no error can be seen on server log [duplicate]

The target for us to achieve is 500 concurrent users.
We have tried running a test for 100 users over 3 machines. And it ran fine without any errors.
When i tried running the test for 150 or More users with same number of machines, i started getting the following response code
Response code:Non Http Response code:java.net.socketException
Response message:Connection Reset
I have also tried increasing the number of machines to 8 machines. Still it is of no help. Response time is also very high (156 seconds) for some of the requests.
When we checked the server logs to find out what could be causing this issue, No error logs were found there during the time of the execution.
I'm having a hard time finding out what could be the issue. The server side is ruling out if there could be an issue from their end.
Tried the following fixes from Jmeter side:
Increasing the heap size
Changing the retry count in user.properties file
Changing Boolean=True in hc.parameters file
Used HTTP Request Defaulters to change the implementation to HTTPClient4
CPU Config:
Intel (R) Xeon(R) CPU E5-2690 v3 # 2.60 GHz (2 Processors)
5 GB Ram
64-bit Operating System
The Connection Reset error means failed attempt to write to the socket which has been closed already, on TCP protocol level it means receiving a TCP RST
It might be the case JMeter is closing the connection prematurely as JMeter 5.0 had httpclient4.time_to_live property set to 2000 and if you're seeing response times > 2 seconds (and you do) most probably JMeter is closing the connection before getting the full response.
You can try increasing this setting to 60000 matches modern browsers default settings or even more to match your application response time (if you think 3 minutes is acceptable) or consider upgrading to JMeter 5.3 which has better default value.
More information: Bug 64289

TCPListener in golang: error when number of connections is above 60 / 260

I am building TCP Proxy: client <-> proxy <-> Vertica
I have a net.TCPListener, which takes incoming requests by AcceptTCP() and creating connections, then, making connection to destination socket by net.DialTCP("tcp", nil, raddr). Looks like a bridge. Default proxy model.
Firstly, at first version, i have a trouble: if i have 59 parallel incoming request, everything is fine. But if i have one more (60), i have a trouble: 1-59 connections are OK, but 60 and newer are fault. I cant catch error properly. Looks like some socket unexpectedly closes
Secondly, i tried to set queue for listener. It helps me a lot: but if i have more than 258 requests, i get error again.
My question: is there any limit of connections in net package? May be it is system limitation?
For external info: Vertica running in docker container, hw/system: macbook, vertica limit connection pool: 5, but pool logic implemented into proxy.
I also tried set "raw" proxy without pool logic (thats why i set queue for listener: i must not exceed threshold of Vertica User's pool), result is 258 requests..
UPDATED: (05.04.2020)
Looks like it is system limitations fault. Did I mention anywhere that I trying to run the whole system on one PC?
So, what I had:
300 parallel processes as requests (making by multiprocessing.Pool
Python) (300 sockets)
Listener that creates 300 connections (once
more 300 sockets)
And series of rapidly creating/closing sockets in
deep of proxy (according to queue and Vertica pool)
What I have now:
300 python requests making from another PC in my local network (on Windows)
Proxy works fine
But I have several errors on Windows PC, which creating requests to my proxy. Errors like low memory in "swap file".
I still need to make some stress test for proxy. Adding less memory for swap file didn't solve my problem on Windows PC. I will be grateful for any suggestions and ideas. Thanks!
How does the proxy connect to Vertica?
There is by default a maximum of 50 ordinary mortal users to be connected to one Vertica node at any one time. The superuser "dbadmin" always has 5 connections in addition to that.
So if I try to connect 60 times as dbadmin, I get this on a default Vertica configuration:
Connection attempt failed: FATAL 4060: New session rejected due to limit, already 55 sessions active
You can increase the Vertica config item MaxClientSessions from its default of 50 per node.
Command is : ALTER NODE <_node_name_> SET MaxClientSessions = 100, for example.
I suppose you are always connecting to the same Vertica node, and that you have set ConnectionLoadBalancing to FALSE. So you always connect to the same node, and soon reach the default maximum of 50.
Hope that's the reason found ....

Why the SpringBoot website refuses clients' connection after several minutes of the Jmeter load test begins?

It is a SpringBoot website and deployed in one Linux server. We use Jmeter to do the load test.
We mock 500 users to visit the webiste index page simultaneously. The index page is very simple html, no database connection,so it is a quite short connection.
After about 2 minutes, Jmeter starts to throw timeout exception as bleow
I guess this is because of website reaching its capacity and running out of connection.
I get one quesiton here, why does website reach its capacity 2 minutes later after Jemter starts. If its TCP connection capacity for this website is 1000, I guess it will reach 1000 very soon after the Jmeter starts, not 2 minutes.
Besides, I see many TCP connections are in TIME_WAIT status in Linux server. I guess this may be related with the connection timeout?
Edit: Someone thinks it is running of port. Someone thinks it is running out of connection. And someone thinks it is running out of processing thread(eg. What does this messge java.net.ConnectException/Connection timed out mean in log.jtl file of Jmeter?). I don't know which one is the exact reason...
Most probably this is due to underlying Linux TCP/IP kernel stack configuration, as per Linux TCP/IP tuning for scalability article:
By default, a connection is supposed to stay in the TIME_WAIT state for twice the msl. Its purpose is to make sure any lost packets that arrive after a connection is closed do not confuse the TCP subsystem (the full details of this are beyond the scope of this article, but ask me if you’d like details). The default msl is 60 seconds, which puts the default TIME_WAIT timeout value at 2 minutes. Which means you’ll run out of available ports if you receive more than about 400 requests a second, or if we look back to how nginx does proxies, this actually translates to 200 requests per second. Not good for scaling.
SO double check timeouts along with maximum number of ports/sockets/files on the Linux server - my expectation is that the aforementioned parameters need to be tuned for high loads.
It's also a good practice to have monitoring of baseline OS health metrics in place (CPU, RAM, Network, Disk, swap usage, etc.). You can use i.e. JMeter PerfMon Plugin or JMeter SSHMon Listener for this.

Slow Apache response

I have a high performance softlayer server. I am only running a (php-based. It's not an IRC server) chat room on this server. It works all fine. On average server response (for chat room) is 100MS with 100+ concurrent users. Some days ago a user threat to ddos our server. Now the server is so slow. On average ping time is 1500-2000MS with just 50-60 users. There is no high resource usage or bandwidth usage. I did following things to protect my server:
1 - DDOS protection (softlayer providers it)
2 - Install mod qos and evassive for appache
3 - Disabled ping of death and Syn packets
I performed following analysis:
1 - Analyzed apache logs. There isn't any frequent request from same IP or CLRF packets.
2 - Not many UDP packets
3 - Checked connections per IP and they are all normal.
However, nothing is working. That user threats and kills our time whenever he says/wants. Is there any other thing I should look into to protect my server? What kind of attack he could make to do this?
My guess is going to be they are exhausting your apache workers (usually a default of 150), you might want to check to see how many apache threads are currently running, and if its ~150 that might be why you have slow response times.
Some good reading on apache performance tuning.
http://httpd.apache.org/docs/2.2/misc/perf-tuning.html
http://www.monitis.com/blog/2011/07/05/25-apache-performance-tuning-tips/
https://www.devside.net/articles/apache-performance-tuning
The output from the following commands might also be useful in figuring out whats going on.
See whats running
ps auxf
See what apache is doing by turning on server-status (http://httpd.apache.org/docs/2.2/mod/mod_status.html)
apachectl fullstatus
See whats going on with network connections
netstat -npl
Anyway, I hope that helps point you in the right direction.

Resources