How to diagnose why a web server with idle applications becomes unresponsive? - performance

I have a Digital Ocean droplet (512MB RAM, 20GB SSD Disk, Ubuntu 13.10 x64) on which
a MongoDB instance and
a Tomcat 7 server
run.
On the Tomcat server, following applications are installed
Apache CXF-based application, which takes processes web service requests, interacts with the database and executes scheduled jobs,
Vaadin application,
JSF (Primefaces) application and
Psi Probe.
When I
restart Tomcat,
use the Vaadin and/or JSF application,
then for several weeks do nothing on that machine (it basically is idle during that time),
then try to open the JSF and/or Vaadin application,
I find the site unresponsive (nothing is displayed after I enter the URL in the browser).
When I restart Tomcat (sudo service tomcat7 restart), everything works again. I don't see any obvious problems in the Tomcat logs.
How can I find out,
whether the problem is on the Tomcat side (one of the applications consumes too many resources even if idle) or on the OS side (nothing happens on the machine and therefore the OS puts itself into a "hibernating" mode) and
if the problem is with Tomcat, exactly which of the application is causing it?

Please start from top to bottom.
then try to open the JSF and/or Vaadin application,
I find the site unresponsive (nothing is displayed after I enter the
URL in the browser).
Check if the service is still running before restarting sudo service tomcat7 status and/or ps -ef | grep tomcat
Check with netstat -patune | grep <portnumber, e.g. 443> if the server is listening on the configured ports
Check your httpd/Apache/Tomcat access logs if the request reaches the server and if yes, check if there are errors or timeouts related to the requests
Check if the DB connection is still possible
To force some error logs, try to change your maxIdle, maxActive and maxWait attributes of your Tomcat's Connection Pool configuration. maxWait default is -1, connections created sometimes during these weeks will wait forever.

#Patrick provided some excellent basic tests.
I notice that you only have 512 MB of RAM. With some fairly beefy software such as tomcat, plus MongoDB on top of that, your machine may simply be overloaded.
Based on that, I would propose a couple additional things to check:
sudo free
Should tell you how much memory is being used, and how much swap space you use.
sudo top
Will tell you which process is using the most memory. You may want to sort the output of top by memory (default is usually by CPU utilization).
Most importantly, check the log files in /var/log (in particular /var/log/messages). You may find indications that the kernel killed one of your processes (probably tomcat).

Related

Spring Boot application not responding after pushing a large number of requests

I have a problem with a server that called server A:
Server A: Red Hat Enterprise Linux Server release 7.2 (Maipo)
Server B: Red Hat Enterprise Linux Server release 7.7 (Maipo)
jdk-8u231 installed on all of servers.
I have an Spring Boot application running on 2 servers.
Whenever i use Jmeter to send 100 concurrency request to application running on each servers, the application running on Server B have no problem.
But in Server A, the application will be not responding, that mean the Process (PID) still running but I can't visit actuator endpoint, cannot visit Swagger page, cannot send new request ... log file show nothing since that time.
Thread dump and heap dump have no significant difference.
Could anyone show me how to analysis that problem?
I still have no idea why the problem occur.
Well, I can only speculate here, but here some ideas that can help:
There are two possible sources of issue here Java Application and Linux (+its network policies, firewalls and so forth).
Since You don't know for sure, what happens, try working by "elimination".
Create a script that will run 100 concurrent requests. Place the script at the Server A (the problematic one) and run The script will run against "localhost" (obviously). If you see that it works, then the issue is not in Java at all. Probably some network policies or linux setup, who knows.
Place a log message in the controller of the java application and examine the log. The log should print the request number among other things, so that you'll be able to understand whether you get stuck after a well defined number of requests or its always a different number.
Check the configurations of Spring Boot application. Maybe there is a difference in the number of threads allocated to serve the request by the embedded web server that runs inside the spring boot application (assuming you're not using a reactive stack) and this number differs. In this case you won't be able to call rest endpoints, actuator, etc.
If JMX connection is available to the setup, connect via the JMX and check the MBean of Tomcat (again, assuming there is a tomcat under the hood) to check pretty much the same information as in 4.
You've mentioned thread dumps. Try to take more than one thread dump but one before you're running JMeter test, one during the running (when everything still works), one when everything is stuck.
In the thread dumps check the actual stacktraces, maybe all the threads are stuck working with Database or something and can't serve requests like I've explained in "4"
Examine GC logs, maybe GC works so hard that you can't really interact with the application.

AWS: EC2 micro, not enough for a .NET MVC 3 application?

I used elastic beanstalk to manage/deploy my .NET MVC 3 application on an EC2 micro instance (has 613 MB memory). It's mainly a static site for now as it is in Beta with registration (including email confirmation) and some error logging (ELMAH).
It was doing fine until recently, I keep getting notifications of CPU Utilization greater than 95.00%.
Is the micro instance with 613MB memory not enough to run an MVC application for Production use?
Added info: Windows Server 2008 R2, running IIS7.5
Thanks!
I've tried running Jetbrains teamcity (which uses Tomcat I think) and was on a linux box using an ec2 micro instance and there wasn't enough memory available to support what it needed.
I did try running a server 2008/2012 box on a micro instance as well and it was pointless took minutes to open anything.
I think you're going to find that running windows on one of those boxes isn't really a viable option unless you start disabling services like crazy and get really creative with you're tweaking.
A micro instance is clearly not enough for Production.
The micro instances have a low I/O limit, and once this limit is reached (for the month I think), all later operations are throttled.
So, I advise you to use at least a small instance for production. And keep your micro for your dev/test/preprod environments!
Edit: I got those info from an Amazon guy.
Make sure your load balancer is pinging a blank html file. I got that message because it was pinging my home page which had db loads. When I set it to ping a blank html file it ran smoothly

Web application very slow in Tomcat 7

I implemented a web application to start the Tomcat service works very quickly, but spending hours and when more users are entering is getting slow (up to 15 users approx.).
Checking RAM usage statistics (20%), CPU (25%)
Server Features:
RAM 8GB
Processor i7
Windows Server 2008 64bit
Tomcat 7
MySql 5.0
Struts2
-Xms1024m
-Xmx1024m
PermGen = 1024
MaxPernGen = 1024
I do not use Web server, we publish directly on Tomcat.
Entering midnight slowness is still maintained (only 1 user online)
The solution I have is to restart the Tomcat service and response time is again excellent.
Is there anyone who has experienced this issue? Any clue would be appreciated.
Not enough details provided. Need more information :(
Use htop or top to find memory and CPU usage per process & per thread.
CPU
A constant 25% CPU usage in a 4 cores system can indicate that a single-core application/thread is running 100% CPU on the only core it is able to use.
Which application is eating the CPU ?
Memory
20% memory is ~1.6GB. It is a bit more than I expect for an idle server running only tomcat + mysql. The -Xms1024 tells tomcat to preallocate 1GB memory so that explains it.
Change tomcat settings to -Xms512 and -Xmx2048. Watch tomcat memory usage while you throw some users at it. If it keeps growing until it reaches 2GB... then freezes, that can indicate a memory leak.
Disk
Use df -h to check disk usage. A full partition can make the issues you are experiencing.
Filesystem Size Used Avail Usage% Mounted on
/cygdrive/c 149G 149G 414M 100% /
(If you just discovered in this example that my laptop is running out of space. You're doing it right :D)
Logs
Logs are awesome. Yet they have a bad habit to fill up the disk. Check logs disk usage. Are logs being written/erased/rotated properly when new users connect ? Does erasing logs fix the issue ? (copy them somewhere for future analysis before you erase them)
If not. Logs are STILL awesome. They have the good habit to help you track bugs. Check tomcat logs. You may want to set logging level to debug. What happens last when the website die ? Any useful error message ? Do user connections are still received and accepted by tomcat ?
Application
I suppose that the 25% CPU goes to tomcat (and not mysql). Tomcat doesn't fail by itself. The application running on it must be failing. Try removing the application from tomcat (you can eventually put an hello world instead). Can tomcat keep working overnight without your application ? It probably can, in which case the fault is on the application.
Enable full debug logging in your application and try to track the issue. Run it straight from eclipse in debug mode and throw users at it. Does it fail consistently in the same way ?
If yes, hit "pause" in the eclipse debugger and check what the application is doing. Look at the piece of code each thread is currently running + its call stack. Repeat that a few times. If there is a deadlock, an infinite loop, or similar, you can find it this way.
You will have found the issue by now if you are lucky. If not, you're unfortunate and it's a tricky bug that might be deep inside the application. That can get tricky to trace. Determination will lead to success. Good luck =)
For performance related issue, we need to follow the given rules:
You can equalize and emphasize the size of xms and xmx for effectiveness.
-Xms2048m
-Xmx2048m
You can also enable the PermGen to be garbage collected.
-XX:+UseConcMarkSweepGC -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
If the page changes too frequently to make this option logical, try temporarily caching the dynamic content, so that it doesn't need to be regenerated over and over again. Any techniques you can use to cache work that's already been done instead of doing it again should be used - this is the key to achieving the best Tomcat performance.
If there any database related issue, then can follow sql query perfomance tuning
rotating the Catalina.out log file, without restarting Tomcat.
In details,There are two ways.
The first, which is more direct, is that you can rotate Catalina.out by adding a simple pipe to the log rotation tool of your choice in Catalina's startup shell script. This will look something like:
"$CATALINA_BASE"/logs/catalina.out WeaponOfChoice 2>&1 &
Simply replace "WeaponOfChoice" with your favorite log rotation tool.
The second way is less direct, but ultimately better. The best way to handle the rotation of Catalina.out is to make sure it never needs to rotate. Simply set the "swallowOutput" property to true for all Contexts in "server.xml".
This will route System.err and System.out to whatever Logging implementation you have configured, or JULI, if you haven't configured.
See more at: Tomcat Catalina Out
I experienced a very slow stock Tomcat dashboard on a clean Centos7 install and found the following cause and solution:
Slow start up times for Tomcat are often related to Java's
SecureRandom implementation. By default, it uses /dev/random as an
entropy source. This can be slow as it uses system events to gather
entropy (e.g. disk reads, key presses, etc). As the urandom manpage
states:
When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered.
Source: https://www.digitalocean.com/community/questions/tomcat-8-5-9-restart-is-really-slow-on-my-centos-7-2-droplet
Fix it by adding the following configuration option to your tomcat.conf or (preferred) a custom file into /tomcat/conf/conf.d/:
JAVA_OPTS="-Djava.security.egd=file:/dev/./urandom"
We encountered a similar problem, the cause was "catalina.out". It is the standard destination log file for "System.out" and "System.err". It's size kept on increasing thus slowing things down and ultimately tomcat crashed. This problem was solved by rotating "catalina.out". We were using redhat so we made a shell script to rotate "catalina.out".
Here are some links:-
Mulesoft article on catalina (also contains two methods of rotating):
Tomcat Catalina Introduction
If "catalina.out" is not the problem then try this instead:-
Mulesoft article on optimizing tomcat:
Tuning Tomcat Performance For Optimum Speed
We had a problem, which looks similar to yours. Tomcat was slow to respond, but access log showed just milliseconds for answer. The problem was streaming responses. One of our services returned real-time data that user could subscribe to. EPOLL were becoming bloated. Network requests couldn't get to the Tomcat. And whats more interesting, CPU was mostly idle (since no one could ask server to do anything) and acceptor/poller threads were sitting in WAIT, not RUNNING or IN_NATIVE.
At the time we just limited amount of such requests and everything became normal.

How to prevent WebSphere from starting before files from an application update have been unpacked

Using the WebSphere Integrated Solutions Console, a large (18,400 file) web application is updated by specifying a war file name and going through the update screens and finally saving the configuration. The Solutions Console web UI spins a while, then it returns, at which point the user is able to start the web application.
If the application is started after the "successful update", it fails because the files that are the web application have not been exploded out to the deployment directory yet.
Experimentation indicates that it takes on the order of 12 minutes for the files to appear!
One more bit of background that may be significant: There are 19 application servers on this one WebSphere instance. WebSphere insists that there be a lot of chatter between them, even though they don't need anything from each other. I wondered if this might be slowing things down when it comes to deployment. Or if there's some timer in the bowels of WebSphere that is just set wrong (usual disclaimers apply...I'm just showing up and finding this situation...I didn't configure this installation).
Additional Information:
This is a Network Deployment configuration, and it's all on one physical host.
* ND 6.1.0.23
Is this a standalone or a ND set up? I am guessing it is ND set up considering you have stated that they are 19 app servers. The nodes should be synchronized with the deployment manager so that the updated files are available to the respective nodes.
After you update and save the changes, try and synchronize the nodes with the dmgr (or alternatively as part of the update process, click on review and the check the box which says synchronize nodes) and this would distribute the changes to the various nodes.
The default interval, i believe is 1 minute.
12 minutes certainly sounds a lot. Is there any possibility of network being an issue here?
HTH
Manglu

Jboss failover testing

I have a peculiar situation here. I have installed JBoss 5.1.0 as a service in Wintel box.
The service will restart itself if the JBoss instance fails.
However I could not find a way to test this scenario. I killed the JVM that was running the JBoss, but it did not restart the service. I need to make the JBoss service end abnormally so that I can ensure it is restarts again.
In a nutsehll, I need a way to make JBoss end abnormally.
Please help.
Write a JSP that calls System.exit(1)? That might fall foul of the Security Manager, though, and JBoss might not permit it.
In my experience, JBoss nodes (and app servers in general) tend not to crash in such a way as the process exists. Instead, they're more likely to consume increasing resources (e.g. memory) until they stop responding, and need explicit restarting. That's certainly easier to reproduce, but it's harder to handle automatically.

Resources