What would cause Tomcat (v8) to CPU spike with periodic regularity - performance

On a windows 2012 RT (x64) TEST server we are running a Tomcat 8 installation and the CPU usage is disconcerting in its regularity of hitting peak usage.
The behavior is happening after an installation of our application but before anyone is accessing it. I have accessed a few pages and tested some features but nothing that could create this behavior that I know of.
There are 2 virtual processors on the server and every ~20 seconds, the CPU usage will spike (on the one processor that is running Tomcat) to 100% for 10 seconds (give or take). See below:
The regularity of the pattern indicates to me that something is incorrect in either the installation or the settings of Tomcat 8.
I have installed the YourKit Java Profiler (by SO recommendation) which I was hoping could shed some light on what is causing these spikes, but haven't been able to see the reason the threads are starting -- at least in part to my newness to YourKit. I did attach it to the Tomcat launch file and it seems to be tracking the behavior.
The catalina logs are silent during the spiking occurrences (as are my application logs) but when I stopped Tomcat there were some messages about ThreadLocals getting started but could not be removed and then: "...Threads are going to be renewed over time to try and avoid a probable memory leak."
I left the server running over the weekend and the pattern has continued until today so I don't think my symptoms are going away. Whatever is starting up has now consumed all available RAM on the system just from starting up these threads (and/or YourKit) every 20 seconds.
What is a possible approach to isolate this aberrant Tomcat activity and hopefully stop or rectify it?
There are many graphs and tabs in YourKit so I hesitate to list everything that might be helpful. Thanks for helping me narrow down the problem with what YourKit (or other tools) could offer me.
Info from catalina log regarding start-up:
Apache Tomcat/8.0.23
Architecture: amd64
Java Home: C:\Program Files\Java\jre1.8.0_65
CATALINA_BASE: C:\Program Files\Apache Software Foundation\Tomcat 8.0
2015-12-08 Update
At Gergely's request, the application is a local installation of DSpace. It's a Java application with a Postgres SQL database backend. We are customizing an opensource version of it from here: http://www.dspace.org/introducing. I'm not exactly sure what else can be helpful and I think the stack trace is more revealing as to what is (and isn't) running -- see below.
By turning on Stack Telemetry in YourKit, "CPU Estimation" was made available by dragging the cursor across a period of profiler history. To me, it looks like all the CPU is doing is spinning idly. Are the Java files pictured below Tomcat routines? They don't strike me as DSpace related (although I'm not an expert) nor does it look like any work is being done while the CPU is peaking.
Of note: the stack trace is identical during the quiet periods -- the only difference being CPU Time (ms) is in the hundreds rather than thousands of milliseconds. For a more direct comparison than what is below, the hump represents ~8,000 ms in Thread.run() and the quiet periods consume ~125 ms of cpu time (although covering approximately the same amount of time).
Lastly, when pages of the application are being requested, a subsequent branch of code appears in the Call Tree. If it happened during the time of a spike it may only take 400 ms of CPU time to load a whole page. The code branch that appears is ApplicationFilterChain.java as a whole separate branch alongside PooledExecutor$Worker.run() -- both underneath java.lang.Thread.run() in the hierarchy.
When trying to interpret the stack trace: Is EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run() responsible?
Processor spikes with no known, associated activity
2015-12-08 Update #2
YourKit comes pre-configured to hide certain java class name patterns which obscured drilling down on java.lang.Thread. Clearing the filters enabled the following screenshots showing that the vast majority of processing time during a spike event is through calling the following 3 methods:
java.io.WinNTFileSystem.canonicalize0
java.io.WinNTFileSystem.getBooleanAttributes (inFile.exists())
StardardRoot.java
My apologies for not yet knowing enough about Tomcat or DSpace to know who is launching these tasks. (In case it matters the line directly above the first line is java.lang.Thread.run() and then <All threads>)

Thank you to those who has viewed and responded to this inquiry. As various individuals have surmised, the problem was related to our settings and use of Tomcat -- not a problem with Tomcat itself (most likely).
This is an attempt to answer the question without perfect knowledge at installing the DSpace application and Tomcat but I think I know enough to be dangerous and potentially helpful to follow-up users.
When installing the application DSpace there are some installation properties in Tomcat's configuration directories that determine whether or not to allow for changes in coding files to be reflected immediately without a Tomcat restart. These settings for us were previously in the directory [tomcat]/conf/Catalina/localhost/ and each of the three files contained a small, insignificant XML file like (e.g. oai.xml):
<?xml version='1.0'?>
<Context docBase="E:/dspace/webapps/oai"
reloadable="false"
cachingAllowed="true"/>
You can find documentation on these properties at the following link:
https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace
Within that documentation is a recommendation about the reloadable and cachingAllowed properties. Search for "Tomcat Context Settings in Production". Here is an excerpt (emphasis mine):
These settings are extremely useful to have when you are first getting started with DSpace, as they let you tweak the DSpace XMLUI (XSLTs or CSS) or JSPUI (JSPs) and see your changes get automatically reloaded by Tomcat (without having to restart Tomcat). However, it is worth noting that the Apache Tomcat documentation recommends Production sites leave the default values in place (reloadable="false" cachingAllowed="true"), as allowing Tomcat to automatically reload all changes may result in "significant runtime overhead".
It is entirely up to you whether to keep these Tomcat settings in place. We just recommend beginning with them, so that you can more easily customize your site without having to require a Tomcat restart. Smaller DSpace sites may not notice any performance issues with keeping these settings in place in Production. Larger DSpace sites may wish to ensure that Tomcat performance is more streamlined.
When I switched these boolean flags to reloadable="false" and cachingAllowed="true" the spiked CPU experience stopped immediately. I don't know if the warning about "Larger sites" applies to us or whether "streamlined performance" could refer to the negative activity I observed.
I presume there may be other problems with our installation that allowed this particular manifestation; one ominous clue is that our production server seems to be operating with these flags in the reloadable="true" configuration. Java, Tomcat, Windows, AND DSpace are ALL getting new versions at the same time so it is fairly difficult to pinpoint why similar Tomcat <context> settings produce such different results.
I am at least content for now to have new behavior and that the system has calmed down. I'll post more if I learn more but will be focusing next on other quandaries.
Update
FWIW, the attributes are settings that directly control Tomcat and they have changed between versions. E.g., cachingAllowed was removed in version 8 which means it can be removed from the Context elements. Compare:
https://tomcat.apache.org/tomcat-8.0-doc/config/context.html#Attributes
https://tomcat.apache.org/tomcat-7.0-doc/config/context.html#Attributes
And for good measure, here is the help text for reloadable in the Tomcat 8 documentation:
Set to true if you want Catalina to monitor classes in /WEB-INF/classes/ and /WEB-INF/lib for changes, and automatically reload the web application if a change is detected. This feature is very useful during application development, but it requires significant runtime overhead and is not recommended for use on deployed production applications. That's why the default setting for this attribute is false. You can use the Manager web application, however, to trigger reloads of deployed applications on demand.
So it would seem that the ultimate answer is that Tomcat 8 on Windows 2012-R2 with the flag reloadable='true' polls for changes to WEB-INF/lib and WEB-INF/classes. The volume of the folders and files to peruse may very well be the cause of these intense, spiked CPU events. For now I will be relying on reloadable='false' which definitely removes the symptom for us.

Not an explicit answer, but way too long for a comment
After reviewing the update on this question and reading a bit I suspect that this recurring issues is caused by a CuratorTask. Reasons being:
The stacktrace you acquired clearly shows that a WorkerThread managed by the DSpace library (so Tomcat is not to be blamed) is using the processor at those times.
After reading a bit about DSpace itself, it looks like that it has a feature that allows users to define curator tasks that should be periodically executed.
On top of this there is at least one task that is - according to the documentation - It is activated by default, so theoretically there can be any number of tasks activated by default.
Moreover this conversation reveals at least 1 curation task that is actived every 10 seconds.
All these together point to the same direction. I would suggest using the UI of DSpace (probably in Admin mode) to look around and find the active curation tasks and verify if their scheduling corresponds to what you have observed.

Related

How to investigate a web performance issue which is accumulated

Our web is running on AWS with Ubuntu OS. We developed it on top of playframework. Right after the web is deployed, it is pretty quick. However, after 1 days or os, it slows down significantly. I checked resource usage of the OS, it seems normal and is responsive. Just the web service is slow to request. I suspect there are some memory, thread pool or some resource leak. Any suggestion about how to investigate it? I used 'top' and 'ps' command to look at current resource usage but they all seem normal.
You may want to create a core dump and then take that to you dev computer and examine it. This is not the easiest way but if you have limited access to the box this may be required.
Create a core dump
Analyze Core Dump File?

NetBeans goes very slow

I'm using NetBeans IDE 6.9.1. I have a web application in JSP using Spring version 3.0.2 and Hibernate Tools 3.2.1.GA. Slowly and gradually, it has been growing in size yet it's not a very big application though I have added many external class libraries as and when required like HibernateValidator.
The performance is degraded and it takes a considerable amount of time in building the application. When changes are saved, many a times, the application is deployed infinitely/endlessly with the auto-deploy feature of NetBeans. It never ends and I have to restart the IDE and the procedure begins all over again from scratch. Sometimes the application is stopped automatically and I have to restart the Tomcat server (6.0.26) because mostly an attempt to restart the application doesn't succeed.
Many a times (every half an hour or so), the application ends with following exception.
java.lang.OutOfMemoryError: PermGen Space
and I have to restart the system itself!
While working with JPA along with EJB and JSF as a front-end (GlassFish Server 3), it often wasn't the case even with heavily loaded applications with the same version of the NetBeans IDE and exactly the same platform, if I remember correctly.
Are there some ways to improve the performance?
try overriding the jvm option for more memory if you can
export JAVA_OPTS="-Xms64m -Xmx512m -XX:PermSize=128m -XX:MaxPermSize=756m"
here http://www.unidata.ucar.edu/projects/THREDDS/tech/tds4.2/reference/JavaOptsSummary.html you can find a bit more about java_opts parameters
I understand you are using Netbeans. A simple solution would be to go tools->servers -> (selecte the server (in your case tomcat))->platform, then in the VM option, paste this settings:
-Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:PermSize=256m -XX:MaxPermSize=256m
that would solve your problem. good luck!

Deployment from NetBeans to Glassfish is very slow

I have a project with a few dozen EJBs and a web project that I'm attempting to deploy from NetBeans 7.0.1 on my laptop directly to Glassfish 3.0.1 on a Solaris 10 server. Ignoring the transfer time of copying the ear file, the deployments seem to take a very long time (3 minutes is the fastest I've seen it). The performance of deployments seems to degrade over time, to the point where eventually I have to restart my domain. I've seen a deployment take anywhere from 12-20 minutes after I've redeployed my application a few times.
I deploy by right-clicking my main project in NetBeans and picking "Deploy". What options do I have for making this more usable? What additional information can I provide to help track down the source of the problem?
UPDATE: Letting the most recent deployment run through to completion, it ended with the following error message in my log:
[#|2011-08-20T14:05:54.494-0400|SEVERE|glassfish3.1|javax.enterprise.system.tools.admin.org.glassfish.deployment.admin|_ThreadID=2490;_ThreadName=Thread-1;|Exception while loading the app : EJB Container initialization error
java.lang.OutOfMemoryError: Java heap space
|#]
So this does appear to be memory related. The deployment itself ran for over 10 minutes before dying in this manner.
Because of my application's requirements, I had to increase the heap space from the default 512MB allocation to a min/max of 1GB/2GB. This seems to have improved deployment slightly. My typical deployment time is ~1 minute now. It's not stellar, but it's at least tolerable.
This is the result of a serious bug in the weld-integration module of Glassfish. Without this bug the deployment is more than 20!! times faster as before.
http://java.net/jira/browse/GLASSFISH-18875
Please vote to get this fixed as soon as possible!

Program runs slow on just a couple of computers

I have a program that I run on multiple network PCs. When I compiled the most recent version, it runs extremely slowly on 2 PCs on the network, but runs fine for everyone else.
This used to happen with my old dev PC when I had an additional 2gb RAM installed. When I would remove the additional 2gb and recompile, it would then work fine for everyone.
Now, I am on a completely new machine and am having the same issue. I've tried to rebuild the project after rebooting, but still have the same issue.
For all other PCs, the program loads in about 3-5 seconds. On these 2 PCs, it takes anywhere from 45 seconds to 1.5 mins to load...
One of the PCs is an older Dell Dimension 8200, but the other is a newer OptiPlex that is identical to several other PCs on the network, so this is what is really making it so confusing.
For now, I've had to revert to the old version so it will run correctly for everyone.
Does anyone have any idea of anything to try?
Thanks in advance!!!
Edit:
Ok, it was an exhausting day yesterday trying various things to solve this issue. Here is what I tried and where the problem begins:
Using the new program
Went back to old versions of all updated components, but still had the same issue
Using the old program
I decided to go back to the drawing board and start from the old version of the application and incrementally add the new features a small piece at a time.
Recompiled the old version using the old components - program works fine
Updated to new DevExpress components - program works fine
Updated to new ESBPCS components - program works fine
Updated to new DeepSoftware components - program works fine
Ok, so now we know there is nothing with the component sets I've updated...
Added 1 image to each of 2 image lists - program works fine
Added new database table - program works fine
Added code to open and close the new table - program works fine
Added new action to action list and added a menu item and toolbar button to new action (action does nothing at this point) - program works fine
Added a new BLANK form to the application and added code to open the new form - BAM!!!
So, adding just one form to the application is what's causing the issue! I removed all the code for the opening of the form, commented out the uses clauses and removed the uses entry from the project source and everything is back to normal!
Anybody have any idea about this?
Thanks!
Edit 2:
For #Warren P - here is my .DPR source:
program Scheduler;
uses
ExceptionLog,
Forms,
SchedulerMainUnit in 'SchedulerMainUnit.pas' {FrmMain},
SchedulerDBInfoUnit in 'SchedulerDBInfoUnit.pas' {FrmDBInfo},
SchedulerHistoryUnit in 'SchedulerHistoryUnit.pas' {FrmHistory},
SchedulerOptionsUnit in 'SchedulerOptionsUnit.pas' {FrmOptions},
SchedulerExtVersionUnit in 'SchedulerExtVersionUnit.pas' {FrmExtVersion},
SchedulerSplashUnit in 'SchedulerSplashUnit.pas' {FrmSplash},
SchedulerInfoUnit in 'SchedulerInfoUnit.pas' {FrmInfo},
SchedulerShippedUnit in 'SchedulerShippedUnit.pas' {FrmShipped}; {<-- This is the new form with the issue}
{$R *.res}
begin
Application.Initialize;
Application.Title := 'SmartWool WIP Scheduling Assistant';
Application.CreateForm(TFrmMain, FrmMain);
Application.CreateForm(TFrmDBInfo, FrmDBInfo);
Application.CreateForm(TFrmHistory, FrmHistory);
Application.CreateForm(TFrmOptions, FrmOptions);
Application.CreateForm(TFrmExtVersion, FrmExtVersion);
Application.Run;
end.
And here is the intialization section of the main form to create the splash:
initialization
FrmSplash:=TFrmSplash.Create(Application);
FrmSplash.Show;
FrmSplash.Refresh;
Edit 3:
Anybody??? Please?
It could be that the program is waiting for timeouts when trying to access resources that are not available on that machine such as network drives or Internet hosts.
Try running Process Monitor when starting up your program and look for file open calls. Filter the output so it only shows your process.
http://technet.microsoft.com/en-us/sysinternals/bb896645
Performance problems initially can seem very daunting at first.
I have been on many teams where people have tried to guess at a reason for performance problems. This sometimes works, but is far less effective than actually measuring the code.
When reproducible on a development machine, I would recommend a profiler.
There was a previous question that asked about
Delphi Profiling tools which has several possible tools you could use.
When you can't reproduce the problem on your development machine, then it becomes a bit more difficult, but not impossible. Typically I have found that problems are related to an application dependency that is different, and not performing well. Understanding the external influences on your application can help pinpoint the problem.
Specifically common external problems in some of my applications.
Network
Database
Application Servers
Installation or Data File Location (i.e. Disk Performance)
Virus and Malware Scanners
Other application interring with yours such as a virus.
To monitor for items related to the network (i.e. Database, web services, etc...)
I typically use Wireshark which allows me to see if resources are responding in expected times. My most common problem is poor performing DNS and can found using Wireshark.
You can use the AutoRuns program to determine everything that starts up when your computer does, it's useful in determine differences between machines.
But most of all I have logging that can be turned on in my applications and this allows me to isolate the problem to a specific area of code. This narrowing down to a specific section of code reduces the guessing, and allows you to focus on a few possible problems.
I created a log function for this that I call at specific places (in your case especially during startup). It adds a timestamp to each log text and stores them in a TMemo that is regularly saved. Not only very helpful when debugging, but may also shed some light on your problem.
Are you using code signing - ie Microsoft Authenticode? If so, then outdated certificate authorities on the computers can cause significant delays to startup.
First, I would try to defragment the hard disk. If still slow, I would check the power supply. Maybe your hard disk are getting insufficient energy.
Check if there is the same antivirus software on those 2 problematic computers. If so, then your Delphi application may match byte pattern used in some virus made in Delphi. Update virus definitions to solve it, or report false alarm to antivirus company, or change antivirus software.
Check if there isn't any printer installed on those 2 problematic computers. If it is so, then add any printer and try again.
Idea 1:
One reason I have seen for very slow application load time, is when printing or reporting system components like Developer Express Express Print, are in your application.
The problem I saw when using Developer Express Printing components, is that I had an offline or non-responsive network printer in my list of printers (check the control panel printer icon) that was not responding. Some of those Developer Express components seem to read some information from each printer you have installed, and the solution was to go to those clients, and delete old printers from their control panel, that were no longer being used. Each not-responding network printer added up to 60 seconds for a TCP Timeout, to the startup time of my application.
Update - Idea 2:
Download MS DebugView and install it on the machine that runs slowly. Now go back to your main development PC, open the IDE, open your main project file (right click on the project, view project source in project viewer), this will show you the contents of your main project source file (.dpr). go to the main begin....end. block. Now set a breakpoint on the main begin statement, and single step INTO (not OVER) and you will see all the module initialization sections. In each one add this: OutputDebugString('ModuleName').
Now when you run this inside the Delphi Ide you will see messages, and see how far apart they come in, and understand what is taking a long time to initialize. Instead of installing the delphi ide onto the machine that runs slowly, Debug View (which is less than 400kb single executable) will be run, and it will show you these debug messages, along with a nice time display (##.# seconds) for each message.
MS Debug view is here.
Are you allowing the forms to be constructed on initialization within the DPR source? If so, you may do well to consider whether or not you want those forms sucking up memory the entire time, more-over if you want those forms to be wasting the application's time on load.
A rule of thumb: If the form is used a LOT during the application's execution, allow it to be constructed when the application loads (this will work out faster over-all than constructing the instance "on-demand").
If the form is not used very often at all (for example, a Dialog or an About Box), delete the "Application.CreateForm" line from the DPR source, and instead construct your instance on request...
var
LForm: TfrmAbout;
begin
Application.CreateForm(LForm, TfrmAbout);
try
LForm.ShowModal;
finally
LForm.Free;
end;
end;
Now that form (which may not even be displayed during the program's execution) is not sucking up system resources, and will not slow down the application's load time.
It may not solve your problem 100%, but it should certainly help!

Application error: fault address 0x00012afb (Expert)

I need some "light" to get a solution. Probably there are tons of things that cause this problem, but maybe somebody could help me.
Scenario: a Windows server running 24/7 a PostgreSQL database and others server applications (for processing tasks on database, etc...). There are differents servers scenarios (~30), with different hardware and windows versions (XP SP3/ WinServer, etc... all NT based). All aplications were written in Delphi7, and link to DLLs (in D7 also).
After some days (sometimes a week, sometimes a couple of months), Windows begins to act strange, like not opening start menu, some buttons are missing in dialogs. And soon some applications do not open, raising a event on eventviewer:
Faulting application x, version y, faulting module kernel32.dll, version 5.1.2600.5781, fault address 0x00012afb
In mean while, others applications open fine, like notepad, iexplore, etc... but SOME of my applications don't, with only event log described above. But if we do not restart system, in a few days even cmd.exe stops open, (and all other applications) with same error on eventlog.
I've tried to find 'what' can cause this, but with no sucess. So, and any advice will be welcome.
Thanks in advance.
I think you are running out of resource handles (Window handles). You can verify this by having a look at the system properties in Sysinternals Process Explorer (a better task manager). I think even the default task manager can help out to display a handle count. Then you can identify which application is causing the trouble.
Once you know the application leaking and if it is yours, you can use Rational purify or Boundschecker to drill down to the problem. If you do not have money for these tools you will have to reduce the problem manually a bit by deactivating some features for example and see if the handle count still increases...
Not sure if it is the problem you are experiencing maybe it is completely unrelated. But easy to check. The track is that some app is stealing some global resources as you experience trouble with other applications. Applications like notepad do not use much resources so appear to work fine, heavy apps are more likely to show up the trouble.
Hope it helps.

Resources