JDK7 Application is getting slow after some Uptime - performance

We have a large JDK7 application deployed on JBoss using several libraries like Hibernate, Spring and so on. After initial startup of the server, the application runs as expected but after some uptime it becomes very slow.
Using a profiler, we have seen that every time certain aspects of out application are slowing down but not always the same aspects. While in one run it might be that hibernate flush slows to a crawl, in another run it might be some DI-code from Spring.
What's going on there?

There is a bug in JDK7 regarding the CodeCache memory area which hit us very, very hard.
Explanation
Basically Java starts up and uses just in time compilation (JIT) to compile just the required parts of the bytecode during runtime. This enables the JVM to de- and recompile certain code fragments during execution. This happend, if the JVM determins, that an initial compilation of a certain code fragment is suboptimal. Oracle introduced a feature named tiered compilation in JDK 7 which allows the VM to do just that.
Compiled code in the JVM is stored in the CodeCache memory area. Up to JDK6 the default was that this area would be filled up and once at a 100% the JIT would stop compiling and an error would be printed to the console, however the application would be running same as before: Everything already compiled would stay compiled, everything not yet compiled would be executed in interpretation mode (which is roughly 100x slower)
This option is named CodeCacheFlushing, it is enabled by default since JDK7u4. The idea is, that once CodeCache is full, the least used parts of compiled code are flushed from memory to make room for other code fragments. That would make the JDK6-default-behaviour (to stop compilation all in all) obsolete. It also allowed for a much smaller CodeCache area (in JDK7 CodeCache is 48M by default/96M if tiered compilation is enabled).
Here comes the bug. In JDK7 once the CodeCache gets full, the JIT is stopped. Next comes the flushing of the CodeCache area. That's it. JIT should be reenabled after flushing is completed but that doesn't happen. Also, there is no warning printed to the console. Worse: prior to disabling the JIT roughly half of the already compiled code is thrown out.
In contrast to JDK6 where everything that was fast will stay fast and only new code will be interpreted, in JDK7 you actually lose already compiled and optimized code! All of the sudden parts of your application that performed well will stop doing so. It is left to chance, which parts of the application slow down, which makes tracking that bugger by profiler nearly impossible: At times the hibernate code for flushing slows down, at other times, its the spring DI code or your own appcode.
Are you affected?
You can use a profiler (JProfiler/YourKit) or JConsole (JVisualVM won't do) to monitor the memory-consumption of CodeCache memory area. Typically the CodeCache amount committed will stay very close to the used amount (say, committed is 23mb, used is 22mb). While your application runs, committed and used go up until committed reaches max. At that point used will drop sharply to 1/2 - 2/3 of max. After that, used whill no longer grow. That's where the bug will hit you. In JConsole, it will look like this:
Why me and not all the others?
Chances are, you are using JBoss. Oracle quickly found out that there are things not like they should be and disabled tiered compilation by default - yet Red Hat in its infinite wisdom decided, it knew better and reenabled it. Basically our webapp runs fine on Weblogic and only JBoss is affected, because without the tiered compilation (not enabled in weblogic) the growth of CodeCache is so small, we never actually hit the 48mb threshold even after weeks of operation.
What can I do?
First of all, decide, whether this bug hits you. Second, make it harder for the bug to damage you. If you disable CodeCacheFlushing at least hitting the bug won't make things worse than they were before. Stopping tiered compilation will make it less probable the bug hits you, same as increasing the amount of CodeCache-Memory available.
You can always try to switch to JDK8, this seems unaffected and also you could implement monitoring in your software to warn you, if CodeCache is running full.
TL;DR
In JDK 7 never enable tiered compilation (disabled by default, enabled in JBoss)
in JBoss 7 always set PRESERVE_JAVA_OPTS=true in standalone.conf
always disable CodeCacheFlushing (-XX:-UseCodeCacheFlushing)
always pack a sufficient amount of memory into CodeCache (-XX:ReservedCodeCacheSize=xxM).

Related

Java7 vs java5 garbage collection

We are planning to migrate our enterprise application currently running on Java5 stack to Java7 stack. We are having issues with implicit gc calls (mainly major gc) causing system to be unstable for a short time(ranging from 5 mins - 30 mins). After analyzing the gc stats, we found that Compact phase is taking quite long time to complete when compared to Mark and sweep phase. I understand compaction is quite complex and time taking but its impacting the app server which is customer facing and few connections being dropped off during this phase.
Now, my question is as we are migrating to Java7, is there a better garbage collection process compared to Java5?
App servers are provided with decent system resources.
Each app server contains 32 cpu cores
contains 64 gb ram
App server is IBM webpshere server
Operating System - 64 bit IBM AIX
As said earlier, gc is happening because of implicit system calls. No explicit system calls invoking gc.
Now, my question is as we are migrating to Java7, is there a better garbage collection process compared to Java5?
Generally yes, though as #Pushkar, you should really be migrating to Java 8.
With respects to the specifics of your application(s), it sounds like you need to tune / retune the garbage collection on Java 5. If you are periodically experiencing 5 to 30 >>minutes<< of unstability due to GC, there is something rather wrong. The current behavior may be due your application or Websphere (e.g. memory leaks, excessive caching, etc), or it may be due to poor GC tuning.
In short, switching to Java 7 (or 8) might make things better "out of the box", but it is likely that you will need to put in more effort to address the underlying cause of your problems.
Finally, I'd advise the obvious things.
Implement the changes in small steps. Don't upgrade your app, websphere version, java version, etc all at the some time.
Do the upgrades of your servers one at a time. Have a roll-back plan in case you get unacceptable performance.
If possible test it all first ... including performance / load testing.
By default, java 7 uses parallelGC on server class machines. If you are using JDK 7 update 4 or later version, switch to G1 garbage collector which might give you better performance. But as #the8472 suggested,it will be good to know what settings you used in java 5 and now in your current environment.
Java 7 reached end of life around April 2015. Why not migrate to 1.8?
GC performance usually improves with java major releases (and in some cases with minor GCs).
You should take a look difference GC tuning flags, following link may help you
http://www.oracle.com/technetwork/articles/java/vmoptions-jsp-140102.html
http://stas-blogspot.blogspot.com/2011/07/most-complete-list-of-xx-options-for.html

Strange memory leak on Mac (Chrome, Firefox and Safari) with our GWT based web UI

We are experiencing a serious memory problem in our GWT based web application when running in Mac, for Chrome, Firefox and Safari.
For example, with Firefox, when looking at the Activity Monitor on Mac, the memory consumption is quickly increasing across time, even through frequent refreshes, and can reach 1 GB after a significant session. Similar phenomena happens for Chrome and Safari.
But, we cannot see a real reason using various profiling tools, including Java JProfiler (for GWT) and Chrome profiler and timeline looking at native JS, listeners and DOM elements.
Actually there are 2 related problems here:
The memory is increasing while using the UI for along time without refresh. In this case, we can see some uncollected garbage SVG elements (we are using SVG based canvas) that are unreachable, but the memory increase in the Activity Monitor is much higher than what we would expect with this garbage.
The memory remains high even after multiple refreshes, and even though the profiler shows that all the above garbage is completely gone.
We are chasing this leak for a while, with no results, so I would appreciate any help.
Thanks,
Yaron.
The problem occurs only on Mac?
What GWT version are you using? 2.5?
Did you saw this issue https://code.google.com/p/google-web-toolkit/issues/detail?id=6938?
I faced several issues with leaks using GWT before, but in 2.5 version it works fine, even in IE!
I've tracked down some leaks in my GWT application before, and it's certainly not easy to determine where they're coming from thanks to Java's garbage collector hiding what's going on. The most common reason for a leak would be cyclical references, so multiple objects can't be garbage collected because they reference each other. They're tough to spot on your own so I use a library called FindBugs - it also comes with a very convenient Eclipse plugin. FindBugs literally finds anything you could possibly consider and has worked wonders for me. BUT make sure to play with the settings first; cyclical reference checking is not enabled by default.
Bruno_Ferreira makes a good point, too - make sure you're up to date with your GWT version as they're always improving memory leaks.

What is the snapshot concept in dart?

I have read that with dart your application can start up to 10x faster because of snapshots. Can anyone explain what it really is and how it works? In what kind of application would i be using snapshots?
Dart's Snapshots are like Smalltalk images in the sense that they allow nearly instant application startup. However, unlike Smalltalk images, Snapshots don't store the program state.
This is especially helpful in slower mobile devices because they are inherently slower and also restricted by memory much more than a desktop system. That reason and the fact that battery usage begs us to close unnecessary programs makes startup speed important.
Dart addresses this issue of slow startup with the heap snapshot feature, which is similar to Smalltalk's image system. The heap of an application is traversed and all objects are written to a simple file. Note: at the moment, the Dart distribution ships with a tool that fires up a Dart VM, loads an application's code, and just before calling main, it takes a snapshot of the heap. The Dart VM can use such a snapshot file to quickly load an application.
The snapshot feature is also used to serialize object graphs that are being sent between Dart Isolates (serialized with SnapshotWriter).
Currently I do not know of any way to initiating a snapshot or dealing with them. In the future, I would expect it to be possible to serve a snapshot file from the web server and having that processed by the browser Dart VM instantaneously.
The snapshot format itself is cross-platform meaning that it works between 32-bit, 64-bit machines and so forth. The format has been made so that it's quick to read into memory with a emphasis on minimizing extra work like pointer fixups.
Here's the source code for snapshot.cc: http://code.google.com/p/dart/source/browse/trunk/dart/runtime/vm/snapshot.cc
and the tests: http://code.google.com/p/dart/source/browse/trunk/dart/runtime/vm/snapshot_test.cc
So the reason why it can speed up an application startup by a factor of 10 is because it's not a bunch of source code like JavaScript that is send as-is and slowly processed afterwards.
And where would you like to use it? Anywhere you possibly can. On the server side, it's basically already happening for you (and doesn't matter really). but on the client-side, that's not possible yet. As I understand it, it will be possible to serve these snapshots to the browser for instant startup, but you really have to wait since it's not available as of now.

Side Effects of running the JVM in debug mode

I'd like to realease a Java application in debug mode to allow for easier debugging when random or hard to reproduce problems occur on the customer side.
However, I want to get a heads up on potential side effects of doing this? From the Java HotSpot Documentation it seems that there should be no performance penalty.
From the link
Full Speed Debugging
The Java HotSpot VM now uses
full-speed debugging. In previous
version of the VM, when debugging was
enabled, the program executed using
only the interpreter. Now, the full
performance advantage of HotSpot
technology is available to programs,
even with compiled code. The improved
performance allows long-running
programs to be more easily debugged.
It also allows testing to proceed at
full speed. Once there is an
exception, the debugger launches with
full visibility to code sources.
Is this accurate or are there hidden caveats, what about memory footprint and are there any other hidden gotchas while using debug mode.
PS: I found this article from AMD which confirmed my initial suspiciion that the original article from oricale doesn't show the full story.
I can't speak for HotSpot, and won't officially for IBM, but I will say there are certainly legal kinds of optimization that aren't possible to undo fully should a decompilation be required in the middle of them, and thus aren't enabled when debug is being asked for in the production JVMs you are likely to use.
Imagine a situation where the optimizer discovers a part of the program is provably not required and by the various language rules (including JSR 133) is legal to remove, the JVM will want to get rid of it. The one wrinkle is debug: removing the code will look odd to the human stepping through it (variables not updating, possibly not stopping on lines when stepping) so the choice is to disable said optimizations in those cases. The same might also be true for opts like stack allocated objects, etc.. so while the JVM says it's "full speed" it's actually closer to "nearly full speed, with some of the funkier opts that can't quite be undone removed".
This question is old but came up while I was searching for any performance impact if you just leave -agentlib:jdwp... on but are not actively debugging.
Summary: Starting with debugging options but not connecting shouldn't impact the speed now (Java 7+).
Before java 6 (ish) you used -Xdebug and this had a definite impact, it shut off the JIT!
In java 6 they changed it to -agentlib and made it better. there were some bugs though that did cause a performance penalty. Here is one of the bugs that was filed against openjdk, My guess is that there were similar problems with The oracle/sun version: https://bugs.openjdk.java.net/browse/JDK-6902182
Note however that the stated goal is that simply enabling debugging by opening the port should not cause any performance penalty.
It looks like, at least in openjdk, the bugs were cleaned up by java 7. I didn't see anything about performance impacts after that.
If you research this further and find negative results, take note of the java version the testing was done under--everything I saw was referring to versions before 7.
I'd love to hear if anyone encountered performance problems in a recent VM just leaving the port enabled.
If you plan to run the app with remote debugging enabled, it can affect security also. Remote debugging leaves a port open on your machine, and by connecting to it, I can do all sorts of fun things with your application.
The program definitely does lot more than simply running when in debugging mode, so it is obvious that performance can not be same. However if you read the statement carefully, it says that new release can run fully optimized code even if in debugging mode which was not possible earlier. Thus the new jvm is much more faster than previous one which could only run in interpreted mode which no optimization.

Comparing cold-start to warm start

Our application takes significantly more time to launch after a reboot (cold start) than if it was already opened once (warm start).
Most (if not all) the difference seems to come from loading DLLs, when the DLLs' are in cached memory pages they load much faster. We tried using ClearMem to simulate rebooting (since its much less time consuming than actually rebooting) and got mixed results, on some machines it seemed to simulate a reboot very consistently and in some not.
To sum up my questions are:
Have you experienced differences in launch time between cold and warm starts?
How have you delt with such differences?
Do you know of a way to dependably simulate a reboot?
Edit:
Clarifications for comments:
The application is mostly native C++ with some .NET (the first .NET assembly that's loaded pays for the CLR).
We're looking to improve load time, obviously we did our share of profiling and improved the hotspots in our code.
Something I forgot to mention was that we got some improvement by re-basing all our binaries so the loader doesn't have to do it at load time.
As for simulating reboots, have you considered running your app from a virtual PC? Using virtualization you can conveniently replicate a set of conditions over and over again.
I would also consider some type of profiling app to spot the bit of code causing the time lag, and then making the judgement call about how much of that code is really necessary, or if it could be achieved in a different way.
It would be hard to truly simulate a reboot in software. When you reboot, all devices in your machine get their reset bit asserted, which should cause all memory system-wide to be lost.
In a modern machine you've got memory and caches everywhere: there's the VM subsystem which is storing pages of memory for the program, then you've got the OS caching the contents of files in memory, then you've got the on-disk buffer of sectors on the harddrive itself. You can probably get the OS caches to be reset, but the on-disk buffer on the drive? I don't know of a way.
How did you profile your code? Not all profiling methods are equal and some find hotspots better than others. Are you loading lots of files? If so, disk fragmentation and seek time might come into play.
Maybe even sticking basic timing information into the code, writing out to a log file and examining the files on cold/warm start will help identify where the app is spending time.
Without more information, I would lean towards filesystem/disk cache as the likely difference between the two environments. If that's the case, then you either need to spend less time loading files upfront, or find faster ways to load files.
Example: if you are loading lots of binary data files, speed up loading by combining them into a single file, then do a slerp of the whole file into memory in one read and parse their contents. Less disk seeks and time spend reading off of disk. Again, maybe that doesn't apply.
I don't know offhand of any tools to clear the disk/filesystem cache, but you could write a quick application to read a bunch of unrelated files off of disk to cause the filesystem/disk cache to be loaded with different info.
#Morten Christiansen said:
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
That makes the customer pay for initializing our app at every boot even when it isn't used, I really don't like that option (neither does Raymond).
One succesful way to speed up application startup is to switch DLLs to delay-load. This is a low-cost change (some fiddling with project settings) but can make startup significantly faster. Afterwards, run depends.exe in profiling mode to figure out which DLLs load during startup anyway, and revert the delay-load on them. Remember that you may also delay-load most Windows DLLs you need.
A very effective technique for improving application cold launch time is optimizing function link ordering.
The Visual Studio linker lets you pass in a file lists all the functions in the module being linked (or just some of them - it doesn't have to be all of them), and the linker will place those functions next to each other in memory.
When your application is starting up, there are typically calls to init functions throughout your application. Many of these calls will be to a page that isn't in memory yet, resulting in a page fault and a disk seek. That's where slow startup comes from.
Optimizing your application so all these functions are together can be a big win.
Check out Profile Guided Optimization in Visual Studio 2005 or later. One of the thing sthat PGO does for you is function link ordering.
It's a bit difficult to work into a build process, because with PGO you need to link, run your application, and then re-link with the output from the profile run. This means your build process needs to have a runtime environment and deal cleaning up after bad builds and all that, but the payoff is typically 10+ or more faster cold launch with no code changes.
There's some more info on PGO here:
http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx
As an alternative to function order list, just group the code that will be called within the same sections:
#pragma code_seg(".startUp")
//...
#pragma code_seg
#pragma data_seg(".startUp")
//...
#pragma data_seg
It should be easy to maintain as your code changes, but has the same benefit as the function order list.
I am not sure whether function order list can specify global variables as well, but use this #pragma data_seg would simply work.
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
Another note, is that .NET 3.5SP1 supposedly has much improved cold-start speed, though how much, I cannot say.
It could be the NICs (LAN Cards) and that your app depends on certain other
services that require the network to come up. So profiling your application alone may not quite tell you this, but you should examine the dependencies for your application.
If your application is not very complicated, you can just copy all the executables to another directory, it should be similar to a reboot. (Cut and Paste seems not work, Windows is smart enough to know the files move to another folder is cached in the memory)

Resources