Why would the JVM suddenly not allocate up to the maximum heap setting, even when running low on memory and with plenty of free OS memory? - memory-management

My JVM is set to have a maximum heap size of 2GB. It is currently running slowly due to being low on memory, but it will not allocate beyond 1841MB (even though it has done so before on this run). I have over 16GB memory free.
Why would this suddenly happen to a running JVM? Could it be because it is "fenced in" - it cannot get a larger continuous range of physical memory?
This is for java 1.8.0_73 (64bit) on Windows 10. But I have seen this now and then for other java versions and on Windows 7 and XP too.

32 bit JVMs usually struggle to use more than about 1800Mb. Exactly how much they can allocate depends on your Operating System and how it lays out the 32 bit address space (which can vary between runs).
Use a 64 bit JVM to get more.

Start JVM with
java -Xmx2048m -Xms2048m
This will preallocate 2GB at JVM startup (even if not needed).

You cannot make a program run faster by just increasing the heap memory. A program may be slow due to various reasons.
In your case, may be it's not because of the memory usage, as the increase in the heap memory does not cause the program to use that memory to the fullest, or run faster. The heap is used up if you create a lot of objects which are in use and not garbage collected.
The other reason for the slowness could be due to some parts of the program using up the processing power (poorly performing algorithms ?).
It could also be due to slow I/O operations (file reads/writes ?).
These are only a few reasons. We can determine the slowness by getting to know more about your program.
You could look for slow running parts of your code by going through its logs (if any) or by using various profiling tools like jconsole (shipped with jdk), VisualVM, etc.
You could also tune your JVM by passing various parameters to customize the Garbage collection, various parts of the heap, thread stack size, etc.

Related

Java buildpack memory calculation

Java buildpack memory calculator with Spring Boot application inside of Docker container with 1GB memory calculates memory as it says in documentation, it takes entire available memory and this are calculated JVM options:
Calculated JVM Memory Configuration: -XX:MaxDirectMemorySize=10M -Xmx747490K -XX:MaxMetaspaceSize=157725K -Xss1M (Total Memory: 1G, Thread Count: 50, Loaded Class Count: 25433, Headroom: 0%)
Question is why does it takes entire available memory and gives it to JVM? It should leave some memory for java process outside of JVM. This can lead to OOM because JVM thinks it has 1GB for itself (747490K for heap), and in reality it has less because some of it's memory is used by native memory, outside of JVM.
Should I not use this calculator and set JVM configuration by myself or I can reconfigure this somehow?
Question is why does it takes entire available memory and gives it to JVM?
The assumption is that the only thing running in your container is your Java application, thus it assigns all of the available memory to be used.
If you do things like shell out and run other processes or run other processes in the container, you need to tell memory calculator so it can take that into account.
This can lead to OOM because JVM thinks it has 1GB for itself (747490K for heap), and in reality it has less because some of it's memory is used by native memory, outside of JVM.
The memory calculator takes into consideration the major memory regions within a Java process. Not just heap. That said, it cannot 100% guarantee that you will never go over your memory limit. That's impossible with a Java app.
There are things you can do as an application developer, like create 10,000 threads or JNI, that cannot be restricted and could potentially consume a whole ton of memory. If you do that, your app will go over its container memory limit and crash.
The memory calculator attempts to give you a reasonable memory configuration for most common Java workloads. Running a web app, running a microservice, running some batch jobs, etc...
If you are doing something that doesn't fit within that pattern, then you can simply tell the memory calculator and it'll adjust things accordingly.
Should I not use this calculator and set JVM configuration by myself or I can reconfigure this somehow?
Even if you need to customize what the calculator is doing it can be helpful. It's additional toil to calculate these values manually, especially when it's so easy to change the memory limits. If your ops team increases the memory limit of the container, you want your application to automatically adjust to that configuration (as well as it can).
Beyond that, memory calculator is also good at detecting problems early. If you configure the JVM manually and you mess it up, let's say you over-allocate memory, the JVM won't necessarily care until it tries to get more memory and can't. At some point down the road, you're going to have a problem but it's not clear when (probably at 3am on a Sat, lol).
With memory calculator, it's doing the math when your container first starts to make sure that memory settings are sane. If there's something off with the configuration, it'll fail and let you know.
TIPS:
You can override a memory calculator-defined value by simply setting that JVM option in the JAVA_TOOL_OPTIONS env variable. For example, if I want to allow for more direct memory, I would set JAVA_TOOL_OPTIONS='-XX:MaxDirectMemorySize=50M'. Then when you restart the container, the memory calculator will shift memory around to accommodate that.
The one thing you don't want to set is -Xmx. The memory calculator should always set this because it will set it to whatever is left after other regions have been accounted for. You can think of it like HEAP = CONTAINER_MEMORY_LIMIT - (all static memory regions).
If you were to set -Xmx, you have to get it exactly right. If it's too low then you're wasting memory. If it's too high then you could exceed the container memory limit and get crashes.
In short, if you think you want to set -Xmx, you should either increase the container memory limit or decrease one of the static memory regions.
If you run other things in the container, you need to set the headroom. This is done with the BPL_JVM_HEAD_ROOM env variable. Give it a percent of the total container memory limit. Ex: BPL_JVM_HEAD_ROOM=20 would use 80% of the container's memory limit for Java and 20 for other stuff.
Setting some headroom can be useful in other cases as well, like if you're troubleshooting a container crash and you want a little extra room, or if you don't like operating at 100% the memory limit. You can leave 5 or 10% unused to match your comfort level.
If you have an application that uses a lot of threads, you'll need to adjust this as well. The default is 250 threads, which works well for many web/servlet-based applications (thread per request model). We do automatically lower to 50 threads if you're specifically using Spring Webflux which does not need so many threads.
For other cases, it's up to you to configure this. For example, if you have a batch application that only needs a thread pool of 10, then you could set this 40 or 50. 40-50 seems weird in this example, but the JVM creates a number of its own threads and you need to account for those in addition to application-specific threads when in doubt look at a thread dump.

How can too much Virtual Memory negatively impact performance? [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Does anyone have experience with using very large heaps, 12 GB or higher in Java?
Does the GC make the program unusable?
What GC params do you use?
Which JVM, Sun or BEA would be better suited for this?
Which platform, Linux or Windows, performs better under such conditions?
In the case of Windows is there any performance difference to be had between 64 bit Vista and XP under such high memory loads?
If your application is not interactive, and GC pauses are not an issue for you, there shouldn't be any problem for 64-bit Java to handle very large heaps, even in hundreds of GBs. We also haven't noticed any stability issues on either Windows or Linux.
However, when you need to keep GC pauses low, things get really nasty:
Forget the default throughput, stop-the-world GC. It will pause you application for several tens of seconds for moderate heaps (< ~30 GB) and several minutes for large ones (> ~30 GB). And buying faster DIMMs won't help.
The best bet is probably the CMS collector, enabled by -XX:+UseConcMarkSweepGC. The CMS garbage collector stops the application only for the initial marking phase and remarking phases. For very small heaps like < 4 GB this is usually not a problem, but for an application that creates a lot of garbage and a large heap, the remarking phase can take quite a long time - usually much less then full stop-the-world, but still can be a problem for very large heaps.
When the CMS garbage collector is not fast enough to finish operation before the tenured generation fills up, it falls back to standard stop-the-world GC. Expect ~30 or more second long pauses for heaps of size 16 GB. You can try to avoid this keeping the long-lived garbage production rate of you application as low as possible. Note that the higher the number of the cores running your application is, the bigger is getting this problem, because the CMS utilizes only one core. Obviously, beware there is no guarantee the CMS does not fall back to the STW collector. And when it does, it usually happens at the peak loads, and your application is dead for several seconds. You would probably not want to sign an SLA for such a configuration.
Well, there is that new G1 thing. It is theoretically designed to avoid the problems with CMS, but we have tried it and observed that:
Its throughput is worse than that of CMS.
It theoretically should avoid collecting the popular blocks of memory first, however it soon reaches a state where almost all blocks are "popular", and the assumptions it is based on simply stop working.
Finally, the stop-the-world fallback still exists for G1; ask Oracle, when that code is supposed to be run. If they say "never", ask them, why the code is there. So IMHO G1 really doesn't make the huge heap problem of Java go away, it only makes it (arguably) a little smaller.
If you have bucks for a big server with big memory, you have probably also bucks for a good, commercial hardware accelerated, pauseless GC technology, like the one offered by Azul. We have one of their servers with 384 GB RAM and it really works fine - no pauses, 0-lines of stop-the-world code in the GC.
Write the damn part of your application that requires lots of memory in C++, like LinkedIn did with social graph processing. You still won't avoid all the problems by doing this (e.g. heap fragmentation), but it would be definitely easier to keep the pauses low.
I am CEO of Azul Systems so I am obviously biased in my opinion on this topic! :) That being said...
Azul's CTO, Gil Tene, has a nice overview of the problems associated with Garbage Collection and a review of various solutions in his Understanding Java Garbage Collection and What You Can Do about It presentation, and there's additional detail in this article: http://www.infoq.com/articles/azul_gc_in_detail.
Azul's C4 Garbage Collector in our Zing JVM is both parallel and concurrent, and uses the same GC mechanism for both the new and old generations, working concurrently and compacting in both cases. Most importantly, C4 has no stop-the-world fall back. All compaction is performed concurrently with the running application. We have customers running very large (hundreds of GBytes) with worse case GC pause times of <10 msec, and depending on the application often times less than 1-2 msec.
The problem with CMS and G1 is that at some point Java heap memory must be compacted, and both of those garbage collectors stop-the-world/STW (i.e. pause the application) to perform compaction. So while CMS and G1 can push out STW pauses, they don't eliminate them. Azul's C4, however, does completely eliminate STW pauses and that's why Zing has such low GC pauses even for gigantic heap sizes.
We have an application that we allocate 12-16 Gb for but it really only reaches 8-10 during normal operation. We use the Sun JVM (tried IBMs and it was a bit of a disaster but that just might have been ignorance on our part...I have friends that swear by it--that work at IBM). As long as you give your app breathing room, the JVM can handle large heap sizes with not too much GC. Plenty of 'extra' memory is key.
Linux is almost always more stable than Windows and when it is not stable it is a hell of a lot easier to figure out why. Solaris is rock solid as well and you get DTrace too :)
With these kind of loads, why on earth would you be using Vista or XP? You are just asking for trouble.
We don't do anything fancy with the GC params. We do set the minimum allocation to be equal to the maximum so it is not constantly trying to resize but that is it.
I have used over 60 GB heap sizes on two different applications under Linux and Solaris respectively using 64-bit versions (obviously) of the Sun 1.6 JVM.
I never encountered garbage collection problems with the Linux-based application except when pushing up near the heap size limit. To avoid the thrashing problems inherent to that scenario (too much time spent doing garbage collection), I simply optimized memory usage throughout the program so that peak usage was about 5-10% below a 64 GB heap size limit.
With a different application running under Solaris, however, I encountered significant garbage-collection problems which made it necessary to do a lot of tweaking. This consisted primarily of three steps:
Enabling/forcing use of the parallel garbage collector via the -XX:+UseParallelGC -XX:+UseParallelOldGC JVM options, as well as controlling the number of GC threads used via the -XX:ParallelGCThreads option. See "Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning" for more details.
Extensive and seemingly ridiculous setting of local variables to "null" after they are no longer needed. Most of these were variables that should have been eligible for garbage collection after going out of scope, and they were not memory leak situations since the references were not copied. However, this "hand-holding" strategy to aid garbage collection was inexplicably necessary for some reason for this application under the Solaris platform in question.
Selective use of the System.gc() method call in key code sections after extensive periods of temporary object allocation. I'm aware of the standard caveats against using these calls, and the argument that they should normally be unnecessary, but I found them to be critical in taming garbage collection when running this memory-intensive application.
The three above steps made it feasible to keep this application contained and running productively at around 60 GB heap usage instead of growing out of control up into the 128 GB heap size limit that was in place. The parallel garbage collector in particular was very helpful since major garbage-collection cycles are expensive when there are a lot of objects, i.e., the time required for major garbage collection is a function of the number of objects in the heap.
I cannot comment on other platform-specific issues at this scale, nor have I used non-Sun (Oracle) JVMs.
12Gb should be no problem with a decent JVM implementation such as Sun's Hotspot.
I would advice you to use the Concurrent Mark and Sweep colllector ( -XX:+UseConcMarkSweepGC) when using a SUN VM.Otherwies you may face long "stop the world" phases, were all threads are stopped during a GC.
The OS should not make a big difference for the GC performance.
You will need of course a 64 bit OS and a machine with enough physical RAM.
I recommend also considering taking a heap dump and see where memory usage can be improved in your app and analyzing the dump in something such as Eclipse's MAT . There are a few articles on the MAT page on getting started in looking for memory leaks. You can use jmap to obtain the dump with something such as ...
jmap -heap:format=b pid
As mentioned above, if you have a non-interactive program, the default (compacting) garbage collector (GC) should work well. If you have an interactive program, and you (1) don't allocate memory faster than the GC can keep up, and (2) don't create temporary objects (or collections of objects) that are too big (relative to the total maximum JVM memory) for the GC to work around, then CMS is for you.
You run into trouble if you have an interactive program where the GC doesn't have enough breathing room. That's true regardless of how much memory you have, but the more memory you have, the worse it gets. That's because when you get too low on memory, CMS will run out of memory, whereas the compacting GCs (including G1) will pause everything until all the memory has been checked for garbage. This stop-the-world pause gets bigger the more memory you have. Trust me, you don't want your servlets to pause for over a minute. I wrote a detailed StackOverflow answer about these pauses in G1.
Since then, my company has switched to Azul Zing. It still can't handle the case where your app really needs more memory than you've got, but up until that very moment it runs like a dream.
But, of course, Zing isn't free and its special sauce is patented. If you have far more time than money, try rewriting your app to use a cluster of JVMs.
On the horizon, Oracle is working on a high-performance GC for multi-gigabyte heaps. However, as of today that's not an option.
If you switch to 64-bit you will use more memory. Pointers become 8 bytes instead of 4. If you are creating lots of objects this can be noticeable seeing as every object is a reference (pointer).
I have recently allocated 15GB of memory in Java using the Sun 1.6 JVM with no problems. Though it is all only allocated once. Not much more memory is allocated or released after the initial amount. This was on a Linux but I imagine the Sun JVM will work just as well on 64-bit Windows.
You should try running visualgc against your app. It´s a heap visualization tool that´s part of the jvmstat download at http://java.sun.com/performance/jvmstat/
It is a lot easier than reading GC logs.
It quickly helps you understand how the parts (generations) of the heap are working. While your total heap may be 10GB, the various parts of the heap will be much smaller. GCs in the Eden portion of the heap are relatively cheap, while full GCs in the old generation are expensive. Sizing your heap so that that the Eden is large and the old generation is hardly ever touched is a good strategy. This may result in a very large overall heap, but what the heck, if the JVM never touches the page, it´s just a virtual page, and doesn´t have to take up RAM.
A couple of years ago, I compared JRockit and the Sun JVM for a 12G heap. JRockit won, and Linux hugepages support made our test run 20% faster. YMMV as our test was very processor/memory intensive and was primarily single-threaded.
here's an article on gc FROM one of Java Champions --
http://kirk.blog-city.com/is_your_concurrent_collector_failing_you.htm
Kirk, the author writes
"Send me your GC logs
I'm currently interested in studying Sun JVM produced GC logs. Since these logs contain no business relevent information it should be ease concerns about protecting proriatary information. All I ask that with the log you mention the OS, complete version information for the JRE, and any heap/gc related command line switches that you have set. I'd also like to know if you are running Grails/Groovey, JRuby, Scala or something other than or along side Java. The best setting is -Xloggc:. Please be aware that this log does not roll over when it reaches your OS size limit. If I find anything interesting I'll be happy to give you a very quick synopsis in return. "
An article from Sun on Java 6 can help you: https://www.oracle.com/java/technologies/javase/troubleshooting-javase.html
The max memory that XP can address is 4 gig(here). So you may not want to use XP for that(use a 64 bit os).
sun has had an itanium 64-bit jvm for a while although itanium is not a popular destination. The solaris and linux 64-bit JVMs should be what you should be after.
Some questions
1) is your application stable ?
2) have you already tested the app in a 32 bit JVM ?
3) is it OK to run multiple JVMs on the same box ?
I would expect the 64-bit OS from windows to get stable in about a year or so but until then, solaris/linux might be better bet.

Why Windows Server 2008 R2 x64 can't allocate more than 1.2GB for 32bit process

I have very strange behavior of the Windows OS regarding 32-bit processes regarding memory allocation. The problem is - it seems that 32-bit process can't allocate more than something around 1.2GB, the limit floats between 1.1GB and 1.3GB depending on conditions I have no idea about.
My environment is:
Windows Server 2008 R2
Total physical RAM - 16GB
RAM in use at the moment of experiment - 6GB (i.e. around 9+GB is free)
I have self-written c++ tool, which bounces memory allocation via malloc() in blocks of 32MB until it gets denial from OS (the upper limit right now is 1070MB, but as I mentioned already - limit floats, the max number I remember was around 1.3GB) and than releases all these blocks in reverse order. This is done in cycle.
So the questions I have are:
Why I can't get close to real limit of 2GB for 32bit process? 1.3GB is only 60% of theoretical limit. Inability to get at least 0.5 GB more seems to be very strange to me. I have 9GB of unused RAM.
What (at runtime) can influence on the top limit? It does change over time - I have no idea why. I'd like to control it somehow - probably there is some magic command to optimize OS address space :)
In reply to comments:
show the loop where you allocate the memory: here it is https://code.google.com/p/membounce/source/browse/
Note: after looking deeper into the code I've noticed the bug which incorrectly calculated the upper limit (when allocation page size is changed dynamically while tool is running :() - the actual limit I'm getting right now is around 1.6 GB - this is better than 1.2 but yet not close to 2GB.
Build to x64 instead: in that case I'm interested particularly in 32bit. The original cause of this topic is sometime I'm not able to run java (specifically Netbeans x32) with Xmx900M (says JVM creation failed), and sometime it DOES run with Xmx1200M (Xmx is JVM parameter of max RAM that can be used by JVM). The reason I want to use non-x64 netbeans is because java (at least Oracle's JVM) effectively consumes twice more memory when switching to x64, i.e I have to allocate 2GB of RAM to JVM to get same efficiency as with 1GB on 32bit JVM. I have number of java processes and want to fit more of them into my 16GB address space :) (this is some kind of offtopic background)

Why is there a huge performance difference between 32 and 64 bit JDK? [duplicate]

Recently I've been doing some benchmarking of the write performance of my company's database product, and I've found that simply switching to a 64bit JVM gives a consistent 20-30% performance increase.
I'm not allowed to go into much detail about our product, but basically it's a column-oriented DB, optimised for storing logs. The benchmark involves feeding it a few gigabytes of raw logs and timing how long it takes to analyse them and store them as structured data in the DB. The processing is very heavy on both CPU and I/O, although it's hard to say in what ratio.
A few notes about the setup:
Processor: Xeon E5640 2.66GHz (4 core) x 2
RAM: 24GB
Disk: 7200rpm, no RAID
OS: RHEL 6 64bit
Filesystem: Ext4
JVMs: 1.6.0_21 (32bit), 1.6.0_23 (64bit)
Max heap size (-Xmx): 512 MB (for both 32bit and 64bit JVMs)
Constants for both JVMs:
Same OS (64bit RHEL)
Same hardware (64bit CPU)
Max heap size fixed to 512 MB (so the speed increase is not due to the 64bit JVM using a larger heap)
For simplicity I've turned off all multithreading options in our product, so pretty much all processing is happening in a single-threaded manner. (When I turned on multi-threading, of course the system got faster, but the ratio between 32bit and 64bit performance stayed about the same.)
So, my question is... Why would I see a 20-30% speed improvement when using a 64bit JVM? Has anybody seen similar results before?
My intuition up until now has been as follows:
64bit pointers are bigger, so the L1 and L2 caches overflow more easily, so performance on the 64bit JVM is worse.
The JVM uses some fancy pointer compression tricks to alleviate the above problem as much as possible. Details on the Sun site here.
The JVM is allowed to use more registers when running in 64bit mode, which speeds things up slightly.
Given the above three points, I would expect 64bit performance to be slightly slower, or approximately equal to, the 32bit JVM.
Any ideas? Thanks in advance.
Edit: Clarified some points about the benchmark environment.
From: http://www.oracle.com/technetwork/java/hotspotfaq-138619.html#64bit_performance
"Generally, the benefits of being able to address larger amounts of memory come with a small performance loss in 64-bit VMs versus running the same application on a 32-bit VM. This is due to the fact that every native pointer in the system takes up 8 bytes instead of 4. The loading of this extra data has an impact on memory usage which translates to slightly slower execution depending on how many pointers get loaded during the execution of your Java program. The good news is that with AMD64 and EM64T platforms running in 64-bit mode, the Java VM gets some additional registers which it can use to generate more efficient native instruction sequences. These extra registers increase performance to the point where there is often no performance loss at all when comparing 32 to 64-bit execution speed.
The performance difference comparing an application running on a 64-bit platform versus a 32-bit platform on SPARC is on the order of 10-20% degradation when you move to a 64-bit VM. On AMD64 and EM64T platforms this difference ranges from 0-15% depending on the amount of pointer accessing your application performs."
Without knowing your hardware I'm just taking some wild stabs
Your specific CPU may be using microcode to 'emulate' some x86 instructions -- most notably the x87 ISA
x64 uses sse math instead of x87 math, I've noticed a %10-%20 speedup of some math-heavy C++ apps in this case. Math differences could be the real killer if you're using strictfp.
Memory. 64 bits gives you much more address space. Maybe the GC is a little less agressive on 64 bits mode because you have extra RAM.
Is your OS is in 64b mode and running a 32b jvm via some wrapper utility?
The 64-bit instruction set has 8 more registers, this should make the code faster overall.
But, since processsors nowaday mostly wait for memory or disk, i suppose that either the memory subsystem or the disk i/o might be more efficient in 64-bit mode.
My best guess, based on a quick google for 32- vs 64-bit performance charts,
is that 64 bit I/O is more efficient. I suppose you do a lot of I/O...
If memcpy is involved when moving the data, it's probably more efficient to copy longs than ints.
Realize that the 64-bit JVM is not magic pixie dust that makes Java apps
go faster. The 64-bit JVM allows heaps >> 4 GB and, as such, only makes sense
for applications which can take advantage of huge memory on systems which
have it.
Generally there is either a slight improvement (due to certain hardware
optimizations on certain platforms) or minor degradation (due to increased
pointer size). Generally speaking there will be a need for fewer GC's -- but
when they do occur they will likely be longer.
In memory databases or search engines that can use the increased memory
for caching objects and thus avoid IPC or disk accesses will see the biggest
application level improvements. In addition a 64-bit JVM will also
allow you to run many, many more threads than a 32-bit one, because
there's more address space for things like thread stacks, etc. The
maximum number of threads generally for a 32-bit JVM is ~1000but ~100000 threads with a 64-bit JVM.
Some drawbacks though:
Additional issues with the 64-bit JVM are that certain client
oriented features like Java Plug-in and Java Web Start
are not supported. Also any native code would also need
to be compatible (e.g. JNI for things like Type II JDBC drivers).
This is a bonus for pure-Java developers as pure apps should
just run out of the box.
More on this Thread at Java.net

Why is the JVM using more memory than I am allocating

I am starting the Java process with the following command:
java -Xmx32m -jar winstone-lite.jar --warfile=myWarFile.war
Instead of using the amount of memory I specified, it is still allocating 144m.
EDIT:
When I say allocate, I mean when I look at the "top" process I am seeing 144m as the amount of memory being used.
I am using http://www.oracle.com/technetwork/java/embedded/documentation/index.html current version.
I would figure that if my application required more memory than I am allocating the jvm would crash.
-Xmxjust tells the JVM how much memory it may use for its internal heap.
The JVM needs memory for other purposes (permanent generation, temporary space etc.), plus like every binary it needs space for its own binary code, plus any libraries/DLLs/.so it loads.
The 144 MiB you quote probably contains at least some of these other memory uses.
How did you measure the memory usage? On modern OS using virtual memory, measuring memory usage of a process is not quite trivial, and cannot be expressed as a single value.

Resources