spring boot 2 actuator jvm.gc.memory.allocated Metric - spring-boot

i'm using spring boot 2.2.5 and monitoring my application metrics via spring boot actuator and Grafana.
my application is packaged with Docker(OpenJdk11) and configured with 4GB of memory
i'm having long gc pauses that get around 1-2 seconds, and it is correlated with high jvm.gc.memory.allocated .
the jvm.gc.memory.allocated metrics sometimes gets to 30GB
can anyone explain the jvm.gc.memory.allocated metric ? what does it mean ?

This is a rather weird metric, if you ask me (in the sense that it is not trivial to grasp). Let's take it slow.
First of all it is generated by micrometer here and if you read it's description:
Incremented for an increase in the size of the young generation memory pool after one GC to before the next
it probably would make little sense to you too. I had to look at what the code that computes it, to understand what it does.
If you know some basic things about how a GC works and look at this code, things are rather simple, actually.
A generational garbage collector (like G1) divides the heap into regions (young and old). Allocations of new Objects happen in the young region (unless humongous, but I will not get into it), specifically in Eden space. Once a GC is triggered, Eden is cleared and allocations can happen again. This is rather simplified and not entirely correct (things are slightly different in case of major/mixed collections).
Now that this theory is in place, you can look at what isYoungGenPool is, from :
if (youngGenPoolName != null) {
final long youngBefore = before.get(youngGenPoolName).getUsed();
final long youngAfter = after.get(youngGenPoolName).getUsed();
final long delta = youngBefore - youngGenSizeAfter.get();
youngGenSizeAfter.set(youngAfter);
if (delta > 0L) {
allocatedBytes.increment(delta);
}
}
Specifically, it is defined here:
... endsWith("Eden Space")
As such, this code takes a snapshot - of what Used Space was before and after a GC cycle in the Eden Space, computes a delta and adds all these deltas into a single value. This is what jvm.gc.memory.allocated is.
This sort of measures how much your application allocates during its lifetime, but only via young space. imho, you should carefully look at it since :
it does not track humongous allocations (there is a different metric for this)
it only works for generational garbage collectors (Shenandoah is not such a collector for example)

Related

Prometheus, how to get the actual Java Garbage Collector memory usage?

Prometheus plugin in Springboot app is sending tons of data, I don't find any highlight to the meaning of what I get from the exporter:
1) What does "jvm_gc_memory_allocated_bytes_total" mean?
2) What does "jvm_gc_memory_promoted_bytes_total" mean?
What I need is the actual memory usage of the Java Garbage Collector, so I'm expecting a value which is always below 2GB (max memory size) but at the moment is 8GB and still raising.
"jvm_gc_memory_allocated_bytes_total"
and
"jvm_gc_memory_promoted_bytes_total"
are the only two Garbage Collector related variables delivered from the exporter.
To answer you questions, there's a help text provided with each exposed metric in Prometheus exposition format:
# HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the young generation memory pool after one GC to before the next
# HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC
These metrics accumulate the allocated bytes in young generation and the promoted bytes which survived a garbage collection and thus they are promoted to the old generation. (very simplified)
From your question, I think you actually are not looking for "memory usage of the Java Garbage Collector" but actually for the managed memory usage of the JVM. These managed pieces are divided in "heap" and "non-heap" (the area tag) on a first level and can be further drilled down into by the id tag.
Here's the metrics you are likely looking for:
jvm_memory_used_bytes{area="heap|nonheap" id="<depends-on-gc-and-jvm>"}
jvm_memory_committed_bytes{area="heap|nonheap" id="<depends-on-gc-and-jvm>"}
jvm_memory_max_bytes{area="heap|nonheap" id="<depends-on-gc-and-jvm>"}
So if you want to get ahold of the currently used heap, you need to sum together the heap area metrics with the following PromQL:
sum(jvm_memory_used_bytes{job="myjob", instance="myhost", area="heap"})

Spring Data JPA Meta JpaMetamodelMappingContext Memory Consumption

My Spring Data JPA/Hibernate Application consumes over 2GB of memory at start without a single user hitting it. I am using Hazelcast as the second level cache but I had the same issue when I used ehCache as well so that is probably not the cause of the issue.
I ran a profile with a Heap Dump in Visual VM and I see where the bulk of the memory is being consumed by JpaMetamodelMappingContext and secondary a ton of Map objects. I just need help in deciphering what I am seeing and if this is actually a problem. I do have a hundred classes in the model so this may be normal but I have no point of reference. It just seems a bit excessive.
Once I get a load of 100 concurrent users, my memory consumption increases to 6-7 GB. That is quite normal for the amount of data I push around and cache, but I feel like if I could reduce the initial memory, I'd have a lot more room for growth.
I don't think you have a problem here.
Instead, I think you are misinterpreting the data you are looking at.
Note that the heap space diagram displays two numbers: Heap size and Used heap
Heap size (orange) is the amount of memory available to the JVM for the heap.
This means it is the amount that the JVM requested at some point from the OS.
Used heap is the part of the Heap size that is actually used.
Ignoring the startup phase, it grows linear and then drops repeatedly over time.
This is typical behavior of an idling application.
Some part of the application generates a moderate amount of garbage (rising part of the curve) which from time to time gets collected.
The low points of that curve are the amount of memory you are actually really using.
It seems to be about 250MB which doesn't sound very much to me, especially when you say that the total consumption of 6-7GB when actually working sounds reasonable to you.
Some other observations:
Both CPU load and heap grows fast/fluctuates a lot at start time.
This is to be expected because the analysis of repositories and entities happen at that time.
JpaMetamodelMappingContext s retained size is about 23MB.
Again, a good chunk of memory, but not that huge.
This includes the stuff it references, which is almost exclusively metadata from the JPA implementation as you can easily see when you take a look at its source.

Why Go can lower GC pauses to sub 1ms and JVM has not?

So there's that: https://groups.google.com/forum/?fromgroups#!topic/golang-dev/Ab1sFeoZg_8:
Today I submitted changes to the garbage collector that make typical worst-case stop-the-world times less than 100 microseconds. This should particularly improve pauses for applications with many active goroutines, which could previously inflate pause times significantly.
High GC pauses are one of the things JVM users struggle with for a long time.
What are the (architectural?) constraints which prevent JVM from lowering GC pauses to Go levels, but are not affecting Go?
2021 Update: With OpenJDK 16 ZGC now has a max pause time of <1ms and average pause times 50µs
It achieves these goals while still performing compaction, unlike Go's collector.
Update: With OpenJDK 17 Shenandoah exploits the same techniques introduced by ZGC and achieves similar results.
What are the (architectural?) constraints which prevent JVM from lowering GC pauses to golang levels
There aren't any fundamental ones as low-pause GCs have existed for a while (see below). So this may be more a difference of impressions either from historic experience or out-of-the-box configuration rather than what is possible.
High GC pauses are one if the things JVM users struggle with for a long time.
A little googling shows that similar solutions are available for java too
Azul offers a pauseless collector that scales even to 100GB+
Redhat is contributing shenandoah to openjdk and oracle zgc.
IBM offers metronome, also aiming for microsecond pause times
various other realtime JVMs
The other collectors in openjdk are, unlike Go's, compacting generational collectors. That is to avoid fragmentation problems and to provide higher throughput on server-class machines with large heaps by enabling bump pointer allocation and reducing the CPU time spent in GC. And at least under good conditions CMS can achieve single-digit millisecond pauses, despite being paired with a moving young-generation collector.
Go's collector is non-generational, non-compacting and requires write barriers (see this other SO question), which results in lower throughput/more CPU overhead for collections, higher memory footprint (due to fragmentation and needing more headroom) and less cache-efficient placement of objects on the heap (non-compact memory layout).
So GoGC is mostly optimized for pause time while staying relatively simple (by GC standards) at the expense of several other performance and scalability goals.
JVM GCs make different tradeoffs. The older ones often focused on throughput. The more recent ones achieve low pause times and several other goals at the expense of higher complexity.
According to this presentation, Getting to Go: The Journey of Go's Garbage Collector, the Go collectors only utilize half of the heap for live data:
Heap 2X live heap
My impression is that Java GCs generally aim for higher heap utilization, so they make a very different trade-off here.

How fast is the go 1.5 gc with terabytes of RAM?

Java cannot use terabytes of RAM because the GC pause is way too long (minutes). With the recent update to the Go GC, I'm wondering if its GC pauses are short enough for use with huge amounts of RAM, such as a couple of terabytes.
Are there any benchmarks of this yet? Can we use a garbage-collected language with this much RAM now?
tl;dr:
You can't use TBs of RAM with a single Go process right now. Max is 512 GB on Linux, and most that I've seen tested is 240 GB.
With the current background GC, GC workload tends to be more important than GC pauses.
You can understand GC workload as pointers * allocation rate / spare RAM. Of apps using tons of RAM, only those with few pointers or little allocation will have a low GC workload.
I agree with inf's comment that huge heaps are worth asking other folks about (or testing). JimB notes that Go heaps have a hard limit of 512 GB right now, and 18 240 GB is the most I've seen tested.
Some things we know about huge heaps, from the design document and the GopherCon 2015 slides:
The 1.5 collector doesn't aim to cut GC work, just cut pauses by working in the background.
Your code is paused while the GC scans pointers on the stack and in globals.
The 1.5 GC has a short pause on a GC benchmark with a roughly 18GB heap, as shown by the rightmost yellow dot along the bottom of this graph from the GopherCon talk:
Folks running a couple production apps that initially had about 300ms pauses reported drops to ~4ms and ~20ms. Another app reported their 95th percentile GC time went from 279ms to ~10ms.
Go 1.6 added polish and pushed some of the remaining work to the background. As a result, tests with heaps up to a bit over 200GB still saw a max pause time of 20ms, as shown in a slide in an early 2016 State of Go talk:
The same application that had 20ms pause times under 1.5 had 3-4ms pauses under 1.6, with about an 8GB heap and 150M allocations/minute.
Twitch, who use Go for their chat service, reported that by Go 1.7 pause times had been reduced to 1ms with lots of running goroutines.
1.8 took stack scanning out of the stop-the-world phase, bringing most pauses well under 1ms, even on large heaps. Early numbers look good. Occasionally applications still have code patterns that make a goroutine hard to pause, effectively lengthening the pause for all other threads, but generally it's fair to say the GC's background work is now usually much more important than GC pauses.
Some general observations on garbage collection, not specific to Go:
The frequency of collections depends on how quickly you use up the RAM you're willing to give to the process.
The amount of work each collection does depends in part on how many pointers are in use.
(That includes the pointers within slices, interface values, strings, etc.)
Rephrased, an application accessing lots of memory might still not have a GC problem if it only has a few pointers (e.g., it handles relatively few large []byte buffers), and collections happen less often if the allocation rate is low (e.g., because you applied sync.Pool to reuse memory wherever you were chewing through RAM most quickly).
So if you're looking at something involving heaps of hundreds of GB that's not naturally GC-friendly, I'd suggest you consider any of
writing in C or such
moving the bulky data out of the object graph. For example, you could manage data in an embedded DB like bolt, put it in an outside DB service, or use something like groupcache or memcache if you want more of a cache than a DB
running a set of smaller-heap'd processes instead of one big one
just carefully prototyping, testing, and optimizing to avoid memory issues.
The new Java ZGC garbage collector can now use 16 Terrabytes of memory and garbage collect in under 10ms.

How do I record and graph the object lifetime for Java applications?

We want to tune the memory generation pool sizes for our Java application. In order to this we need to first understand how the heap is used. In essence we need to know number, size and lifetime for every object in the JVM heap. After we have collected this data we should be able to find better suited sizes for our young and tenured generation pools.
We base our tuning efforts on information found in the "Tuning Garbage Collection with the 5.0 JVM" whitepaper from Sun/Oracle. In section 3 (http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html#1.1.%20Generations%7Coutline) they discuss generation sizing and show an example on an object lifetime graph. Pretty much what we are trying to achieve for our application.
So far we have been able to record the number of instances for a given class and their respective sizes in memory. However I am unable to find a way to extract an average instance life length. Right now we are looking into jProfiler but so far without success.
Has anybody been successful in graphing the average object lifetime for Java applications?
To tune the GC you typically don't need the lifetime of every object, but a good realtime overview of the pools. The first thing I usually do is to look at the various pools using visualgc which is part of jvmstat (http://java.sun.com/performance/jvmstat/). Then I move on to check for potential memory leaks. This is by far the best way I've come across.
A. In jconsole you can see if you are constantly overflowing into the old gen prematurely (meaning that the eden was too big to fit in the survivor even after it's gc). Should this be the case, check your young size and survivor ratio and try to adjust them so that it doesn't overflow.
B. Also, during "normal" operation it's a good idea to look at the histogram of survivor generations in visualgc and make sure the generations are empty well ahead of becoming too old.
If they do spill over this way, you could potentially have a memory leak. I would then dump the memory with jconsole and have a look at it with MAT (http://www.eclipse.org/mat/):
Launch jconsole.exe and invoke operation dumpHeap() on HotSpotDiagnostic MBean (make sure the filename ends with .hprof)
Open the dump in MAP and see if you recognize any type of object occupying more space than you would expect.
Good luck.

Resources