Prometheus, how to get the actual Java Garbage Collector memory usage? - spring-boot

Prometheus plugin in Springboot app is sending tons of data, I don't find any highlight to the meaning of what I get from the exporter:
1) What does "jvm_gc_memory_allocated_bytes_total" mean?
2) What does "jvm_gc_memory_promoted_bytes_total" mean?
What I need is the actual memory usage of the Java Garbage Collector, so I'm expecting a value which is always below 2GB (max memory size) but at the moment is 8GB and still raising.
"jvm_gc_memory_allocated_bytes_total"
and
"jvm_gc_memory_promoted_bytes_total"
are the only two Garbage Collector related variables delivered from the exporter.

To answer you questions, there's a help text provided with each exposed metric in Prometheus exposition format:
# HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the young generation memory pool after one GC to before the next
# HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC
These metrics accumulate the allocated bytes in young generation and the promoted bytes which survived a garbage collection and thus they are promoted to the old generation. (very simplified)
From your question, I think you actually are not looking for "memory usage of the Java Garbage Collector" but actually for the managed memory usage of the JVM. These managed pieces are divided in "heap" and "non-heap" (the area tag) on a first level and can be further drilled down into by the id tag.
Here's the metrics you are likely looking for:
jvm_memory_used_bytes{area="heap|nonheap" id="<depends-on-gc-and-jvm>"}
jvm_memory_committed_bytes{area="heap|nonheap" id="<depends-on-gc-and-jvm>"}
jvm_memory_max_bytes{area="heap|nonheap" id="<depends-on-gc-and-jvm>"}
So if you want to get ahold of the currently used heap, you need to sum together the heap area metrics with the following PromQL:
sum(jvm_memory_used_bytes{job="myjob", instance="myhost", area="heap"})

Related

relationship between container_memory_working_set_bytes and process_resident_memory_bytes and total_rss

I'm looking to understanding the relationship of
container_memory_working_set_bytes vs process_resident_memory_bytes vs total_rss (container_memory_rss) + file_mapped so as to better equipped system for alerting on OOM possibility.
It seems against my understanding (which is puzzling me right now) given if a container/pod is running a single process executing a compiled program written in Go.
Why is the difference between container_memory_working_set_bytes is so big(nearly 10 times more) with respect to process_resident_memory_bytes
Also the relationship between container_memory_working_set_bytes and container_memory_rss + file_mapped is weird here, something I did not expect, after reading here
The total amount of anonymous and swap cache memory (it includes transparent hugepages), and it equals to the value of total_rss from memory.status file. This should not be confused with the true resident set size or the amount of physical memory used by the cgroup. rss + file_mapped will give you the resident set size of cgroup. It does not include memory that is swapped out. It does include memory from shared libraries as long as the pages from those libraries are actually in memory. It does include all stack and heap memory.
So cgroup total resident set size is rss + file_mapped how does this value is less than container_working_set_bytes for a container that is running in the given cgroup
Which make me feels something with this stats that I'm not correct.
Following are the PROMQL used to build the above graph
process_resident_memory_bytes{container="sftp-downloader"}
container_memory_working_set_bytes{container="sftp-downloader"}
go_memstats_heap_alloc_bytes{container="sftp-downloader"}
container_memory_mapped_file{container="sftp-downloader"} + container_memory_rss{container="sftp-downloader"}
So the relationship seems is like this
container_working_set_in_bytes = container_memory_usage_bytes - total_inactive_file
container_memory_usage_bytes as its name implies means the total memory used by the container (but since it also includes file cache i.e inactive_file which OS can release under memory pressure) substracting the inactive_file gives container_working_set_in_bytes
Relationship between container_memory_rss and container_working_sets can be summed up using following expression
container_memory_usage_bytes = container_memory_cache + container_memory_rss
cache reflects data stored on a disk that is currently cached in memory. it contains active + inactive file (mentioned above)
This explains why the container_working_set was higher.
Ref #1
Ref #2
Not really an answer, but still two assorted points.
Does this help to make sense of the chart?
Here at my $dayjob, we had faced various different issues with how different tools external to the Go runtime count and display memory usage of a process executing a program written in Go.
Coupled with the fact Go's GC on Linux does not actually release freed memory pages to the kernel but merely madvise(2)s it that such pages are MADV_FREE, a GC cycle which had freed quite a hefty amount of memory does not result in any noticeable change of the readings of the "process' RSS" taken by the external tooling (usually cgroups stats).
Hence we're exporting our own metrics obtained by periodically calling runtime.ReadMemStats (and runtime/debug.ReadGCStats) in any major serivice written in Go — with the help of a simple package written specifically for that. These readings reflect the true idea of the Go runtime about the memory under its control.
By the way, the NextGC field of the memory stats is super useful to watch if you have memory limits set for your containers because once that reading reaches or surpasses your memory limit, the process in the container is surely doomed to be eventually shot down by the oom_killer.

spring boot 2 actuator jvm.gc.memory.allocated Metric

i'm using spring boot 2.2.5 and monitoring my application metrics via spring boot actuator and Grafana.
my application is packaged with Docker(OpenJdk11) and configured with 4GB of memory
i'm having long gc pauses that get around 1-2 seconds, and it is correlated with high jvm.gc.memory.allocated .
the jvm.gc.memory.allocated metrics sometimes gets to 30GB
can anyone explain the jvm.gc.memory.allocated metric ? what does it mean ?
This is a rather weird metric, if you ask me (in the sense that it is not trivial to grasp). Let's take it slow.
First of all it is generated by micrometer here and if you read it's description:
Incremented for an increase in the size of the young generation memory pool after one GC to before the next
it probably would make little sense to you too. I had to look at what the code that computes it, to understand what it does.
If you know some basic things about how a GC works and look at this code, things are rather simple, actually.
A generational garbage collector (like G1) divides the heap into regions (young and old). Allocations of new Objects happen in the young region (unless humongous, but I will not get into it), specifically in Eden space. Once a GC is triggered, Eden is cleared and allocations can happen again. This is rather simplified and not entirely correct (things are slightly different in case of major/mixed collections).
Now that this theory is in place, you can look at what isYoungGenPool is, from :
if (youngGenPoolName != null) {
final long youngBefore = before.get(youngGenPoolName).getUsed();
final long youngAfter = after.get(youngGenPoolName).getUsed();
final long delta = youngBefore - youngGenSizeAfter.get();
youngGenSizeAfter.set(youngAfter);
if (delta > 0L) {
allocatedBytes.increment(delta);
}
}
Specifically, it is defined here:
... endsWith("Eden Space")
As such, this code takes a snapshot - of what Used Space was before and after a GC cycle in the Eden Space, computes a delta and adds all these deltas into a single value. This is what jvm.gc.memory.allocated is.
This sort of measures how much your application allocates during its lifetime, but only via young space. imho, you should carefully look at it since :
it does not track humongous allocations (there is a different metric for this)
it only works for generational garbage collectors (Shenandoah is not such a collector for example)

Spring Data JPA Meta JpaMetamodelMappingContext Memory Consumption

My Spring Data JPA/Hibernate Application consumes over 2GB of memory at start without a single user hitting it. I am using Hazelcast as the second level cache but I had the same issue when I used ehCache as well so that is probably not the cause of the issue.
I ran a profile with a Heap Dump in Visual VM and I see where the bulk of the memory is being consumed by JpaMetamodelMappingContext and secondary a ton of Map objects. I just need help in deciphering what I am seeing and if this is actually a problem. I do have a hundred classes in the model so this may be normal but I have no point of reference. It just seems a bit excessive.
Once I get a load of 100 concurrent users, my memory consumption increases to 6-7 GB. That is quite normal for the amount of data I push around and cache, but I feel like if I could reduce the initial memory, I'd have a lot more room for growth.
I don't think you have a problem here.
Instead, I think you are misinterpreting the data you are looking at.
Note that the heap space diagram displays two numbers: Heap size and Used heap
Heap size (orange) is the amount of memory available to the JVM for the heap.
This means it is the amount that the JVM requested at some point from the OS.
Used heap is the part of the Heap size that is actually used.
Ignoring the startup phase, it grows linear and then drops repeatedly over time.
This is typical behavior of an idling application.
Some part of the application generates a moderate amount of garbage (rising part of the curve) which from time to time gets collected.
The low points of that curve are the amount of memory you are actually really using.
It seems to be about 250MB which doesn't sound very much to me, especially when you say that the total consumption of 6-7GB when actually working sounds reasonable to you.
Some other observations:
Both CPU load and heap grows fast/fluctuates a lot at start time.
This is to be expected because the analysis of repositories and entities happen at that time.
JpaMetamodelMappingContext s retained size is about 23MB.
Again, a good chunk of memory, but not that huge.
This includes the stuff it references, which is almost exclusively metadata from the JPA implementation as you can easily see when you take a look at its source.

Understanding garbage collections and generations in dynatrace

I have gone through various websites to understand garbage collector and i got some idea about it.Using dynatrace I'm monitoring the performance of a server under load. Can someone explain me what are these metrics we get in dynatrace GC graph.Such as generations,large object heap,GC caused suspension heap,transactions etc as in the attachement.
Thanks in advance.
On the left side, you have information about the different memory spaces, how big they are and if there was a GC in that space.
Basically if an object survives garbage collections in one space, it gets promoted to the next generation. You also have the large object heap for larger files.
On the left side you have different metrics for the CLR. Some basics like the number of transactions it currently handles, the number of threads and used CPU.
The GC Suspension time shows how much time is spent in GC, so with cleaning up memory and not "actual work". If you have GC suspension of e.g. 30s of a minute interval it means half of the time the CLR is cleaning up memory. This value should not be over 15% constantly.

How do I record and graph the object lifetime for Java applications?

We want to tune the memory generation pool sizes for our Java application. In order to this we need to first understand how the heap is used. In essence we need to know number, size and lifetime for every object in the JVM heap. After we have collected this data we should be able to find better suited sizes for our young and tenured generation pools.
We base our tuning efforts on information found in the "Tuning Garbage Collection with the 5.0 JVM" whitepaper from Sun/Oracle. In section 3 (http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html#1.1.%20Generations%7Coutline) they discuss generation sizing and show an example on an object lifetime graph. Pretty much what we are trying to achieve for our application.
So far we have been able to record the number of instances for a given class and their respective sizes in memory. However I am unable to find a way to extract an average instance life length. Right now we are looking into jProfiler but so far without success.
Has anybody been successful in graphing the average object lifetime for Java applications?
To tune the GC you typically don't need the lifetime of every object, but a good realtime overview of the pools. The first thing I usually do is to look at the various pools using visualgc which is part of jvmstat (http://java.sun.com/performance/jvmstat/). Then I move on to check for potential memory leaks. This is by far the best way I've come across.
A. In jconsole you can see if you are constantly overflowing into the old gen prematurely (meaning that the eden was too big to fit in the survivor even after it's gc). Should this be the case, check your young size and survivor ratio and try to adjust them so that it doesn't overflow.
B. Also, during "normal" operation it's a good idea to look at the histogram of survivor generations in visualgc and make sure the generations are empty well ahead of becoming too old.
If they do spill over this way, you could potentially have a memory leak. I would then dump the memory with jconsole and have a look at it with MAT (http://www.eclipse.org/mat/):
Launch jconsole.exe and invoke operation dumpHeap() on HotSpotDiagnostic MBean (make sure the filename ends with .hprof)
Open the dump in MAP and see if you recognize any type of object occupying more space than you would expect.
Good luck.

Resources