Understanding garbage collections and generations in dynatrace - performance

I have gone through various websites to understand garbage collector and i got some idea about it.Using dynatrace I'm monitoring the performance of a server under load. Can someone explain me what are these metrics we get in dynatrace GC graph.Such as generations,large object heap,GC caused suspension heap,transactions etc as in the attachement.
Thanks in advance.

On the left side, you have information about the different memory spaces, how big they are and if there was a GC in that space.
Basically if an object survives garbage collections in one space, it gets promoted to the next generation. You also have the large object heap for larger files.
On the left side you have different metrics for the CLR. Some basics like the number of transactions it currently handles, the number of threads and used CPU.
The GC Suspension time shows how much time is spent in GC, so with cleaning up memory and not "actual work". If you have GC suspension of e.g. 30s of a minute interval it means half of the time the CLR is cleaning up memory. This value should not be over 15% constantly.

Related

Spring Data JPA Meta JpaMetamodelMappingContext Memory Consumption

My Spring Data JPA/Hibernate Application consumes over 2GB of memory at start without a single user hitting it. I am using Hazelcast as the second level cache but I had the same issue when I used ehCache as well so that is probably not the cause of the issue.
I ran a profile with a Heap Dump in Visual VM and I see where the bulk of the memory is being consumed by JpaMetamodelMappingContext and secondary a ton of Map objects. I just need help in deciphering what I am seeing and if this is actually a problem. I do have a hundred classes in the model so this may be normal but I have no point of reference. It just seems a bit excessive.
Once I get a load of 100 concurrent users, my memory consumption increases to 6-7 GB. That is quite normal for the amount of data I push around and cache, but I feel like if I could reduce the initial memory, I'd have a lot more room for growth.
I don't think you have a problem here.
Instead, I think you are misinterpreting the data you are looking at.
Note that the heap space diagram displays two numbers: Heap size and Used heap
Heap size (orange) is the amount of memory available to the JVM for the heap.
This means it is the amount that the JVM requested at some point from the OS.
Used heap is the part of the Heap size that is actually used.
Ignoring the startup phase, it grows linear and then drops repeatedly over time.
This is typical behavior of an idling application.
Some part of the application generates a moderate amount of garbage (rising part of the curve) which from time to time gets collected.
The low points of that curve are the amount of memory you are actually really using.
It seems to be about 250MB which doesn't sound very much to me, especially when you say that the total consumption of 6-7GB when actually working sounds reasonable to you.
Some other observations:
Both CPU load and heap grows fast/fluctuates a lot at start time.
This is to be expected because the analysis of repositories and entities happen at that time.
JpaMetamodelMappingContext s retained size is about 23MB.
Again, a good chunk of memory, but not that huge.
This includes the stuff it references, which is almost exclusively metadata from the JPA implementation as you can easily see when you take a look at its source.

Why Go can lower GC pauses to sub 1ms and JVM has not?

So there's that: https://groups.google.com/forum/?fromgroups#!topic/golang-dev/Ab1sFeoZg_8:
Today I submitted changes to the garbage collector that make typical worst-case stop-the-world times less than 100 microseconds. This should particularly improve pauses for applications with many active goroutines, which could previously inflate pause times significantly.
High GC pauses are one of the things JVM users struggle with for a long time.
What are the (architectural?) constraints which prevent JVM from lowering GC pauses to Go levels, but are not affecting Go?
2021 Update: With OpenJDK 16 ZGC now has a max pause time of <1ms and average pause times 50µs
It achieves these goals while still performing compaction, unlike Go's collector.
Update: With OpenJDK 17 Shenandoah exploits the same techniques introduced by ZGC and achieves similar results.
What are the (architectural?) constraints which prevent JVM from lowering GC pauses to golang levels
There aren't any fundamental ones as low-pause GCs have existed for a while (see below). So this may be more a difference of impressions either from historic experience or out-of-the-box configuration rather than what is possible.
High GC pauses are one if the things JVM users struggle with for a long time.
A little googling shows that similar solutions are available for java too
Azul offers a pauseless collector that scales even to 100GB+
Redhat is contributing shenandoah to openjdk and oracle zgc.
IBM offers metronome, also aiming for microsecond pause times
various other realtime JVMs
The other collectors in openjdk are, unlike Go's, compacting generational collectors. That is to avoid fragmentation problems and to provide higher throughput on server-class machines with large heaps by enabling bump pointer allocation and reducing the CPU time spent in GC. And at least under good conditions CMS can achieve single-digit millisecond pauses, despite being paired with a moving young-generation collector.
Go's collector is non-generational, non-compacting and requires write barriers (see this other SO question), which results in lower throughput/more CPU overhead for collections, higher memory footprint (due to fragmentation and needing more headroom) and less cache-efficient placement of objects on the heap (non-compact memory layout).
So GoGC is mostly optimized for pause time while staying relatively simple (by GC standards) at the expense of several other performance and scalability goals.
JVM GCs make different tradeoffs. The older ones often focused on throughput. The more recent ones achieve low pause times and several other goals at the expense of higher complexity.
According to this presentation, Getting to Go: The Journey of Go's Garbage Collector, the Go collectors only utilize half of the heap for live data:
Heap 2X live heap
My impression is that Java GCs generally aim for higher heap utilization, so they make a very different trade-off here.

How fast is the go 1.5 gc with terabytes of RAM?

Java cannot use terabytes of RAM because the GC pause is way too long (minutes). With the recent update to the Go GC, I'm wondering if its GC pauses are short enough for use with huge amounts of RAM, such as a couple of terabytes.
Are there any benchmarks of this yet? Can we use a garbage-collected language with this much RAM now?
tl;dr:
You can't use TBs of RAM with a single Go process right now. Max is 512 GB on Linux, and most that I've seen tested is 240 GB.
With the current background GC, GC workload tends to be more important than GC pauses.
You can understand GC workload as pointers * allocation rate / spare RAM. Of apps using tons of RAM, only those with few pointers or little allocation will have a low GC workload.
I agree with inf's comment that huge heaps are worth asking other folks about (or testing). JimB notes that Go heaps have a hard limit of 512 GB right now, and 18 240 GB is the most I've seen tested.
Some things we know about huge heaps, from the design document and the GopherCon 2015 slides:
The 1.5 collector doesn't aim to cut GC work, just cut pauses by working in the background.
Your code is paused while the GC scans pointers on the stack and in globals.
The 1.5 GC has a short pause on a GC benchmark with a roughly 18GB heap, as shown by the rightmost yellow dot along the bottom of this graph from the GopherCon talk:
Folks running a couple production apps that initially had about 300ms pauses reported drops to ~4ms and ~20ms. Another app reported their 95th percentile GC time went from 279ms to ~10ms.
Go 1.6 added polish and pushed some of the remaining work to the background. As a result, tests with heaps up to a bit over 200GB still saw a max pause time of 20ms, as shown in a slide in an early 2016 State of Go talk:
The same application that had 20ms pause times under 1.5 had 3-4ms pauses under 1.6, with about an 8GB heap and 150M allocations/minute.
Twitch, who use Go for their chat service, reported that by Go 1.7 pause times had been reduced to 1ms with lots of running goroutines.
1.8 took stack scanning out of the stop-the-world phase, bringing most pauses well under 1ms, even on large heaps. Early numbers look good. Occasionally applications still have code patterns that make a goroutine hard to pause, effectively lengthening the pause for all other threads, but generally it's fair to say the GC's background work is now usually much more important than GC pauses.
Some general observations on garbage collection, not specific to Go:
The frequency of collections depends on how quickly you use up the RAM you're willing to give to the process.
The amount of work each collection does depends in part on how many pointers are in use.
(That includes the pointers within slices, interface values, strings, etc.)
Rephrased, an application accessing lots of memory might still not have a GC problem if it only has a few pointers (e.g., it handles relatively few large []byte buffers), and collections happen less often if the allocation rate is low (e.g., because you applied sync.Pool to reuse memory wherever you were chewing through RAM most quickly).
So if you're looking at something involving heaps of hundreds of GB that's not naturally GC-friendly, I'd suggest you consider any of
writing in C or such
moving the bulky data out of the object graph. For example, you could manage data in an embedded DB like bolt, put it in an outside DB service, or use something like groupcache or memcache if you want more of a cache than a DB
running a set of smaller-heap'd processes instead of one big one
just carefully prototyping, testing, and optimizing to avoid memory issues.
The new Java ZGC garbage collector can now use 16 Terrabytes of memory and garbage collect in under 10ms.

How to gain control of a 5GB heap in Haskell?

Currently I'm experimenting with a little Haskell web-server written in Snap that loads and makes available to the client a lot of data. And I have a very, very hard time gaining control over the server process. At random moments the process uses a lot of CPU for seconds to minutes and becomes irresponsive to client requests. Sometimes memory usage spikes (and sometimes drops) hundreds of megabytes within seconds.
Hopefully someone has more experience with long running Haskell processes that use lots of memory and can give me some pointers to make the thing more stable. I've been debugging the thing for days now and I'm starting to get a bit desperate here.
A little overview of my setup:
On server startup I read about 5 gigabytes of data into a big (nested) Data.Map-alike structure in memory. The nested map is value strict and all values inside the map are of datatypes with all their field made strict as well. I've put a lot of time in ensuring no unevaluated thunks are left. The import (depending on my system load) takes around 5-30 minutes. The strange thing is the fluctuation in consecutive runs is way bigger than I would expect, but that's a different problem.
The big data structure lives inside a 'TVar' that is shared by all client threads spawned by the Snap server. Clients can request arbitrary parts of the data using a small query language. The amount of data request usually is small (upto 300kb or so) and only touches a small part of the data structure. All read-only request are done using a 'readTVarIO', so they don't require any STM transactions.
The server is started with the following flags: +RTS -N -I0 -qg -qb. This starts the server in multi-threaded mode, disable idle-time and parallel GC. This seems to speed up the process a lot.
The server mostly runs without any problem. However, every now and then a client request times out and the CPU spikes to 100% (or even over 100%) and keeps doing this for a long while. Meanwhile the server does not respond to request anymore.
There are few reasons I can think of that might cause the CPU usage:
The request just takes a lot of time because there is a lot of work to be done. This is somewhat unlikely because sometimes it happens for requests that have proven to be very fast in previous runs (with fast I mean 20-80ms or so).
There are still some unevaluated thunks that need to be computed before the data can be processed and sent to the client. This is also unlikely, with the same reason as the previous point.
Somehow garbage collection kicks in and start scanning my entire 5GB heap. I can imagine this can take up a lot of time.
The problem is that I have no clue how to figure out what is going on exactly and what to do about this. Because the import process takes such a long time profiling results don't show me anything useful. There seems to be no way to conditionally turn on and off the profiler from within code.
I personally suspect the GC is the problem here. I'm using GHC7 which seems to have a lot of options to tweak how GC works.
What GC settings do you recommend when using large heaps with generally very stable data?
Large memory usage and occasional CPU spikes is almost certainly the GC kicking in. You can see if this is indeed the case by using RTS options like -B, which causes GHC to beep whenever there is a major collection, -t which will tell you statistics after the fact (in particular, see if the GC times are really long) or -Dg, which turns on debugging info for GC calls (though you need to compile with -debug).
There are several things you can do to alleviate this problem:
On the initial import of the data, GHC is wasting a lot of time growing the heap. You can tell it to grab all of the memory you need at once by specifying a large -H.
A large heap with stable data will get promoted to an old generation. If you increase the number of generations with -G, you may be able to get the stable data to be in the oldest, very rarely GC'd generation, whereas you have the more traditional young and old heaps above it.
Depending the on the memory usage of the rest of the application, you can use -F to tweak how much GHC will let the old generation grow before collecting it again. You may be able to tweak this parameter to make this un-garbage collected.
If there are no writes, and you have a well-defined interface, it may be worthwhile making this memory un-managed by GHC (use the C FFI) so that there is no chance of a super-GC ever.
These are all speculation, so please test with your particular application.
I had a very similar issue with a 1.5GB heap of nested Maps. With the idle GC on by default I would get 3-4 secs of freeze on every GC, and with the idle GC off (+RTS -I0), I would get 17 secs of freeze after a few hundred queries, causing a client time-out.
My "solution" was first to increase the client time-out and asking that people tolerate that while 98% of queries were about 500ms, about 2% of the queries would be dead slow. However, wanting a better solution, I ended up running two load-balanced servers and taking them offline from the cluster for performGC every 200 queries, then back in action.
Adding insult to injury, this was a rewrite of an original Python program, which never had such problems. In fairness, we did get about 40% performance increase, dead-easy parallelization and a more stable codebase. But this pesky GC problem...

Java heap bottleneck - how to identify the cause?

I have a J2EE project running on JBoss, with a maximum heap size of 2048m, which is giving strange results under load testing. I've benchmarked the heap and cpu usage and received the following results (series 1 is heap usage, series 2 is cpu usage):
It seems as if the heap is being used properly and getting garbage collected properly around A. When it gets to B however, there appears to be some kind of a bottleneck as there is heap space available, but it never breaks that imaginary line. At the same time, at C, the cpu usage drops dramatically. During this period we also receive an "OutOfMemoryError (GC overhead limit exceeded)," which does not make much sense to me as there is heap space available.
My guess is that there is some kind of bottleneck, but what exactly I can't even imagine. How would you suggest going about finding the cause of the issue? I've profiled the memory usage and noticed that there are quite a few instances of the one class (around a million), but the total size of these instances is fairly small (around 50MB if I remember correctly).
Edit: The server is dedicated to to this application and the CPU usage given is only for the JVM (there should not be any significant CPU usage outside of the JVM). The memory usage is only for the heap, it does not include the permgen space. This problem is reproducible. My main concern is surrounding the limit encountered around B, for which I have not found a plausible explanation yet.
Conclusion: Turns out this was caused by a bunch of long running SQL queries being called concurrently. The returned ResultSets were also very large, possibly explaining the OOME. I still have no reasonable explanation for why there appears to be some limit at B.
From the error message it appears that the JVM is using the parallel scavenger algorithm for garbage collection. The message is dumped along with an OOME error when a lot of time is spent on GC, but not a lot of the heap is recovered.
The document from Sun does not specify if the 98% of the total time consumed is to be read as 98% of the CPU utilization of the process or that of the CPU itself. In either case, I have to draw the following inferences (with limited information):
The garbage collector or the JVM process does not have enough CPU utilization, most likely due to other processes consuming CPU at the same time.
The garbage collector does not have enough CPU utilization since it is a low priority thread, and another memory intensive (but not CPU intensive) thread in the JVM is doing work at the same time, which results in the failure to de-allocate memory.
Based on the above inferences (all, one or none of them could be true), it would be worthwhile to correlate the graph that you're obtained with the runtime behavior of the application as far as users are concerned. In other words, you might find it useful to determine if other processes are kicked off (when your problem occurs), or the part of the application that is in operation (again, when the problem occurs).
In any case, the page referenced above, does give an option to disable the GC overhead limit used by the GC algorithm.
EDIT: If the problem occurs periodically, and can be reproduced, it might turn out to be a memory leak, otherwise (i.e. it occurs sporadically), you are better off tuning the GC algorithm or even changing it.
If I want to know where the "bottlenecks" are, I just get a few stackshots. There's no need to wonder and guess and play detective. They will just tell you.
Usually memory problems and performance problems go hand in hand, so if you fix the performance problems, you will also fix the memory problems (not for certain, though).

Resources