why elasticsearch cluster's heap usage so mutch? - elasticsearch

why my elasticsearch cluster's heap usage so mutch. while there just a littile request?

Elasticsearch uses memory for few different purposes, all in order to provide faster search times and high indexing throughput.
The heap usage you pasted doesn't look high. Elasticsearch runs its Java process in the JVM. The JVM manages the heap and clears memory through garbage collection. looking at heap usage graph you will usually see the usage increasing constantly, and then sudden drops - which happen when the GC (Garbage collector) releases the unused memory.
Do you have a specific issue with memory? it looks ok - ranging between 40%-70% is perfectly normal.

Related

Cassandra java process using more memory than its allocated max heap size (Xmx)

We have our cassandra cluster which runs Apache Cassandra 3.11.4 in set of unix hosts (18). each of these host has 96G of RAM and we have configured heap size to -Xms=64G -Xmx=64G but top command (top -M) on hosts shows the actual memory utilization is ~85G on average i.e. much higher than allocated heap (64G).
the trends of memory usage are like, during startup of cassandra daemon, top -M show the process has already occupied ~75G which (75G-64G)=9G more than allocated heap size, and this memory utilization increases over time and reaches to max 85G in just 3-4 hours and remains at that stage throughout the time, while the heap utilization (~40-50%) is normal, GS activities are usual, minor GC kicks in as usual.
have confirmed that the total off-heap memory utilized by all the keyspaces are below 2G on each hosts.
We are unable to trace what else is consuming the RAM in addition to the allocated heap.
Besides the heap memory, Cassandra uses also the off-heap memory, for example for keeping compression metadata, bloom filters, and some other things. From documentation (1, 2):
Compression metadata is stored off-heap and scales with data on disk. This often requires 1-3GB of off-heap RAM per terabyte of data on disk, though the exact usage varies with chunk_length_in_kb and compression ratios.
Bloom filters are stored in RAM, but are stored offheap, so operators should not consider bloom filters when selecting the maximum heap size.
You can monitor heap & offheap memory usage using the JMX, for example. (I've seen setups, where bloom filter alone occupied ~40Gb of RAM, but it was heavily dependent on the number of the unique partition keys)
Too big heaps are usually not recommended because they can use long pauses, etc. It of course depends on the workload, but you can try 31Gb or lower (or just use default settings). Plus, you need to leave the memory for Linux file buffers so it will cache often used files. That is the reason why by default Cassandra allocates only 1/4th of system memory for heap.

Prometheus, how to get the actual Java Garbage Collector memory usage?

Prometheus plugin in Springboot app is sending tons of data, I don't find any highlight to the meaning of what I get from the exporter:
1) What does "jvm_gc_memory_allocated_bytes_total" mean?
2) What does "jvm_gc_memory_promoted_bytes_total" mean?
What I need is the actual memory usage of the Java Garbage Collector, so I'm expecting a value which is always below 2GB (max memory size) but at the moment is 8GB and still raising.
"jvm_gc_memory_allocated_bytes_total"
and
"jvm_gc_memory_promoted_bytes_total"
are the only two Garbage Collector related variables delivered from the exporter.
To answer you questions, there's a help text provided with each exposed metric in Prometheus exposition format:
# HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the young generation memory pool after one GC to before the next
# HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC
These metrics accumulate the allocated bytes in young generation and the promoted bytes which survived a garbage collection and thus they are promoted to the old generation. (very simplified)
From your question, I think you actually are not looking for "memory usage of the Java Garbage Collector" but actually for the managed memory usage of the JVM. These managed pieces are divided in "heap" and "non-heap" (the area tag) on a first level and can be further drilled down into by the id tag.
Here's the metrics you are likely looking for:
jvm_memory_used_bytes{area="heap|nonheap" id="<depends-on-gc-and-jvm>"}
jvm_memory_committed_bytes{area="heap|nonheap" id="<depends-on-gc-and-jvm>"}
jvm_memory_max_bytes{area="heap|nonheap" id="<depends-on-gc-and-jvm>"}
So if you want to get ahold of the currently used heap, you need to sum together the heap area metrics with the following PromQL:
sum(jvm_memory_used_bytes{job="myjob", instance="myhost", area="heap"})

ElasticSearch heap size, half of all ram or half of available?

This may be a silly question, but I really confused. According to this document Give less than half your memory to lucene, it is recommend to give 50% of memory to ES, and lucene will take the rest. My question is I have a 8GB machine with some daemons running, that leaves 4GB available. should I set es heap size to half of 8 or half of 4?
If you have 8GB of physical memory available and 4GB are already taken by some processes, that means no other process will be able to use that space.
This means that you effectively have 4GB of memory left that you can share between ES (for the heap) and Lucene (for the file cache), i.e. 2GB to ES and Lucene will take what's left.

Are Solr's caches created in the JVM's heap memory or in off-heap (direct) memory?

I've looked for an answer to this but haven't found one. The official Solr documentation on cache doesn't say anything on this, and while the official docs on Performance Issues touch a bit on heap and memory consumption, they still don't specify how cache is handled.
I think it's an important question, because when configuring large caches, clearly it's them that'll take the most memory as compared to other Solr components/actions, so it'd be nice to know if I should adjust the JVM max heap parameter (Xmx) or the direct memory parameter (-XX:MaxDirectMemorySize).
Solr caches are on the JVM Heap.
Heliosearch, a Solr fork that is currently a drop-in replacement, does store the filter caches off-heap.

Java heap bottleneck - how to identify the cause?

I have a J2EE project running on JBoss, with a maximum heap size of 2048m, which is giving strange results under load testing. I've benchmarked the heap and cpu usage and received the following results (series 1 is heap usage, series 2 is cpu usage):
It seems as if the heap is being used properly and getting garbage collected properly around A. When it gets to B however, there appears to be some kind of a bottleneck as there is heap space available, but it never breaks that imaginary line. At the same time, at C, the cpu usage drops dramatically. During this period we also receive an "OutOfMemoryError (GC overhead limit exceeded)," which does not make much sense to me as there is heap space available.
My guess is that there is some kind of bottleneck, but what exactly I can't even imagine. How would you suggest going about finding the cause of the issue? I've profiled the memory usage and noticed that there are quite a few instances of the one class (around a million), but the total size of these instances is fairly small (around 50MB if I remember correctly).
Edit: The server is dedicated to to this application and the CPU usage given is only for the JVM (there should not be any significant CPU usage outside of the JVM). The memory usage is only for the heap, it does not include the permgen space. This problem is reproducible. My main concern is surrounding the limit encountered around B, for which I have not found a plausible explanation yet.
Conclusion: Turns out this was caused by a bunch of long running SQL queries being called concurrently. The returned ResultSets were also very large, possibly explaining the OOME. I still have no reasonable explanation for why there appears to be some limit at B.
From the error message it appears that the JVM is using the parallel scavenger algorithm for garbage collection. The message is dumped along with an OOME error when a lot of time is spent on GC, but not a lot of the heap is recovered.
The document from Sun does not specify if the 98% of the total time consumed is to be read as 98% of the CPU utilization of the process or that of the CPU itself. In either case, I have to draw the following inferences (with limited information):
The garbage collector or the JVM process does not have enough CPU utilization, most likely due to other processes consuming CPU at the same time.
The garbage collector does not have enough CPU utilization since it is a low priority thread, and another memory intensive (but not CPU intensive) thread in the JVM is doing work at the same time, which results in the failure to de-allocate memory.
Based on the above inferences (all, one or none of them could be true), it would be worthwhile to correlate the graph that you're obtained with the runtime behavior of the application as far as users are concerned. In other words, you might find it useful to determine if other processes are kicked off (when your problem occurs), or the part of the application that is in operation (again, when the problem occurs).
In any case, the page referenced above, does give an option to disable the GC overhead limit used by the GC algorithm.
EDIT: If the problem occurs periodically, and can be reproduced, it might turn out to be a memory leak, otherwise (i.e. it occurs sporadically), you are better off tuning the GC algorithm or even changing it.
If I want to know where the "bottlenecks" are, I just get a few stackshots. There's no need to wonder and guess and play detective. They will just tell you.
Usually memory problems and performance problems go hand in hand, so if you fix the performance problems, you will also fix the memory problems (not for certain, though).

Resources