Using G1GC algorithm in prod do we have any algorithm that it has to do major gc after threshold level - g1gc

We are using G1GC algoritm in PRODUCTION, using this algorithm in -XX+useg1gc in environment. My question is a few servers in PROD are getting high heap usage about 95%, so do we have any arguments or option or mechanism that major GC should happen after reaching threshold level?

From the Oracle docs:
-XX:G1ReservePercent=n : Sets the amount of heap that is reserved as a false ceiling to reduce the possibility of promotion failure. The default value is 10.

Related

JVM Tuning options

Our application shows frequent Full GCs after load of about 24 hours. Forcing restarts.
When we analyzed our GC logs, it indicated that the out of 20GB MaxHeap, only 1GB was used for young generation and the 19GB was used by OLD generations.
Command options were -Xms2g -Xmx20g
Should the -Xms2g be bumped to 10GB or make equal to 20GB so that the NewRatio default ratio of 2 can let the young generation use larger portion of the JVM?
Monitored the JVM Heap using jmap -histo and found a pattern when it gives spike in the heap size.
Also tried the GC log analysis using http://gceasy.io
Actually some hibernate entities were getting piled up in the transaction which was an issue in the code.
Got it corrected in the code and it does help keeping the heap under control and not spike it.
On another note, it was observed that though the New-Old ratio was not specified, so the JVM should have used 1:2 ration, but it was not the case.
Hence specifically provided values for NewSize and MaxNewSize and it helped getting better results.

Is higher or lower ALS.checkpointInterval better?

When setting ALS.checkpointInterval, what consideration should be taken in setting it? What does a higher or lower interval mean?
ALS.checkpointInterval value refers the iterations for how many no. of times cache will be checkpointed.
E.g. If interval is set to 10 means that the cache will get checkpointed every 10 iterations.
Checkpointing helps with recovery (when nodes fail) and StackOverflow exceptions caused by long lineage. It also helps with eliminating temporary shuffle files on disk, which can be important when there are many ALS iterations.
Default value is 10. So according to your system memory, you can set the lower or higher value now

Performace impact of using setStatsSampleRate/topology.stats.sample.rate

What is the performance impact of setting topology.stats.sample.rate: 1.0 in yaml?
How this works?
topology.stats.sample.rate configures the rate at which a Storm topology statistics would be calculated.
Default value in defaults.yaml is 0.05. This means only five out of 100 events are taken into account.
The value of 1 means each tuple's statistics is going to be calculated.
Is this going to decrease performance? Most likely many will say yes but since each environment is different, I would say it is better to measure it yourself. Increase and decrease the value and measure the throughput of your topology.

Why does ElasticSearch Heap Size not return to normal?

I have setup elasticsearch and it works great.
I've done a few bulk inserts and did a bit of load testing. However, its been idle for a while and I'm not sure why the Heap size doesn't reduce to about 50mb which is what it was when it started? I'm guessing GC hasn't happened?
Please note the nodes are running on different machines on AWS. They are all on small instances and each instance has 1.7GB of RAM.
Any ideas?
Probably. Hard to say, the JVM manages the memory and does what it thinks is best. It may be avoiding GC cycles because it simply isn't necessary. In fact, it's recommended to set mlockall to true, so that the heap is fully allocated at startup and will never change.
It's not really a problem that ES is using memory for heap...memory is to be used, not saved. Unless you are having memory problems, I'd just ignore it and continue on.
ElasticSearch and Lucene maintains cache data to perform fast sorts or facets.
If your queries are doing sorts, this may increase the Lucene FieldCache size which may not be released because objects here are not eligible for the GC.
So the default threshold (CMSInitiatingOccupancyFraction) of 75% do not apply here.
You can manage FieldCache duration as explained here : http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html

How could I tell if my hadoop config parameter io.sort.factor is too small or too big?

After reading http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html we came to the conclusion our 6-nodes hadoop cluster could use some tuning, and io.sort.factor seems to be a good candidate, as it controls an important tradeoff. We're planning on tweaking and testing, but planning ahead and knowing what to expect and what to watch for seems reasonable.
It's currently on 10. How would we know that it's causing us too much merges? When we raise it, how would we know it's causing too much files to be opened?
Note that we can't follow the blog log extracts directly as it's updated to CDH3b2, and we're working on CDH3u2, and they have changed...
There are a few tradeoffs to consider.
the number of seeks being done when merging files. If you increase the merge factor too high, then the seek cost on disk will exceed the savings from doing a parallel merge (note that OS cache might mitigate this somewhat).
Increasing the sort factor decreases the amount of data in each partition. I believe the number is io.sort.mb / io.sort.factor for each partition of sorted data. I believe the general rule of thumb is to have io.sort.mb = 10 * io.sort.factor (this is based on the seek latency of the disk on the transfer speed, I believe. I'm sure this could be tuned better if it was your bottleneck. If you keep these in line with each other, then the seek overhead from merging should be minimized
If you increase io.sort.mb, then you increase memory pressure on the cluster, leaving less memory available for job tasks. Memory usage for sorting is mapper tasks * io.sort.mb -- so you could find yourself causing extra GCs if this is too high
Essentially,
If you find yourself swapping heavily, then there's a good chance you have set the sort factor too high.
If the ratio between io.sort.mb and io.sort.factor isn't correct, then you may need to change io.sort.mb (if you have the memory) or lower the sort factor.
If you find that you are spending more time in your mappers than in your reducers, then you may want to increase the number of map tasks and decrease the sort factor (assuming there is memory pressure).

Resources