Why does Elasticsearch Cluster JVM Memory Pressure keep increasing? - elasticsearch

The JVM Memory Pressure of my AWS Elasticsearch cluster has been increasing consistently. The pattern I see for the last 3 days is that it adds 1.1% every 1 hour. This is for one of the 3 master nodes I have provisioned.
All other metrics seem to be in the normal range. The CPU is under 10% and there are barely any indexing or search operations being performed.
I have tried clearing the cache for fielddata for all indices as mentioned in this document but that has not helped.
Can anyone help me understand what might be the reason for this?

Got this answer from AWS Support
I checked the particular metric and can also see the JVM increasing from the last few days. However, I do not think this is an issue as JVM is expected to increase over time. Also, the garbage collection in ES runs once the JVM reaches 75% (currently its around 69%), after which you would see a drop in the JVM metric of your cluster. If JVM is being continuously > 75 % and not coming down after GC's is a problem and should be investigated.
The other thing which you mentioned about clearing the cache for
fielddata for all indices was not helping in reducing JVM, that is
because the dedicated master nodes do not hold any indices data and
their related caches. Clearing caches should help in reducing JVM on
the data nodes.

Related

Elasticsearch maximum index count limit

Is there any limit on how many indexes we can create in elastic search?
Can 100 000 indexes be created in Elasticsearch?
I have read that, maximum of 600-1000 indices can be created. Can it be scaled?
eg: I have a number of stores, and the store has items. Each store will have its own index where its items will be indexed.
There is no limit as such, but obviously, you don't want to create too many indices(too many depends on your cluster, nodes, size of indices etc), but in general, it's not advisable as it can have a server impact on cluster functioning and performance.
Please check loggly's blog and their first point is about proper provisioning and below is important relevant text from the same blog.
ES makes it very easy to create a lot of indices and lots and lots of
shards, but it’s important to understand that each index and shard
comes at a cost. If you have too many indices or shards, the
management load alone can degrade your ES cluster performance,
potentially to the point of making it unusable. We’re focusing on
management load here, but running too many indices/shards can also
have pretty significant impacts on your indexing and search
performance.
The biggest factor we’ve found to impact management overhead is the
size of the Cluster State, which contains all of the mappings for
every index in the cluster. At one point, we had a single cluster with
a Cluster State size of over 900MB! The cluster was alive but not
usable.
Edit: Thanks #Silas, who pointed that from ES 2.X, cluster state updates are not that much costly(As the only diff is sent in update call). More info on this change can be found on this ES issue

High CPU usage on elasticsearch nodes

we have been using a 3 node Elasticsearch(7.6v) cluster running in docker container. I have been experiencing very high cpu usage on 2 nodes(97%) and moderate CPU load on the other node(55%). Hardware used are m5 xlarge servers.
There are 5 indices with 6 shards and 1 replica. The update operations take around 10 seconds even for updating a single field. similar case is with delete. however querying is quite fast. Is this because of high CPU load?
2 out of 5 indices, continuously undergo a update and write operations as they listen from a kafka stream. size of the indices are 15GB, 2Gb and the rest are around 100MB.
You need to provide more information to find the root cause:
All the ES nodes are running on different docker containers on the same host or different host?
Do you have resource limit on your ES docker containers?
How much heap size of ES and is it 50% of host machine RAM?
Node which have high CPU, holds the 2 write heavy indices which you mentioned?
what is the refresh interval of your indices which receives high indexing requests.
what is the segment size of your 15 GB indices, use https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-segments.html to get this info.
What all you have debugged so far and is there is any interesting info you want to share to find the issue?

Nodes crashing from sudden Out of Memory error

We are currently running ES 6.5 in 71 nodes using hot-warm architecture. We have 3 master nodes, 34 warm nodes and 34 hot nodes. Hot nodes have 64GB of RAM , 30GB of them for the heap. In the warm nodes we have 128GB of RAM and also 30GB dedicated for the heap.
We've been suffering from some sudden crashes on the hot nodes, these crashes don't happen only when the ingestion rate it's at the peak. Since the cluster is fine with a higher ingestion rate I don't believe we are hitting any limit yet. I got the heap dump from the hot nodes when they crash and I see that 80% of the heap is being used by byte arrays, which means that 80% of the heap (24GB!) are byte arrays of documents we want to index.
I've also analyzed the tasks (GET _tasks?nodes=mynode&detailed) being executed in a hot node right before it crashes and I saw that there are more than 1300 bulk indexing tasks active in the node at that time, 1300 bulk indexing tasks are about 20GB!! of data. Some of those tasks have been running for more than 40 seconds! A healthy node shows about 100 bulk tasks being executed.
Why does ES allow to have 1300 bulk indexing tasks in a node if the bulk indexing queue size is only 10? shouldn't it be rejecting bulk requests if it's already executing 1300?? Is there a way to limit the amount of tasks being executed in a node at a time and reject if we cross certain limit?
I wanted to mention that there are no queries running in the hot nodes at all. I also wanted to mention that the cluster has been fine with higher ingestion rates and only sometimes it seems like some of the nodes get stuck with many indexing bulk requests, they go into full gcs all the time and that makes the node to crash from Out Of Memory, followed by the rest of the nodes. When one or two of the hot nodes start to suffer from Full GCs the rest of the hot nodes are totally fine. The document id is generated by ES so there shouldn't be any hotspotting as far as I know, and if there was, it should be happening all the time.
Honestly I'm running out of ideas and I don't know what else could I check to find out the root of the cause. So any help would be great!
Thanks in advance!

Changing AWS Elasticsearch properties (without elasticsearch.yml) like threadpool queue size

I would like to change my AWS Elasticsearch thread_pool.write.queue_size setting. I see that the recommended technique is to update the elasticsearch.yml file as it can't be done dynamically by the API in the newer versions.
However, since I am using AWS's Elasticsearch service, as far as I'm aware, I don't have access to that file. Is there anyway to make this change? I don't see it referenced for version 6.3 here so I don't know how to do it with AWS.
You do not have a lot of flexibility with AWS ES. In your case, scale your data node instance type to a bigger instance and that should provide you higher thread pool queue size. A note on increasing the number of shards - do not do it unless really required as it may cause performance issues while searching, aggregating etc. A shard can easily hold upto 50 GB of data, so if you have a lot of shards with very less data then think about shrinking the shards. Each shard in itself consumes resources (cpu, memory) etc and shard configuration should be proportional to the heap memory available on the node.

OOM issue in Elasticsearch Cluster 1.1.1 Environment

I have an Elasticsearch 1.1.1 cluster with two nodes. With a configured heap of 18G each. (RAM on each node is 32G)
Totally we have 6 Shards and one replica for each shard. ES runs on a 64bit JVM on Ubuntu box.
There is only one index in our cluster. Cluster health looks Green. Document count on each node is close to 200Million.
Data used on each cluster node is around 150GB. There are no unassigned shards.
System is encountering the OOM issue (java.lang.OutOfMemoryError: Java heap space).
content of elasticsearch.yml
bootstrap.mlockall: true
transport.tcp.compress: true
indices.fielddata.cache.size: 35%
indices.cache.filter.size: 30%
indices.cache.filter.terms.size: 1024mb
indices.memory.index_buffer_size: 25%
indices.fielddata.breaker.limit: 20%
threadpool:
search:
type: cached
size: 100
queue_size: 1000
It has been noticed that the instances of org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector is occupying most of the heapspace
(around 45%)
I am new to ES. Could someone guide (or comment) on situation on the OOM issue, What could be the cause as we have lot of heapspace allocated ?
To be blunt: You are flogging a dead horse. 1.x is not maintained any more and there are good reasons for that. In the case of OOM: Elasticsearch replaced field data wherever possible with doc values and added more circuit breakers.
What is further complicating the issue is that there is no more documentation for 1.1 on the official docs — only 0.90, 1.3, 1.4,... So at the very least you should upgrade to 1.7 (the latest 1.x release).
Turning to your OOM issue what you could try:
Increase your heap size, decrease the amount of data you are querying, add more nodes, use doc values (on 2.x and up).
And your indices.fielddata.breaker.limit looks fishy to me. I think this config parameter has been renamed to indices.breaker.fielddata.limit in 1.4 and the Elasticsearch Guide states:
In Fielddata Size, we spoke about adding a limit to the size of
fielddata, to ensure that old unused fielddata can be evicted. The
relationship between indices.fielddata.cache.size and
indices.breaker.fielddata.limit is an important one. If the
circuit-breaker limit is lower than the cache size, no data will ever
be evicted. In order for it to work properly, the circuit breaker
limit must be higher than the cache size.

Resources