OOM issue in Elasticsearch Cluster 1.1.1 Environment - elasticsearch

I have an Elasticsearch 1.1.1 cluster with two nodes. With a configured heap of 18G each. (RAM on each node is 32G)
Totally we have 6 Shards and one replica for each shard. ES runs on a 64bit JVM on Ubuntu box.
There is only one index in our cluster. Cluster health looks Green. Document count on each node is close to 200Million.
Data used on each cluster node is around 150GB. There are no unassigned shards.
System is encountering the OOM issue (java.lang.OutOfMemoryError: Java heap space).
content of elasticsearch.yml
bootstrap.mlockall: true
transport.tcp.compress: true
indices.fielddata.cache.size: 35%
indices.cache.filter.size: 30%
indices.cache.filter.terms.size: 1024mb
indices.memory.index_buffer_size: 25%
indices.fielddata.breaker.limit: 20%
threadpool:
search:
type: cached
size: 100
queue_size: 1000
It has been noticed that the instances of org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector is occupying most of the heapspace
(around 45%)
I am new to ES. Could someone guide (or comment) on situation on the OOM issue, What could be the cause as we have lot of heapspace allocated ?

To be blunt: You are flogging a dead horse. 1.x is not maintained any more and there are good reasons for that. In the case of OOM: Elasticsearch replaced field data wherever possible with doc values and added more circuit breakers.
What is further complicating the issue is that there is no more documentation for 1.1 on the official docs — only 0.90, 1.3, 1.4,... So at the very least you should upgrade to 1.7 (the latest 1.x release).
Turning to your OOM issue what you could try:
Increase your heap size, decrease the amount of data you are querying, add more nodes, use doc values (on 2.x and up).
And your indices.fielddata.breaker.limit looks fishy to me. I think this config parameter has been renamed to indices.breaker.fielddata.limit in 1.4 and the Elasticsearch Guide states:
In Fielddata Size, we spoke about adding a limit to the size of
fielddata, to ensure that old unused fielddata can be evicted. The
relationship between indices.fielddata.cache.size and
indices.breaker.fielddata.limit is an important one. If the
circuit-breaker limit is lower than the cache size, no data will ever
be evicted. In order for it to work properly, the circuit breaker
limit must be higher than the cache size.

Related

memory management for elasticsearch

I am trying to calculate good balance of total memory in three node es cluster.
If I have three node e.s cluster each with 32G memory, 8 vcpu. Which combination would be more suitable for balancing memory between all the components? I know there will be no fixed answers but just trying to get as accurate as I can.
different elasticsearch components will be used are beats (filebeat, metricbeat,heartbeat), logstash, elasticsearch, kibana.
most use case for this cluster will be, application logs getting indexed and running query on them like fetch average response time for 7 days,30 days, how many are different status codes for last 24 hrs, 7 days etc through curl calls, so aggregation will be used and other use case is monitoring, seeing logs through kibana but no ML jobs or dashboard creation etc..
After going through below official docs, its recommended to set heap size as below,
logstash -
https://www.elastic.co/guide/en/logstash/current/jvm-settings.html#heap-size
The recommended heap size for typical ingestion scenarios should be no less than 4GB and no more than 8GB.
elasticsearch -
https://www.elastic.co/guide/en/elasticsearch/reference/current/advanced-configuration.html#set-jvm-heap-size
Set Xms and Xmx to no more than 50% of your total memory. Elasticsearch requires memory for purposes other than the JVM heap
Kibana -
I have't found default or recommended memory for kibana but in our test cluster of single node of 8G memory it is taking 1.4G as total (256 MB/1.4 GB)
beats -
not found what is the default or recommended memory for beats but they will also consume more or less.
What should the ideal combination from below?
32G = 16G for OS + 16G for Elasticsearch heap.
for logstash 4G from 16G of OS, say three beats will consume 4G, kibana 2G
this leaves OS with 6G and if any new component has to be install in future like say APM or any other OS related then they all will have only 6G with OS.
Above is, per offical recommendation for all components. (i.e 50% for OS and 50% for es)
32G = 8G for elasticsearch heap. (25% for elasticsearch)
4G for logstash + beats 4G + kibana 2G
this leaves 14G for OS and for any future component.
I am missing to cover something that can change this memory combination?
Any suggestion by changing in above combination or any new combination is appreciated.
Thanks,

Why does Elasticsearch Cluster JVM Memory Pressure keep increasing?

The JVM Memory Pressure of my AWS Elasticsearch cluster has been increasing consistently. The pattern I see for the last 3 days is that it adds 1.1% every 1 hour. This is for one of the 3 master nodes I have provisioned.
All other metrics seem to be in the normal range. The CPU is under 10% and there are barely any indexing or search operations being performed.
I have tried clearing the cache for fielddata for all indices as mentioned in this document but that has not helped.
Can anyone help me understand what might be the reason for this?
Got this answer from AWS Support
I checked the particular metric and can also see the JVM increasing from the last few days. However, I do not think this is an issue as JVM is expected to increase over time. Also, the garbage collection in ES runs once the JVM reaches 75% (currently its around 69%), after which you would see a drop in the JVM metric of your cluster. If JVM is being continuously > 75 % and not coming down after GC's is a problem and should be investigated.
The other thing which you mentioned about clearing the cache for
fielddata for all indices was not helping in reducing JVM, that is
because the dedicated master nodes do not hold any indices data and
their related caches. Clearing caches should help in reducing JVM on
the data nodes.

Changing AWS Elasticsearch properties (without elasticsearch.yml) like threadpool queue size

I would like to change my AWS Elasticsearch thread_pool.write.queue_size setting. I see that the recommended technique is to update the elasticsearch.yml file as it can't be done dynamically by the API in the newer versions.
However, since I am using AWS's Elasticsearch service, as far as I'm aware, I don't have access to that file. Is there anyway to make this change? I don't see it referenced for version 6.3 here so I don't know how to do it with AWS.
You do not have a lot of flexibility with AWS ES. In your case, scale your data node instance type to a bigger instance and that should provide you higher thread pool queue size. A note on increasing the number of shards - do not do it unless really required as it may cause performance issues while searching, aggregating etc. A shard can easily hold upto 50 GB of data, so if you have a lot of shards with very less data then think about shrinking the shards. Each shard in itself consumes resources (cpu, memory) etc and shard configuration should be proportional to the heap memory available on the node.

Elasticsearch config tweaking with limited memory

I have following scenario:
A single machine with 32GB of ram runs Elasticsearch 2.4, there is one index with 5 shards that is 25gb in size.
On that index we are constantly indexing new data, plus doing full-text search queries that check about 95% documents - no aggregations. The instance generates a lot of CPU load - there is no swapping.
My question is: how should I tweak elasticsearch memory usage? (I don't have an option to add another machine at this moment)
Should I assign more memory to ES HEAP like 25GB (going over 50% memory that readme advises to not do do), or should I assign minimal HEAP like 1GB-2GB and assume Lucene will cache all the index in memory since its full-text searches?
Right now 50% of server memory so 16GB in this case seems to work best for us.

Slow index speed of Elasticsearch

We deployed ES 2.0 on 3 EC2 c4.4xlarge(16 cores, 32gb memory) nodes, allocating 16G for ES, attached 500GB with io1/4000 IOPS on each.
Problem : We are expecting great performance from this hardware config, however a very slow indexing speed is observed.
Our document is about 10-50k in size, we are using Java transport client to insert. The speed was alright for the first 50,000 at roughly 1000/second, and dramatically slow down to 100-200/second.
In the meanwhile we are looking at the low resource consumption:
CPU is about 1-20% only (16 Core CPU)
IO write is about 4-10Mb/second only
Memory consumption is about 20-30% only
Requirements :So I cannot understand why it is so slow while all the recourses are so free, what can I do to enhance the efficiency? Thanks.
Here is the config file we are using:
cluster.name: {{ env }}-{{ app }}
path.data: /data/es
path.logs: /data/es-logs
network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: ["xxxx"]
bootstrap.mlockall: true
threadpool.search.queue_size: 300
threadpool.index.type: fixed
threadpool.index.size: 16
threadpool.index.queue_size: 250000
index.refresh_interval: 1s
index.translog.flush_threshold_ops: 50000
indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb
script.inline: on
script.indexed: on
http.cors.enabled: true
http.cors.allow-origin: /https?:\/\/localhost(:[0-9]+)?/
Here is htop and iostat while running the job:
Upgrade your ES to latest version, because in recent releases they have made it more production friendly and most stable release now is the latest one 2.3
You can try following things to make indexing go faster:
Make some master nodes, separate from Data nodes as it will reduce load on all your cluster.
Disable OS swapping, ES takes care of that and Check your heap size on all your machines Heap Sizing
Check your documents are of similar size always, you can make use of bulk indexing and tweak you settings in there like chunk_size in number of records or in memory size
If you are using script try to optimize that as they make the indexing slow, you can store the scripted value if possible as preprocessing, as ES is not designed to handle scripting.
Check number of shards per node and try to balance that out across nodes using Routing
Read more on how ES guys suggest production ready system to work Elasticsearch in Production
One more blog on increasing Elasticsearch Indexing performance Performance Considerations for Elasticsearch Indexing
Check this answer for optimal way to setup ELK Stack on three servers. Optimal way to set up ELK stack on three servers

Resources