Elasticsearch 1.5.2 deployment issue - elasticsearch

I have ES 1.5.2 cluster with the following specs:
3 nodes with RAM: 32GB, CPU cores: 8 each
282 total indices
2,564 total shards
799,505,935 total docs
767.84GB total data
ES_HEAP_SIZE=16g
The problem is when I am using Kibana to query some thing (very simple queries), if it a single query it`s working fine, but if I continue to query some more - elastic is getting so slow and eventually stuck because the JVM heap usage (from Marvel) is getting to 87-95%. It happens also when I trying to load some Kibana dashboard and the only solution for this situation is to restart the service on all the nodes.
(This is also happens on ES 2.2.0 , 1 node, with Kibana 4)
What is wrong, what am I missing?
Am I suppose to query less?
EDIT:
I had to mention that I have a lot of empty indices (0 documents) but the shards are counted. This is this way because I set ttl on the documents for 4w, and the empty indices will be deleted with curator.
Also we have not disabled doc_values in 1.5.2 nor 2.2.0 clusters.
The accurate specs are as following (1.5.2):
3 nodes with RAM: 32GB, CPU cores: 8 each
282 total indices = 227 empty + 31 marvel + 1 kibana + 23 data
2,564 total shards = (1135 empty + 31 marvel + 1 kibana + 115 data)* 1 replica
799,505,935 total docs
767.84GB total data
ES_HEAP_SIZE=16g
curl _cat/fielddata?v result:
1.5.2:
total os.cpu.usage primaries.indexing.index_total total.fielddata.memory_size_in_bytes jvm.mem.heap_used_percent jvm.gc.collectors.young.collection_time_in_millis primaries.docs.count device.imei fs.total.available_in_bytes os.load_average.1m index.raw #timestamp node.ip_port.raw fs.total.disk_io_op node.name jvm.mem.heap_used_in_bytes jvm.gc.collectors.old.collection_time_in_millis total.merges.total_size_in_bytes jvm.gc.collectors.young.collection_count jvm.gc.collectors.old.collection_count total.search.query_total
2.1gb 1.2mb 3.5mb 3.4mb 1.1mb 0b 3.5mb 2.1gb 1.9mb 1.8mb 3.6mb 3.6mb 1.7mb 1.9mb 1.7mb 1.6mb 1.5mb 3.5mb 1.5mb 1.5mb 3.2mb
1.9gb 1.2mb 3.4mb 3.3mb 1.1mb 1.5mb 3.5mb 1.9gb 1.9mb 1.8mb 3.5mb 3.6mb 1.7mb 1.9mb 1.7mb 1.5mb 1.5mb 3.4mb 0b 1.5mb 3.2mb
2gb 0b 0b 0b 0b 0b 0b 2gb 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b
2.2.0:
total index_stats.index node.id node_stats.node_id buildNum endTime location.timestamp userActivity.time startTime time shard.state shard.node indoorOutdoor.time shard.index dataThroughput.downloadSpeed
176.2mb 0b 0b 0b 232b 213.5kb 518.8kb 479.7kb 45.5mb 80.1mb 1.4kb 920b 348.7kb 2.5kb 49.1mb

delete the empty indices
for the 1.5 cluster the major usage of your heap is for fielddata - around 9.5GB for each node, 1.2GB for filter cache and around 1.7GB for segments files' metadata
even if you have that snippet in your template to make the strings as not_analyzed, in 1.5 this doesn't automatically mean ES will use doc_values, you need to specifically enable them.
if you enable doc_values now in 1.5.x cluster, the change will be effective with the new indices. For the old indices you need to reindex the data. Or if you have time-based indices (created daily, weekly etc) you just need to wait for the new indices to be created and the old ones to be deleted.
until the doc_values will be predominant in your indices in the 1.5 cluster, what #Val suggested in the comments is the only option: limit the fielddata cache size or add more nodes to your cluster (and implicitly more memory) or increase the RAM on your nodes. Or manually clear the fielddata cache ;-) from time to time.
not related to the memory issue entirely, but try to avoid using ttl. If you don't need some data anymore, simply delete the index, don't rely on ttl, it is much more costly than simply deleting the index. The use of ttl creates can potentially cause issues at search time and affect the overall performance of a cluster, as it deletes documents from indices, which means a lot of updates and a lot of merging to those indices. Since you probably have time-based indices (which means data from yesterday doesn't really change) using ttl brings unnecessary operations on data that should otherwise be static (and which can potentially be optimized).

If your heap is getting affected rapidly while querying, that means you're doing something really heavy in your query, like for example aggregations. Like val and Andrei suggested, the problem might be with your field data going unbounded. I'd suggest to check your mappings and use doc_values and not_analyzed properties wherever applicable to cut down query cost.

Related

How much heap memory required for elasticsearch?

I have 8 node cluster(3Master+3Data+2Coordinate) in each data node the heap size is 10gb and disk space is 441gb each total 1.2TB
in each day i have 32.73GB data in each day 26 shards created for 11 indices.So lets suppose the retention period is 30 days.In 30th day the data on cluster would be 982GB and total shards would be 780 each node gets 260 shards.So the average shard size would be 260mb(approx).i read this documentation that a node of 30gb heap size can handle 600 shards.So the question is Can heap size of 10gb can handle 260 shards ?.
This article which you read can be considered a good general recommendations but there are various factors which can affect it, like size of indices, size of shards, size of documents, type of disk, current load on the system and so on, in the same document you can notice the shard size recommendation is between 10 to 50 GB, while you have very small shard size(260 MB as you mentioned), so based on this, I can say 10GB heap can easily handle 260 shards in your case, although you should benchmark your cluster and read more about how ES internally stores the data and searches them so that its easy for you to fine-tune it.

Elasticsearch shard sizes and JVM memory for elastic cloud configuration

I have a 2-node ES cluster (Elastic Cloud) with 60GB heap size.
Following are my indexes and number of shards allocated.
green open prod-master-account 6 0 6871735 99067 4.9gb 4.9gb
green open prod-master-categories 1 1 221 6 3.5mb 1.7mb
green open prod-v1-apac 4 1 10123830 1405510 11.4gb 5.6gb
green open prod-v1-emea 9 1 28608447 2405254 30.6gb 15gb
green open prod-v1-global 10 1 94955647 12548946 128.1gb 61.2gb
green open prod-v1-latam 2 1 4398361 471038 4.7gb 2.3gb
green open prod-v1-noram 9 1 51933712 6188480 60.1gb 29.2gb
The JVM memory is above 60%. I want to downgrade this cluster to a lower heap size.
But it fails each time and gives a circuit-breaker due to the JVM memory high.
I want to know why the JVM memory is still high? How can I keep the JVM memory low? Am I doing something wrong with sharding?
As the guides says to keep 20 shards per GB, Looking at my configurations its under those values.
How can I downgrade this cluster to a lower heap size cluster?
Much appreciated!
60 GB of HEAP size is not at all recommended for ES process or any other JVM process as beyond 32 GB, JVM doesn't use the compressed object pointers (compressed oops), so you won't get optimal performance.
Please refer to ES official doc on heap setting for more info.
You can try below to optimize the ES heap size
If you have big RAM machines, then try to use the mid-size machine, where you allocate 50% of RAM to ES heap size(should not cross 32 GB heap size threshold).
Assign less primary shards and increase replica shards for better search performance.

Elasticsearch Memory Usage increases over time and is at 100%

I see that indexing performance degraded over a period of time in Elasticsearch. I see that the mem usage has slowly increased over a period of time until it became 100%. At this state I cannot index any more data. I have default shard settings - 5 primary and 1 replica. My index is time based with index created every hour to store coral service logs of various teams. An index size corresponds to about 3GB with 5 shards and with replica it is about 6GB. With a single shard and 0 replicas it comes to about 1.7 GB.
I am using ec2's i2.2x large hosts which offer 1.6TB space and 61GB RAM and 8 cores.
I have set the heap size to 30GB.
Following is node statistics:
https://jpst.it/1eznd
Could you please help in fixing this? My whole cluster came down that I had to delete all the indices.

How can i save disk space in elasticsearch

I have a three nodes cluster each of 114 GB disk capacity. I am pushing syslog events from around 190 devices to this server. The events flow at a rate of 100k withing 15 mins. The index has 5 shards and uses best_compression method. Inspite of this the disk gets filled up soon so i was forced to remove the replica for this index. The index size is 170 GB and each shard is of size 34.1 GB. Now if i get additional disk space and i try to re index this data to a new index with 3 shards and replica will it save disk space ?

OOM issue in Elasticsearch Cluster 1.1.1 Environment

I have an Elasticsearch 1.1.1 cluster with two nodes. With a configured heap of 18G each. (RAM on each node is 32G)
Totally we have 6 Shards and one replica for each shard. ES runs on a 64bit JVM on Ubuntu box.
There is only one index in our cluster. Cluster health looks Green. Document count on each node is close to 200Million.
Data used on each cluster node is around 150GB. There are no unassigned shards.
System is encountering the OOM issue (java.lang.OutOfMemoryError: Java heap space).
content of elasticsearch.yml
bootstrap.mlockall: true
transport.tcp.compress: true
indices.fielddata.cache.size: 35%
indices.cache.filter.size: 30%
indices.cache.filter.terms.size: 1024mb
indices.memory.index_buffer_size: 25%
indices.fielddata.breaker.limit: 20%
threadpool:
search:
type: cached
size: 100
queue_size: 1000
It has been noticed that the instances of org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector is occupying most of the heapspace
(around 45%)
I am new to ES. Could someone guide (or comment) on situation on the OOM issue, What could be the cause as we have lot of heapspace allocated ?
To be blunt: You are flogging a dead horse. 1.x is not maintained any more and there are good reasons for that. In the case of OOM: Elasticsearch replaced field data wherever possible with doc values and added more circuit breakers.
What is further complicating the issue is that there is no more documentation for 1.1 on the official docs — only 0.90, 1.3, 1.4,... So at the very least you should upgrade to 1.7 (the latest 1.x release).
Turning to your OOM issue what you could try:
Increase your heap size, decrease the amount of data you are querying, add more nodes, use doc values (on 2.x and up).
And your indices.fielddata.breaker.limit looks fishy to me. I think this config parameter has been renamed to indices.breaker.fielddata.limit in 1.4 and the Elasticsearch Guide states:
In Fielddata Size, we spoke about adding a limit to the size of
fielddata, to ensure that old unused fielddata can be evicted. The
relationship between indices.fielddata.cache.size and
indices.breaker.fielddata.limit is an important one. If the
circuit-breaker limit is lower than the cache size, no data will ever
be evicted. In order for it to work properly, the circuit breaker
limit must be higher than the cache size.

Resources