Elasticsearch Memory Usage increases over time and is at 100% - elasticsearch

I see that indexing performance degraded over a period of time in Elasticsearch. I see that the mem usage has slowly increased over a period of time until it became 100%. At this state I cannot index any more data. I have default shard settings - 5 primary and 1 replica. My index is time based with index created every hour to store coral service logs of various teams. An index size corresponds to about 3GB with 5 shards and with replica it is about 6GB. With a single shard and 0 replicas it comes to about 1.7 GB.
I am using ec2's i2.2x large hosts which offer 1.6TB space and 61GB RAM and 8 cores.
I have set the heap size to 30GB.
Following is node statistics:
https://jpst.it/1eznd
Could you please help in fixing this? My whole cluster came down that I had to delete all the indices.

Related

How much heap memory required for elasticsearch?

I have 8 node cluster(3Master+3Data+2Coordinate) in each data node the heap size is 10gb and disk space is 441gb each total 1.2TB
in each day i have 32.73GB data in each day 26 shards created for 11 indices.So lets suppose the retention period is 30 days.In 30th day the data on cluster would be 982GB and total shards would be 780 each node gets 260 shards.So the average shard size would be 260mb(approx).i read this documentation that a node of 30gb heap size can handle 600 shards.So the question is Can heap size of 10gb can handle 260 shards ?.
This article which you read can be considered a good general recommendations but there are various factors which can affect it, like size of indices, size of shards, size of documents, type of disk, current load on the system and so on, in the same document you can notice the shard size recommendation is between 10 to 50 GB, while you have very small shard size(260 MB as you mentioned), so based on this, I can say 10GB heap can easily handle 260 shards in your case, although you should benchmark your cluster and read more about how ES internally stores the data and searches them so that its easy for you to fine-tune it.

High CPU usage on elasticsearch nodes

we have been using a 3 node Elasticsearch(7.6v) cluster running in docker container. I have been experiencing very high cpu usage on 2 nodes(97%) and moderate CPU load on the other node(55%). Hardware used are m5 xlarge servers.
There are 5 indices with 6 shards and 1 replica. The update operations take around 10 seconds even for updating a single field. similar case is with delete. however querying is quite fast. Is this because of high CPU load?
2 out of 5 indices, continuously undergo a update and write operations as they listen from a kafka stream. size of the indices are 15GB, 2Gb and the rest are around 100MB.
You need to provide more information to find the root cause:
All the ES nodes are running on different docker containers on the same host or different host?
Do you have resource limit on your ES docker containers?
How much heap size of ES and is it 50% of host machine RAM?
Node which have high CPU, holds the 2 write heavy indices which you mentioned?
what is the refresh interval of your indices which receives high indexing requests.
what is the segment size of your 15 GB indices, use https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-segments.html to get this info.
What all you have debugged so far and is there is any interesting info you want to share to find the issue?

how does elasticsearch decide how many shards can be restored in parallel

I made a snapshot of an index with 24 shards. The index is of size 700g.
When restoring, it restores 4 shards in parallel.
My cluster is a new cluster with only one machine w/o replica nodes.
The machine is AWS c3.8xlarge with 32 vCPUs and 60G memory.
I also followed How to speed up Elasticsearch recovery?.
When restoring, the memory usage is full. How does elastic search decide how many shards can be restored in parallel?
I was wondering how I can tune my machines' hardware config to make restore faster. If my cluster has more machines, can the restoring speed be improved linearly?
Basically, for ES 6.x there are two settings that decide how fast is the recovery for primary shards:
cluster.routing.allocation.node_initial_primaries_recoveries Sets the number of primary shards that are recovering in parallel on one node. Defaults is 4. So, for a cluster with N machines, the total number of recovering shards in parallel is N*node_initial_primaries_recoveries (See https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html#_shard_allocation_settings)
indices.recovery.max_bytes_per_sec Decides how much storage is loaded on recovery per single index. Default is 40mb. (See https://www.elastic.co/guide/en/elasticsearch/reference/current/recovery.html)

Optimizing Bulk Indexing in elasticsearch

We have an elastic search cluster of 3 nodes of the following configurations
#Cpu Cores Memory(GB) Disk(GB) IO Performance
36 244.0 48000 very high
The machines are in 3 different zones namely eu-west-1c,eu-west-1a,eu-west-1b.
Each elastic search instance is being allocated 30GB of heap space.
we are using the above cluster for running aggregations only. The cluster has replication factor of 1 and all the string fields are not analyzed , doc_values is true for all the fields.
We are pumping data into this cluster running 6 instances of logstash in parallel ( having a batch size of 1000)
When more instances of logstash are started one by one the nodes of the ElasticSearch cluster starts throwing out of memory error.
What could be the possible optimizations to speed up bulk indexing rate on the cluster?= Will presence of nodes of cluster in the same zone increase bulk indexing? Will adding more nodes in the cluster help ?
Couple of steps taken so far
Increase the bulk queue size from 50 to 1000
Increase refresh interval from 1 seconds to 2 minutes
Changed segments merge throttling to none (
https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing- performance.html)
We cannot set the replication factor to 0 due to inconsistency involved if one of the nodes goes down.

How to improve percolator performance in ElasticSearch?

Summary
We need to increase percolator performance (throughput).
Most likely approach is scaling out to multiple servers.
Questions
How to do scaling out right?
1) Would increasing number of shards in underlying index allow running more percolate requests in parallel?
2) How much memory does ElasticSearch server need if it does percolation only?
Is it better to have 2 servers with 4GB RAM or one server with 16GB RAM?
3) Would having SSD meaningfully help percolator's performance, or it is better to increase RAM and/or number of nodes?
Our current situation
We have 200,000 queries (job search alerts) in our job index.
We are able to run 4 parallel queues that call percolator.
Every query is able to percolate batch of 50 jobs in about 35 seconds, so we can percolate about:
4 queues * 50 jobs per batch / 35 seconds * 60 seconds in minute = 343
jobs per minute
We need more.
Our jobs index have 4 shards and we are using .percolator sitting on top of that jobs index.
Hardware: 2 processors server with 32 cores total. 32GB RAM.
We allocated 8GB RAM to ElasticSearch.
When percolator is working, 4 percolation queues I mentioned above consume about 50% of CPU.
When we tried to increase number of parallel percolation queues from 4 to 6, CPU utilization jumped to 75%+.
What is worse, percolator started to fail with NoShardAvailableActionException:
[2015-03-04 09:46:22,221][DEBUG][action.percolate ] [Cletus
Kasady] [jobs][3] Shard multi percolate failure
org.elasticsearch.action.NoShardAvailableActionException: [jobs][3]
null
That error seems to suggest that we should increase number of shards and eventually add dedicated ElasticSearch server (+ later increase number of nodes).
Related:
How to Optimize elasticsearch percolator index Memory Performance
Answers
How to do scaling out right?
Q: 1) Would increasing number of shards in underlying index allow running more percolate requests in parallel?
A: No. Sharding is only really useful when creating a cluster. Additional shards on a single instance may in fact worsen performance. In general the number of shards should equal the number of nodes for optimal performance.
Q: 2) How much memory does ElasticSearch server need if it does percolation only?
Is it better to have 2 servers with 4GB RAM or one server with 16GB RAM?
A: Percolator Indices reside entirely in memory so the answer is A LOT. It is entirely dependent on the size of your index. In my experience 200 000 searches would require a 50MB index. In memory this index would occupy around 500MB of heap memory. Therefore 4 GB RAM should be enough if this is all you're running. I would suggest more nodes in your case. However as the size of your index grows, you will need to add RAM.
Q: 3) Would having SSD meaningfully help percolator's performance, or it is better to increase RAM and/or number of nodes?
A: I doubt it. As I said before percolators reside in memory so disk performance isn't much of a bottleneck.
EDIT: Don't take my word on those memory estimates. Check out the site plugins on the main ES site. I found Big Desk particularly helpful for watching performance counters for scaling and planning purposes. This should give you more valuable info on estimating your specific requirements.
EDIT in response to comment from #DennisGorelik below:
I got those numbers purely from observation but on reflection they make sense.
200K Queries to 50MB on disk: This ratio means the average query occupies 250 bytes when serialized to disk.
50MB index to 500MB on heap: Rather than serialized objects on disk we are dealing with in memory Java objects. Think about deserializing XML (or any data format really) you generally get 10x larger in-memory objects.

Resources