How to prevent Elasticsearch from index throttling? - performance

I have a 40 node Elasticsearch cluster which is hammered by a high index request rate. Each of these nodes makes use of an SSD for the best performance. As suggested from several sources, I have tried to prevent index throttling with the following configuration:
indices.store.throttle.type: none
Unfortunately, I'm still seeing performance issues as the cluster still periodically throttles indices. This is confirmed by the following logs:
[2015-03-13 00:03:12,803][INFO ][index.engine.internal ] [CO3SCH010160941] [siphonaudit_20150313][19] now throttling indexing: numMergesInFlight=6, maxNumMerges=5
[2015-03-13 00:03:12,829][INFO ][index.engine.internal ] [CO3SCH010160941] [siphonaudit_20150313][19] stop throttling indexing: numMergesInFlight=4, maxNumMerges=5
[2015-03-13 00:03:13,804][INFO ][index.engine.internal ] [CO3SCH010160941] [siphonaudit_20150313][19] now throttling indexing: numMergesInFlight=6, maxNumMerges=5
[2015-03-13 00:03:13,818][INFO ][index.engine.internal ] [CO3SCH010160941] [siphonaudit_20150313][19] stop throttling indexing: numMergesInFlight=4, maxNumMerges=5
[2015-03-13 00:05:00,791][INFO ][index.engine.internal ] [CO3SCH010160941] [siphon_20150313][6] now throttling indexing: numMergesInFlight=6, maxNumMerges=5
[2015-03-13 00:05:00,808][INFO ][index.engine.internal ] [CO3SCH010160941] [siphon_20150313][6] stop throttling indexing: numMergesInFlight=4, maxNumMerges=5
[2015-03-13 00:06:00,861][INFO ][index.engine.internal ] [CO3SCH010160941] [siphon_20150313][6] now throttling indexing: numMergesInFlight=6, maxNumMerges=5
[2015-03-13 00:06:00,879][INFO ][index.engine.internal ] [CO3SCH010160941] [siphon_20150313][6] stop throttling indexing: numMergesInFlight=4, maxNumMerges=5
The throttling occurs after one of the 40 nodes dies for various expected reasons. The cluster immediately enters a yellow state, in which a number of shards will begin initializing on the remaining nodes.
Any idea why the cluster continues to throttle after explicitly configuring it not to? Any other suggestions to have the cluster more quickly return to a green state after a node failure?

The setting that actually corresponds to the maxNumMerges in the log file is called index.merge.scheduler.max_merge_count. Increasing this along with index.merge.scheduler.max_thread_count (where max_thread_count <= max_merge_count) will increase the number of simultaneous merges which are allowed for segments within an individual index's shards.
If you have a very high indexing rate that results in many GBs in a single index, you probably want to raise some of the other assumptions that the Elasticsearch default settings make about segment size, too. Try raising the floor_segment - the minimum size before a segment will be considered for merging, the max_merged_segment - the maximum size of a single segment, and the segments_per_tier -- the number of segments of roughly equivalent size before they start getting merged into a new tier. On an application that has a high indexing rate and finished index sizes of roughly 120GB with 10 shards per index, we use the following settings:
curl -XPUT /index_name/_settings -d'
{
"settings": {
"index.merge.policy.max_merge_at_once": 10,
"index.merge.scheduler.max_thread_count": 10,
"index.merge.scheduler.max_merge_count": 10,
"index.merge.policy.floor_segment": "100mb",
"index.merge.policy.segments_per_tier": 25,
"index.merge.policy.max_merged_segment": "10gb"
}
}
Also, one important thing you can do to improve loss-of-node/node restarted recovery time on applications with high indexing rates is taking advantage of index recovery prioritization (in ES >= 1.7). Tune this setting so that the indices that receive the most indexing activity are recovered first. As you may know, the "normal" shard initialization process just copies the already-indexed segment files between nodes. However, if indexing activity is occurring against a shard before or during initialization, the translog with the new documents can become very large. In the scenario where merging goes through the roof during recovery, it's the replay of this translog against the shard that is almost always the culprit. Thus, using index recovery prioritization to recover those shards first and delay shards with less indexing activity, you can minimize the eventual size of the translog which will dramatically improve recovery time.

We are using 1.7 and noticed a similar problem: The indexing getting throttled even when the IO was not saturated (Fusion IO in our case).
After increasing "index.merge.scheduler.max_thread_count" the problem seems to be gone -- we did not see any more throttling being logged so far.
I would try setting "index.merge.scheduler.max_thread_count" to at least the max reported numMergesInFlight (6 in the logs above).
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/index-modules-merge.html#scheduling
Hope this helps!

Have you looked into increasing the shard allocation delay to give the node time to recover before the master starts promoting replicas?
https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html

try setting index.merge.scheduler.max_thread_count to 1
https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing

Related

optimization on old indexes collecting logs from my apps

I have an elastic cluster with 3x nodes(each 6x cpu, 31GB heap , 64GB RAM) collecting 25GB logs per day , but after 3x months I realized my dashboards become very slow when checking stats in past weeks , please, advice if there is an option to improve the indexes read erformance so it become faster when calculating my dashboard stats?
Thanks!
I would suggest you try to increase the shards number
when you have more shards Elasticsearch will split your data over the shards so as a result, Elastic will send multiple parallel requests to search in a smaller data stack
for Shards number you could try to split it based on your heap memory size
No matter what actual JVM heap size you have, the upper bound on the maximum shard count should be 20 shards per 1 GB of heap configured on the server.
ElasticSearch - Optimal number of Shards per node
https://qbox.io/blog/optimizing-elasticsearch-how-many-shards-per-index
https://opster.com/elasticsearch-glossary/elasticsearch-choose-number-of-shards/
It seems that the amount of data that you accumulated and use for your dashboard is causing performance problems.
A straightforward option is to increase your cluster's resources but then you're bound to hit the same problem again. So you should rather rethink your data retention policy.
Chances are that you are really only interested in most recent data. You need to answer the question what "recent" means in your use case and simply discard anything older than that.
Elasticsearch has tools to automate this, look into Index Lifecycle Management.
What you probably need is to create an index template and apply a lifecycle policy to it. Elasticsearch will then handle automatic rollover of indices, eviction of old data, even migration through data tiers in hot-warm-cold architecture if you really want very long retention periods.
All this will lead to a more predictable performance of your cluster.

Frequent circuit Breaking Exceptions occurring in ES Cluster V7.3.1?

We are using ES v7.3.1 and from past few days the shards in our ES cluster get unassigned because of circuit breaking exception, I am not able to understand the exact reason which leads to this exception, any help would be really helpful.
This is the detailed info that I get using command GET _cluster/allocation/explain
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2021-02-12T08:05:40.154Z",
"failed_allocation_attempts" : 1,
"details" : "failed shard on node [WbHklo7iSf6jGj90cP9Y-A]: failed to perform indices:data/write/bulk[s] on replica [segment_index_573179789d2572f27bc73e6b][6], node[WbHklo7iSf6jGj90cP9Y-A], [R], s[STARTED], a[id=SNoWjhhYRXClfVqa6lsDAQ], failure RemoteTransportException[[ip-1-0-104-220][1.0.104.220:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [14311120802/13.3gb], which is larger than the limit of [14247644364/13.2gb], real usage: [14311004440/13.3gb], new bytes reserved: [116362/113.6kb], usages [request=0/0b, fielddata=808714408/771.2mb, in_flight_requests=15836072/15.1mb, accounting=683535018/651.8mb]]; ",
"last_allocation_status" : "no_attempt"
}
Your shard allocation explain API is providing all the details you need to resolve the issue
indices:data/write/bulk is causing the circuit breaker.
WbHklo7iSf6jGj90cP9Y-A node on which issue is happening.
its happening on replica shard segment_index_573179789d2572f27bc73e6b
below is the actual limit and data which you are sending which is causing data circuit breaker.
Parent circuit breaker is happening on your node, as mentioned in the log line, I have highlighted the parent and actual limit is 13.2 GB but in your bulk request you are sending 13.3 GB data.
CircuitBreakingException[[parent] Data too large, data for
[<transport_request>] would be [14311120802/13.3gb], which is larger
than the limit of [14247644364/13.2gb], real usage:
[14311004440/13.3gb], new bytes reserved: [116362/113.6kb]
Solution
You can reduce the size of your bulk request to avoid this circuit breaker
Can you provide the result of cluster health?
curl -XGET "http://localhost:9200/_cluster/health?pretty"
Can you provide the results of JVM pressure metric?
As there is not much information, below I explain what happened to me in a similar case of circuit breaking, and what I did in my case.
I am confident that the bulk size is small, the sum of several things(search + bulk + assigment process of new shards) causes the RAM to fill up.
In the cases that I know, "Data too large", refers to the JVM heap. Perhaps, you have reached the memory limit, trigging the circuit breaker. I dare to think(with a high probability of being wrong :/ ) that your ram per node is proportional to the % of the configured circuit breaker.
For example, if you have a circuit breaker set to 80% and it shows an error at 13.3 GB, you surely have 16 GB of RAM (100%) per node.
Depending of your circuit breaker configuration, as a temp workaround you can increses the number temporarily. If your circuit breaker is already high, it is not advisable to increase it.
Maybe your bulk its creating multiples indices, multiples indices creates multiples shards. Too many shards fill the memory, because each shards create memory, filedescriptor uses memory. There have been cases where the logic / pattern of creating new indices is incorrect, and they generate many new indices (creating many shards) inadvertently instead of creating just one index.
Maybe you were already at the limits and you did not know it, until now that you created new indexes (therefore new shards), and the problem arose.
How many shards do you have? curl -XGET "http://localhost:9200/_cat/shards" | wc -l
How many shards support each of your nodes?
When you reach the limit offset with shard your can reduce the number of replica shards, as a temp workaround. In order to complete the bulk.
An option that works for me was disabling temporary shards replica for less important indices.
Check your shards: curl -XGET "http://localhost:9200/_cat/shards"
curl -X "PUT" "http://localhost:9200/mylessimportant_indices/_settings" \
-H "Content-type: application/json" \
-d $'{
"index" : {
"number_of_replicas":0
}
}' | jq

Elasticsearch: QueueResizingEsThreadPoolExecutor exception

At some point during 35k endurance load test of my java web app which fetches static data from Elasticsearch, I am start getting following elasticsearch exception:
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable#1a25fe82 on QueueResizingEsThreadPoolExecutor[name = search4/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 10.7ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor#6312a0bb[Running, pool size = 25, active threads = 25, queued tasks = 1000, completed tasks = 34575035]]
Elasticsearch details:
Elasticsearch version 6.2.4.
The cluster consists of 5 nodes. The JVM heap size setting for each node is Xms16g and Xmx16g. Each of the node machine has 16 processors.
NOTE: Initially, when I got this exception for the first time, I decided to increase the thread_pool.search.queue_size parameter in elasticsearch.yml, set it to 10000. Yes, I understand, I just postponed the problem to happen later.
Elasticsearch indicies details:
Currently, there are about 20 indicies, and only 6 are being used among of them. The unused one are old indicies that were not deleted after newer were created. The indexes itself are really small:
Index within the red rectangle is the index used by my web app. It's shards and replicas settings are "number_of_shards": "5" and "number_of_replicas": "2" respectively.
It's shards details:
In this article I found that
Small shards result in small segments, which increases overhead. Aim
to keep the average shard size between at least a few GB and a few
tens of GB. For use-cases with time-based data, it is common to see
shards between 20GB and 40GB in size.
As you can see from the screenshot above, my shard size is much less than mentioned size.
Hence, Q: what is the right number of shards in my case? Is it 1 or 2?
The index won't grow up much over the time.
ES Queries issued during the test. The load tests simulates scenario where user navigates to the page for searching some products. User can filter the products using corresponding filters (for e.g. name, city, etc...). The unique filter values is fetched from ES index using composite query. So this is the first query type. Another query is for fetching products from ES. It consists of must, must_not, filter, has_child queries, the size attribute equals 100.
I set the slow search logging, but nothing had been logged:
"index": {
"search": {
"slowlog": {
"level": "info",
"threshold": {
"fetch": {
"debug": "500ms",
"info": "500ms"
},
"query": {
"debug": "2s",
"info": "1s"
}
}
}
}
I feel like I am missing something simple to make it finally and being able to handle my load. Appreciate, if any one can help me with solving the issue.
For such a small size, you are using 5 primary shards, which I feel, due to your ES version 6.X(default was 5), and you never changed it, but In short having high number of primary shards for small index, has severe performance penalty, please refer very similar use-case(I was also having 5 PS 😀) which I covered in my blog.
As you already mentioned that your index size will not grow significantly in future, I would suggest to have 1 primary shard and 4 replica shards
1 Primary shard means for a single search, only one thread and one request will be created in Elasticsearch, this will provide better utilisation of resources.
As you have 5 data nodes, having 4 replica means shards are properly distributed on each data node, so your throughput and performance will be optimal.
After this change, measure the performance, and I am sure after this, you can again reduce the search queue size to 1k, as you know having high queue size is just delaying the problem and not addressing the issue at hand.
Coming to your search slow log, I feel you have very high threshold, for query phase 1 seconds for a query is really high for user-facing application, try to lower it ~100ms and not down those queries and optimize them further.

Elastic 6.1 replication speed capped?

I'm playing with Elastic 6.1.1 and testing the limit of the software.
If I take an index of ~ 300GB with 0 replicas and 10 data nodes, and then decide to add a replica, all Elastic instances are massively using network (but not CPU). This is a normal behaviour :)
But it appears network usage is somewhat "capped" - considering network graphs - to 160Mbps (20MiB/sec). This limit is strange as it was default throttle limit on previous versions of Elastic (indices.store.throttle.max_bytes_per_sec), but this variable was deleted starting with Elastic 2.X
I wonder what is this cap, and how I could remove it.
I tried raising index.merge.scheduler.max_thread_count with no effect ...
Do you see any other tuning that can be done in that end ?
Any feedback welcome !
You have this - https://www.elastic.co/guide/en/elasticsearch/reference/6.1/recovery.html - which limits the transfer rate of anything related to copying shard from node to node. You can start playing with it by increasing it gradually and see what the impact is on the cluster performance.
Also, you have https://www.elastic.co/guide/en/elasticsearch/reference/6.1/shards-allocation.html#_shard_allocation_settings which also affects the traffic between nodes when copying shards.

Elasticsearch: High CPU usage by Lucene Merge Thread

I have a ES 2.4.1 cluster with 3 master and 18 data nodes which collects log data with a new index being created every day. In a day index size grows to about 2TB. Indexes older than 7 days get deleted. Very few searches are being performed on the cluster, so the main goal is to increase indexing throughput.
I see a lot of the following exceptions which is another symptom of what I am going to say next:
EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4#5a7d8a24 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor#5f9ef44f[Running, pool size = 8, active threads = 8, queued tasks = 50, completed tasks = 68888704]]];]];
The nodes in the cluster are constantly pegging CPU. I increased index refresh interval to 30s but that had little effect. When I check hot threads I see multiple "Lucene Merge Thread" per node using 100% CPU. I also noticed that segment count is constantly around 1000 per shard, which seems like a lot. The following is an example of a segment stat:
"_2zo5": {
"generation": 139541,
"num_docs": 5206661,
"deleted_docs": 123023,
"size_in_bytes": 5423948035,
"memory_in_bytes": 7393758,
"committed": true,
"search": true,
"version": "5.5.2",
"compound": false
}
Extremely high "generation" number worries me and I'd like to optimize segment creation and merge to reduce CPU load on the nodes.
Details about indexing and cluster configuration:
Each node is an i2.2xl AWS instance with 8 CPU cores and 1.6T SSD drives
Documents are indexed constantly by 6 client threads with bulk size 1000
Each index has 30 shards with 1 replica
It takes about 25 sec per batch of 1000 documents
/_cat/thread_pool?h=bulk*&v shows that bulk.completed are equally spread out across nodes
Index buffer size and transaction durability are left at default
_all is disabled, but dynamic mappings are enabled
The number of merge threads is left at default, which should be OK given that I am using SSDs
What's the best way to go about it?
Thanks!
Here are the optimizations I made to the cluster to increase indexing throughput:
Increased threadpool.bulk.queue_size to 500 because index requests were frequently overloading the queues
Increased disk watermarks, because default settings were too aggressive for the large SSDs that we were using. I set "cluster.routing.allocation.disk.watermark.low": "100gb" and "cluster.routing.allocation.disk.watermark.high": "10gb"
Deleted unused indexes to free up resources ES uses to manage their shards
Increased number of primary shards to 175 with the goal of keeping shard size under 50GB and have approximately a shard per processor
Set client index batch size to 10MB, which seemed to work very well for us because the size of documents indexed varied drastically (from KBs to MBs)
Hope this helps others
I have run similar workloads and your best bet is to run hourly indices and run optimize on older indices to keep segments in check.

Resources