Elasticsearch: High CPU usage by Lucene Merge Thread - elasticsearch

I have a ES 2.4.1 cluster with 3 master and 18 data nodes which collects log data with a new index being created every day. In a day index size grows to about 2TB. Indexes older than 7 days get deleted. Very few searches are being performed on the cluster, so the main goal is to increase indexing throughput.
I see a lot of the following exceptions which is another symptom of what I am going to say next:
EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4#5a7d8a24 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor#5f9ef44f[Running, pool size = 8, active threads = 8, queued tasks = 50, completed tasks = 68888704]]];]];
The nodes in the cluster are constantly pegging CPU. I increased index refresh interval to 30s but that had little effect. When I check hot threads I see multiple "Lucene Merge Thread" per node using 100% CPU. I also noticed that segment count is constantly around 1000 per shard, which seems like a lot. The following is an example of a segment stat:
"_2zo5": {
"generation": 139541,
"num_docs": 5206661,
"deleted_docs": 123023,
"size_in_bytes": 5423948035,
"memory_in_bytes": 7393758,
"committed": true,
"search": true,
"version": "5.5.2",
"compound": false
}
Extremely high "generation" number worries me and I'd like to optimize segment creation and merge to reduce CPU load on the nodes.
Details about indexing and cluster configuration:
Each node is an i2.2xl AWS instance with 8 CPU cores and 1.6T SSD drives
Documents are indexed constantly by 6 client threads with bulk size 1000
Each index has 30 shards with 1 replica
It takes about 25 sec per batch of 1000 documents
/_cat/thread_pool?h=bulk*&v shows that bulk.completed are equally spread out across nodes
Index buffer size and transaction durability are left at default
_all is disabled, but dynamic mappings are enabled
The number of merge threads is left at default, which should be OK given that I am using SSDs
What's the best way to go about it?
Thanks!

Here are the optimizations I made to the cluster to increase indexing throughput:
Increased threadpool.bulk.queue_size to 500 because index requests were frequently overloading the queues
Increased disk watermarks, because default settings were too aggressive for the large SSDs that we were using. I set "cluster.routing.allocation.disk.watermark.low": "100gb" and "cluster.routing.allocation.disk.watermark.high": "10gb"
Deleted unused indexes to free up resources ES uses to manage their shards
Increased number of primary shards to 175 with the goal of keeping shard size under 50GB and have approximately a shard per processor
Set client index batch size to 10MB, which seemed to work very well for us because the size of documents indexed varied drastically (from KBs to MBs)
Hope this helps others

I have run similar workloads and your best bet is to run hourly indices and run optimize on older indices to keep segments in check.

Related

optimization on old indexes collecting logs from my apps

I have an elastic cluster with 3x nodes(each 6x cpu, 31GB heap , 64GB RAM) collecting 25GB logs per day , but after 3x months I realized my dashboards become very slow when checking stats in past weeks , please, advice if there is an option to improve the indexes read erformance so it become faster when calculating my dashboard stats?
Thanks!
I would suggest you try to increase the shards number
when you have more shards Elasticsearch will split your data over the shards so as a result, Elastic will send multiple parallel requests to search in a smaller data stack
for Shards number you could try to split it based on your heap memory size
No matter what actual JVM heap size you have, the upper bound on the maximum shard count should be 20 shards per 1 GB of heap configured on the server.
ElasticSearch - Optimal number of Shards per node
https://qbox.io/blog/optimizing-elasticsearch-how-many-shards-per-index
https://opster.com/elasticsearch-glossary/elasticsearch-choose-number-of-shards/
It seems that the amount of data that you accumulated and use for your dashboard is causing performance problems.
A straightforward option is to increase your cluster's resources but then you're bound to hit the same problem again. So you should rather rethink your data retention policy.
Chances are that you are really only interested in most recent data. You need to answer the question what "recent" means in your use case and simply discard anything older than that.
Elasticsearch has tools to automate this, look into Index Lifecycle Management.
What you probably need is to create an index template and apply a lifecycle policy to it. Elasticsearch will then handle automatic rollover of indices, eviction of old data, even migration through data tiers in hot-warm-cold architecture if you really want very long retention periods.
All this will lead to a more predictable performance of your cluster.

Elasticsearch: QueueResizingEsThreadPoolExecutor exception

At some point during 35k endurance load test of my java web app which fetches static data from Elasticsearch, I am start getting following elasticsearch exception:
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable#1a25fe82 on QueueResizingEsThreadPoolExecutor[name = search4/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 10.7ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor#6312a0bb[Running, pool size = 25, active threads = 25, queued tasks = 1000, completed tasks = 34575035]]
Elasticsearch details:
Elasticsearch version 6.2.4.
The cluster consists of 5 nodes. The JVM heap size setting for each node is Xms16g and Xmx16g. Each of the node machine has 16 processors.
NOTE: Initially, when I got this exception for the first time, I decided to increase the thread_pool.search.queue_size parameter in elasticsearch.yml, set it to 10000. Yes, I understand, I just postponed the problem to happen later.
Elasticsearch indicies details:
Currently, there are about 20 indicies, and only 6 are being used among of them. The unused one are old indicies that were not deleted after newer were created. The indexes itself are really small:
Index within the red rectangle is the index used by my web app. It's shards and replicas settings are "number_of_shards": "5" and "number_of_replicas": "2" respectively.
It's shards details:
In this article I found that
Small shards result in small segments, which increases overhead. Aim
to keep the average shard size between at least a few GB and a few
tens of GB. For use-cases with time-based data, it is common to see
shards between 20GB and 40GB in size.
As you can see from the screenshot above, my shard size is much less than mentioned size.
Hence, Q: what is the right number of shards in my case? Is it 1 or 2?
The index won't grow up much over the time.
ES Queries issued during the test. The load tests simulates scenario where user navigates to the page for searching some products. User can filter the products using corresponding filters (for e.g. name, city, etc...). The unique filter values is fetched from ES index using composite query. So this is the first query type. Another query is for fetching products from ES. It consists of must, must_not, filter, has_child queries, the size attribute equals 100.
I set the slow search logging, but nothing had been logged:
"index": {
"search": {
"slowlog": {
"level": "info",
"threshold": {
"fetch": {
"debug": "500ms",
"info": "500ms"
},
"query": {
"debug": "2s",
"info": "1s"
}
}
}
}
I feel like I am missing something simple to make it finally and being able to handle my load. Appreciate, if any one can help me with solving the issue.
For such a small size, you are using 5 primary shards, which I feel, due to your ES version 6.X(default was 5), and you never changed it, but In short having high number of primary shards for small index, has severe performance penalty, please refer very similar use-case(I was also having 5 PS 😀) which I covered in my blog.
As you already mentioned that your index size will not grow significantly in future, I would suggest to have 1 primary shard and 4 replica shards
1 Primary shard means for a single search, only one thread and one request will be created in Elasticsearch, this will provide better utilisation of resources.
As you have 5 data nodes, having 4 replica means shards are properly distributed on each data node, so your throughput and performance will be optimal.
After this change, measure the performance, and I am sure after this, you can again reduce the search queue size to 1k, as you know having high queue size is just delaying the problem and not addressing the issue at hand.
Coming to your search slow log, I feel you have very high threshold, for query phase 1 seconds for a query is really high for user-facing application, try to lower it ~100ms and not down those queries and optimize them further.

Creating high throughput Elasticsearch cluster

We are in process of implementing Elasticsearch as a search solution in our organization. For the POC we implemented a 3-Node cluster ( each node with 16 VCores and 60 GB RAM and 6 * 375GB SSDs) with all the nodes acting as master, data and co-ordination node. As it was a POC indexing speeds were not a consideration we were just trying to see if it will work or not.
Note : We did try to index 20 million documents on our POC cluster and it took about 23-24 hours to do that which is pushing us to take time and design the production cluster with proper sizing and settings.
Now we are trying to implement a production cluster (in Google Cloud Platform) with emphasis on both indexing speed and search speed.
Our use case is as follows :
We will bulk index 7 million to 20 million documents per index ( we have 1 index for each client and there will be only one cluster). This bulk index is a weekly process i.e. we'll index all data once and will query it for whole week before refreshing it.We are aiming for a 0.5 million document per second indexing throughput.
We are also looking for a strategy to horizontally scale when we add more clients. I have mentioned the strategy in subsequent sections.
Our data model has nested document structure and lot of queries on nested documents which according to me are CPU, Memory and IO intensive. We are aiming for sub second query times for 95th percentile of queries.
I have done quite a bit of reading around this forum and other blogs where companies have high performing Elasticsearch clusters running successfully.
Following are my learnings :
Have dedicated master nodes (always odd number to avoid split-brain). These machines can be medium sized ( 16 vCores and 60 Gigs ram) .
Give 50% of RAM to ES Heap with an exception of not exceeding heap size above 31 GB to avoid 32 bit pointers. We are planning to set it to 28GB on each node.
Data nodes are the workhorses of the cluster hence have to be high on CPUs, RAM and IO. We are planning to have (64 VCores, 240 Gb RAM and 6 * 375 GB SSDs).
Have co-ordination nodes as well to take bulk index and search requests.
Now we are planning to begin with following configuration:
3 Masters - 16Vcores, 60GB RAM and 1 X 375 GB SSD
3 Cordinators - 64Vcores, 60GB RAM and 1 X 375 GB SSD (Compute Intensive machines)
6 Data Nodes - 64 VCores, 240 Gb RAM and 6 * 375 GB SSDs
We have a plan to adding 1 Data Node for each incoming client.
Now since hardware is out of windows, lets focus on indexing strategy.
Few best practices that I've collated are as follows :
Lower number of shards per node is good of most number of scenarios, but have good data distribution across all the nodes for a load balanced situation. Since we are planning to have 6 data nodes to start with, I'm inclined to have 6 shards for the first client to utilize the cluster fully.
Have 1 replication to survive loss of nodes.
Next is bulk indexing process. We have a full fledged spark installation and are going to use elasticsearch-hadoop connector to push data from Spark to our cluster.
During indexing we set the refresh_interval to 1m to make sure that there are less frequent refreshes.
We are using 100 parallel Spark tasks which each task sending 2MB data for bulk request. So at a time there is 2 * 100 = 200 MB of bulk requests which I believe is well within what ES can handle. We can definitely alter these settings based on feedback or trial and error.
I've read more about setting cache percentage, thread pool size and queue size settings, but we are planning to keep them to smart defaults for beginning.
We are open to use both Concurrent CMS or G1GC algorithms for GC but would need advice on this. I've read pros and cons for using both and in dilemma in which one to use.
Now to my actual questions :
Is sending bulk indexing requests to coordinator node a good design choice or should we send it directly to data nodes ?
We will be sending query requests via coordinator nodes. Now my question is, lets say since my data node has 64 cores, each node has thread pool size of 64 and 200 queue size. Lets assume that during search data node thread pool and queue size is completely exhausted then will the coordinator nodes keep accepting and buffering search requests at their end till their queue also fill up ? Or will 1 thread on coordinator will also be blocked per each query request ?
Say a search request come up to coordinator node it blocks 1 thread there and send request to data nodes which in turn blocks threads on data nodes as per where query data is lying. Is this assumption correct ?
While bulk indexing is going on ( assuming that we do not run indexing for all the clients in parallel and schedule them to be sequential) how to best design to make sure that query times do not take much hit during this bulk index.
References
https://thoughts.t37.net/designing-the-perfect-elasticsearch-cluster-the-almost-definitive-guide-e614eabc1a87
https://thoughts.t37.net/how-we-reindexed-36-billions-documents-in-5-days-within-the-same-elasticsearch-cluster-cd9c054d1db8
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
We did try to index 20 million documents on our POC cluster and it took about 23-24 hours
That is surprisingly little — like less than 250 docs/s. I think my 8GB RAM laptop can insert 13 million docs in 2h. Either you have very complex documents, some bad settings, or your bottleneck is on the ingestion side.
About your nodes: I think you could easily get away with less memory on the master nodes (like 32GB should be plenty). Also the memory on data nodes is pretty high; I'd normally expect heap in relation to the rest of the memory to be 1:1 or for lots of "hot" data maybe 1:3. Not sure you'll get the most out of that 1:7.5 ratio.
CMS vs G1GC: If you have a current Elasticsearch and Java version, both are an option, otherwise CMS. You're generally trading throughput for (GC) latency, so if you benchmark be sure to have a long enough timeframe to properly hit GC phases and run as close to production queries in parallel as possible.
Is sending bulk indexing requests to coordinator node a good design choice or should we send it directly to data nodes ?
I'd say the coordinator is fine. Unless you use a custom routing key and the bulk only contains data for that specific data node, 5/6th of the documents would need to be forwarded to other data nodes anyway (if you have 6 data nodes). And you can offload the bulk processing and coordination handling to non data nodes.
However, overall it might make more sense to have 3 additional data nodes and skip the dedicated coordinating node. Though this is something you can only say for certain by benchmarking your specific scenario.
Now my question is, lets say since my data node has 64 cores, each node has thread pool size of 64 and 200 queue size. Lets assume that during search data node thread pool and queue size is completely exhausted then will the coordinator nodes keep accepting and buffering search requests at their end till their queue also fill up ? Or will 1 thread on coordinator will also be blocked per each query request ?
I'm not sure I understand the question. But have you looked into https://www.elastic.co/blog/why-am-i-seeing-bulk-rejections-in-my-elasticsearch-cluster, which might shed some more light on this topic?
While bulk indexing is going on ( assuming that we do not run indexing for all the clients in parallel and schedule them to be sequential) how to best design to make sure that query times do not take much hit during this bulk index.
While there are different queues for different query operations, there is otherwise no clear separation of tasks (like "only use 20% of the resources for indexing). Maybe go a little more conservative on the parallel bulk requests to avoid overloading the node.
If you are not reading from an index while it's being indexed (ideally you flip an alias once done): You might want to disable the refresh rate entirely and let Elasticsearch create segments as needed, but do a force refresh and change the setting once done. Also you could try running with 0 replicas while indexing, change replicas to 1 once done, and then wait for it to finish — though I'd benchmark if this is helping overall and if it's worth the added complexity.

Elasticsearch Indexing Speed Degrade with the Time

I am doing some performance tuning in elastic search for my project and I need some help in improving the elastic search indexing speed. I am using ES 5.1.1 and I have 2 nodes setup with 8 shards for the index. I have the servers for 2 nodes with 16GB RAM and 12CPUs allocated for each server with 2.2GHz clock speed. I need to index around 25,000,000 documents within 1.5 hours, which I am currently doing in around 4 hours. I have done the following config changes to improve the indexing time.
Setting ‘indices.store.throttle.type’ to ‘none’
Setting ‘refresh_interval’ to ‘-1’
Increasing ‘translog.flush_threshold_size’ to 1GB
Setting ‘number_of_replicas’ to ‘0’
Using 8 shards for the index
Setting VM Options -Xms8g -Xmx8g (Half of the RAM size)
I am using the bulk processor to generate the documents in my java application and I’m using the following configurations to setup the bulk processor.
Bulk Actions Count : 10000
Bulk Size in MB : 100
Concurrent Requests : 100
Flush Interval : 30
Initially I can index around 356167 in the first minute. But with the time, It decreases and after around 1 hour its around 121280 docs per minute.
How can I keep the indexing rate steady over the time? Is there any other ways to improve the performance?
I highly encourage not to change configuration parameters like the translog flush size, the throttling, unless you know what you are doing (and this does not mean reading some blog post on the internet :-)
Try a single shard per server and especially reduce the bulk size to something like 10MB. 100MB * 100 concurrent requests means you need 10GB of heap to deal with those (without doing anything else). I suppose not all of the documents get indexed because of your rejected tasks in your threadpools.
Start small and get bigger instead of starting big but not have any insight in your indexing.

Is there a limit on the number of indexes that can be created on Elastic Search?

I'm using AWS-provided Elastic Search.
I have a signup page on my website, and on each signup; a new index for the new user gets created (to be used later by his work-group), which means that the number of indexes is continuously growing, (now it reached around 4~5k).
My question is: is there a performance limit on the number of indexes? is it safe (performance-wise) to keep creating new indexes dynamically with each new user?
Note: I haven't used AWS-Elasticsearch, so this answer may vary because they have started using open-distro of Elsticsearch and have forked the main branch. But a lot of principles should be the same. Also, this question doesn't have a definitive answer and it depends on various factors but I hope this answer will help the thought process.
One of the factors is the number of shards and replicas per index as that will contribute to the total number of shards per node. Each shard consumes some memory, so you will have to keep the number of shards limited per node so that they don't exceed maximum recommended 30GB heap space. As per this comment 600 to 1000 should be reasonable and you can scale your cluster according to that.
Also, you have to monitor the number of file descriptors and make sure that doesn't create any bottleneck for nodes to operate.
HTH!
If I'm not mistaken, the only limit is the disk space of your server, but if your index is growing too fast you should think about having more replica servers. I recomend reading this page: Indexing Performance Tips
Indexes themselves have no limit, however shards do, the recommended amount of shards per GB of heap is 20(JVM heap - you can check on kibana stack monitoring tab), this means if you have 5GB of JVM heap, the recommended amount is 100.
Remember that 1 index can take from 1 to x number of shards (1 primary and x secondary), normally people have 1 primary and 1 secondary, if this is you case then you would be able to create 50 indexes with those 5GB of heap

Resources