We have a cluster consisting of 3 masters (4 core, 16 GB RAM each), 3 hot(8 core, 32 GB RAM, 300 GB SSD each), and 3 warm nodes(8 core, 32GB RAM, 1.5TB HDD each).
We have one index for each month of year following the naming convention of voucher_YYYY_MMM(eg voucher_2021_JAN). and all these indexes have an alias voucher which acts as a read alias and our search query is directed towards this read alias.
Our index resides on the hot nodes for 32 days, and that is the period it will receive 99% of writes. Our estimate data is approximately 480 million docs in this index, it has 1 replica and 16 shards( we have taken 16 shards because eventually, our data will grow, right now we are thinking of shrinking down to 8 shards each shard with 30 GB of data, as per our mapping 2 million docs are taking 1GB of space).
After 32 days index will move to the warm nodes, currently, we have 450 million in our hot index and 1.8 billion documents collectively in our warm indexes. The total comes up to 2.25 billion docs.
Our doc contains customer id and some fields on which we are applying filters, they all are mapped as keyword types, we are using custom routing on customer id for improving our search speed.
our typical query looks like
GET voucher/_search?routing=1000636779&search_type=query_then_fetch
{
"from": 0,
"size": 20,
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
{
"term": {
"uId": {
"value": "1000636779",
"boost": 1
}
}
},
{
"terms": {
"isGift": [
"false"
]
}
}
]
}
}
}
},
"version": true,
"sort": [
{
"cdInf.crtdAt": {
"order": "desc"
}
}
]
}
We are using a constant score query because we don't want to score our documents and want to increase search speed.
We have 13 search threads on each of our hot and warm nodes and we are sending requests to our master node for indexing and searching.
we are sending 100 search requests per second and getting an average search response time of about 3.5 seconds, where max time is going up to 9 seconds.
I am not understanding what are we missing, why is our search performance so poor.
Thank you for the exhaustive explanations. Based on them here are a few points of improvement (in no particular order):
Never direct your search and index requests to the master nodes, they should never handle traffic. Send them to the data nodes directly, or better yet, to dedicated coordinating nodes.
As a direct consequence, the master(-eligible) nodes don't need 16GB of RAM, 2GB is more than sufficient, because they will not act as coordinating nodes anymore.
In case you have time ranges in your queries, you could leverage index sorting on the cdInf.crtdAt field. Faster searches at the cost of slower ingestion, but it only makes sense if your queries have a time constraint, otherwise not.
16 shards per index on 3 hot nodes is not a good sharding strategy, you should have a multiple of the number of nodes (3, 6, 9, etc) otherwise one of the nodes will have more shards, and hence, you might create hot spots. You can also add one more hot node, so each one has 4 shards. It's a typical example of oversharding. Since your indexes are rolled over each month, it's easy to just modify the number of primary shards in the index template as you see data growing.
It's a good idea to leverage routing in order to search less shards. It's not clear from the question how many indexes in total you have behind the voucher alias, but that would also be a good information to have in order to assess whether the sharding and size of search threads is appropriate. Based on the docs count you provide, it seems you have 1 hot index and 5 warm ones, so 6 indexes in total. So each search request with routing will search only 6 shards.
100 search requests per second and 13 search threads per node (the default for 8 cores) means that each second each node has to handle 7+ search requests, and since requests take approximately 3 seconds to return, you're building up a search queue, because the nodes might not be able to keep up.
Another feature to leverage in order to benefit from the filter caching is the preference query string parameter
Also part of the slowness comes from the fact that 80% of the data you're searching on is located on warm nodes with spinning disks, so depending on your use case, you might want to maybe split your search in two, i.e. one super fast search on the hot data and another slower search on the warm data.
Once your indexes get reallocated to the warm nodes (and if they don't get updated anymore), it might be a good idea to force merge them to a few segments (3 to 5) so that your searches have less segments to browse and also to decrease their size (i.e. remove deleted documents)
Related
We have a problem with scaling up Elasticsearch.
We have a simple structure with multiple indexes each index has 2 shards, 1 primary and 1 duplicate.
Around now we have ~ 3000 indexes which mean 6000 shards and we think that we will hit limits with shards soon. (currently running 2 nodes 32gb of ram, 4 cores top usage 65%, we are working on changing servers to 3 nodes to have 1 primary shard and 2 duplicates)
Refresh interval is set to 60s.
Some indexes have 200 documents, others have 10 mil documents most of them are less than 200k.
The total amount of documents is: 40 mil. (amount of documents can increase fast)
Our search requests are hitting multiple indexes at the same time (we might be searching in 50 indexes or 500 indexes, we might need in the future to be able to search in all indexes).
Searching need to be fast.
Currently, we are daily synchronizing all documents by bulk in chunks of 5000 of documents ~ 7mb because from tests that is working best ~ 2,3 seconds per request of 5000 ~ 7mb, done by 10 async workers.
We sometimes hit the same index with workers at same time and request with bulk is taking longer extending even to ~ 12,5 sec per request of 5000 ~ 7mb.
Current synchronization process takes about ~1hour / 40 mil of documents.
Documents are stored by uuids (we are using them to get direct hits of documents from elasticsearch), documents values can be modified daily, sometimes we only change synchronization_hash which determins which documents were changed, after all the synchronization process we run a delete on documents with old synchronization_hash.
Other thing is that we think that our data architecture is broken, we have x clients ~ 300 number can increase, each client is assigned to be only allowed to search in y indexes (from 50 to 500), indexes for multiple clients can repeat in search (client x has 50 indexes, client y has 70 indexes, client y,x most often clients need to have access to same documents in indexes, amount of clients can increase) that's why we store data in separated indexes so we don't have to update all indexes where this data is stored.
To increase a speed of indexing we are thinking even moving to 4 nodes (with each index 2 primary, 2 duplicates), or moving to 4 nodes (with each index only having 2, 1 primary, 1 duplicate), but we need to test things out to figure out what would work for us the best. We might be needing to double amount of documents in next few months.
What do you think can be changed to increase a indexing speed without reducing an search speed?
What can be changed in our data architecture?
Is there any other way that our data should be organized to allow us for fast searching and faster indexing?
I tried many sizes of chunks in synchronization / didn't try to change the architecture.
We are trying to achive increased indexing speed without reducing search speed.
At some point during 35k endurance load test of my java web app which fetches static data from Elasticsearch, I am start getting following elasticsearch exception:
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable#1a25fe82 on QueueResizingEsThreadPoolExecutor[name = search4/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 10.7ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor#6312a0bb[Running, pool size = 25, active threads = 25, queued tasks = 1000, completed tasks = 34575035]]
Elasticsearch details:
Elasticsearch version 6.2.4.
The cluster consists of 5 nodes. The JVM heap size setting for each node is Xms16g and Xmx16g. Each of the node machine has 16 processors.
NOTE: Initially, when I got this exception for the first time, I decided to increase the thread_pool.search.queue_size parameter in elasticsearch.yml, set it to 10000. Yes, I understand, I just postponed the problem to happen later.
Elasticsearch indicies details:
Currently, there are about 20 indicies, and only 6 are being used among of them. The unused one are old indicies that were not deleted after newer were created. The indexes itself are really small:
Index within the red rectangle is the index used by my web app. It's shards and replicas settings are "number_of_shards": "5" and "number_of_replicas": "2" respectively.
It's shards details:
In this article I found that
Small shards result in small segments, which increases overhead. Aim
to keep the average shard size between at least a few GB and a few
tens of GB. For use-cases with time-based data, it is common to see
shards between 20GB and 40GB in size.
As you can see from the screenshot above, my shard size is much less than mentioned size.
Hence, Q: what is the right number of shards in my case? Is it 1 or 2?
The index won't grow up much over the time.
ES Queries issued during the test. The load tests simulates scenario where user navigates to the page for searching some products. User can filter the products using corresponding filters (for e.g. name, city, etc...). The unique filter values is fetched from ES index using composite query. So this is the first query type. Another query is for fetching products from ES. It consists of must, must_not, filter, has_child queries, the size attribute equals 100.
I set the slow search logging, but nothing had been logged:
"index": {
"search": {
"slowlog": {
"level": "info",
"threshold": {
"fetch": {
"debug": "500ms",
"info": "500ms"
},
"query": {
"debug": "2s",
"info": "1s"
}
}
}
}
I feel like I am missing something simple to make it finally and being able to handle my load. Appreciate, if any one can help me with solving the issue.
For such a small size, you are using 5 primary shards, which I feel, due to your ES version 6.X(default was 5), and you never changed it, but In short having high number of primary shards for small index, has severe performance penalty, please refer very similar use-case(I was also having 5 PS 😀) which I covered in my blog.
As you already mentioned that your index size will not grow significantly in future, I would suggest to have 1 primary shard and 4 replica shards
1 Primary shard means for a single search, only one thread and one request will be created in Elasticsearch, this will provide better utilisation of resources.
As you have 5 data nodes, having 4 replica means shards are properly distributed on each data node, so your throughput and performance will be optimal.
After this change, measure the performance, and I am sure after this, you can again reduce the search queue size to 1k, as you know having high queue size is just delaying the problem and not addressing the issue at hand.
Coming to your search slow log, I feel you have very high threshold, for query phase 1 seconds for a query is really high for user-facing application, try to lower it ~100ms and not down those queries and optimize them further.
I'm indexing about 40M documents into Elasticsearch. It's usually a one off data load and then we do run queries on top. There's no further updates to the index itself. However the default settings of Elasticsearch isn't getting me the throughput I expected.
So in long list of things to tune and verify, I was wondering whether ordering by a business key would help improve the search throughput. All our analysis queries use this key and it is indexed as a keyword already and we do a filter on it like below,
{
"query" : {
"bool" : {
"must" : {
"multi_match" : {
"type": "cross_fields",
"query":"store related query",
"minimum_should_match": "30%",
"fields": [ "field1^5", "field2^5", "field3^3", "field4^3", "firstLine", "field5", "field6", "field7"]
}
},
"filter": {
"term": {
"businessKey": "storename"
}
}
}
}
}
This query is run in a bulk fashion about 20M times in a matter of few hours. Currently I cannot go past 21k/min. But that could be because of various factors. Any tips to improve performance for this sort of work flow (load once and search a lot) would be appreciated.
However I'm particularly interested to know if I could order the data first by business key when I'm indexing so that data for that businessKey lives within one single Lucene segment and hence the lookup would be quicker. Is that line of thoughts correct? Is this something ES already does given that it's keyword term?
It's a very good performance optimization use-case and as you already mentioned there will be a list of performance optimization which you need to do.
I can see, you are already building the query correctly that is that filtering the records based on businessKey and than search on remaining docs, this way you are already utilizing the filter-cache of elasticsearch.
As you have huge number of documents ~40M docs, it doesn't make sense to put all of them in single segments, default max size of segment is 5 GB and beyond that
merge process will be blocked on segments, hence its almost impossible for you to have just 1 segment for your data.
I think couple of things which you can do is:
Disable refresh interval when you are done with ingesting your data and preparing index for search.
As you are using the filters, your request_cache should be used and you should monitor the cache usage when you are querying and monitor how many times results are coming from cache.
GET your-index/_stats/request_cache?human
Read throughput is more when you have more replicas and if you have nodes in your elasticsearch cluster makes sure these nodes, have replicas of your ES index.
Monitor the search queues on each nodes and make sure its not getting exhausted otherwise you will not be able to increase the throughput, refer threadpools in ES for more info
You main issue is around throughput and you want to go beyond current limit of 21k/min, so it requires a lot of index and cluster configuration optimization as well and I have written short tips to improve search performance please refer them and let me know how it goes.
Indexing is faster with more shards(& segments) and Querying is faster with less shards (& segments).
If you are doing a one-time load you can do a force merge of segments post indexing and also think about adding more replicas post indexing to distribute search and increase throughput.
If each query is going to be specific to a business key, segregating data while indexing and creating one index per documents related to a business key might help. And similarly while querying direct the query to specific index pertaining to the business key being queried.
Going through below links might help
Tune for Search
Tune for Indexing
Optimise Search
I have more than 4000 different fields in one of my index. And that number can grow larger with time.
As Elasticsearch give default limit of 1000 field per index. There must be some reason.
Now, I am thinking that I should not increase the limit set by Elasticsearch.
So I should break my single large index into small multiple indexes.
Before moving to multiple indexes I have few questions as follows:
The number of small multiple indexes can increase up to 50. So searching on all 50 index at a time would slow down search time as compared to a search on the single large index?
Is there really a need to break my single large index into multiple indexes because of a large number of fields?
When I use small multiple indexes, the total number of shards would increase drastically(more than 250 shards). Each index would have 5 shards(default number, which I don't want to change). Search on these multiple indexes would be searching on these 250 shards at once. Will this affect my search performance? Note: These shards might increase in time as well.
When I use Single large index which contains only 5 shards and a large number of documents, won't this be an overload on these 5 shards?
It strongly depends on your infrastructure. If you run a single node with 50 Shards a query will run longer than it would with only 1 Shard. If you have 50 Nodes holding one shard each, it will most likely run faster than one node with 1 Shard (if you have a big dataset). In the end, you have to test with real data to be sure.
When there is a massive amount of fields, ES gets a performance problem and errors are more likely. The main problem is that every field has to be stored in the cluster state, which takes a toll on your master node(s). Also, in a lot of cases you have to work with lots of sparse data (90% of fields empty).
As a rule of thumb, one shard should contain between 30 GB and 50 GB of data. I would not worry too much about overloading shards in your use-case. The opposite is true.
I suggest testing your use-case with less shards, go down to 1 Shard, 1 Replica for your index. The overhead from searching multiple Shards (5 primary, multiply by replicas) then combining the results again is massive in comparison to your small dataset.
Keep in mind that document_type behaviour changed and will change further. Since 6.X you can only have one document_type per Index, starting in 7.X document_type is removed entirely. As the API listens at _doc, _doc is the suggested document_type to use in 6.X. Either move to one Index per _type or introduce a new field that stores your type if you need the data in one index.
I have a ES 2.4.1 cluster with 3 master and 18 data nodes which collects log data with a new index being created every day. In a day index size grows to about 2TB. Indexes older than 7 days get deleted. Very few searches are being performed on the cluster, so the main goal is to increase indexing throughput.
I see a lot of the following exceptions which is another symptom of what I am going to say next:
EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4#5a7d8a24 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor#5f9ef44f[Running, pool size = 8, active threads = 8, queued tasks = 50, completed tasks = 68888704]]];]];
The nodes in the cluster are constantly pegging CPU. I increased index refresh interval to 30s but that had little effect. When I check hot threads I see multiple "Lucene Merge Thread" per node using 100% CPU. I also noticed that segment count is constantly around 1000 per shard, which seems like a lot. The following is an example of a segment stat:
"_2zo5": {
"generation": 139541,
"num_docs": 5206661,
"deleted_docs": 123023,
"size_in_bytes": 5423948035,
"memory_in_bytes": 7393758,
"committed": true,
"search": true,
"version": "5.5.2",
"compound": false
}
Extremely high "generation" number worries me and I'd like to optimize segment creation and merge to reduce CPU load on the nodes.
Details about indexing and cluster configuration:
Each node is an i2.2xl AWS instance with 8 CPU cores and 1.6T SSD drives
Documents are indexed constantly by 6 client threads with bulk size 1000
Each index has 30 shards with 1 replica
It takes about 25 sec per batch of 1000 documents
/_cat/thread_pool?h=bulk*&v shows that bulk.completed are equally spread out across nodes
Index buffer size and transaction durability are left at default
_all is disabled, but dynamic mappings are enabled
The number of merge threads is left at default, which should be OK given that I am using SSDs
What's the best way to go about it?
Thanks!
Here are the optimizations I made to the cluster to increase indexing throughput:
Increased threadpool.bulk.queue_size to 500 because index requests were frequently overloading the queues
Increased disk watermarks, because default settings were too aggressive for the large SSDs that we were using. I set "cluster.routing.allocation.disk.watermark.low": "100gb" and "cluster.routing.allocation.disk.watermark.high": "10gb"
Deleted unused indexes to free up resources ES uses to manage their shards
Increased number of primary shards to 175 with the goal of keeping shard size under 50GB and have approximately a shard per processor
Set client index batch size to 10MB, which seemed to work very well for us because the size of documents indexed varied drastically (from KBs to MBs)
Hope this helps others
I have run similar workloads and your best bet is to run hourly indices and run optimize on older indices to keep segments in check.