How to improve percolator performance in ElasticSearch? - performance

Summary
We need to increase percolator performance (throughput).
Most likely approach is scaling out to multiple servers.
Questions
How to do scaling out right?
1) Would increasing number of shards in underlying index allow running more percolate requests in parallel?
2) How much memory does ElasticSearch server need if it does percolation only?
Is it better to have 2 servers with 4GB RAM or one server with 16GB RAM?
3) Would having SSD meaningfully help percolator's performance, or it is better to increase RAM and/or number of nodes?
Our current situation
We have 200,000 queries (job search alerts) in our job index.
We are able to run 4 parallel queues that call percolator.
Every query is able to percolate batch of 50 jobs in about 35 seconds, so we can percolate about:
4 queues * 50 jobs per batch / 35 seconds * 60 seconds in minute = 343
jobs per minute
We need more.
Our jobs index have 4 shards and we are using .percolator sitting on top of that jobs index.
Hardware: 2 processors server with 32 cores total. 32GB RAM.
We allocated 8GB RAM to ElasticSearch.
When percolator is working, 4 percolation queues I mentioned above consume about 50% of CPU.
When we tried to increase number of parallel percolation queues from 4 to 6, CPU utilization jumped to 75%+.
What is worse, percolator started to fail with NoShardAvailableActionException:
[2015-03-04 09:46:22,221][DEBUG][action.percolate ] [Cletus
Kasady] [jobs][3] Shard multi percolate failure
org.elasticsearch.action.NoShardAvailableActionException: [jobs][3]
null
That error seems to suggest that we should increase number of shards and eventually add dedicated ElasticSearch server (+ later increase number of nodes).
Related:
How to Optimize elasticsearch percolator index Memory Performance

Answers
How to do scaling out right?
Q: 1) Would increasing number of shards in underlying index allow running more percolate requests in parallel?
A: No. Sharding is only really useful when creating a cluster. Additional shards on a single instance may in fact worsen performance. In general the number of shards should equal the number of nodes for optimal performance.
Q: 2) How much memory does ElasticSearch server need if it does percolation only?
Is it better to have 2 servers with 4GB RAM or one server with 16GB RAM?
A: Percolator Indices reside entirely in memory so the answer is A LOT. It is entirely dependent on the size of your index. In my experience 200 000 searches would require a 50MB index. In memory this index would occupy around 500MB of heap memory. Therefore 4 GB RAM should be enough if this is all you're running. I would suggest more nodes in your case. However as the size of your index grows, you will need to add RAM.
Q: 3) Would having SSD meaningfully help percolator's performance, or it is better to increase RAM and/or number of nodes?
A: I doubt it. As I said before percolators reside in memory so disk performance isn't much of a bottleneck.
EDIT: Don't take my word on those memory estimates. Check out the site plugins on the main ES site. I found Big Desk particularly helpful for watching performance counters for scaling and planning purposes. This should give you more valuable info on estimating your specific requirements.
EDIT in response to comment from #DennisGorelik below:
I got those numbers purely from observation but on reflection they make sense.
200K Queries to 50MB on disk: This ratio means the average query occupies 250 bytes when serialized to disk.
50MB index to 500MB on heap: Rather than serialized objects on disk we are dealing with in memory Java objects. Think about deserializing XML (or any data format really) you generally get 10x larger in-memory objects.

Related

High CPU usage on elasticsearch nodes

we have been using a 3 node Elasticsearch(7.6v) cluster running in docker container. I have been experiencing very high cpu usage on 2 nodes(97%) and moderate CPU load on the other node(55%). Hardware used are m5 xlarge servers.
There are 5 indices with 6 shards and 1 replica. The update operations take around 10 seconds even for updating a single field. similar case is with delete. however querying is quite fast. Is this because of high CPU load?
2 out of 5 indices, continuously undergo a update and write operations as they listen from a kafka stream. size of the indices are 15GB, 2Gb and the rest are around 100MB.
You need to provide more information to find the root cause:
All the ES nodes are running on different docker containers on the same host or different host?
Do you have resource limit on your ES docker containers?
How much heap size of ES and is it 50% of host machine RAM?
Node which have high CPU, holds the 2 write heavy indices which you mentioned?
what is the refresh interval of your indices which receives high indexing requests.
what is the segment size of your 15 GB indices, use https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-segments.html to get this info.
What all you have debugged so far and is there is any interesting info you want to share to find the issue?

Elasticsearch config tweaking with limited memory

I have following scenario:
A single machine with 32GB of ram runs Elasticsearch 2.4, there is one index with 5 shards that is 25gb in size.
On that index we are constantly indexing new data, plus doing full-text search queries that check about 95% documents - no aggregations. The instance generates a lot of CPU load - there is no swapping.
My question is: how should I tweak elasticsearch memory usage? (I don't have an option to add another machine at this moment)
Should I assign more memory to ES HEAP like 25GB (going over 50% memory that readme advises to not do do), or should I assign minimal HEAP like 1GB-2GB and assume Lucene will cache all the index in memory since its full-text searches?
Right now 50% of server memory so 16GB in this case seems to work best for us.

Indexing multiple indexes in elastic search at the same time

I am using logstash for ETL purpose and have 3 indexes in the Elastic search .Can I insert documents into my 3 indexes through 3 different logtash processes at the same time to improve the parallelization or should I insert documents into 1 index at a time.
My elastic search cluster configuration looks like:
3 data nodes
1 client node
3 data nodes - 64 GB RAM, SSD Disk
1 client node - 8 GB RAM
Shards - 20 Shards
Replica - 1
Thanks
As always it depends. The distribution concept of Elasticsearch is based on shards. Since the shards of an index live on different nodes, you are automatically spreading the load.
However, if Logstash is your bottleneck, you might gain performance from running multiple processes. Though if running multiple LS process on a single machine will make a positive impact is doubtful.
Short answer: Parallelising over 3 indexes won't make much sense, but if Logstash is your bottleneck, it might make sense to run those in parallel (on different machines).
PS: The biggest performance improvement generally is batching requests together, but Logstash does that by default.

ElasticSearch - Optimal number of Shards per node

I would appreciate if someone could suggest the optimal number of shards per ES node for optimal performance or provide any recommended way to arrive at the number of shards one should use, given the number of cores and memory foot print.
I'm late to the party, but I just wanted to point out a couple of things:
The optimal number of shards per index is always 1. However, that provides no possibility of horizontal scale.
The optimal number of shards per node is always 1. However, then you cannot scale horizontally more than your current number of nodes.
The main point is that shards have an inherent cost to both indexing and querying. Each shard is actually a separate Lucene index. When you run a query, Elasticsearch must run that query against each shard, and then compile the individual shard results together to come up with a final result to send back. The benefit to sharding is that the index can be distributed across the nodes in a cluster for higher availability. In other words, it's a trade-off.
Finally, it should be noted that any more than 1 shard per node will introduce I/O considerations. Since each shard must be indexed and queried individually, a node with 2 or more shards would require 2 or more separate I/O operations, which can't be run at the same time. If you have SSDs on your nodes then the actual cost of this can be reduced, since all the I/O happens much quicker. Still, it's something to be aware of.
That, then, begs the question of why would you want to have more than one shard per node? The answer to that is planned scalability. The number of shards in an index is fixed. The only way to add more shards later is to recreate the index and reindex all the data. Depending on the size of your index that may or may not be a big deal. At the time of writing, Stack Overflow's index is 203GB (see: https://stackexchange.com/performance). That's kind of a big deal to recreate all that data, so resharding would be a nightmare. If you have 3 nodes and a total of 6 shards, that means that you can scale out to up to 6 nodes at a later point easily without resharding.
There are three condition you consider before sharding..
Situation 1) You want to use elasticsearch with failover and high availability. Then you go for sharding.
In this case, you need to select number of shards according to number of nodes[ES instance] you want to use in production.
Consider you wanna give 3 nodes in production. Then you need to choose 1 primary shard and 2 replicas for every index. If you choose more shards than you need.
Situation 2) Your current server will hold the current data. But due to dynamic data increase future you may end up with no space on disk or your server cannot handle much data means, then you need to configure more no of shards like 2 or 3 shards (its up to your requirements) for each index. But there shouldn't any replica.
Situation 3) In this situation you the combined situation of situation 1 & 2. then you need to combine both configuration. Consider your data increased dynamically and also you need high availability and failover. Then you configure a index with 2 shards and 1 replica. Then you can share data among nodes and get an optimal performance..!
Note: Then query will be processed in each shard and perform mapreduce on results from all shards and return the result to us. So the map reduce process is expensive process. Minimum shards gives us optimal performance
If you are using only one node in production then, only one primary shards is optimal no of shards for each index.
Hope it helps..!
Just got back from configuring some log storage for 10 TB so let's talk sharding :D
Node limitations
Main source: The definitive guide to elasticsearch
HEAP: 32 GB at most:
If the heap is less than 32 GB, the JVM can use compressed pointers, which saves a lot of memory: 4 bytes per pointer instead of 8 bytes.
HEAP: 50% of the server memory at most. The rest is left to filesystem caches (thus 64 GB servers are a common sweet spot):
Lucene makes good use of the filesystem caches, which are managed by the kernel. Without enough filesystem cache space, performance will suffer. Furthermore, the more memory dedicated to the heap means less available for all your other fields using doc values.
[An index split in] N shards can spread the load over N servers:
1 shard can use all the processing power from 1 node (it's like an independent index). Operations on sharded indices are run concurrently on all shards and the result is aggregated.
Less shards is better (the ideal is 1 shard):
The overhead of sharding is significant. See this benchmark for numbers https://blog.trifork.com/2014/01/07/elasticsearch-how-many-shards/
Less servers is better (the ideal is 1 server (with 1 shard)]):
The load on an index can only be split across nodes by sharding (A shard is enough to use all resources on a node). More shards allow to use more servers but more servers bring more overhead for data aggregation... There is no free lunch.
Configuration
Usage: A single big index
We put everything in a single big index and let elasticsearch do all the hard work relating to sharding data. There is no logic whatsoever in the application so it's easier to dev and maintain.
Let's suppose that we plan for the index to be at most 111 GB in the future and we've got 50 GB servers (25 GB heap) from our cloud provider.
That means we should have 5 shards.
Note: Most people tend to overestimate their growth, try to be realistic. For instance, this 111GB example is already a BIG index. For comparison the stackoverflow index is 430 GB (2016) and it's a top 50 site worldwide, made entirely of written texts by millions of people.
Usage: Index by time
When there're too much data for a single index or it's getting too annoying to manage, the next thing is to split the index by time period.
The most extreme example is logging applications (logstach and graylog) which are using a new index every day.
The ideal configuration of 1-single-shard-per-index makes perfect sense in scenario. The index rotation period can be adjusted, if necessary, to keep the index smaller than the heap.
Special case: Let's imagine a popular internet forum with monthly indices. 99% of requests are hitting the last index. We have to set multiple shards (e.g. 3) to spread the load over multiple nodes. (Note: It's probably unnecessary optimization. A 99% hitrate is unlikely in the real world and the shard replica could distribute part of the read-only load anyway).
Usage: Going Exascale (just for the record)
ElasticSearch is magic. It's the easiest database to setup in cluster and it's one of the very few able to scale to many nodes (excluding Spanner ).
It's possible to go exascale with hundreds of elasticsearch nodes. There must be many indices and shards to spread the load on that many machines and that takes an appropriate sharding configuration (eventually adjusted per index).
The final bit of magic is to tune elasticsearch routing to target specific nodes for specific operations.
It might be also a good idea to have more than one primary shard per node, depends on use case. I have found out that bulk indexing was pretty slow, only one CPU core was used - so we had idle CPU power and very low IO, definitely hardware was not a bottleneck. Thread pool stats shown, that during indexing only one bulk thread was active. We have a lot of analyzers and complex tokenizer (decomposed analysis of German words). Increasing number of shards per node has resulted in more bulk threads being active (one per shard on node) and it has dramatically improved speed of indexing.
Number of primary shards and replicas depend upon following parameters:
No of Data Nodes: The replica shards for the given primary shard meant to be present on different data nodes, which means if there are 3 data Nodes: DN1, DN2, DN3 then if primary shard is in DN1 then the replica shard should be present in DN2 and/or DN3. Hence no of replicas should be less than total no of Data Nodes.
Capacity of each of the Data Nodes: Size of the shard cannot be more than the size of the data nodes hard disk and hence depending upon the expected size for the given index, no of primary shards should be defined.
Recovering mechanism in case of failure: If the data on the given index has quick recovering mechanism then 1 replica should be enough.
Performance requirement from the given index: As sharding helps in directing the client node to appropriate shard to improve the performance and hence depending upon the query parameter and size of the data belonging to that query parameter should be considered in defining the no of primary shards.
These are the ideal and basic guidelines to be followed, it should be optimized depending upon the actual use cases.
I have not tested this yet, but aws has a good articale about ES best practises. Look at Choosing Instance Types and Testing part.
Elastic.co recommends to:
[…] keep the number of shards per node below 20 per GB heap it has configured

Do multiple Solr shards on a single machine improve performance?

Does running multiple Solr shards on a single machine improve performance? I would expect Lucene to be multi-threaded, but it doesn't seem to be using more than a single core on my server with 16 physical cores. I realize this is workload dependent, but any statistics or benchmarks would be very useful!
I ran some benchmarks of our search stack, and found that adding more Solr shards (on a single machine, with 16 physical cores) did improve performance up to about 8 shards (where I got a 6.5x speed up). This is on an index with ~1.5million documents, running complex range queries.
So, it seems that Solr doesn't take advantage of multiple physical cores, when running queries against a single index.
If you currently have a single box with a single shard, then splitting this shard into several shards:
is likely to worsen throughput,
may improve latency, by parallelizing query execution.
I can't provide you with statistics or benchmarks because it depends on whether query execution is CPU or I/O bound: if query execution is already I/O bound on a single box then splitting the shard into several shards will even worsen throughput. You will need to test yourself, just take a production log and try to replay it in both scenarii.

Resources