Resource usage with rolling indices in Elasticsearch - elasticsearch

My question is mostly based on the following article:
https://qbox.io/blog/optimizing-elasticsearch-how-many-shards-per-index
The article advises against having multiple shards per node for two reasons:
Each shard is essentially a Lucene index, it consumes file handles, memory, and CPU resources
Each search request will touch a copy of every shard in the index. Contention arises and performance decreases when the shards are competing for the same hardware resources
The article advocates the use of rolling indices for indices that see many writes and fewer reads.
Questions:
Do the problems of resource consumption by Lucene indices arise if the old indices are left open?
Do the problems of contention arise when searching over a large time range involving many indices and hence many shards?
How does searching many small indices compare to searching one large one?
I should mention that in our particular case, there is only one ES node though of course generally applicable answers will be more useful to SO readers.

It's very difficult to spit out general best practices and guidelines when it comes to cluster sizing as it depends on so many factors. If you ask five ES experts, you'll get ten different answers.
After several years of tinkering and fiddling around ES, I've found out that what works best for me is always to start small (one node, how many indices your app needs and one shard per index), load a representative data set (ideally your full data set) and load test to death. Your load testing scenarii should represent the real maximum load you're experiencing (or expecting) in your production environment during peak hours.
Increase the capacity of your cluster (add shard, add nodes, tune knobs, etc) until your load test pass and make sure to increase your capacity by a few more percent in order to allow for future growth. You don't want your production to be fine now, you want it to be fine in a year from now. Of course, it will depend on how fast your data will grow and it's very unlikely that you can predict with 100% certainty what will happen in a year from now. For that reason, as soon as my load test pass, if I expect a large exponential data growth, I usually increase the capacity by 50% more percent, knowing that I will have to revisit my cluster topology within a few month or a year.
So to answer your questions:
Yes, if old indices are left open, they will consume resources.
Yes, the more indices you search, the more resources you will need in order to go through every shard of every index. Be careful with aliases spanning many, many rolling indices (especially on a single node)
This is too broad to answer, as it again depends on the amount of data we're talking about and on what kind of query you're sending, whether it uses aggregation, sorting and/or scripting, etc

Do the problems of resource consumption by Lucene indices arise if the old indices are left open?
Yes.
Do the problems of contention arise when searching over a large time range involving many indices and hence many shards?
Yes.
How does searching many small indices compare to searching one large one?
When ES searches an index it will pick up one copy of each shard (be it replica or primary) and asks that copy to run the query on its own set of data. Searching a shard will use one thread from the search threadpool the node has (the threadpool is per node). One thread basically means one CPU core. If your node has 8 cores then at any given time the node can search concurrently 8 shards.
Imagine you have 100 shards on that node and your query will want to search all of them. ES will initiate the search and all 100 shards will compete for the 8 cores so some shards will have to wait some amount of time (microseconds, milliseconds etc) to get their share of those 8 cores. Having many shards means less documents on each and, thus, potentially a faster response time from each. But then the node that initiated the request needs to gather all the shards' responses and aggregate the final result. So, the response will be ready when the slowest shard finally responds with its set of results.
On the other hand, if you have a big index with very few shards, there is not so much contention for those CPU cores. But the shards having a lot of work to do individually, it can take more time to return back the individual result.
When choosing the number of shards many aspects need to be considered. But, for some rough guidelines yes, 30GB per shard is a good limit. But this won't work for everyone and for every use case and the article fails to mention that. If, for example, your index is using parent/child relationships those 30GB per shard might be too much and the response time of a single shard can be too slow.
You took this out of the context: "The article advises against having multiple shards per node". No, the article advises one to think about the aspects of structuring the indices shards before hand. One important step here is the testing one. Please, test your data before deciding how many shards you need.
You mentioned in the post "rolling indices", and I assume time-based indices. In this case, one question is about the retention period (for how long you need the data). Based on the answer to this question you can determine how many indices you'll have. Knowing how many indices you'll have gives you the total number of shards you'll have.
Also, with rolling indices, you need to take care of deleting the expired indices. Have a look at Curator for this.

Related

Elasticsearch total shard count impact on an index search speed

I know that the shard count and size has a significant impact on the search performance (speed) and cluster recovery.
Does the total number of shard count impact the search speed? Let me simplify it assume I have 5 indices with 5 primary shards each and I am searching in indice1 only and assume it return me the response in 500ms. Will this be same (500ms) if I add 5 more indices? I know the recovery time would increase but not sure about a specific indice search performance.
Any help would be highly appricated.
Common sense would imply that searching on more data takes longer, however,
it's impossible to answer without also knowing:
the number of nodes (more nodes can parallelize searches on several shards)
their hardware specs (ram and cpu play a role in how many concurrent searches can happen on a single node)
if any write operations also happen at the same time (taking resources away from search threads)
etc...
The best you can do is to actually create a test case (using e.g. Rally) and test this on your own infrastructure.

Elasticsearch - Sharding and Performance

I think I've finally gotten a grasp of the fundamental understanding of how to allocate shards for Elasticsearch. Please correct me if I'm wrong, this is what I've pieced together:
Ideally, there should only exist one shard per index, per node.
The only reason why we would ever want to configure more than
one shard IS to over-allocate for future growth (i.e. adding more
nodes to physically support the data).
Now, assuming what I have above is correct, I then wonder if there are any performance issues or differences if I only had one node with 1 shard versus one node with 5 shards. Can anyone enlighten me on this subject?
"The only reason why we would ever want to configure more than one shard IS to over-allocate for future growth (i.e. adding more nodes to physically support the data)."
Not necessarily so. Having more shards helps parallelise your queries and helps them finish faster, but after a bit it can be counterproductive as too many shards will mean overheads in merging the individual shard responses and time spent in queuing and such things.
"one node with 1 shard versus one node with 5 shards"
It depends on what your use case is but you should see some performance gain for bigger queries, with 5 shards.
I believe it depends on the size of the shards. For instance, on the elastic website, they say the following:
"Querying lots of small shards will make the processing per shard
faster, but as many more tasks need to be queued up and processed in
sequence, it is not necessarily going to be faster than querying a
smaller number of larger shards. Having lots of small shards can also
reduce the query throughput if there are multiple concurrent queries."
https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster
In practice I have found that using some exploratory testing with realistic queries helps me determine more definitively how I should move forward with my architecture. It really depends on the use case. As was stated previously however, there comes a point where you can sort of "over optimize" and it ends up cancelling out any noticible gains you may have otherwise obtained by doing the opposite solution.
To be succinct, one shard per index, per node is a fine practice. But if you find yourself needing more, then just assess your use case first and determine if additional shards are truly necessary.

Elasticsearch sharding basics with respect to query speed

Pardon my ignorance if this question is too broad or vague. I'm playing with Elasticsearch with out-of-the-box settings on my laptop and it works just great.
It's a 8 core Macbook and 6G of heap is given to Elasticsearch and it works pretty well for a large dataset (just over 7 Million documents).
I'm keen to set up a multinode cluster (2 machines) and before I assume a few things, I would like to get expert views on a few key points.
I understand "How many shards per node" is a very subjective question and one answer will not fit all situations.
I understand that sharding helps to distribute the indices to multiple nodes so that the storage footprint is optimal per node.
But mainly, I'd like to understand how the sharding effects on the query speed & effective CPU cores utilization.
When a single search query comes in, does ES fire internal subqueries to all the shards in parallel, and therefore it can keep all the cores busy (if the no of shards equals no of cores)?
Can I also be pointed to a few useful links that will help me? Thanks.
Your understanding is pretty much spot on.
The basic concept to understand is that one query on a single shard will use one thread. One thread boils down to one core CPU. If the query needs to touch multiple shards, then ES will make sure the shards involved are queried. This means each shard will do its part of the job using one thread.
The size of the shard and the complexity of the query translates to how much time is being spent in that thread. But the OS will not give one CPU core to that thread all the time, the OS it's scheduling jobs and other processes get a slice of the CPU core.
Ideally, yes, you would have number of shards = number of cores, but rarely clusters out there use this setup. Mainly those clusters that have a lot of concurrent requests per seconds and they demand a strict response time.
Thanks for the response.
Just a summary of my understanding to get it validated.
No of shards == No of cores
(-)
Bigger shards.
A thread could take more time to search a single shard
therefore other threads could be queued up and made to wait?
(+)
Optimal core utilization.
Less chances of context switching overhead as the no of threads are limited.
No of shards > No of cores
(-)
More threads will be spawned for queries and context switching overhead may apply.
More threads perhaps need more memory for thread stack etc.
More shards could potentially need more housekeeping (ie managing file handles etc) by elasticsearch.
(+)
A thread could take less time(relatively) to search a single shard
Could process more concurrent requests (as the no of threads are more).
Eventually, it depends on which one of these gives a best balance between:
Available Hardware, Query Speed and Concurrency factor and I think it requires quite a bit of experimenting. Or in otherwords, which hurts little.

Elasticsearch shard allocation for small indices

I have an elasticsearch setup with 192 active indices ranging from a few hundred mb to possibly 5gb each. I read that for a logstash use case with 1gb indices you should only use 1 shard. The difference with my setup is that I will be having more users (estimate of up to 100) expecting a quick response time. I intend to have 1 replica for reliability.
Will having 1 shard per index still be appropriate for my use case?
In a word: yes.
The need to create multiple primary shards derives from the need to isolate documents, extreme counts (e.g., when you're in the billions of documents volume), or to improve write throughput (write documents across more places, thereby reducing individual burden).
In practice, you want to shard based on your use case, unless you're one of those first two scenarios (isolation or extreme counts).
Are you read heavy?
Are you write heavy? (Less common, but it does happen)
If you're read heavy, as most use cases are, then having fewer shards will help you by limiting the request size (fewer places to look). Given that your shard sizes are also relatively small (I'd consider anything under 5 GB to be relatively small), you can easily get away with having a single primary shard and it should benefit your search performance by doing so.
Indexes that share the same mappings, but are also tiny ("few hundred MBs"), should likely be combined if you search across them. If they're independent, then it really makes no difference and the isolation sounds like good practice at the expense of slightly bloating your cluster state (with each index).
Have a look at this blog: https://qbox.io/blog/optimizing-elasticsearch-how-many-shards-per-index. He has a lot of good pointers to sharding and shard sizing.
However, the question you really should be asking yourself is: How easy is it to change? When it comes to sizing and scalability, the answer often is "it depends" - and the real question is: How quickly can you reconfigure?
This could e.g. mean that you design you application in a way, that allows quick re-spooling of data into a new index, that you use aliases so that you can in fact change these things, where your data lies (not just in Elastic, I hope) etc.
By building a system - from the start - so that you can quickly rebuild indicies enables you to experiment with sizes - and more importantly - change them as your need changes.

Performance issues when pushing data at a constant rate to Elasticsearch on multiple indexes at the same time

We are experiencing some performance issues or anomalies on a elasticsearch specifically on a system we are currently building.
The requirements:
We need to capture data for multiple of our customers, who will query and report on them on a near real time basis. All the documents received are the same format with the same properties and are in a flat structure (all fields are of primary type and no nested objects). We want to keep each customer’s information separate from each other.
Frequency of data received and queried:
We receive data for each customer at a fluctuating rate of 200 to 700 documents per second – with the peak being in the middle of the day.
Queries will be mostly aggregations over around 12 million documents per customer – histogram/percentiles to show patterns over time and the occasional raw document retrieval to find out what happened a particular point in time. We are aiming to serve 50 to 100 customer at varying rates of documents inserted – the smallest one could be 20 docs/sec to the largest one peaking at 1000 docs/sec for some minutes.
How are we storing the data:
Each customer has one index per day. For example, if we have 5 customers, there will be a total of 35 indexes for the whole week. The reason we break it per day is because it is mostly the latest two that get queried with occasionally the remaining others. We also do it that way so we can delete older indexes independently of customers (some may want to keep 7 days, some 14 days’ worth of data)
How we are inserting:
We are sending data in batches of 10 to 2000 – every second. One document is around 900bytes raw.
Environment
AWS C3-Large – 3 nodes
All indexes are created with 10 shards with 2 replica for the test purposes
Both Elasticsearch 1.3.2 and 1.4.1
What we have noticed:
If I push data to one index only, Response time starts at 80 to 100ms for each batch inserted when the rate of insert is around 100 documents per second. I ramp it up and I can reach 1600 before the rate of insert goes to close to 1sec per batch and when I increase it to close to 1700, it will hit a wall at some point because of concurrent insertions and the time will spiral to 4 or 5 seconds. Saying that, if I reduce the rate of inserts, Elasticsearch recovers nicely. CPU usage increases as rate increases.
If I push to 2 indexes concurrently, I can reach a total of 1100 and CPU goes up to 93% around 900 documents per second.
If I push to 3 indexes concurrently, I can reach a total of 150 and CPU goes up to 95 to 97%. I tried it many times. The interesting thing is that response time is around 109ms at the time. I can increase the load to 900 and response time will still be around 400 to 600 but CPU stays up.
Question:
Looking at our requirements and findings above, is the design convenient for what’s asked? Are there any tests that I can do to find out more? Is there any setting that I need to check (and change)?
I've been hosting thousands of Elasticsearch clusters on AWS over at https://bonsai.io for the last few years, and have had many a capacity planning conversation that sound like this.
First off, it sounds to me like you have a pretty good cluster design and test rig going here. My first intuition here is that you are legitimately approaching the limits of your c3.large instances, and will want to bump up to a c3.xlarge (or bigger) fairly soon.
An index per tenant per day could be reasonable, if you have relatively few tenants. You may consider an index per day for all tenants, using filters to focus your searches on specific tenants. And unless there are obvious cost savings to discarding old data, then filters should suffice to enforce data retention windows as well.
The primary benefit of segmenting your indices per tenant would be to move your tenants between different Elasticsearch clusters. This could help if you have some tenants with wildly larger usage than others. Or to reduce the potential for Elasticsearch's cluster state management to be a single point of failure for all tenants.
A few other things to keep in mind that may help explain the performance variance you're seeing.
Most importantly here, indexing is incredibly CPU bottlenecked. This makes sense, because Elasticsearch and Lucene are fundamentally just really fancy string parsers, and you're sending piles of strings. (Piles are a legitimate unit of measurement here, right?) Your primary bottleneck is going to be the number and speed of your CPU cores.
In order to take the best advantage of your CPU resources while indexing, you should consider the number of primary shards you're using. I'd recommend starting with three primary shards to distribute the CPU load evenly across the three nodes in your cluster.
For production, you'll almost certainly end up on larger servers. The goal is for your total CPU load for your peak indexing requirements ends up under 50%, so you have some additional overhead for processing your searches. Aggregations are also fairly CPU hungry. The extra performance overhead is also helpful for gracefully handling any other unforeseen circumstances.
You mention pushing to multiple indices concurrently. I would avoid concurrency when bulk updating into Elasticsearch, in favor of batch updating with the Bulk API. You can bulk load documents for multiple indices with the cluster-level /_bulk endpoint. Let Elasticsearch manage the concurrency internally without adding to the overhead of parsing more HTTP connections.
That's just a quick introduction to the subject of performance benchmarking. The Elasticsearch docs have a good article on Hardware which may also help you plan your cluster size.

Resources