In a setup with 2 replicas per shard without insert_quorum enabled, if one replica goes down, seems the other available replica can keep taking inserts. But is there anything that can block inserts in the future? For example when there are X unreplicated inserts, then stop taking in more inserts for a given shard?
Related
I understand that replica shards are used for two main purposes in Elasticsearch:
Providing high availability (I.e. backup)
Improving throughput by enabling running search queries parallelly on multi-core CPUs
Elasticsearch does not allow having replica shards on the same node that holds the primary shard, the rationale is that replicas are used for backup which would be meaningless if they're stored on the same node as the primary shard. I get that.
But, in my case, I have a cluster with a single node and would like to add a replica to the node to improve the throughput and I don't mind the fact that I still have a single point of failure (I have the original data stored somewhere else). I only have a single machine to work with. Why can't I add replica shards for performance reasons only while disregarding the backup aspects?
ElasticSearch can execute concurrent requests on a shard. See this thread
the processing of a query is single threaded against each shard. Multiple queries can however be run concurrently against the same shard, so assuming you have more than one concurrent query, you can still use multiple cores.
So adding a replicas in the same node will just consume disk space. The throughput gain of replicas is that the data is distributed on mutiple node allowing all cpus of those node to be used to process your query.
We operate an elasticsearch stack which uses 3 nodes to store log data. Our current config is to have indices with 3 primaries and 1 replica. (We have just eyeballed this config and are happy with the performance, so we decided to not (yet) spend time for optimization)
After a node outage (let's assume a full disk), I have observed that elasticsearch automatically redistributes its shards to the remaining instances - as advertised.
However this increases disk usage on the remaining two instances, making it a candidate for cascading failure.
Durability of the log data is not paramount. I am therefore thinking about reconfiguring elasticsearch to not create a new replica after a node outage. Instead, it could just run on the primaries only. This means that after a single node outage, we would run without redundancy. But that seems better than a cascading failure. (This is a one time cost)
An alternative would be to just increase disk size. (This is an ongoing cost)
My question
(How) can I configure elasticsearch to not create new replicas after the first node has failed? Or is this considered a bad idea and the canonical way is to just increase disk capacity?
Rebalancing is expensive
When a node leaves the cluster, some additional load is generated on the remaining nodes:
Promoting a replica shard to primary to replace any primaries that were on the node.
Allocating replica shards to replace the missing replicas (assuming there are enough nodes).
Rebalancing shards evenly across the remaining nodes.
This can lead to quite some data being moved around.
Sometimes, a node is only missing for a short period of time. A full rebalance is not justified in such a case. To take account for that, when a node goes down, then elasticsearch immediatelly promotes a replica shard to primary for each primary that was on the missing node, but then it waits for one minute before creating new replicas to avoid unnecessary copying.
Only rebalance when required
The duration of this delay is a tradeoff and can therefore be configured. Waiting longer means less chance of useless copying but also more chance for downtime due to reduced redundancy.
Increasing the delay to a few hours results in what I am looking for. It gives our engineers some time to react, before a cascading failure can be created from the additional rebalancing load.
I learned that from the official elasticsearch documentation.
I have designed solr/elasticsearch for searching, I have a particular question. suppose I have 10K search request/seconds. so where will be my search on Shards or replica. I know replica is backup of shards.
if it happens on shards then how/why and if its on replica then how/why ?
Primary Shard is the original copy of data, while the replica shard is a copy of your original data.
While Indexing always happens on the original copy ie primary shards and then copied to replica shards, but the search can happen on any of the copy irrespective of original or copy of data.
Hence replicas are not only created for fault-tolerance where if you lose one copy, it can recover from copy of it, But also to improve the search performance where if one shard is overloaded (primary or replica) then search happens on the least loaded copy ie another replica.
Please refer to Adaptive replica selection in ES on how/why replicas improve the search latency.
Feel free to let me know if you need more information.
EDIT based on OP comment:
From ES 7 adaptive replica selection is by default on, so it would send to a least loaded replica but even if all shards are underutilized still it wouldn't send all search requests to primary shards to avoid overloading it. Also before ARS(adaptive replica selection), ES used to send these search requests on round-robin fashion to avoid overloading one shard.
In ES we know that when a document is submitted for the indexing first the data gets into primary and then the data get index replica shards. Is the writing operation to replica is immediate one or will there be any latency if there is any latency, is any parameter available to measure the latency? And while writing data to replica due some reason if the write fails what will happen to the data in primary and will the data get replicate from primary again to replica.
Thanks
At index time, a replica shard does the same amount of work as the primary shard. New documents are first indexed on the primary and then on any replicas. So unless the data is first completely written on the primary it won't start writing on to replica shard.
And while writing data to replica shard , if it fails you don't have to worry about primary & yes it will be re-written once you get the replica back
I have manually allocated 3 primary shards to a particular node in ElasticSearch. The replicas of these shards reside in different nodes. Now, let's say, primary shard number 2 goes down (for example, due to overflow of data) without the node on which it is residing going down. Then is it possible to retrieve the data residing on that particular shard, after I manually re-allocate it to a different node? If yes, how?
Yes.
Once the node with primary shard number 2 goes down then the replica shard on the other node will be upgraded to a primary shard - allowing you to retrieve the data. See here:
Coping with failure (ES Definitive Guide)