Elasticsearch indices recovery - elasticsearch

I'm learning how Elasticsearch (version 5.3.0) works in order to try and use it. I've read documentation, Elasticsearch Reference and some ES blog posts too but I couldn't find how indices (shards?) recovery works.
Let assume a node A turn off and, then, become active again. If the cluster didn't stop its activity and some documents were indexed, how are those changes synchronized with the node A? Does ES replace all files or there is a mechanism to communicate only changes to that node?
References and documentation are welcomed.
Thank you in advance for the responses.

These days Elasticsearch is doing a diff between the segments (files) in primary shard and the ones in the replica shard. What is different is copied over new from the primary.
In future though (ES 6), there will be sequence IDs: https://github.com/elastic/elasticsearch/issues/10708
The advantage of having these is that ES will make a first attempt to compare the sequence IDs from the primary and replica and see how "far" they are apart. If the translog from the primary shard still has all the changes since the replica went offline, ES will simply replay the operations in the primary shard translog on the replica shard. If not all the operations are there anymore, then it will get back to the segments diffing (the current approach).

Related

Elasticsearch - How node detects shard failure

I had basic knowledge about elastic search.I come across the following phrase . From https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-replication.html
In the case that the primary itself fails, the node hosting the primary will send a message to the master about it. The indexing operation will wait (up to 1 minute, by default) for the master to promote one of the replicas to be a new primary.
The question, How node hosting the shard knows about the failure of the shard ? As I understand , shard is a lucene instance that runs on a data node.
Most likely (with some improvements since elasticsearch version 1.4), this would be detected via checksum if any segment file within the shard has incorrect checksum, then the shard is marked corrupt.
This may happen on recovery (after node starts up) or when any IO operation is done on the segment (ie when it is read by searching or via the merge policy)
Potentially, this page for 7.8 (select the version you use for accurate doc) mentions how to dismiss corrupt data or if data is important best way is to restore from snapshot :
https://www.elastic.co/guide/en/elasticsearch/reference/7.8/shard-tool.html#_description_7
I guess, you are getting confused in this statement
How node hosting the shard knows about the failure of the shard ? As I
understand , shard is a lucene instance that runs on a data node.
while its true that every shard is a Lucene instance(index) but its not a 1:1 mapping and 1 data node of elasticsearch can host multiple shards not just 1 shard and failure of Lucene shard doesn't always mean the failure of data node.
Node holding the primary shard knows if its connected to network, whether its able to index the data or not or shard is corrupted or not as mentioned by #julian and then it can send that information to master node, which then promote other replicas to primary which is contained in cluster state which all nodes holds.
In network failure case, all the primary shards hosted on the nodes will be replaced by other shards and it's easy to detect as master will not a heart beat from that data node.
Hope bold part of my answer is what you were looking for, otherwise feel free to comment and would try to explain further.
It's confusing at first sight. But if you look deeper, it is still a valid scenario and same mentioned in the document at high level.
Let's, say coordinator node receives a request to index the data. Master node maintains list of in-sync shards. Then master forwards the request to the node which has the primary shard. As you mentioned, shard is a Lucene core. The node which received has to index the data in the primary shard. Incase if it is not possible due to the portion of shard corrupted or so, then it will inform the master to elect another primary.
And master also monitors each shards and informs the other node to prepare a primary shard if needed. Demotes a shard from primary if needed. Master does more in this cases.
Elasticsearch maintains a list of shard copies that should receive the operation. This list is called the in-sync copies and is maintained by the master node
Once the replication group has been determined, the operation is forwarded internally to the current primary shard of the group

solr/elasticsearch search request is handled by shards or replica?

I have designed solr/elasticsearch for searching, I have a particular question. suppose I have 10K search request/seconds. so where will be my search on Shards or replica. I know replica is backup of shards.
if it happens on shards then how/why and if its on replica then how/why ?
Primary Shard is the original copy of data, while the replica shard is a copy of your original data.
While Indexing always happens on the original copy ie primary shards and then copied to replica shards, but the search can happen on any of the copy irrespective of original or copy of data.
Hence replicas are not only created for fault-tolerance where if you lose one copy, it can recover from copy of it, But also to improve the search performance where if one shard is overloaded (primary or replica) then search happens on the least loaded copy ie another replica.
Please refer to Adaptive replica selection in ES on how/why replicas improve the search latency.
Feel free to let me know if you need more information.
EDIT based on OP comment:
From ES 7 adaptive replica selection is by default on, so it would send to a least loaded replica but even if all shards are underutilized still it wouldn't send all search requests to primary shards to avoid overloading it. Also before ARS(adaptive replica selection), ES used to send these search requests on round-robin fashion to avoid overloading one shard.

Do all shards (within index) have the same content?

Do all shards (within index) have the same content?
If yes, more shards = longer propagation (save) time?
If no, when one of shards failed = data is incomplete when merging?
First, you need to understand what is sharding and why it's important in distributed systems like elasticsearch. You can read some good resources on shards here here and here.
Now Coming to your question,
Do all shards (within index) have the same content.
The answer, is no (assuming you are referring to primary shards here, of course, replica shard is just a copy of primary shard), let's take an example.
Your Index contains around 100 million docs and you have a 10 data nodes cluster, then you want to horizontally scale your index, so you started with the setting of 10 primary shards and 1 replica shards. In this case, elasticsearch will physically divide your data into 10 primary shards and each primary shard will be on a different node of a cluster as there are 10 data nodes and similarly every primary shards copy which is called replica of a shard which is on a different node of its primary shard.
Now coming to your follow-up question.
If yes, more shards = longer propagation (save) time? If no, when one
of shards failed = data is incomplete when merging?
As elasticsearch doesn't store the same data in all the primary shards, so more shards mean longer propagation or save time is invalid and also when one of the shards is failed then elasticsearch recover its data from its replica shard as it's present physically on a different data node server.
Bonus tip:- Shards are used to split your data and to make your application horizontal scalable, while the replica is to make your application is highly available as it contains the duplicated data, so the application can recover easily from the scenario you just asked in your follow-up question.
Let me know if you need any clarification or more details.
short answer:
Q-1: no
if-no: if index has not a replica, it affects the whole index but not other shards of the index .
please read this document:
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/_basic_concepts.html

How does Elasticsearch recover from a quorum that is not unanimous

When using replication with a quorum, Elasticsearch allows writes to fail for some (a small number of) replica shards. Writing to a replica might fail only because it is temporarily unavailable (because of a temporary network partition, for example). When that shard becomes available again (the network is fixed, for example), what happens?
Does Elasticsearch automatically detect that the shard is out of date (stale, inconsistent with the primary shard) and update it in the background? Or must you perform a manual operation? When the shard returns from being unavailable, but is out of date, does Elasticsearch automatically refrain from querying that shard (and retrieving stale data) until it is brought up to date? Or must you provide special query parameters it ensure that out-of-date shards are not used?
Elasticsearch manages automatically the replica that it are out of date. No manual operation or special query are necessary.
In case of nodes/network failure you have to ensure that a quorum of the cluster remain online, otherwise you will encounter the split brain problem in which you cannot known which of the replica is in line and which is out of date.
Be careful: Quorum is generally associated with the election of the one master node out of all master eligible nodes. That master maintains the cluster state, which keeps track of the one primary shard (plus 0 or more replica shards) — there is no quorum involved for this.
The replication protocol has been improved a lot in 6.0 with sequence numbers and primary terms. A good overview is the blog post about it. Basically all operations are numbered (per shard), so missing operations can be detected and replayed; see the recovery part in the blog post in particular.
With failing primary shards it can get a little more interesting; one great post about more details is available on Elastic's discuss.

elasticsearch undefined index and how to get rid of it

I am seeing the following index Unassigned which is very annoying. How do I get rid of it
Those unassigned shards are actually unassigned replicas of your actual shards from the master node.
The main purpose of replicas is for failover: if the node holding a primary shard dies, then a replica is promoted to the role of primary.
At index time, a replica shard does the same amount of work as the primary shard. New documents are first indexed on the primary and then on any replicas. Increasing the number of replicas does not change the capacity of the index.
However, replica shards can serve read requests. If, as is often the case, your index is search-heavy, you can increase search performance by increasing the number of replicas, but only if you also add extra hardware.
In order to assign these shards, you need to run a new instance of elasticsearch to create a secondary node to carry the data replicas. (The node can be master eligible or just a workhorse. Of course, you can set those configurations in the elasticsearch config files)
For more details about it you can refer to the official documentation and the Elasticsearch Definitive Guide (the work on it is still in progress but you will find what you are looking for here)

Resources