ElasticSearch: Synchronization between replicas and primary shard - elasticsearch

I was going through the details about how updates to a document are propagated through the primary shard and sent to replica shards as provided here: https://www.elastic.co/guide/en/elasticsearch/guide/current/_partial_updates_to_a_document.html
It is written that the updates to the document is communicated asynchronously to the replica shards and it tries till retry_on_conflict times to make sure that it is successfully executed.
Why does it have to try this this many times, it could have returned an error on the first try itself. Please provide some examples where the update would fail in the first case and would successfully take place after some tries.

Related

Check All shard of index has been deleted from Elastic Search

I have deleted the index from the elastic search with DELETE API. But during my deletion, some of the shards may not be connected to clusters due to some node failure or network issue. So after deletion the index I have to check that all shard has been deleted properly so that I can take action accordingly (Including execute DELETE API again) so for the check that all shard has been deleted can I use GET /_cat/indices/indexname to check. The issue of checking is that some node holing a shard may not be connected to cluster at the time of checking. And I want to know that some shard still there in somewhere (In which node?, I am not interested in)
GET /_cat/indices/indexname returns
Shard count
Document count
Deleted document count
Primary store size
Total store size of all shards, including shard replica

Elasticsearch delete_by_query version conflict

According to ES documentation document indexing/deletion happens as follows:
Request received at one of the nodes.
Request forwarded to the document's primary shard.
The operation performed on the primary shard and parallel requests sent to replica nodes.
Primary shard node waits for a response from replica nodes and then send the response to the node where the request was originally received.
Send the response back to the client.
Now in my case, I am sending a create document request to ES at time t and then sending a request to delete the same document (using delete_by_query) at approximately t+800 milliseconds. These requests are sent via a messaging system (internal implementation of kafka) which ensures that the delete request will be sent to ES only after receiving 200 OK response for the indexing operation from ES.
According to ES documentation, delete_by_query throws a 409 version conflict only when the documents present in the delete query have been updated during the time delete_by_query was still executing.
In my case, it is always guaranteed that the delete_by_query request will be sent to ES only when a 200 OK response has been received for all the documents that have to be deleted. Hence there is no possibility of an update/create of a document that has to be deleted during delete_by_query operation.
Please let me know if I am missing something or this is an issue with ES.
Possible reason could be due to the fact that when a document is created, it is not "committed" to the index immediately.
Elasticsearch indices operate on a refresh_interval, which defaults to 1 second.
This documentation around refresh cycles is old, but I cannot for the life of me find anything as descriptive in the more modern ES versions.
A few things you can try:
Send _refresh with your request
Add ?refresh=wait_for or ?refresh=true param
Note that refreshing the index on every indexing request is terrible for performance, which begs the question as to why you are trying to delete a document immediately after indexing it.
add
deleteByQueryRequest.setAbortOnVersionConflict(false);

How does elastic search brings back a node which is down

I was going through elastic search and wanted to get consistent response from ES clusters.
I read Elasticsearch read and write consistency
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-index_.html
and some other posts and can conclude that ES returns success to write operation after completing writes to all shards (Primary + replica), irrespective of consistency param.
Let me know if my understanding is wrong.
I am wondering if anyone knows, how does elastic search add a node/shard back into a cluster which was down transiently. Will it start serving read requests immediately after it is available or does it ensures it has up to date data before serving read requests?
I looked for the answer to above question, but could not find any.
Thanks
Gopal
If node is removed from the cluster and it joins again, Elasticsearch checks if the data is up to date. If it is not, then it will not be made available for search, until it is brought up to date again (which could mean the whole shard gets copied again).
the consistency parameter is just an additional pre-index check if the number of expected shards are available in the cluster (if the index is configured to have 4 replicas, then the primary shard plus two replicas need to be available, if set to quorum). However this parameter does never change the behaviour that a write needs to be written to all available shards, before returning to the client.

Elasticsearch: indeterministic data corruption on replicas

I want to refer to this aprt of documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/_partial_updates_to_a_document.html
Namely the last blue box and its last sentence.
When a primary shard forwards changes to its replica shards, it
doesn’t forward the update request. Instead it forwards the new
version of the full document. Remember that these changes are
forwarded to the replica shards asynchronously, and there is no
guarantee that they will arrive in the same order that they were sent.
If Elasticsearch forwarded just the change, it is possible that
changes would be applied in the wrong order, resulting in a corrupt
document.
So what does it mean?
It says that sometimes if the random asynchronous order is unfortunate then the replica will contain a corrupt document (?) It does not seem reliable for an industry solution.
If this the above is true then how one can eliminate this problem of indeterministic data corruption on replicas?

Retrieving a document at replica shard while it is not yet updated data

I have a question with Elasticsearch as below:
In mode replication is async, while a document is being indexed, the document will already be present on the primary shard but not yet copied to the replica shards. At this time, a GET request for this document is forwarded to a replica shard.
How does Elasticsearch handle this in the cases the document is not yet indexed on the replica shards or the document is not yet updated on the replica shards ?
A fail request will be returned in the case indexing a new document or a old document returned in the case updating a document? Or requesting node will re-forward to the primary shard to get data?
First of all, I wouldn't recommend using the async mode. It doesn't really provides any benefits that you couldn't achieve in a safer way but creates issues when in comes to reliability. Because of this, this feature was deprecated in v1.5 and completely removed in v2.0 of elasticsearch.
Saying that, if you still want to use async and care about getting the latest results, you have to use it with primary preference.
In case of update operation, you don't have to do anything. Update is always performed on the primary shard first and then the result of the operation is replicated to all replicas.

Resources