Check All shard of index has been deleted from Elastic Search - elasticsearch

I have deleted the index from the elastic search with DELETE API. But during my deletion, some of the shards may not be connected to clusters due to some node failure or network issue. So after deletion the index I have to check that all shard has been deleted properly so that I can take action accordingly (Including execute DELETE API again) so for the check that all shard has been deleted can I use GET /_cat/indices/indexname to check. The issue of checking is that some node holing a shard may not be connected to cluster at the time of checking. And I want to know that some shard still there in somewhere (In which node?, I am not interested in)
GET /_cat/indices/indexname returns
Shard count
Document count
Deleted document count
Primary store size
Total store size of all shards, including shard replica

Related

How to check that all shards are moved from a specific elasticsearch node?

I'm trying to move all the shards (primary and copies) from one specific elasticsearch node to others.
While doing some studies, I came to know about Cluster-level shard allocation filtering where I can specify the node name which I want to ignore while allocating shards.
PUT _cluster/settings
{
"transient" : {
"cluster.routing.allocation.exclude._name" : "data-node-1"
}
}
My questions are,
If I dynamically update the setting, will the shards be moved from the nodes that I excluded to other nodes automatically?
How can I check and make sure that all shards are moved from a specific node?
Yes, your shards will be moved automatically, if it is possible to do so:
Shards are only relocated if it is possible to do so without breaking another routing constraint, such as never allocating a primary and replica shard on the same node.
More information here
You can use the shards api to see the location of all shards. Alternatively, if you have access to a kibana Dashboard, you can see the shard allocation in the monitoring tab for shards or indices at the very bottom.

ElasticSearch: Synchronization between replicas and primary shard

I was going through the details about how updates to a document are propagated through the primary shard and sent to replica shards as provided here: https://www.elastic.co/guide/en/elasticsearch/guide/current/_partial_updates_to_a_document.html
It is written that the updates to the document is communicated asynchronously to the replica shards and it tries till retry_on_conflict times to make sure that it is successfully executed.
Why does it have to try this this many times, it could have returned an error on the first try itself. Please provide some examples where the update would fail in the first case and would successfully take place after some tries.

Old Elasticsearch shards are not deleted after relocation

Our Elasticsearch cluster has two data directories. We recently restarted all the nodes in the cluster. After the successful restart process, we observed increased disk space usage on few nodes. When we examined the folders inside the data directory, we found that there are orphaned shards.
For example, an orphaned shard "15" exists at location data_dir0/cluster_name/nodes/0/indices/index_name/15, while one of the replicas of the same shard "15" exists on the same node inside other data directory, here at data_dir1/cluster_name/nodes/0/indices/index_name/15. This shard "15" from data_dir1 is also included in cluster metadata and thus, we assume that shard "15" from data_dir0 is an orphaned shard and has to be deleted by Elasticsearch. But Elasticsearch hasn't deleted the orphaned shard yet, even after 6 days since last restart.
We found this topic https://discuss.elastic.co/t/old-shards-on-re-joining-nodes-useful/182661 relating to our issue but it did not help us as in ES did not take care of that orphaned shard. We also raised the question on Elastic forum but we are not getting quick replies. So, I am asking it here as stack overflow has larger community.
This also happened to our cluster, we run elastic 6.1.3. One specific node had 88% of it disk used, it seems there were some shard leftovers from previous relocation on our production index.
To fix this I stopped Elasticsearch on the node (make sure you have plenty of disk-space on your other datanodes), let the relocation of elastic do its work. Once it is done and re-balanced, delete the index folder and start Elasticsearch again, this went quite painless.
What version of Elasticsearch are you running?
Is your cluster green? If so, those shard files should be deleted by Elasticsearch during initialization. But if that shard has unallocated replicas at the time the node rejoined the cluster, Elasticsearch won't remove pre-existing shard files on disk.
You can manually delete the directory if you don't need the shard. Or you can try restarting Elasticsearch on the node and let it delete the files for you.
We also got help from Elastic forum here https://discuss.elastic.co/t/old-shards-not-deleted-upon-relocation/71161/6
Restarting the node did not help and we do not want to manually delete the folders. So, we are going to replace the affected nodes one by one.
#chani It would be great if you can provide any official link to the manual delete suggestion.

Retrieving a document at replica shard while it is not yet updated data

I have a question with Elasticsearch as below:
In mode replication is async, while a document is being indexed, the document will already be present on the primary shard but not yet copied to the replica shards. At this time, a GET request for this document is forwarded to a replica shard.
How does Elasticsearch handle this in the cases the document is not yet indexed on the replica shards or the document is not yet updated on the replica shards ?
A fail request will be returned in the case indexing a new document or a old document returned in the case updating a document? Or requesting node will re-forward to the primary shard to get data?
First of all, I wouldn't recommend using the async mode. It doesn't really provides any benefits that you couldn't achieve in a safer way but creates issues when in comes to reliability. Because of this, this feature was deprecated in v1.5 and completely removed in v2.0 of elasticsearch.
Saying that, if you still want to use async and care about getting the latest results, you have to use it with primary preference.
In case of update operation, you don't have to do anything. Update is always performed on the primary shard first and then the result of the operation is replicated to all replicas.

Elasticsearch replica auto undertake shard

Below I have 2 nodes in two servers. 2 indexes each. The indexes are distributed in 2 shards and 1 replica set.
"Thor" node had a downtime so I "Iron_man" took over. That's fine.
As you can see events_v1 is an index created before the downtime and venue_v1 was created after the downtime. Shouldn't "Thor" after being back alive take over one shard automatically in the same way as it handles the newly created venue index?
If yes how should I configure the settings?
You need not to configure anything for above scenario.Because Elasticsearch's default behavior is same as you requested.if you create replica for a shard in same node is not allocated to query.If you add new node means your replica shard will be allocated in newly created node.
for more information watch this video

Resources