Is there factor available to measure Replica shards latency in ES - elasticsearch

In ES we know that when a document is submitted for the indexing first the data gets into primary and then the data get index replica shards. Is the writing operation to replica is immediate one or will there be any latency if there is any latency, is any parameter available to measure the latency? And while writing data to replica due some reason if the write fails what will happen to the data in primary and will the data get replicate from primary again to replica.
Thanks

At index time, a replica shard does the same amount of work as the primary shard. New documents are first indexed on the primary and then on any replicas. So unless the data is first completely written on the primary it won't start writing on to replica shard.
And while writing data to replica shard , if it fails you don't have to worry about primary & yes it will be re-written once you get the replica back

Related

Allocate Shards and replica to specific node

I am beginner to Elastic search ,
I have an elastic search cluster with 3 nodes in ec2 AWS, from these nodes 2 are spot instances, and one is on-demand instances. Since shards and replicas are distributed among all nodes. I need to set that on-demand instance as a primary node means there should be allocated a replica of every index in the 3 nodes into the primary node(on-demand instance). So if even the spot instances get terminated I will not lose the data. Is there an way to configure it. Thanks in advance.
If I understand correctly, you want to allocate primary shard to the on-demand instance?
Primary/replica shard level filtering is not possible currently. However, if your concern is regarding data loss, you don't have to worry so much because if the primary shards goes down one of the replica shard gets promoted to primary shard.

Insert availability when one replica is down in a shard

In a setup with 2 replicas per shard without insert_quorum enabled, if one replica goes down, seems the other available replica can keep taking inserts. But is there anything that can block inserts in the future? For example when there are X unreplicated inserts, then stop taking in more inserts for a given shard?

Elasticsearch shard and replica search performance

I'm trying to understand how search queries are devided between primary and replica shards.
I don't clearly underatsnad why ones are said that one of benefit of replica is increasing search performance. But as I understand primary shard can serve search queries as replica does. If so let's assume that we have ES cluster with 2 nodes and queries are read-only. Will index with 10 shard work with such performanse as index with 5 shards and 1 replica?
First you need to understand what is primary shard and replica shard.
Primary shard is where you first write request goes and then replicates to its replica shards, based on the replication factor Elasticsearch will create n number of replica shard for a primary shard.
Now one document is always a part of a single primary shard but its copies are present in all the replica shards.
Now when you search you can search either in its primary shard(just one main copy) or any one of the replica shards. And these copies can be present on different nodes in the cluster(Elasticsearch is a distributed system). And to improve the performance Elasticsearch can query depends on the load and various other factors any shard which contains the copy, this explains why having replicas increase search performance and why you can search from the replicas.

Do all shards (within index) have the same content?

Do all shards (within index) have the same content?
If yes, more shards = longer propagation (save) time?
If no, when one of shards failed = data is incomplete when merging?
First, you need to understand what is sharding and why it's important in distributed systems like elasticsearch. You can read some good resources on shards here here and here.
Now Coming to your question,
Do all shards (within index) have the same content.
The answer, is no (assuming you are referring to primary shards here, of course, replica shard is just a copy of primary shard), let's take an example.
Your Index contains around 100 million docs and you have a 10 data nodes cluster, then you want to horizontally scale your index, so you started with the setting of 10 primary shards and 1 replica shards. In this case, elasticsearch will physically divide your data into 10 primary shards and each primary shard will be on a different node of a cluster as there are 10 data nodes and similarly every primary shards copy which is called replica of a shard which is on a different node of its primary shard.
Now coming to your follow-up question.
If yes, more shards = longer propagation (save) time? If no, when one
of shards failed = data is incomplete when merging?
As elasticsearch doesn't store the same data in all the primary shards, so more shards mean longer propagation or save time is invalid and also when one of the shards is failed then elasticsearch recover its data from its replica shard as it's present physically on a different data node server.
Bonus tip:- Shards are used to split your data and to make your application horizontal scalable, while the replica is to make your application is highly available as it contains the duplicated data, so the application can recover easily from the scenario you just asked in your follow-up question.
Let me know if you need any clarification or more details.
short answer:
Q-1: no
if-no: if index has not a replica, it affects the whole index but not other shards of the index .
please read this document:
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/_basic_concepts.html

If a shard goes down, then, after re-allocating that shard, will data residing in that shard be retrievable

I have manually allocated 3 primary shards to a particular node in ElasticSearch. The replicas of these shards reside in different nodes. Now, let's say, primary shard number 2 goes down (for example, due to overflow of data) without the node on which it is residing going down. Then is it possible to retrieve the data residing on that particular shard, after I manually re-allocate it to a different node? If yes, how?
Yes.
Once the node with primary shard number 2 goes down then the replica shard on the other node will be upgraded to a primary shard - allowing you to retrieve the data. See here:
Coping with failure (ES Definitive Guide)

Resources