Elasticsearch shard allocation - elasticsearch

I have 1 ES cluster with 3 nodes, 1 index, 3 shards and 2 replica per shard.
For some reason, all my primary shards are located on the same node:
Node 1: replica 0, replica 1, replica 2
Node 2: replica 0, replica 1, replica 2
Node 3: primary 0, primary 1, primary 2.
What I should do to rebalance shards? I want to have 1 primary shard per 1 node, for example:
Node 1: primary 0, replica 1, replica 2
Node 2: replica 0, primary 1, replica 2
Node 3: replica 0, replica 1, primary 2.

Related

How to draw the binary tree for this input: [1,2,2,3,null,null,3,4,null,null,4]?

This is what I have drawn
I am apparently missing something. How to include the value at index 10 of the array?
I will be grateful for your help.
The array encoding is not such that you can assume that a node's children are at the index i*2+1 and i*2+2. That would be true if the encoded tree were a complete binary tree, but when it is not, you cannot use this formula.
Instead, you should keep track of the bottom layer of the tree as you build it, only registering real nodes (not null). Then distribute the next children among the nodes in the (then) bottom layer, etc. This is really a breadth-first traversal method.
This is the procedure:
Create a queue, and create a node for the first value in the input list (if the list is not empty), and enqueue it.
Then repeat for as long as there is more input to process:
dequeue a node from the queue
read the next two values from the input, and create nodes for them. If there are not enough values remaining in the input, use null instead.
Attach these two nodes as children to the node that you had taken from the queue
Those children that were not null should be enqueued on the queue.
If you apply this algorithm to the example input [1,2,2,3,null,null,3,4,null,null,4], then we get first the root, which is put on the queue. So just before the loop starts we have:
root: 1 queue = [1]
remaining input = [2,2,3,null,null,3,4,null,null,4]
I depict here the queue contents with numbers, but they really are node instances.
After the first iteration of the loop, in which we read 2 and 2 from the input, create nodes for them, attach them to the dequeued node, and enqueue those children, we get:
root: 1 queue = [2, 2]
/ \ remaining input = [3,null,null,3,4,null,null,4]
2 2
After iteration #2 (note that no null is enqueued):
root: 1 queue = [2, 3]
/ \ remaining input = [null,3,4,null,null,4]
2 2
/ *
3
After iteration #3:
root: 1 queue = [3, 3]
/ \ remaining input = [4,null,null,4]
2 2
/ * * \
3 3
After iteration #5:
root: 1 queue = [3, 4]
/ \ remaining input = [null,4]
2 2
/ * * \
3 3
/ *
4
After the final iteration:
root: 1 queue = [4, 4]
/ \ remaining input = []
2 2
/ * * \
3 3
/ * * \
4 4
The queue is not empty, but as there is no more input, those queued nodes represent leaves that need no further processing.

Replicas created on same node before being transferred

I have an Elasticsearch cluster made up of 3 nodes.
Every day, I have a batch that feeds in a new index composed of 3 shards then scales the number of replicas to 1. So at the end of the day I'm expecting every node to carry 1 primary and 1 replica.
The figure below shows the disk space usage on each node during this operation.
On node 0 everything seems to be going smoothly during that operation.
However, node 2 is idle most of the time at the beginning while node 1 seems to be is taking care of its own replica plus node 2 replica, before transferring it to node 2 (this is my own understanding, I might be wrong). This is causing a lot of pressure on the disk usage of node 1 which almost reaches 100% of disk space usage.
Why this behaviour? Shouldn't every node take care of its own replica here to even the load? Can I force it to do so somehow? This is worrying because when a disk reaches 100%, the entire node goes down as it happened in the past.
UPDATE to Val's answer:
You will find the outputs below
GET _cat/shards/xxxxxxxxxxxxxxxxxxxxxx_20210617?v
index shard prirep state docs store ip node
xxxxxxxxxxxxxxxxxxxxxx_20210617 1 p STARTED 8925915 13.4gb 172.23.13.255 es-master-0
xxxxxxxxxxxxxxxxxxxxxx_20210617 1 r STARTED 8925915 13.4gb 172.23.10.76 es-master-2
xxxxxxxxxxxxxxxxxxxxxx_20210617 2 r STARTED 8920172 13.4gb 172.23.24.221 es-master-1
xxxxxxxxxxxxxxxxxxxxxx_20210617 2 p STARTED 8920172 13.4gb 172.23.10.76 es-master-2
xxxxxxxxxxxxxxxxxxxxxx_20210617 0 p STARTED 8923889 13.4gb 172.23.24.221 es-master-1
xxxxxxxxxxxxxxxxxxxxxx_20210617 0 r STARTED 8923889 13.5gb 172.23.13.255 es-master-0
GET _cat/recovery/xxxxxxxxxxxxxxxxxxxxxx_20210617?v
index shard time type stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent
xxxxxxxxxxxxxxxxxxxxxx_20210617 0 382ms empty_store done n/a n/a 172.23.24.221 es-master-1 n/a n/a 0 0 0.0% 0 0 0 0.0% 0 0 0 100.0%
xxxxxxxxxxxxxxxxxxxxxx_20210617 0 21.9m peer done 172.23.24.221 es-master-1 172.23.13.255 es-master-0 n/a n/a 188 188 100.0% 188 14467579393 14467579393 100.0% 14467579393 55835 55835 100.0%
xxxxxxxxxxxxxxxxxxxxxx_20210617 1 395ms empty_store done n/a n/a 172.23.13.255 es-master-0 n/a n/a 0 0 0.0% 0 0 0 0.0% 0 0 0 100.0%
xxxxxxxxxxxxxxxxxxxxxx_20210617 1 9m peer done 172.23.13.255 es-master-0 172.23.10.76 es-master-2 n/a n/a 188 188 100.0% 188 14486949488 14486949488 100.0% 14486949488 0 0 100.0%
xxxxxxxxxxxxxxxxxxxxxx_20210617 2 17.8m peer done 172.23.10.76 es-master-2 172.23.24.221 es-master-1 n/a n/a 134 134 100.0% 134 14470475298 14470475298 100.0% 14470475298 1894 1894 100.0%
xxxxxxxxxxxxxxxxxxxxxx_20210617 2 409ms empty_store done n/a n/a 172.23.10.76 es-master-2 n/a n/a 0 0 0.0% 0 0 0 0.0% 0 0 0 100.0%
First, if you have 3 nodes and your index has 3 primaries with each having 1 replica, there's absolutely no guarantee whatsoever that each node will hold one primary and one replica.
The only guarantees that you have is that:
the shard count will be balanced over the nodes and
a primary and its replica will never land on the same node.
That being said, it's perfectly possible for a node to get two primaries, another two replicas and the 3rd one gets one primary and one replica.
Looking at the chart, what I think happens in your case is that
node 2 gets two primaries and
node 0 gets one primary
Then, when you add the replica:
node 0 (which has only one primary) gets one replica (the curve is less steep)
node 1 (which has nothing so far) gets two replicas (the curve grows steeper)
node 2 stays flat because it already has two primaries
A little later, when node 1's disk approaches saturation, one shard is relocated away from it to node 2 (at 23:16 the curve starts to increase).
The end situation seems to be:
node 0 with one primary and one replica
node 1 with only one replica
node 2 with two primaries and one replica
I think it would be nice to confirm this with the following two commands:
# you can see where each shard is located now
GET _cat/shards/tax*?v
# you can see which shards went from which node to which node
GET _cat/recovery/indexname*?v

elasticsearch cluster green with only one node

I have an elasticsearch cluster that only reports that it is green but reports only one node . From my research the cluster should be yellow and there should be two separate clusters . So could someone explain why the cluster below is reporting a green status ?
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 2,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
The cluster is being configured for clustering in elasticsearch.yml , and before those changes it properly reported a yellow status with the same 2 shards per node .
you have two primary shards in your cluster with no replica. both shards are assigned to one data node.
if you increase Number_of_replicas to 1 or higher, you would see the yellow status of cluster. on that moment you can do two things. 1) add another data node. 2) change elastic setting to force assign both primary and replica shards to one node (not recommended).
The cluster is green because there are 0 unassigned shards - every shard that needs a home has one. This is likely because you have indices with number_of_replicas set to 1, and since you have 1 active node in your cluster, all replica requirements are satisfied. This is generally a bad idea, as it doesn't provide any redundancy.
If you create indices with number_of_replicas set to a value larger than 1, you will need at least that many machines active in the cluster to be eligible for green status.

How does Elasticsearch manage its shards?

I have three nodes working together. On those nodes I have five indexes, each having 5 primary shards and each primary shard has 2 replicas. It looks like this (i cut to see only two indices out of 5 ):
![IMG]http://i59.tinypic.com/2ez1wjt.png
As you can see on the picture:
- the node 1 has primary shards 0 and 3 (and replicas 1,2 and 4)
- the node 2 has primary shards 2 (and replicas 0, 1, 3 and 4)
- the node 3 has primary shards 1 and 4 (and replicas 0 ,2 and 3)
and this is the case for each index (the 5 of them).
I understand that if I restart my nodes this "organisation" will change but still the "look" of index will be the same as index2, 3, 4, 5. For example, after restarting, I would have:
- the node 1 has primary shards 1 and 2 (and replicas 0, 3 and 4)
- the node 2 has primary shards 3 (and replicas 0, 1, 2 and 4)
- the node 3 has primary shards 0 and 4 (and replicas 1 ,2 and 3)
and this would be the case for each index (the 5 of them).
Is there a reason why I find always the same pattern for each of my index?
Thanks

UnavailableShardsException on ElasticSearch

I'm using Elasticsearch on my dedicated server (not amazon). Recently it's giving me error like:
UnavailableShardsException[[tribune][4] Primary shard is not active or
isn't assigned is a known node. Timeout: [1m], request: delete
{[tribune][news][90755]}]
when ever I'm making /_cat/shards?v the result is:
index shard prirep state docs store ip node
tribune 4 p UNASSIGNED
tribune 4 r UNASSIGNED
tribune 0 p STARTED 5971 34mb ***.**.***.** Benny Beckley
tribune 0 r UNASSIGNED
tribune 3 p STARTED 5875 33.9mb ***.**.***.** Benny Beckley
tribune 3 r UNASSIGNED
tribune 1 p INITIALIZING ***.**.***.** Benny Beckley
tribune 1 r UNASSIGNED
tribune 2 p STARTED 5875 33.6mb ***.**.***.** Benny Beckley
tribune 2 r UNASSIGNED

Resources