Elastic search shard allocation - elasticsearch

What will happen when a node which was moved out of the cluster rejoins the cluster after the cluster re balance?
suppose i have cluster of 5 nodes and its status is green and out of 5 one node leaves the cluster and i have configured delayed shard allocation. After the shard time out of the delayed allocation the master promotes the one of the replica as a primary and allocates the unassigned shards and re balances the cluster. What will happen when a the node which has moved out of the cluster rejoins the cluster after all the re balancing of the cluster has been done? what about the shards present in the node which has rejoined the cluster?

Every Node in your cluster computes a weight based on the amount of shards it holds. You can tune the point at which a rebalancing happens via Cluster Level Shard Allocation Settings. Take note that the important information is the amount of shards, not their size. If your shards are very different in size, this may lead to problems with balancing.
When a Node rejoins the cluster, currently ongoing recovery processes (copying the shard to another node to regain green status, amount limited by cluster.routing.allocation.node_concurrent_recoveries) will complete, then the next shard will be recovered. When the cluster discovers an up-to-date version of the missing shard on your rejoined node it will save that information to cluster state and not take any unneccesary action.

Related

Does Elasticsearch stop indexing data when some nodes go down?

I have read that when a new indexing request is sent to ES cluster. ES will specify which shard should that document be stored in depending on routing. Then that node which hosts that primary shard (aka coordinating node) will broadcast the indexing request to each node containing a replica for that shard and it will respond to the client that the document has been indexed successfully if the primary shard and it's replicas stored/indexed that document.
Does that mean that ES supports high availability(node tolerant) for reading requests and not for writing request or it's the default behavior and can be changed?
The main purpose of replicas is for failover, if the node holding a primary shard dies, a replica is promoted to the role of primary. Also, replica shards can serve read requests thus improving search performance.
For write requests though, indexing will be affected if one of your nodes (which has the primary shard for a live index) in the cluster suddenly runs out of disk space because if a node disk usage hits configured watermark levels then ES throws a cluster block exception preventing any writes to the node. If ALL nodes are down/ unreachable indexing will stop however if only one or some nodes go down, indexing shouldn't completely stop as replica shards on other nodes are promoted to primary if the node holding the original primary is offline. Ideally, it goes without saying that some analysis and effort should go to right size an ES cluster and have monitoring in place to prevent any issues.

Why No shard becomes primary on second node when it is UP again?

I have configured two node cluster and created an index with 4 shards and 1 replica
Elastic created 2 primary shards on each node, this is how it looks from head plugin. shard 1, shard 3 are primary on node 1(stephen1c7) AND shard 0 and shard 2 are primary on node 2(stephen2c7)
Shutdown one Node
Now i have shutdown the node 2(stephen2c7) to see if all the shards on node 1(stephen1c7) became primary. Yes, all shards are now primary.
UP the shutdown node
Now i have made the Node 2(stephen2c7) up again to see if any shards on this node will be primary. But surprisingly no shard on this node became primary. Waited for long time but still no shard is primary on Node 2.
Why so?
Is there any configuration to set for making the shards primary again after a node is up?
Thanks in advance!
Given this post and this one (albeit slightly old),balancing the primary and replica shard across the cluster does not seem to be a priority in an Elastic cluster. As you can see, Elastic sees replica and primary shard and thus the status seems satisfactory for the cluster.
What I would suggest is to have a look at the shard balancing heuristic and play with these values until you obtain a satisfactory result. (as is often the case with ElasticSearch, testing several parameters is what will yield the best configuration given your architectural design choices).
Note that if you start using shard balancing heuristic, you might not get good results if you use at the same time shard filtering or forced awareness

One node in Elasticsearch cluster permanently overloaded

I have an Elasticsearch cluster with 11 nodes. Five of these are data nodes and the other ones are client nodes from where I add and retrieve documents.
I am using the standard Elasticsearch configuration. Each index has 5 shards and replicas. In the cluster I have 55 indices and round about 150GB of data.
The cluster is very slow. With the Kopf plugin I can see the stats of each node. There I can see that one single data node (not the master) is permanently overloaded. Heap, disk, cpu are ok, but load is almost every time 100%. I have noticed, that every shard is a primary shard whereas all other data nodes have both primary shards and replicas. When I shutdown that node and then on again, the same problem occurs at another data node.
And I don't know why and how to solve this problem. I thought that the client nodes and the master node distribute the requests evenly? Why is always one data node overloaded?
Try the following settings:
cluster.routing.rebalance.enable:
Enable or disable rebalancing for specific kinds of shards:
all - (default) Allows shard balancing for all kinds of shards.
primaries - Allows shard balancing only for primary shards.
replicas - Allows shard balancing only for replica shards.
none - No shard balancing of any kind are allowed for any indices.
cluster.routing.allocation.allow_rebalance:
Specify when shard rebalancing is allowed:
always - Always allow rebalancing.
indices_primaries_active - Only when all primaries in the cluster are allocated.
indices_all_active - (default) Only when all shards (primaries and replicas) in the cluster are allocated.
cluster.routing.allocation.cluster_concurrent_rebalance:
Allow to control how many concurrent shard rebalances are allowed cluster wide.
Defaults to 2
Sample curl to apply desired settings:
curl -XPUT <elasticsearchserver>:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.rebalance.enable" : "all"
}
}
You can replace transient with persistent if you want your settings persist across restarts.

Unexpected ElasticSearch shard allocation for a single replica with allocation disabled

We have a two node environment and there is certain data that we only want to store on the master node (as the other node is not highly available).
To do this, I've set the number of replicas to 0 and also set the following properties on the indices for which we do not want shard allocation to occur:
"index.routing.allocation.enable": "none",
"index.routing.allocation.rebalance": "none"
My expectation here is that doing so will keep all 5 shards on the master node. However, as soon as I connect the worker node to the environment, 2 or 3 of the shards from each index are moved over to the worker node! How can I stop this from happening and keep all of the shards for the specified index on the master node? Thank you!
I think you need to shard allocation filtering to specify which nodes are allowed to host the shards of a particular index.
https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-allocation-filtering.html

GC problems in ES 1.4.4

Frequent full GC even if the memory utilization is 30-40%. We have not changed the default JVM settings
Cluster details
A master node
Two data nodes
All the replicas are allocated on one node and all the primaries are on the other node. It should not be a problem, right? Also frequent full GC is observed on one of the 2 data nodes.
Shards -
We have around 75 shards with replication as 1. I guess the problem is because of shard over allocation.
Most of the queries we hit are aggregation queries.

Resources