Can a node have more than one shard in Elasticsearch? - elasticsearch

I am reading "Elasticsearch: The Definitive Guide" and I would like to confirm something.
When we create an index, it will be assigned to 5 shards by default (or we can use the "number_of_shards" setting).
But if I am using just one node (one server), will the index be spread into 5 shards in the same node? I guess what I am asking is - can a node have multiple shards?

Yes a node can have multiple shards of one or more indices. You can verify it for yourself by executing the GET _cat/shards?v command. Read more about the command here. The problem with having a single node Elasticsearch cluster is that replica shards for indices will not be allocated (but primary shards will be) as it does not make sense to have both the primary and replica of the same shard on the same machine.

Related

ElasticSearch - Restrict primary and replica shards count on nodes

I have ElasticSearch 7.16.2 cluster running with three nodes (2 master , 1 Voting node). An index has two primary shards and two replicas, and on restarting a node, both primary shards move to single node. How to restrict index in a nodes to have one primary shard and one replica each.
You can use the index level shard allocation settings to achieve that, it might be not that straight forward and it's a bit complex setting and can cause further unbalance when you have a changing nodes and indices in the cluster.
In order to avoid the issue which happens on the node restart, you must disable the shard allocation and shard rebalance before starting your nodes in Elasticsearch cluster.
Command to disable allocation
PUT /_cluster/settings
{
"persistent":{
"cluster.routing.allocation.enable": "all"
}
}
Command to disable rebalance
PUT /_cluster/settings
{
"persistent":{
"cluster.routing.rebalance.enable": "all"
}
}
Apart from that, you can use the reroute API to manually move the shards to a node in Elasticsearch to fix your current shard allocation.
the config is index.routing.allocation.total_shards_per_node. but you have a problem. first of all I assume you have three data node. (if you don't have, increase the data nodes.).
the problem is you have 4 primary and replica shard in total and one node must assign two shards to itself. so you could not the set index.routing.allocation.total_shards_per_node to 1. at least it must be 2 and your problem not solved.
the config is dynamic: https://www.elastic.co/guide/en/elasticsearch/reference/master/increase-shard-limit.html
also you could set cluster.routing.allocation.total_shards_per_node config for cluster.

Why No shard becomes primary on second node when it is UP again?

I have configured two node cluster and created an index with 4 shards and 1 replica
Elastic created 2 primary shards on each node, this is how it looks from head plugin. shard 1, shard 3 are primary on node 1(stephen1c7) AND shard 0 and shard 2 are primary on node 2(stephen2c7)
Shutdown one Node
Now i have shutdown the node 2(stephen2c7) to see if all the shards on node 1(stephen1c7) became primary. Yes, all shards are now primary.
UP the shutdown node
Now i have made the Node 2(stephen2c7) up again to see if any shards on this node will be primary. But surprisingly no shard on this node became primary. Waited for long time but still no shard is primary on Node 2.
Why so?
Is there any configuration to set for making the shards primary again after a node is up?
Thanks in advance!
Given this post and this one (albeit slightly old),balancing the primary and replica shard across the cluster does not seem to be a priority in an Elastic cluster. As you can see, Elastic sees replica and primary shard and thus the status seems satisfactory for the cluster.
What I would suggest is to have a look at the shard balancing heuristic and play with these values until you obtain a satisfactory result. (as is often the case with ElasticSearch, testing several parameters is what will yield the best configuration given your architectural design choices).
Note that if you start using shard balancing heuristic, you might not get good results if you use at the same time shard filtering or forced awareness

If a shard goes down, then, after re-allocating that shard, will data residing in that shard be retrievable

I have manually allocated 3 primary shards to a particular node in ElasticSearch. The replicas of these shards reside in different nodes. Now, let's say, primary shard number 2 goes down (for example, due to overflow of data) without the node on which it is residing going down. Then is it possible to retrieve the data residing on that particular shard, after I manually re-allocate it to a different node? If yes, how?
Yes.
Once the node with primary shard number 2 goes down then the replica shard on the other node will be upgraded to a primary shard - allowing you to retrieve the data. See here:
Coping with failure (ES Definitive Guide)

elasticsearch undefined index and how to get rid of it

I am seeing the following index Unassigned which is very annoying. How do I get rid of it
Those unassigned shards are actually unassigned replicas of your actual shards from the master node.
The main purpose of replicas is for failover: if the node holding a primary shard dies, then a replica is promoted to the role of primary.
At index time, a replica shard does the same amount of work as the primary shard. New documents are first indexed on the primary and then on any replicas. Increasing the number of replicas does not change the capacity of the index.
However, replica shards can serve read requests. If, as is often the case, your index is search-heavy, you can increase search performance by increasing the number of replicas, but only if you also add extra hardware.
In order to assign these shards, you need to run a new instance of elasticsearch to create a secondary node to carry the data replicas. (The node can be master eligible or just a workhorse. Of course, you can set those configurations in the elasticsearch config files)
For more details about it you can refer to the official documentation and the Elasticsearch Definitive Guide (the work on it is still in progress but you will find what you are looking for here)

Elasticsearch with two nodes and the default 5 shards?

I have set up a cluster with two nodes but I have some confusions about shard and replica.
What I intend is a setup where there is a master(node A) handling write and a slave(node B) that helps with read and search operation. Ideally if the master is not functional I can recover the data from the slave.
I read that the default is 5 shards and 1 replica. Does it mean that my primary data would then be automatically split between node A and node B. Would that means if one node is down I would lost half the data?
Given the description of my need above, am I doing it right?
The only config I have changed at this point is the following
cluster:
name: maincluster
node:
name: masternode
master: true
I am really new to elasticsearch and please kindly point out if I am missing anything.
5 shards and 1 replica means that your data will be split into 5 shards per index.
Each shard will have one replica (5 more backup shards) for a total of 10 shards spread across your set of nodes.
The replica shard will be placed onto a different node than the primary shard (so that if one node fails you have redundancy).
With 2 nodes and replication set to 1 or more, losing a node will still give you access to all of your data, since the primary shard and replication shard will not ever be on same node.
I would install the elasticsearch head plugin it provides a very graphical view of nodes and shards (primary and replica).

Resources