Relocating shards in elasticsearch - elasticsearch

step 1) creating a node named "NODE1"
Step 2) creating new index in NODE1 named "application" and in index type as "testing"
step 3)index that created is with 5 shards. no replicas
Step 4)now i insert 5 doc in to index. it will splitted among 5 shards
Step 5)now i initiate new node called "NODE2" in NODE1's Cluster.
Step 6) as per my understanding it shared shards between nodes. So my 2 shards moved to new node
Question 1)now i request document at NODE1 that is present in relocated shards(shards that moved from NODE1 to NODE2)
Question 2) Will my search return my requested document or not
Question 3) how does two nodes communicate each other
Question 4) Can i read and write in NODE2 ? if yes can i search same data written by NODE2 from NODE1..
Thanks in Advance..!

All the answers are yes :)
The nodes communicate with each other through the transport port, by default 9300 port (or the first one free in the (9300-9400] range. They use a custom binary protocol to communicate, based on serialization of objects (not standard java serialization in most of the cases though).
Any node in a cluster is cluster-aware and knows where the shards are and so on, as they all share the so called cluster state. You can send requests (read and write) to any node and it will be rerouted to the interesting nodes and properly executed depending on the type of request.

Related

Can you run an elasticsearch data node after deleting the data folder?

I am running a three node Elasticsearch (ELK) cluster. All nodes have all and the same roles, e.g. data, master, etc. The disk on node 3 where the data folder is assigned became corrupt and that data is probably unrecoverable. The other nodes are running normally and one of them assumed the master role instead.
Will the cluster work normally if I replace the disk and make the empty directory available to elastic again, or am I risking crashing the whole cluster?
EDIT: As this is not explicitly mentioned in the answer, yes, if you add your node with an empty data folder, the cluster will continue normally as if you added a new node to the cluster, but you have to deal with the missing data. In my case, I lost the data as I do not have replicas.
Let me try to explain that in simple way.
Your data got corrupt at node-3 so if you add that that node again, it will not have the older data, i.e. the shards stored in node-3 will remain unavailable for the cluster.
Did you have the replica shards configured for the indexes?
What is the current status(yellow/red) of the cluster when you have
node-3 removed?
If a primary shard isn't available then the master-node promotes one of the active replicas to become the new primary. If there are currently no active replicas then status of the cluster will remain red.

How to add dedicated master node to existing elasticsearch cluster

We have 6 elasticsearch 6.4 with 3 of them are master eligible does both master and data node operations.
We are thinking of getting 3 dedicated Master as we see the 3 Master/Data node uses high resource utilization sometime and feel that it might crash during working hours some day.
Looking for procedure to add 3 new dedicated master server to existing cluster and how to make the current 3 Master/Data node to just data node.
We found our procedure on how to do this from below link.
https://discuss.elastic.co/t/introduction-of-dedicated-master-nodes/43601
We followed following steps (except disabling http port) mentioned in the post.
shutdown cluster
modify actual 5 nodes with master: false flag and data: true
make 3 new nodes master:true and data: false
modify all nodes to discover using 3 new master nodes addresses
we can optionnally disable http port on master nodes to make them not receiving REST requests.
start cluster
We are still in experimental stage So full cluster restart is not an issue for us however the link has discussion about how to add dedicated master dynamically and avoid split brain issue.

elasticsearch: Poss to change number of replicas after system is running?

elasticsearch 1.7.2 on centos
3 node cluster
This question is how to manage ES config via mods to elasticsearch.yml + restart of elasticsearch service. (Not via api.)
Out of box, the config is:
index.number_of_replicas: 1
So on a 3 node cluster, any 2 nodes have the whole package.
If I want any 1 node to be complete, I would set:
index.number_of_replicas: 2
a) Correct?
b) Can I just walk up to an existing setup and make this change?
c) And, can I just walk up , and adjust it up to 2, and down to 1, whenever? (up to make each node a possible stand alone, down to save disk space)
The number of replica can be changed at any point of time. You can increase or decrease the replica dynamically. There is a good example shown here.
Also please note that , you cant change the number of shards after index creation , but number of replica is open to change via index settings API.
fwiw, another way to do this (I have now proven out) is to update the yml file (elasticsearch.yml). Change the element:
index.number_of_replicas: 2
Up or down, as desired, and restart the elasticsearch service
service elasticsearch restart
The cluster will go yellow (yellow status) while the replicas are being created/moved, and then will go green.

how we decide in which node we should store document in elasticsearch?

Since i am a new in ES, i need help.
I read that it is possible to specify the shard where the document to be stored using 'routing'. But is it possible to restrict that a document should be saved in a particular node?..
Suppose i have two nodes. Node1 and node2. my requirement is that, if i add a document from node1 to index 'attenadance', it should store primary shard at node1 and the replica may be at node2. And the same thing if i add a document from node2 to index 'attenadance', it should store primary shard at node2 and the replica may be at node1...Please advice me is it possible in ES?..if yes, please tell how to achieve this in ES.
Each node has a specific value associated with rack in cofig.yml. Node 1 has a setting node.rack: rack1, Node 2 a setting of node.rack: rack2, and so on.
We can create an index that will only deploy on nodes that have rack set to rack1 setting index.routing.allocation.include.tag to rack1. For example:
curl -XPUT localhost:9200/test/_settings -d '{
"index.routing.allocation.include.tag" : "rack1"
}'
further ref:
Elasticsearch official doc
You don't control where shards/replicas go -- elasticsearch handles that... In general, it won't put a replica of a shard on the same node. There is a really good explanation of how it all works here: Shards and replicas in Elasticsearch
There is also good documentation on using shard routing on the elasticsearch blog if you need to group data together (but be careful because it can generate hot-spots.

Remove of data folder is not synced in Elasticsearch upon index delete

We have an ES cluster with 2 nodes. When we delete an index not all folders in the cluster (on filesystem) are deleted which causes some problems when restarting one server.
Then our deleted indices gets distributed with some weird state indicating that the cluster health is not green.
Example. We delete index with name someIndex and after deletion we check file system, one can see this:
Node1
ElasticSearch\data\clustername\nodes\0\indices\
ElasticSearch\data\clustername\nodes\1\indices\
Node2
ElasticSearch\data\clustername\nodes\0\indices\
ElasticSearch\data\clustername\nodes\1\indices\someIndex (<-- still present)
Anyone know whats causing this?
ES-version: 0.90.5
There are two nodes directories for each on your filesystem (these are nodes\0 and nodes\1).
When you start Elasticsearch, you start up a node (in ES-lingo). Your machine can host multiple nodes, which happens if you start Elasticsearch multiple times. The default settings for the http port is 9200-9300, that means, ES is looking for a free port in that range and binds its node to it (the same is true for the transport module with 9300-9400)
So, if you start an ES process while another is still running, that is, it's bound to a port, you start a second node and ES will create a new directory for it. Maybe this has happened if you issued a restart, but ES couldn't shut down in time before the new node started up.
But now you have a third node in your cluster and ES will assign shards to it. Then you do a cluster restart or something similar and you start one node on each of your machine. ES cannot find the shards that were assigned to the third node, because it's not spun up, and it will show you a red or yellow state, depending on what shards live on the third node. If you delete you index data, you won't delete the data from this missing node.
If you don't care about the data, you can just shutdown ES and delete these directories or start two ES nodes on each of your machines and then delete the index again.
Then you could change the port settings to one specific port, that would prevent second processes from starting up, since they won't be able to bind to a free port.

Resources