I am running a three node Elasticsearch (ELK) cluster. All nodes have all and the same roles, e.g. data, master, etc. The disk on node 3 where the data folder is assigned became corrupt and that data is probably unrecoverable. The other nodes are running normally and one of them assumed the master role instead.
Will the cluster work normally if I replace the disk and make the empty directory available to elastic again, or am I risking crashing the whole cluster?
EDIT: As this is not explicitly mentioned in the answer, yes, if you add your node with an empty data folder, the cluster will continue normally as if you added a new node to the cluster, but you have to deal with the missing data. In my case, I lost the data as I do not have replicas.
Let me try to explain that in simple way.
Your data got corrupt at node-3 so if you add that that node again, it will not have the older data, i.e. the shards stored in node-3 will remain unavailable for the cluster.
Did you have the replica shards configured for the indexes?
What is the current status(yellow/red) of the cluster when you have
node-3 removed?
If a primary shard isn't available then the master-node promotes one of the active replicas to become the new primary. If there are currently no active replicas then status of the cluster will remain red.
Related
Druid cluster shows unavailable for certain segments of data of data source after data ingestion.
Ex: 72.4% available (2352 segments, 647 segments unavailable)
We have a clustered deployment 3 nodes :
master node (coordinator amd overlord)
Data node (historical and middlemanager)
Query node (broker and router)
Any specific reason why it is happening so.
The issue is resolved after clean restart of master and data nodes. However just restarting nodes without cleaning data did not work
Our Elasticsearch cluster has two data directories. We recently restarted all the nodes in the cluster. After the successful restart process, we observed increased disk space usage on few nodes. When we examined the folders inside the data directory, we found that there are orphaned shards.
For example, an orphaned shard "15" exists at location data_dir0/cluster_name/nodes/0/indices/index_name/15, while one of the replicas of the same shard "15" exists on the same node inside other data directory, here at data_dir1/cluster_name/nodes/0/indices/index_name/15. This shard "15" from data_dir1 is also included in cluster metadata and thus, we assume that shard "15" from data_dir0 is an orphaned shard and has to be deleted by Elasticsearch. But Elasticsearch hasn't deleted the orphaned shard yet, even after 6 days since last restart.
We found this topic https://discuss.elastic.co/t/old-shards-on-re-joining-nodes-useful/182661 relating to our issue but it did not help us as in ES did not take care of that orphaned shard. We also raised the question on Elastic forum but we are not getting quick replies. So, I am asking it here as stack overflow has larger community.
This also happened to our cluster, we run elastic 6.1.3. One specific node had 88% of it disk used, it seems there were some shard leftovers from previous relocation on our production index.
To fix this I stopped Elasticsearch on the node (make sure you have plenty of disk-space on your other datanodes), let the relocation of elastic do its work. Once it is done and re-balanced, delete the index folder and start Elasticsearch again, this went quite painless.
What version of Elasticsearch are you running?
Is your cluster green? If so, those shard files should be deleted by Elasticsearch during initialization. But if that shard has unallocated replicas at the time the node rejoined the cluster, Elasticsearch won't remove pre-existing shard files on disk.
You can manually delete the directory if you don't need the shard. Or you can try restarting Elasticsearch on the node and let it delete the files for you.
We also got help from Elastic forum here https://discuss.elastic.co/t/old-shards-not-deleted-upon-relocation/71161/6
Restarting the node did not help and we do not want to manually delete the folders. So, we are going to replace the affected nodes one by one.
#chani It would be great if you can provide any official link to the manual delete suggestion.
When you have a one node cluster and you create a table with 32 shards, and then you add, say, 7 more nodes to the cluster, will those shards automatically migrate to the rest of the cluster so I have 4 shards per node ?
Is manual intervention required for this ?
How about the replicas created on one node ? Do those migrate to other nodes as well ?
Nothing will be automatically redistributed. In current versions of RethinkDB changing the number/distribution of replicas or changing shard boundaries will cause a loss of availability, so you have to explicitly ask for it happen (either in the web UI or with the command line administration tool).
Below I have 2 nodes in two servers. 2 indexes each. The indexes are distributed in 2 shards and 1 replica set.
"Thor" node had a downtime so I "Iron_man" took over. That's fine.
As you can see events_v1 is an index created before the downtime and venue_v1 was created after the downtime. Shouldn't "Thor" after being back alive take over one shard automatically in the same way as it handles the newly created venue index?
If yes how should I configure the settings?
You need not to configure anything for above scenario.Because Elasticsearch's default behavior is same as you requested.if you create replica for a shard in same node is not allocated to query.If you add new node means your replica shard will be allocated in newly created node.
for more information watch this video
We have an ES cluster with 2 nodes. When we delete an index not all folders in the cluster (on filesystem) are deleted which causes some problems when restarting one server.
Then our deleted indices gets distributed with some weird state indicating that the cluster health is not green.
Example. We delete index with name someIndex and after deletion we check file system, one can see this:
Node1
ElasticSearch\data\clustername\nodes\0\indices\
ElasticSearch\data\clustername\nodes\1\indices\
Node2
ElasticSearch\data\clustername\nodes\0\indices\
ElasticSearch\data\clustername\nodes\1\indices\someIndex (<-- still present)
Anyone know whats causing this?
ES-version: 0.90.5
There are two nodes directories for each on your filesystem (these are nodes\0 and nodes\1).
When you start Elasticsearch, you start up a node (in ES-lingo). Your machine can host multiple nodes, which happens if you start Elasticsearch multiple times. The default settings for the http port is 9200-9300, that means, ES is looking for a free port in that range and binds its node to it (the same is true for the transport module with 9300-9400)
So, if you start an ES process while another is still running, that is, it's bound to a port, you start a second node and ES will create a new directory for it. Maybe this has happened if you issued a restart, but ES couldn't shut down in time before the new node started up.
But now you have a third node in your cluster and ES will assign shards to it. Then you do a cluster restart or something similar and you start one node on each of your machine. ES cannot find the shards that were assigned to the third node, because it's not spun up, and it will show you a red or yellow state, depending on what shards live on the third node. If you delete you index data, you won't delete the data from this missing node.
If you don't care about the data, you can just shutdown ES and delete these directories or start two ES nodes on each of your machines and then delete the index again.
Then you could change the port settings to one specific port, that would prevent second processes from starting up, since they won't be able to bind to a free port.