Decommissioning multiple Hadoop DataNodes in parallel - hadoop

I'm replacing multiple machines in my Hadoop CDH 5.7 cluster.
I started by adding a few new machines and decommission same amount of existing datanodes.
I noticed that blocks are marked as under-replicated when decommissioning a node.
Does it mean I'm at risk when decommissioning multiple nodes?
Can I decommission all nodes in parallel?
Is there a better way of replacing all machines?
Thanks!

Its obvious that when a node is down(or removed) the data is under-replicated.
When you add a new node and rebalance this will automatically be fixed.
What's actually happening?
Lets say the replication factor on your cluster is 3. When a node is decommissioned, all the data stored on it is gone and the replication factor of that data is now 2 (and hence under replicated). Now when you add a new node and re-balance the missing copy is made again hence restoring the replication to the default.
Am I at risk?
Not if you are doing it one by one.
That is replace a node and re-balance cluster. Repeat. (I think this is the only way! )
If you just remove multiple nodes there is good chance of losing data as you may lose all replications of some data(which resided on those nodes).
Don't decommission multiple nodes at once!

Related

HDFS replication factor on single node cluster

Can I have more than one replica's for single node cluster? i have updated replication factor as 2 in hdfs-site.xml and restarted all nodes, but still only one block created for new files, help me to get clarity on this
No. You can't have more than one replication factor for a single node cluster. What makes you think that it is even possible?
Replication is the procedure to save your data so that you don't lose it in any worst conditions. If you're setting it to 2, that means you want your data to be copied on 2 nodes(machines) so that if one goes down you'll still have your data safe on another node.
Now, the default replication provided by Hadoop is 3. Which means there will 3 Replications(Copy) of data on 3 different nodes on different racks(That's another concept which is called Hadoop's Rack awareness)
So you won't be able to get more than one copy of your data on a Single node cluster. I hope it clears your query!

what Hadoop will do after one of datanodes down

I have 10 data noes and 2 name nodes Hadoop cluster with replicates configured 3, I was wondering if one of data nodes goes down, will hadoop try to generate the lost replicates on the other alive nodes? or just do nothing(since still have 2 replicas left).
Add, what if the down data node come back after a while, can hadoop recognize the data on that node? Thanks!
will hadoop try to generate the lost replicates on the other alive nodes? or just do nothing(since still have 2 replicas left).
Yes, Hadoop will recognize it and make copies of that data on some other nodes. When Namenode stop receiving heart beats from the data nodes, it assumes that data node is lost. To keep the replication of the all the data to defined replication factor, it will make the copies on other data nodes.
Add, what if the down data node come back after a while, can hadoop recognize the data on that node?
Yes, when a data node comes back with all its data, Name node will remove/delete the extra copies of data. In the next heart beat to the data node, Name node will send the instruction to remove the extra data and free up the space on disk.
Snippet from Apache HDFS documentation:
Each DataNode sends a Heartbeat message to the NameNode periodically. A network partition can cause a subset of DataNodes to lose connectivity with the NameNode. The NameNode detects this condition by the absence of a Heartbeat message. The NameNode marks DataNodes without recent Heartbeats as dead and does not forward any new IO requests to them. Any data that was registered to a dead DataNode is not available to HDFS any more. DataNode death may cause the replication factor of some blocks to fall below their specified value. The NameNode constantly tracks which blocks need to be replicated and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a DataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail, or the replication factor of a file may be increased.

What are the possible reasons behind the imbalance of files stored on HDFS?

Sometimes, the data blocks are stored in imbalanced way across the data node. Based on HDFS block placement policy, the first replica is favored to be stored on the writer node (i.e. the client node), then the second replica is stored on a remote rack and the third one is stored on a local rack. What are the use cases that make the data blocks unbalanced across the data nodes under this placement policy? one possible reason in mind that if the writer nodes are few, then one replica of the data blocks will be stored on these nodes. Are there any other reasons ?
Here are some potential reasons for data skew:
If some of the DataNodes are unavailable for some time (not accepting requests/writes), the cluster can end up unbalanced.
TaskTrackers are not collocated with DataNodes evenly across cluster nodes. If we write data through MapReduce in this situation, the cluster can be unbalanced because the nodes hosting both a TaskTracker and a DataNode would be preferred.
Same as above, but with the RegionServers of HBase.
Large deletion of data can result in an unbalanced cluster depending on the location of the deleted blocks.
Adding new DataNodes will not automatically rebalance existing blocks across the cluster.
The "hdfs balancer" command allows admins to rebalance the cluster. Also, https://issues.apache.org/jira/browse/HDFS-1804 added a new block storage policy that takes into account free space left on the volume.

what if I decommission one of datanode in the cluster which only has two datanodes?

I setup a hdfs cluster, which has one master(namenode) and two slaves(datanode)
and the dfs.replication is set to be "2"
so every block will be replicated in the two slaves, and the files in the slaves are all the same.
my question is, if I want to decommission one of the two slaves, it always shows "Decommission In Progress", but there is no files being copied(by use sar to moniter the network)
So I think if the cluster only have two datanodes, and the replication is set to be "2", I can not decommission any datanode, because if I decommission any of the node, there will be only one node left, so the file can not be replicated 2.
Do you think so?
I believe with replication factor of 2 in a cluster, if you decommissioned one data node then hadoop will identify as a crash of one data node and will continue working with data node. However in future if you ever put that node back in cluster hadoop will start replicating files to that node.
So you can have replication factor of 2 with only one node in cluster, it will not hamper working of hadoop in any way.

HDFS replication factor

When I'm uploading a file to HDFS, if I set the replication factor to 1 then the file splits gonna reside on one single machine or the splits would be distributed to multiple machines across the network ?
hadoop fs -D dfs.replication=1 -copyFromLocal file.txt /user/ablimit
According to the Hadoop : Definitive Guide
Hadoop’s default strategy is to place the first replica on the same node as the client (for
clients running outside the cluster, a node is chosen at random, although the system
tries not to pick nodes that are too full or too busy). The second replica is placed on a
different rack from the first (off-rack), chosen at random. The third replica is placed on
the same rack as the second, but on a different node chosen at random. Further replicas
are placed on random nodes on the cluster, although the system tries to avoid placing
too many replicas on the same rack.
This logic makes sense as it decreases the network chatter between the different nodes. But, the book was published in 2009 and there had been a lot of changes in the Hadoop framework.
I think it depends on, whether the client is same as a Hadoop node or not. If the client is a Hadoop node then all the splits will be on the same node. This doesn't provide any better read/write throughput in-spite of having multiple nodes in the cluster. If the client is not same as the Hadoop node, then the node is chosen at random for each split, so the splits are spread across the nodes in a cluster. Now, this provides a better read/write throughput.
One advantage of writing to multiple nodes is that even if one of the node goes down, a couple of splits might be down, but at least some data can be recovered somehow from the remaining splits.
If you set replication to be 1, then the file will be present only on the client node, that is the node from where you are uploading the file.
If your cluster is single node then when you upload a file it will be spilled according to the blocksize and it remains in single machine.
If your cluster is Multi node then when you upload a file it will be spilled according to the blocksize and it will be distributed to different datanode in your cluster via pipeline and NameNode will decide where the data should be moved in the cluster.
HDFS replication factor is used to make a copy of the data (i.e) if your replicator factor is 2 then all the data which you upload to HDFS will have a copy.
If you set replication factor is 1 it means that the single node cluster. It has only one client node http://commandstech.com/replication-factor-in-hadoop/. Where you can upload files then use in a single node or client node.

Resources