How often are blocks on HDFS replicated? - hadoop

I have a question regarding hadoop hdfs blocks replication. Suppose a block is written on a datanode and the DFS has a replication factor 3, how long does it take for the namenode to replicate this block on other datanodes? Is it instantaneuos? If not, right after writing the block to a datanode suppose the disk on this datanode fails which cannot be recovered, does it mean the block is lost forever? And also how often does the namenode check for missing/corrupt blocks?

You may want to review this article which has a good description of hdfs writes. it should be immediate depending upon how busy the cluster is:
https://data-flair.training/blogs/hdfs-data-write-operation/
What happens if DataNode fails while writing a file in the HDFS?
While writing data to the DataNode, if DataNode fails, then the following actions take place, which is transparent to the client writing the data.
The pipeline gets closed, packets in the ack queue are then added to the front of the data queue making DataNodes downstream from the failed node to not miss any packet.

Related

in cloudera manager, how to migrate deleted datanode data

I have been excluded datanode host "dn001" by "dfs_hosts_exclude.txt", and it works, how to also migrate datanode data from this "dn001" to other datanodes?
You shouldn't have to do anything. Hadoop's HDFS should re-replicate any data lost on your data node.
From HDFS Architecture - Data Disk Failure, Heartbeats and Re-Replication
Each DataNode sends a Heartbeat message to the NameNode periodically. A network partition can cause a subset of DataNodes to lose connectivity with the NameNode. The NameNode detects this condition by the absence of a Heartbeat message. The NameNode marks DataNodes without recent Heartbeats as dead and does not forward any new IO requests to them. Any data that was registered to a dead DataNode is not available to HDFS any more. DataNode death may cause the replication factor of some blocks to fall below their specified value. The NameNode constantly tracks which blocks need to be replicated and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a DataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail, or the replication factor of a file may be increased.

How NameNode recognizes that the specific file replication is set value, than configured replication 3?

hdfs-site.xml:
dfs.replication value configured 3
Assuming that i set replication of an specific file to 2:
./bin/hadoop dfs -setrep -w 2 /path/to/file.txt
When NameNode receives heartbeat from DataNode,
Will NameNode consider as specified file
path/to/file.txt is in under replication as per the configured replication or not?
If not, how it 'll be?
First, I would like to attempt to restate your question for clarity, to make sure I understand:
Will the NameNode consider a file that has been manually set to a replication factor lower than the default (dfs.replication) to be under-replicated?
No. The NameNode stores the replication factor of each file separately in its metadata, even if the replication factor was not set explicitly by calling -setrep. By default, the metadata for each file will copy the replication factor as specified in dfs.replication (3 in your example). It may be overridden, such as by calling -setrep. When the NameNode checks if a file is under-replicated, it checks the exact replication factor stored in the metadata for that individual file, not dfs.replication. If the file's replication factor is 2, and there are 2 replicas of each of its blocks, then this is fine, and the NameNode will not consider it to be under-replicated.
Your question also makes mention of heartbeating from the DataNodes, which I think means you're interested in how interactions between the DataNodes and NameNodes relate to replication. There is also another form of communication between DataNodes and NameNodes called block reports. The block reports are the means by which DataNodes tell the NameNodes which block replicas they store. The NameNode analyzes block reports from all DataNodes to determine if a block is either under-replicated or over-replicated. If a block is under-replicated (e.g. replication factor is 2, but there is only one replica), then the NameNode schedules re-replication work so that another DataNode makes a copy of the replica. If a block is over-replicated (e.g. replication factor is 3, but there are 4 replicas), then the NameNode schedules one of the replicas to be deleted, and eventually one of the DataNodes will delete it locally.

what Hadoop will do after one of datanodes down

I have 10 data noes and 2 name nodes Hadoop cluster with replicates configured 3, I was wondering if one of data nodes goes down, will hadoop try to generate the lost replicates on the other alive nodes? or just do nothing(since still have 2 replicas left).
Add, what if the down data node come back after a while, can hadoop recognize the data on that node? Thanks!
will hadoop try to generate the lost replicates on the other alive nodes? or just do nothing(since still have 2 replicas left).
Yes, Hadoop will recognize it and make copies of that data on some other nodes. When Namenode stop receiving heart beats from the data nodes, it assumes that data node is lost. To keep the replication of the all the data to defined replication factor, it will make the copies on other data nodes.
Add, what if the down data node come back after a while, can hadoop recognize the data on that node?
Yes, when a data node comes back with all its data, Name node will remove/delete the extra copies of data. In the next heart beat to the data node, Name node will send the instruction to remove the extra data and free up the space on disk.
Snippet from Apache HDFS documentation:
Each DataNode sends a Heartbeat message to the NameNode periodically. A network partition can cause a subset of DataNodes to lose connectivity with the NameNode. The NameNode detects this condition by the absence of a Heartbeat message. The NameNode marks DataNodes without recent Heartbeats as dead and does not forward any new IO requests to them. Any data that was registered to a dead DataNode is not available to HDFS any more. DataNode death may cause the replication factor of some blocks to fall below their specified value. The NameNode constantly tracks which blocks need to be replicated and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a DataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail, or the replication factor of a file may be increased.

Hadoop: HDFS File Writes & Reads

I have a basic question regarding file writes and reads in HDFS.
For example, if I am writing a file, using the default configurations, Hadoop internally has to write each block to 3 data nodes. My understanding is that for each block, first the client writes the block to the first data node in the pipeline which will then inform the second and so on. Once the third data node successfully receives the block, it provides an acknowledgement back to data node 2 and finally to the client through Data node 1. Only after receiving the acknowledgement for the block, the write is considered successful and the client proceeds to write the next block.
If this is the case, then isn't the time taken to write each block is more than a traditional file write due to -
the replication factor (default is 3) and
the write process is happening sequentially block after block.
Please correct me if I am wrong in my understanding. Also, the following questions below:
My understanding is that File read / write in Hadoop doesn't have any parallelism and the best it can perform is same to a traditional file read or write (i.e. if the replication is set to 1) + some overhead involved in the distributed communication mechanism.
Parallelism is provided only during the data processing phase via Map Reduce, but not during file read / write by a client.
Though your above explanation of a file write is correct, a DataNode can read and write data simultaneously. From HDFS Architecture Guide:
a DataNode can be receiving data from the previous one in the pipeline
and at the same time forwarding data to the next one in the pipeline
A write operation takes more time than on a traditional file system (due to bandwidth issues and general overhead) but not as much as 3x (assuming a replication factor of 3).
I think your understanding is correct.
One might expect that a simple HDFS client writes some data and when at least one block replica has been written, it takes back the control, while asynchronously HDFS generates the other replicas.
But in Hadoop, HDFS is designed around the pattern "write once, read many times" so the focus wasn't on write performance.
On the other side you can find parallelism in Hadoop MapReduce (which can be seen also an HDFS client) designed explicity to do so.
HDFS Write Operation:
There are two parameters
dfs.replication : Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time
dfs.namenode.replication.min : Minimal block replication.
Even though dfs.replication set as 3, write operation will be considered as successful once dfs.namenode.replication.min (default value : 1) has been replicated.
But this replication up to dfs.replication will happen in sequential pipeline. First Datanode writes the block and forward it to second Datanode. Second Datanode writes the block and forward it to third Datanode.
DFSOutputStream also maintains an internal queue of packets that are waiting to be acknowledged by datanodes, called the ack queue. A packet is removed from the ack queue only when it has been acknowledged by all the Datanodes in the pipeline.
Have a look at related SE question: Hadoop 2.0 data write operation acknowledgement
HDFS Read Operation:
HDFS read operations happen in parallel instead of sequential like write operations

HDFS behavior when a file got corrupted

I find Sample Questions on cloudera exam I believe the answer is D. Agree ??
Question 1
You use the hadoop fs -put command to add sales.txt to HDFS. This file is small enough that it fits into a single block, which is replicated to three nodes within your cluster. When and how will the cluster handle replication following the failure of one of these nodes?
A. The cluster will make no attempt to re-replicate this block.
B. This block will be immediately re-replicated and all other HDFS operations on the cluster will halt while this is in progress.
C. The block will remain under-replicated until the administrator manually deletes and recreates the file.
D. The file will be re-replicated automatically after the NameNode determines it is under-replicated based on the block reports it receives from the DataNodes.
Yes. It's D. When Namenode determines the datanode is no longer active it'll make one of the datanodes that has the given block to replicate to another node.

Resources