We have 3 settings for hadoop replication namely:
dfs.replication.max = 10
dfs.replication.min = 1
dfs.replication = 2
So dfs.replication is default replication of files in hadoop cluster until a hadoop client is setting it manually using "setrep".
and a hadoop client can set max replication up to dfs.replication.mx.
dfs.relication.min is used in two cases:
During safe mode, it checks whether replication of blocks is upto dfs.replication.min or not.
dfs.replication.min are processed synchronously. and remaining dfs.replication-dfs.replication.min are processed asynchronously.
So we have to set these configuration on each node (namenode+datanode) or only on client node?
What if setting for above three settings vary on different datanodes?
Replication factor can’t be set for any specific node in cluster, you can set it for entire cluster/directory/file. dfs.replication can be updated in running cluster in hdfs-sie.xml.
Set the replication factor for a file- hadoop dfs -setrep -w <rep-number> file-path
Or set it recursively for directory or for entire cluster- hadoop fs -setrep -R -w 1 /
Use of min and max rep factor-
While writing the data to datanode it is possible that many datanodes may fail. If the dfs.namenode.replication.min replicas written then the write operation succeed. Post to write operation the blocks replicated asynchronously until it reaches to dfs.replication level.
The max replication factor dfs.replication.max is used to set the replication limit of blocks. A user can’t set the blocks replication more than the limit while creating the file.
You can set the high replication factor for blocks of popular file to distribute the read load on the cluster.
Related
We are using Hortonworks HDP 2.1 (HDFS 2.4), with replication factor 3.
We have recently decommissioned a datanode and that left a lot of under replicated blocks in the cluster.
Cluster is now trying to satisfy the replication factor by distributing under replicated blocks among other nodes.
How do I stop that process. I am OK with some files being replicated only twice. If I change the replication factor to 2 in that directory, will that process be terminated?
What's the impact of making the replication factor to 2 for a directory which has files with 3 copies. Will the cluster start another process to remove the excess copy for each file with 3 copies?
Appreciate your help on this. Kindly share the references too.
Thanks.
Sajeeva.
We have recently decommissioned a datanode and that left a lot of under replicated blocks in the cluster.
If the DataNode was gracefully decommissioned, then it should not have resulted in under-replicated blocks. As an edge case though, if decommissioning a node brings the total node count under the replication factor set on a file, then by definition that file's blocks will be under-replicated. (For example, consider an HDFS cluster with 3 DataNodes. Decommissioning a node results in 2 DataNodes remaining, so now files with a replication factor of 3 have under-replicated blocks.)
During decommissioning, HDFS re-replicates (copies) the blocks hosted on that DataNode over to other DataNodes in the cluster, so that the desired replication factor is maintained. More details on this are here:
How do I correctly remove nodes in Hadoop?
Decommission DataNodes
How do I stop that process. I am OK with some files being replicated only twice. If I change the replication factor to 2 in that directory, will that process be terminated?
There is no deterministic way to terminate this process as a whole. However, if you lower replication factor to 2 on some of the under-replicated files, then the NameNode will stop scheduling re-replication work for the blocks of those files. This means that for the blocks of those files, HDFS will stop copying new replicas across different DataNodes.
The typical replication factor of 3 is desirable from a fault tolerance perspective. You might consider setting replication factor on those files back to 3 later.
What's the impact of making the replication factor to 2 for a directory which has files with 3 copies. Will the cluster start another process to remove the excess copy for each file with 3 copies?
Yes, the NameNode will flag these files as over-replicated. In response, it will schedule block deletions at DataNodes to restore the desired replication factor of 2. These block deletions are dispatched to the DataNodes asynchronously, in response to their heartbeats. Within the DataNode, the block deletion executes asynchronously to clean the underlying files from the disk.
More details on this are described in the Apache Hadoop Wiki.
hdfs-site.xml:
dfs.replication value configured 3
Assuming that i set replication of an specific file to 2:
./bin/hadoop dfs -setrep -w 2 /path/to/file.txt
When NameNode receives heartbeat from DataNode,
Will NameNode consider as specified file
path/to/file.txt is in under replication as per the configured replication or not?
If not, how it 'll be?
First, I would like to attempt to restate your question for clarity, to make sure I understand:
Will the NameNode consider a file that has been manually set to a replication factor lower than the default (dfs.replication) to be under-replicated?
No. The NameNode stores the replication factor of each file separately in its metadata, even if the replication factor was not set explicitly by calling -setrep. By default, the metadata for each file will copy the replication factor as specified in dfs.replication (3 in your example). It may be overridden, such as by calling -setrep. When the NameNode checks if a file is under-replicated, it checks the exact replication factor stored in the metadata for that individual file, not dfs.replication. If the file's replication factor is 2, and there are 2 replicas of each of its blocks, then this is fine, and the NameNode will not consider it to be under-replicated.
Your question also makes mention of heartbeating from the DataNodes, which I think means you're interested in how interactions between the DataNodes and NameNodes relate to replication. There is also another form of communication between DataNodes and NameNodes called block reports. The block reports are the means by which DataNodes tell the NameNodes which block replicas they store. The NameNode analyzes block reports from all DataNodes to determine if a block is either under-replicated or over-replicated. If a block is under-replicated (e.g. replication factor is 2, but there is only one replica), then the NameNode schedules re-replication work so that another DataNode makes a copy of the replica. If a block is over-replicated (e.g. replication factor is 3, but there are 4 replicas), then the NameNode schedules one of the replicas to be deleted, and eventually one of the DataNodes will delete it locally.
I tried changing the replication to 3 and I can see the replication is changed to 3 for the file I loaded into hdfs,but I cannot see the other 2 copies.Could someone answer what happens in this scenario.
You won't see any replica seen you don't have other node to create them. A replica can't be created in the same node. But in your NameNode you will see Number of Under-Replicated Blocks metric different to zero. If you attach a new data node in your cluster further, the under-replicated blocks should start the replication in automatic (obviously that imply to configure a full cluster instead the pseudo cluster).
You can see the Number of Under-Replicated Blocks metric in the Name node web ui: http://localhost:50070/dfshealth.html#tab-overview (By default in a pseudo cluster configuration).
It is recommended to set the dfs.replication to "1", otherwise when running a single datanode or psuedodistributed mode, HDFS can't replicate blocks to the specified number of datanodes and it will warn about blocks being under-replicated
I was attending a course on Hadoop and MapReduce on Udacity.com and the instructor mentioned that In HDFS to reduce the point of failures each block is replicated 3 times in Database. Is it true for real?? Does it mean that If I have 1 petabytes of Logs I will need 3 Petabytes of Storage?? Beacuse that will cost me more
Yes, is true, HDFS requires space for each redundant copy and requires copies to achieve failure tolerance and data locality during processing.
But this is not necessarily true about MapReduce, which can run on other file systems like S3 or Azure blobs for instance. It is HDFS that requires the 3 copies.
By default, HDFS conf parameter dfs.replication is set with value 3. That allow fault tolerance, disponibility, etc... (All parameters of HDFS here)
But in install time, you could set the parameter in 1, and HDFS don't make replicas of your data. With dfs.replication=1, 1 petabyte is storaged in the same space amount.
Yes that's true. So say if you have say 4 machines with datanodes running on them, then by default replication will happen in other two machines at random as well. If you don't want that, you can switch it to 1 by setting dfs.replication property in hdfs-site.xml
This is because HDFS replicates data when you store it. The default replication factor for hdfs is 3, which you can find in hdfs-site.xml file under dfs.replication property. You can set this value to 1 or 5 as per your requirement.
Data replication is much useful as if some node particularly goes down, you will have the copy of data available on other node/nodes for processing.
For instance, if a Hadoop cluster consisted of 2 DataNodes and the HDFS replication factor is set at the default of 3, what is the default behavior for how the files are replicated?
From what I've read, it seems that HDFS bases it on rack awareness, but for cases like this, does anyone know how it is determined?
It will consider the blocks as under-replicated and it will keep complaining about that and it will permanently try to bring them to the expected replication factor.
The HDFS system has a parameter (replication factor - by default 3) which tells the namenode how replicated each block should be (in the default case, each block should be replicated 3 times all over the cluster, according to the given replica placement strategy). Until the system manages to replicate each block as many times as specified by the replication factor, it will keep trying to do that.