I recently created a cluster with five servers :
To have more "workers" I added the Nademode to the list of slaves in /etc/hadoop/slaves.
This works, the master perfoms some mapReduce jobs.
Today I want to remove this node from the workers list (this is too much CPU intensive for it). I want to set dfs.exclude in my hdfs-site.xml but I worried about the fact this is also the master server.
If there is data stored in the master node (as there probably is because it's a DataNode), you will essentially lose that data. But if your replication factor is more than 1 (3 is the default), then it doesn't matter as Hadoop will notice that some data is missing (under-replicated) and will start replicating it again on other DataNodes to reach the replication factor.
So, if your replication factor is more than 1 (and the cluster is otherwise healthy), you can just remove the master's data (and make it again just a NameNode) and Hadoop will take care of the rest.


Decommissioning multiple Hadoop DataNodes in parallel

I'm replacing multiple machines in my Hadoop CDH 5.7 cluster.
I started by adding a few new machines and decommission same amount of existing datanodes.
I noticed that blocks are marked as under-replicated when decommissioning a node.
Does it mean I'm at risk when decommissioning multiple nodes?
Can I decommission all nodes in parallel?
Is there a better way of replacing all machines?
Its obvious that when a node is down(or removed) the data is under-replicated.
When you add a new node and rebalance this will automatically be fixed.
What's actually happening?
Lets say the replication factor on your cluster is 3. When a node is decommissioned, all the data stored on it is gone and the replication factor of that data is now 2 (and hence under replicated). Now when you add a new node and re-balance the missing copy is made again hence restoring the replication to the default.
Am I at risk?
Not if you are doing it one by one.
That is replace a node and re-balance cluster. Repeat. (I think this is the only way! )
If you just remove multiple nodes there is good chance of losing data as you may lose all replications of some data(which resided on those nodes).
Don't decommission multiple nodes at once!

what Hadoop will do after one of datanodes down

I have 10 data noes and 2 name nodes Hadoop cluster with replicates configured 3, I was wondering if one of data nodes goes down, will hadoop try to generate the lost replicates on the other alive nodes? or just do nothing(since still have 2 replicas left).
Add, what if the down data node come back after a while, can hadoop recognize the data on that node? Thanks!
will hadoop try to generate the lost replicates on the other alive nodes? or just do nothing(since still have 2 replicas left).
Yes, Hadoop will recognize it and make copies of that data on some other nodes. When Namenode stop receiving heart beats from the data nodes, it assumes that data node is lost. To keep the replication of the all the data to defined replication factor, it will make the copies on other data nodes.
Add, what if the down data node come back after a while, can hadoop recognize the data on that node?
Yes, when a data node comes back with all its data, Name node will remove/delete the extra copies of data. In the next heart beat to the data node, Name node will send the instruction to remove the extra data and free up the space on disk.
Snippet from Apache HDFS documentation:
Each DataNode sends a Heartbeat message to the NameNode periodically. A network partition can cause a subset of DataNodes to lose connectivity with the NameNode. The NameNode detects this condition by the absence of a Heartbeat message. The NameNode marks DataNodes without recent Heartbeats as dead and does not forward any new IO requests to them. Any data that was registered to a dead DataNode is not available to HDFS any more. DataNode death may cause the replication factor of some blocks to fall below their specified value. The NameNode constantly tracks which blocks need to be replicated and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a DataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail, or the replication factor of a file may be increased.

Hadoop doesn't use one node for job

I've got a four node YARN cluster set up und running. I recently had to format the namenode due to a smaller problem.
Later I ran Hadoop's PI example to verify every node was still taking part in the calculation, which they all did. However when I start my own job now one of the nodes is not being used at all.
I figured this might be because this node doesn't have any data to work on. So I tried to balance the cluster using the balancer. This doesn't work and the balancer tells me the cluster is balanced.
What am I missing?
While processing, your ApplicationMaster would negoriate with the NodeManager for containers and NodeManager in turn would try to obtain the nearest datanode resource. Since your replication factor is 3, HDFS would try to place 1 whole copy on a single datanode and distribute the rest across all the datanodes.
1) Change the replication factor to 1 (Since you are only trying to benchmark, reducing replication should not be a big issue).
2) Make sure your client(machine from where you would give your -copyFromLocal command) does not have a datanode running on it. If not, HDFS will tend to place most of the data in this node since it would have reduced latency.
3) Control the file distribution using dfs.blocksize property.
4) Check the status of your datanodes using hdfs dfsadmin -report.
Make sure your node is joinig the resourcemanager. Look into nodemanager log on t the problem node, see if there are errors. Look into the resourcemanager Web UI (:8088 by default) make sure the node is listed there.
Make sure the node is bringing enough resources to the pool to be able to run a job. Check yarn.nodemanager.resource.cpu-vcores and yarn.nodemanager.resource.memory-mb in yarn-site.xml on the node. The memory should be more than the minimum memory requested by a container (see yarn.scheduler.minimum-allocation-mb).

How to explicilty define datanodes to store a particular given file in HDFS?

I want to write a script or something like .xml file which explicitly defines the datanodes in Hadoop cluster to store a particular file blocks.
for example:
Suppose there are 4 slave nodes and 1 Master node (total 5 nodes in hadoop cluster ).
there are two files file01(size=120 MB) and file02(size=160 MB).Default block size =64MB
Now I want to store one of two blocks of file01 at slave node1 and other one at slave node2.
Similarly one of three blocks of file02 at slave node1, second one at slave node3 and third one at slave node4.
So,my question is how can I do this ?
actually there is one method :Make changes in conf/slaves file every time to store a file.
but I don't want to do this
So, there is another solution to do this ??
I hope I made my point clear.
There is no method to achieve what you are asking here - the Name Node will replicate blocks to data nodes based upon rack configuration, replication factor and node availability, so even if you do managed to get a block on two particular data nodes, if one of those nodes goes down, the name node will replicate the block to another node.
Your requirement is also assuming a replication factor of 1, which doesn't give you any data redundancy (which is a bad thing if you lose a data node).
Let the namenode manage block assignments and use the balancer periodically if you want to keep your cluster evenly distibuted
NameNode is an ultimate authority to decide on the block placement.
There is Jira about the requirements to make this algorithm pluggable:
but unfortunetely it is in the 0.21 version, which is not production (alhough working not bad at all).
I would suggest to plug you algorithm to 0.21 if you are on the research state and then wait for 0.23 to became production, or, to downgrade the code to 0.20 if you do need it now.

Hadoop namenode : Single point of failure

The Namenode in the Hadoop architecture is a single point of failure.
How do people who have large Hadoop clusters cope with this problem?.
Is there an industry-accepted solution that has worked well wherein a secondary Namenode takes over in case the primary one fails ?
Yahoo has certain recommendations for configuration settings at different cluster sizes to take NameNode failure into account. For example:
The single point of failure in a Hadoop cluster is the NameNode. While the loss of any other machine (intermittently or permanently) does not result in data loss, NameNode loss results in cluster unavailability. The permanent loss of NameNode data would render the cluster's HDFS inoperable.
Therefore, another step should be taken in this configuration to back up the NameNode metadata
Facebook uses a tweaked version of Hadoop for its data warehouses; it has some optimizations that focus on NameNode reliability. Additionally to the patches available on github, Facebook appears to use AvatarNode specifically for quickly switching between primary and secondary NameNodes. Dhruba Borthakur's blog contains several other entries offering further insights into the NameNode as a single point of failure.
Edit: Further info about Facebook's improvements to the NameNode.
High Availability of Namenode has been introduced with Hadoop 2.x release.
It can be achieved in two modes - With NFS and With QJM
But high availability with Quorum Journal Manager (QJM) is preferred option.
In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.
Have a look at below SE questions, which explains complete failover process.
Secondary NameNode usage and High availability in Hadoop 2.x
How does Hadoop Namenode failover process works?
Large Hadoop clusters have thousands of data nodes and one name node. The probability of failure goes up linearly with machine count (all else being equal). So if Hadoop didn't cope with data node failures it wouldn't scale. Since there's still only one name node the Single Point of Failure (SPOF) is there, but the probability of failure is still low.
That sad, Bkkbrad's answer about Facebook adding failover capability to the name node is right on.
