requirement of 3 journal nodes in HA hadoop setup - hadoop

I am quite new to hadoop. As i am setting up a hadoop namenode ha using qoroum journal manager, i am a bit confused on the requirements. The official documentations on apache site says
Note: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs.
what does this means? why do we need 3 journal-nodes instead of two?

As in hadoop1 we can have only one Namenode per cluster if somehow this namenode become unavailable whole cluster will become unavailable thus making it single point of failure.
To resolve this issue the obvious solution was to add more than one Namenode per cluster.
In haoop2 we can have two Namenode per cluster. At a time only one Namenode would be active and other would be in standby mode. To Make system HA both Namenode should be synchronised. To do so they introduced a concept journal nodes.
The purpose of this light weight demon is to sync every change in active Namenode to standby Namenodes.
Now what if this journal node would fail? .This would again became the same issue.journal node will become the Single point of failure. To avoid that they introduced a quorum concept like it was introduced in Zookeeper.
what Quorum means?
Quorum :- The literal meaning of quorum is 'minimum number of assembly/society member that must be present to make a meeting valid'.
On similar notes there must always be more than half of the total journal nodes to be healthy to keep everything running. e.g if you have 2 journal nodes in the system you would have to have to keep 'more than half' i.e more than 1 which is 2 Journal nodes healthy to keep everything running. which means you can't take any journal node failures in this case. To avoid this you must have odd number of journal nodes (i.e 3,5,7). But minimum 3 so that we can bear journal node failures.
I hope this helped

Related

How much NameNode can be there in a single hadoop cluster?

Hadoop cluster is a collection of racks. Do each rack contains one NameNode or only one NameNode is present for the entire cluster?
It depends on the configuration of racks as well as Name Node too. you can have 1 Name Node for entire cluster. If u are serious about the performance, then you can configure another Name Node for other set of racks. But 1 Name Node per Rack is not advisable. In Hadoop 1.x you can have only one name node(Only one Namespace) but in Hadoop 2.x we can have namespace federation where we can have multiple name nodes usually serving for particular metadata only.
In a typical Hadoop deployment, you would not have one NameNode per rack. Many smaller-scale deployments use one NameNode, with an optional Standby NameNode for automatic failover.
However, you can have more than one NameNode. Version 0.23 of Hadoop introduced federated NameNodes to allow for horizontal scaling. But, like I said, in many of the common use cases, you would have one NameNode per cluster (with optional Standby NameNode or Secondary NameNode).
See here for some more info.
One. You can have only a single name node in a cluster.
Detail -
In Yarn / Hadoop 2.0 they have come with a concept of active name node and standby name node. ( This is where most of the people get confused. They consider them to be 2 nodes in a cluster). But in this yarn architecture also there will be a single name node which will be receiving heartbeat and block report from data node. Which means there will be a single name node which will remain active.
While this stand by name node will receive a meta data file from active name node via journal node so that in case of name node failure it can take over.
Now in case if you are having a cluster of large number of nodes say 2000 node then in that case also you can have only one Active name node or you can have another approach of dividing your cluster in sub cluster now these sub cluster will also have one Active node per cluster but this will increase processing speed because now your name node to data nodes ratio is better
Conclusion - in any case you can have one node per cluster

what Hadoop will do after one of datanodes down

I have 10 data noes and 2 name nodes Hadoop cluster with replicates configured 3, I was wondering if one of data nodes goes down, will hadoop try to generate the lost replicates on the other alive nodes? or just do nothing(since still have 2 replicas left).
Add, what if the down data node come back after a while, can hadoop recognize the data on that node? Thanks!
will hadoop try to generate the lost replicates on the other alive nodes? or just do nothing(since still have 2 replicas left).
Yes, Hadoop will recognize it and make copies of that data on some other nodes. When Namenode stop receiving heart beats from the data nodes, it assumes that data node is lost. To keep the replication of the all the data to defined replication factor, it will make the copies on other data nodes.
Add, what if the down data node come back after a while, can hadoop recognize the data on that node?
Yes, when a data node comes back with all its data, Name node will remove/delete the extra copies of data. In the next heart beat to the data node, Name node will send the instruction to remove the extra data and free up the space on disk.
Snippet from Apache HDFS documentation:
Each DataNode sends a Heartbeat message to the NameNode periodically. A network partition can cause a subset of DataNodes to lose connectivity with the NameNode. The NameNode detects this condition by the absence of a Heartbeat message. The NameNode marks DataNodes without recent Heartbeats as dead and does not forward any new IO requests to them. Any data that was registered to a dead DataNode is not available to HDFS any more. DataNode death may cause the replication factor of some blocks to fall below their specified value. The NameNode constantly tracks which blocks need to be replicated and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a DataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail, or the replication factor of a file may be increased.

Remove a node of Hadoop which is NameNode too

I recently created a cluster with five servers :
master
node01
node02
node03
node04
To have more "workers" I added the Nademode to the list of slaves in /etc/hadoop/slaves.
This works, the master perfoms some mapReduce jobs.
Today I want to remove this node from the workers list (this is too much CPU intensive for it). I want to set dfs.exclude in my hdfs-site.xml but I worried about the fact this is also the master server.
COuld someone confirm me that there is no risks to perform this operation ?
Thanks,
Romain.
If there is data stored in the master node (as there probably is because it's a DataNode), you will essentially lose that data. But if your replication factor is more than 1 (3 is the default), then it doesn't matter as Hadoop will notice that some data is missing (under-replicated) and will start replicating it again on other DataNodes to reach the replication factor.
So, if your replication factor is more than 1 (and the cluster is otherwise healthy), you can just remove the master's data (and make it again just a NameNode) and Hadoop will take care of the rest.

Hadoop doesn't use one node for job

I've got a four node YARN cluster set up und running. I recently had to format the namenode due to a smaller problem.
Later I ran Hadoop's PI example to verify every node was still taking part in the calculation, which they all did. However when I start my own job now one of the nodes is not being used at all.
I figured this might be because this node doesn't have any data to work on. So I tried to balance the cluster using the balancer. This doesn't work and the balancer tells me the cluster is balanced.
What am I missing?
While processing, your ApplicationMaster would negoriate with the NodeManager for containers and NodeManager in turn would try to obtain the nearest datanode resource. Since your replication factor is 3, HDFS would try to place 1 whole copy on a single datanode and distribute the rest across all the datanodes.
1) Change the replication factor to 1 (Since you are only trying to benchmark, reducing replication should not be a big issue).
2) Make sure your client(machine from where you would give your -copyFromLocal command) does not have a datanode running on it. If not, HDFS will tend to place most of the data in this node since it would have reduced latency.
3) Control the file distribution using dfs.blocksize property.
4) Check the status of your datanodes using hdfs dfsadmin -report.
Make sure your node is joinig the resourcemanager. Look into nodemanager log on t the problem node, see if there are errors. Look into the resourcemanager Web UI (:8088 by default) make sure the node is listed there.
Make sure the node is bringing enough resources to the pool to be able to run a job. Check yarn.nodemanager.resource.cpu-vcores and yarn.nodemanager.resource.memory-mb in yarn-site.xml on the node. The memory should be more than the minimum memory requested by a container (see yarn.scheduler.minimum-allocation-mb).

Hadoop namenode : Single point of failure

The Namenode in the Hadoop architecture is a single point of failure.
How do people who have large Hadoop clusters cope with this problem?.
Is there an industry-accepted solution that has worked well wherein a secondary Namenode takes over in case the primary one fails ?
Yahoo has certain recommendations for configuration settings at different cluster sizes to take NameNode failure into account. For example:
The single point of failure in a Hadoop cluster is the NameNode. While the loss of any other machine (intermittently or permanently) does not result in data loss, NameNode loss results in cluster unavailability. The permanent loss of NameNode data would render the cluster's HDFS inoperable.
Therefore, another step should be taken in this configuration to back up the NameNode metadata
Facebook uses a tweaked version of Hadoop for its data warehouses; it has some optimizations that focus on NameNode reliability. Additionally to the patches available on github, Facebook appears to use AvatarNode specifically for quickly switching between primary and secondary NameNodes. Dhruba Borthakur's blog contains several other entries offering further insights into the NameNode as a single point of failure.
Edit: Further info about Facebook's improvements to the NameNode.
High Availability of Namenode has been introduced with Hadoop 2.x release.
It can be achieved in two modes - With NFS and With QJM
But high availability with Quorum Journal Manager (QJM) is preferred option.
In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.
Have a look at below SE questions, which explains complete failover process.
Secondary NameNode usage and High availability in Hadoop 2.x
How does Hadoop Namenode failover process works?
Large Hadoop clusters have thousands of data nodes and one name node. The probability of failure goes up linearly with machine count (all else being equal). So if Hadoop didn't cope with data node failures it wouldn't scale. Since there's still only one name node the Single Point of Failure (SPOF) is there, but the probability of failure is still low.
That sad, Bkkbrad's answer about Facebook adding failover capability to the name node is right on.

Resources