What is difference between Hadoop Namenode HA and HDFS federation - hadoop

I am a bit confused with Hadoop Namenode HA using QJM and HDFS federation. Both uses multiple namenode and both provides High Availability. I am not able to decide which architecture to used for Namenode High Availability since both looks exactly same except the QJM thing.
Please pardon me if this is not the type of question to be discussed here.

The main difference between HDFS High Availability and HDFS Federation would be that the namenodes in Federation aren't related to each other.
In HDFS federation, all the namenodes share a pool of metadata in which each namenode has it's own pool hence providing fault-tolerance i.e if one namenode in a federation fails, it doesn't affect the data of other namenodes.
So, Federation = Multiple namenodes and no correlation.
While in case of HDFS HA, there are two namenodes - Primary NN and Standby NN.
Primary NN works hard all the time, everytime while Standby NN just sits there and chills and updates it's metadata with respect to the Primary Namenode once in a while which makes them related.
When Primary NN gets tired of this usual sheet (i.e it fails), the Standby NameNode takes over with whatever most recent metadata it has.
As for a HA Architecture, you need to have atleast two sepearte machines configured as Namenode, out of which only one should run in Active State.
More details here: HDFS High Availability

Related

Namenode with high availability vs zookeeper based leader selection

I am reading 2 different things in Apache Hadoop documentation and cloudera's documentation.
Based on cloudera, we should set up namenode in high availability mode, i.e.: by defining primary and secondary namenode, but based on Hadoop documentation, this should automatically taken care by zookeeper and it should decide namenode among the available datanodes.
Can anyone explain the difference and which one to use?
by defining primary and secondary namenode
There is such a thing as a "secondary namenode", but it's actually a very different thing as it's not a standby and able to become active.
There's no "vs". Namenode HA needs Zookeeper
If you read more of the Cloudera documentation it doesn't fail to mention Zookeeper.
Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).
Cloudera doesn't package much extras, if any, on top of the core Hadoop functions.
Regarding your question...
this should automatically taken care by zookeeper
The failover is automatic if HDFS Zookeeper properties are (manually) configured, Zookeeper is running, and the Active Namenode goes down.
among the available datanodes
The operation has nothing to do with datanodes

differences between HDFS and ZooKeeper?

While reading ZooKeeper's documentation, it seems to me that HDFS relies on pretty much the same mechanisms of distribution/replication (broadly speeking) as ZooKeeper. I hear some echo from one to another, but I still can't distinguish things clearly and striclty.
I understand ZooKeeper is a Cluster Management / Sync tool, while HDFS is a Distributed File Management System, but could ZK be needed on an HDFS cluster for example?
Yes, the factor is distributed processing and high availability on a hadoop cluster with a zookeper's quorum
For ex. Hadoop Namenode fail over process.
Hadoop high availability is designed around Active Namenode & Standby Namenode for fail over process. At any point of time, you should not have two masters ( active Namenodes) at same time.
Zookeper resolves cluster address to an active namenode.

secondary name node functionality

Can someone explain what exactly the words in bold mean which are taken from text book? What does "state of the secondary namenode lags that of the primary " mean?
Secondary name node keeps a copy of the merged namespace image, which can be used in the event of the namenode failing. **However, the state
of the secondary namenode lags that of the primary, so in the event of total failure of the primary, data loss is almost certain.**The usual course of action in this case is to copy the namenode’s metadata files that are on NFS to the secondary and run it as the new primary.
Thanks in advance
Hadoop 1.x:
When we start ha hadoop cluster its creates a file system image which keeps the metadata information of your entire hadopp cluster. When a new entry comes into the hadoop cluster it goes to edits log. Secondary NameNode periodically reads and query the edits and retrieve the information and merge the information with fsimage. In case NameNode fails, hadoop administrator can start the hadoop cluster with the help of fsimage and edits.(during start NameNode reads the edits and fsimage so there wont be data loss)
Fsimage and edits log already keeps the updated information about file system in the form of metadata so in case of total failure of primary hadoop administrator can recover the cluster information with help of edits log and fsimage.
Hadoop 2.x:
In hadoop 1.x NameNode was a single point of failure. Failure of NameNode was downtime for your entire hadoop cluster. Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in periods of cluster downtime.To overcome this issue hadoop community added High Availability feature. During the setting up of hadoop cluster you can choose which type of cluster you want.
The HDFS NameNode High Availability feature enables you to run redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby.Both NameNode require the same type of hardware configuration.
In HA configuration one NameNode will be active and other will be in standby state.The ZKFailoverController (ZKFC) is a ZooKeeper client that monitors and manages the state of the NameNode. When active NameNode goes down, It makes standby as active NameNode, and primary NameNode will become standby when you start them. Please can get more on it on this website: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.8.0/bk_system-admin-guide/content/ch_hadoop-ha-5.html
In HA hadoop cluster Active NameNode reads and write metadata information in JournalNode(Quorum-based Storage only). JournalNode is a separate node in HA hadoop cluster used for reads and write edits log and fsimage.
Standby NameNodealways synchronized with active NameNode, both communicate with each other through Journal Node. When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. Standby NameNode constantly monitors edit logs at journal nodes and updates its namespace accordingly.In the event of failover, standby NameNode will ensure that its namespace is completely updated according to edit logs before it is changes to active state. When standby will be in active state it will start writing edits log into JournalNode.
Hadoop don't keep any data into NameNode, All data resides in datanode, In case of NameNode failure there wont be any loss of data.

Hadoop backup and recovery tool and guidance

I am new to hadoop need to learn details about backup and recovery. I have revised oracle backup and recovery will it help in hadoop?From where should I start
There are a few options for backup and recovery. As s.singh points out, data replication is not DR.
HDFS supports snapshotting. This can be used to prevent user errors, recover files, etc. That being said, this isn't DR in the event of a total failure of the Hadoop cluster. (http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html)
Your best bet is keeping off-site backups. This can be to another Hadoop cluster, S3, etc and can be performed using distcp. (http://hadoop.apache.org/docs/stable1/distcp2.html), (https://wiki.apache.org/hadoop/AmazonS3)
Here is a Slideshare by Cloudera discussing DR (http://www.slideshare.net/cloudera/hadoop-backup-and-disaster-recovery)
Hadoop is designed to work on the big cluster with 1000's of nodes. Data loss is possibly less. You can increase the replication factor to replicate the data into many nodes across the cluster.
Refer Data Replication
For Namenode log backup, Either you can use the secondary namenode or Hadoop High Availability
Secondary Namenode
Secondary namenode will take backup for the namnode logs. If namenode fails then you can recover the namenode logs (which holds the data block information) from the secondary namenode.
High Availability
High Availability is a new feature to run more than one namenode in the cluster. One namenode will be active and the other one will be in standby. Log saves in both namenode. If one namenode fails then the other one becomes active and it will handle the operation.
But also we need to consider for Backup and Disaster Recovery in most cases. Refer #brandon.bell answer.
You can use the HDFS sync application on DataTorrent for DR use cases to backup high volumes of data from one HDFS cluster to another.
https://www.datatorrent.com/apphub/hdfs-sync/
It uses Apache Apex as a processing engine.
Start with official documentation website : HdfsUserGuide
Have a look at below SE posts:
Hadoop 2.0 data write operation acknowledgement
Hadoop: HDFS File Writes & Reads
Hadoop 2.0 Name Node, Secondary Node and Checkpoint node for High Availability
How does Hadoop Namenode failover process works?
Documentation page regarding Recovery_Mode:
Typically, you will configure multiple metadata storage locations. Then, if one storage location is corrupt, you can read the metadata from one of the other storage locations.
However, what can you do if the only storage locations available are corrupt? In this case, there is a special NameNode startup mode called Recovery mode that may allow you to recover most of your data.
You can start the NameNode in recovery mode like so: namenode -recover

HBase HDFS zookeeper

Now I am learning about HBase. I set up my HBase Cluster and Hadoop Cluster like this:
server1: Namenode HMaster
server2: datanode1 RegionServer1 HQuorumPeer
Server3: datanode2 RegionServer2 HQuorumPeer
Server4: datanode3 RegionServer3 HQuorumPeer
I have several question about HBase cluster:
1: All RegionServers must be in the Hadoop Cluster so it can use HDFS to store
data, even though it will store data into local file system, right?
2: What does RegionServer do? Does the HMaster give the job to all RegionServeres
and let them running parallel, like tasktracker in datanode?
3: What does zookeeper do? Do I need to setup zookeeper in all RegionServers
nodes and the master node?
4: It is related to #3. I know HBase uses zookeeper to recovery once regionServer
is down. How does it specific work?
All RegionServers must be in the Hadoop Cluster so it can use HDFS to store
data, even though it will store data into local file system, right?
Yes. RegionServers are the daemons that are responsible for storing data in a HBase cluster. You store data in HBase tables which are spread over many regions on several RegionServers across the cluster. Although data goes into the RegionServers, it actually gets stored inside HDFS. But if you are on a standalone setup HDFS is not used. The data gets stored directly in the local FS. It is analogous to any DB and FS. Take MSQL and ext3 for example. And yes, all the HDFS data is stored on your disk in reality. You cannot see it directly though.
What does RegionServer do? Does the HMaster give the job to all RegionServeres
and let them running parallel, like tasktracker in datanode?
As specified in the comment above RegionServer is the daemon that actually stores data in a HBase cluster. I'm sorry I didn't quite get the second part of this question. what do you mean by like tasktracker in datanode? In a HBase cluster HMaster is the daemon which is responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes. Its job is monitoring and management. Regionservers don't run any job like TaskTrackers do. They just store data and are responsible for stuff like serving and managing regions.
What does zookeeper do? Do I need to setup zookeeper in all RegionServers
nodes and the master node?
Zookeeper is the guy who coordinates everything behind the curtains. It is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. A distributed HBase setup depends on a running ZooKeeper cluster. All participating nodes and clients need to be able to access the running ZooKeeper ensemble. HBase by default manages a ZooKeeper cluster. It gets started and stopped as part of the HBase start/stop process. But, you can also manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use. You don't have to have Zookeepers running on all the nodes. Just decide some number which suits your cluster. One thing to note here is that you should always use an odd number of Zookeepers.
It is related to #3. I know HBase uses zookeeper to recovery once regionServer
is down. How does it specific work?
Each RegionServer is connected to ZooKeeper, and the master watches these connections. ZooKeeper manages a heartbeat with a timeout. So, on a timeout, the HMaster declares the region server as dead, and starts the recovery process. Following things happen during the recovery process :
Identifying that a node is down : a node can cease to respond simply because it is overloaded or as well because it is dead.
Recovering the writes in progress : that’s reading the commit log and recovering the edits that were not flushed.
Reassigning the regions : the region server was previously handling a set of regions. This set must be reallocated to other region servers, depending on their respective workload.
The process is actually a bit more involved. You can find more on this here. I would also suggest you to go through the book HBase The Definitive Guide by Lars in order to get some grip on HBase.
HTH

Resources