Can someone explain what exactly the words in bold mean which are taken from text book? What does "state of the secondary namenode lags that of the primary " mean?
Secondary name node keeps a copy of the merged namespace image, which can be used in the event of the namenode failing. **However, the state
of the secondary namenode lags that of the primary, so in the event of total failure of the primary, data loss is almost certain.**The usual course of action in this case is to copy the namenode’s metadata files that are on NFS to the secondary and run it as the new primary.
Thanks in advance
Hadoop 1.x:
When we start ha hadoop cluster its creates a file system image which keeps the metadata information of your entire hadopp cluster. When a new entry comes into the hadoop cluster it goes to edits log. Secondary NameNode periodically reads and query the edits and retrieve the information and merge the information with fsimage. In case NameNode fails, hadoop administrator can start the hadoop cluster with the help of fsimage and edits.(during start NameNode reads the edits and fsimage so there wont be data loss)
Fsimage and edits log already keeps the updated information about file system in the form of metadata so in case of total failure of primary hadoop administrator can recover the cluster information with help of edits log and fsimage.
Hadoop 2.x:
In hadoop 1.x NameNode was a single point of failure. Failure of NameNode was downtime for your entire hadoop cluster. Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in periods of cluster downtime.To overcome this issue hadoop community added High Availability feature. During the setting up of hadoop cluster you can choose which type of cluster you want.
The HDFS NameNode High Availability feature enables you to run redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby.Both NameNode require the same type of hardware configuration.
In HA configuration one NameNode will be active and other will be in standby state.The ZKFailoverController (ZKFC) is a ZooKeeper client that monitors and manages the state of the NameNode. When active NameNode goes down, It makes standby as active NameNode, and primary NameNode will become standby when you start them. Please can get more on it on this website: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.8.0/bk_system-admin-guide/content/ch_hadoop-ha-5.html
In HA hadoop cluster Active NameNode reads and write metadata information in JournalNode(Quorum-based Storage only). JournalNode is a separate node in HA hadoop cluster used for reads and write edits log and fsimage.
Standby NameNodealways synchronized with active NameNode, both communicate with each other through Journal Node. When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. Standby NameNode constantly monitors edit logs at journal nodes and updates its namespace accordingly.In the event of failover, standby NameNode will ensure that its namespace is completely updated according to edit logs before it is changes to active state. When standby will be in active state it will start writing edits log into JournalNode.
Hadoop don't keep any data into NameNode, All data resides in datanode, In case of NameNode failure there wont be any loss of data.
Related
I am reading "Hadoop: The Definitive guide". This is how author explains fault tolerance before Hadoop 2.x
Without the namenode, the filesystem cannot be used. In fact, if the machine running
the namenode were obliterated, all the files on the filesystem would be lost since there
would be no way of knowing how to reconstruct the files from the blocks on the
datanodes. For this reason, it is important to make the namenode resilient to failure,
and Hadoop provides two mechanisms for this.
The first way is to back up the files that make up the persistent state of the filesystem
metadata. Hadoop can be configured so that the namenode writes its persistent state to
multiple filesystems. These writes are synchronous and atomic. The usual configuration
choice is to write to local disk as well as a remote NFS mount.
It is also possible to run a secondary namenode, which despite its name does not act as
a namenode. Its main role is to periodically merge the namespace image with the edit
log to prevent the edit log from becoming too large. The secondary namenode usually
runs on a separate physical machine because it requires plenty of CPU and as much
memory as the namenode to perform the merge. It keeps a copy of the merged name‐
space image, which can be used in the event of the namenode failing. However, the state
of the secondary namenode lags that of the primary, so in the event of total failure of
the primary, data loss is almost certain. The usual course of action in this case is to copy
the namenode’s metadata files that are on NFS to the secondary and run it as the new
primary
My understanding is NFS is always synced with primary namenode. My question is how does the metadata stored in NFS gets synced with primary namenode after secondary namenode has updated the metadata of primary namenode? What happens if primary fails totally before NFS gets synced?
That document doesn't say the "primary" or Secondary NameNode is necessarily in sync with NFS, it's saying in the event you have configured Namenode backups to NFS (something you must do yourself, I believe, as it says this is a "configuration choice"), you can restore them to a new server and designate it as the new Namenode. Note "despite its name (the secondary namenode) does not act as a namenode", and "the state of the secondary namenode lags that of the primary", therefore it'll never get data that didn't already arrive on the primary, it will checkpoint what's already there.
That quoted section is alluding to having a Standby Namenode, which serves a different purpose than the secondary, and the standby should be in sync
Quoted from that link,
Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error
I couldn't understand the difference between secondary name node and standby name node and backup name node. I am looking for in depth understanding of these terms. Kindly help me out with this.
Secondary namenode is just a helper for Namenode.
It gets the edit logs from the namenode in regular intervals and applies to fsimage.
Once it has new fsimage, it copies back to namenode.
Namenode will use this fsimage for the next restart, which will reduce the startup time.
Secondary Namenode's whole purpose is to have a checkpoint in HDFS. Its just a helper node for namenode. That’s why it also known as checkpoint node.
But, It cant replace namenode on namenode's failure.
So, Namenode still is Single-Point-of-Failure.
To overcome this issue; STANDBY-NAMENODE comes into picture.
It does three things:
merging fsimage and edits-log files. (Secondary-namenode's work)
receive online updates of the file system meta-data, apply them to its memory state and persist them on disks just like the name-node does.
Thus at any time the Backup node contains an up-to-date image of the namespace both in memory and on local disk(s).
Cluster will switch over to the new name-node (this standby-node) if the active namenode dies.
However, the answer explained above is satisfactory but I want to add some points to it.
About Standby-Namenode
Both active and standby Namenode use a shared directory and standby Namenode sync through that directory from time to time so there must be no delay in activating it if the active Namenode goes down.
But the main factor is about the block reports, Block reports are not written in edit-logs, they are stored in local disk space. So syncing with a shared directory is not enough.
To avoid this conflict, data-nodes has the addresses of both the name-nodes,
and they send the block reports to both of them but they only follow the block commands coming from the active Namenode.
Hope this is helpful
Standby Node : In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode.Planned maintenance events such as software or hardware upgrades on the NameNode machine could result in whole cluster downtime. So a Standby Node comes in action which is nothing but a backup for the Name Node .
Secondary NameNode : It is one of the poorest named part of the hadoop ecosystem usually beginners get confused thinking of it as a backup.Secondary NameNode in hadoop is a specially dedicated node in HDFS cluster whose main function is to take checkpoints of the file system metadata present on namenode. It is not a backup namenode. It just checkpoints namenode’s file system namespace. The Secondary NameNode is a helper to the primary NameNode but not replace for primary namenode.
The Secondary namenode maps the fsimage and the edit log transactions periodically stores them in a shared storage location in case of HA enabled HDFS Cluster.
In other hand, Standby node has the ability to transfer the latest built fsimage to the Active NameNode via HTTP Get call .
So the main difference between Secondary and standby namenode is secondary namenode does not upload the merged Fsimage with editlogs to active namenode
where as the standby node uplods the merged new image back to active Namenode.
So the NameNode need to fetch the state from the Secondary NameNode
I am a bit confused with Hadoop Namenode HA using QJM and HDFS federation. Both uses multiple namenode and both provides High Availability. I am not able to decide which architecture to used for Namenode High Availability since both looks exactly same except the QJM thing.
Please pardon me if this is not the type of question to be discussed here.
The main difference between HDFS High Availability and HDFS Federation would be that the namenodes in Federation aren't related to each other.
In HDFS federation, all the namenodes share a pool of metadata in which each namenode has it's own pool hence providing fault-tolerance i.e if one namenode in a federation fails, it doesn't affect the data of other namenodes.
So, Federation = Multiple namenodes and no correlation.
While in case of HDFS HA, there are two namenodes - Primary NN and Standby NN.
Primary NN works hard all the time, everytime while Standby NN just sits there and chills and updates it's metadata with respect to the Primary Namenode once in a while which makes them related.
When Primary NN gets tired of this usual sheet (i.e it fails), the Standby NameNode takes over with whatever most recent metadata it has.
As for a HA Architecture, you need to have atleast two sepearte machines configured as Namenode, out of which only one should run in Active State.
More details here: HDFS High Availability
As far as i know, Hadoop 1.x had secondary namenode but was used to create an image of the primary namenode and it updates the primary namenode when it fails and again starts up. But what is the use of secondary namenode in Hadoop 2.x given that we already have a hot standby present?
As far as I know the Hadoop 2.x can be done in 2 ways:
1. With HA (High Availability Cluster): if you are setting up HA cluster then you may not need to use Secondary namenode because standby namenode keep its state synchronized with the Active namenode.
The HDFS NameNode High Availability feature enables you to run redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby.Both NameNode require the same type of hardware configuration.In HA hadoop cluster Active NameNode reads and write metadata information in Separate JournalNode.
In the event of failover, standby NameNode will ensure that its namespace is completely updated according to edit logs before it is changes to active state. So there is no need of Secondary NameNode in this Cluster Setup.
2. Without HA: you can have a hadoop setup without standby node. Then the secondary NameNode will act as you already mentioned in Hadoop 1.x
When you configure HA for NameNodes, Secondary Namenode is not used. However you can still configure HDFS without HA (with NameNode and Secondary NameNode). This part didn't change much since hadoop 1.x.
I am new to hadoop need to learn details about backup and recovery. I have revised oracle backup and recovery will it help in hadoop?From where should I start
There are a few options for backup and recovery. As s.singh points out, data replication is not DR.
HDFS supports snapshotting. This can be used to prevent user errors, recover files, etc. That being said, this isn't DR in the event of a total failure of the Hadoop cluster. (http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html)
Your best bet is keeping off-site backups. This can be to another Hadoop cluster, S3, etc and can be performed using distcp. (http://hadoop.apache.org/docs/stable1/distcp2.html), (https://wiki.apache.org/hadoop/AmazonS3)
Here is a Slideshare by Cloudera discussing DR (http://www.slideshare.net/cloudera/hadoop-backup-and-disaster-recovery)
Hadoop is designed to work on the big cluster with 1000's of nodes. Data loss is possibly less. You can increase the replication factor to replicate the data into many nodes across the cluster.
Refer Data Replication
For Namenode log backup, Either you can use the secondary namenode or Hadoop High Availability
Secondary Namenode
Secondary namenode will take backup for the namnode logs. If namenode fails then you can recover the namenode logs (which holds the data block information) from the secondary namenode.
High Availability
High Availability is a new feature to run more than one namenode in the cluster. One namenode will be active and the other one will be in standby. Log saves in both namenode. If one namenode fails then the other one becomes active and it will handle the operation.
But also we need to consider for Backup and Disaster Recovery in most cases. Refer #brandon.bell answer.
You can use the HDFS sync application on DataTorrent for DR use cases to backup high volumes of data from one HDFS cluster to another.
https://www.datatorrent.com/apphub/hdfs-sync/
It uses Apache Apex as a processing engine.
Start with official documentation website : HdfsUserGuide
Have a look at below SE posts:
Hadoop 2.0 data write operation acknowledgement
Hadoop: HDFS File Writes & Reads
Hadoop 2.0 Name Node, Secondary Node and Checkpoint node for High Availability
How does Hadoop Namenode failover process works?
Documentation page regarding Recovery_Mode:
Typically, you will configure multiple metadata storage locations. Then, if one storage location is corrupt, you can read the metadata from one of the other storage locations.
However, what can you do if the only storage locations available are corrupt? In this case, there is a special NameNode startup mode called Recovery mode that may allow you to recover most of your data.
You can start the NameNode in recovery mode like so: namenode -recover