does hadoop namenode dirs synchronize after a failure? - hadoop

What happens in this scenario:
Namenode is writing to two directories on two different drives, one is local and the other (remote) is mounted . Now, the namenode fails and we launch namenode process on the remote machine (it has a copy of the original namenode data, so it's safe) and change namenode's ip on all datanodes. After a while, we manage to bring the original namenode to life with exact previous configuration. We stop the namenode process on the remote machine. Now, will the local fs.name.dir be synchronized with the mounted one (I mean the diff which was accumulated on the mounted dir, while the original namenode was down) or there will be a problem with namenode data consistency?

Now, will the local fs.name.dir be synchronized with the mounted one (I mean the diff which was accumulated on the mounted dir, while the original namenode was down) or there will be a problem with namenode data consistency?
The local NN data will be out-of-date for the time it was down. All the changes done to the NN namespace on the remote will not be there on the local NN.
Note than NN only stores the namespace for the filesystem and also the namespace to block-id mapping. Where the blocks are stored is not stored in NN. When a DN starts, it sends the block report to the NN.
Check this and this on the HDFS NN HA.
change namenode's ip on all datanodes.
After a while, we manage to bring the original namenode to life with exact previous configuration. We stop the namenode process on the remote machine.
There will be downtime for the above mentioned scenarios.

Related

Does secondary namenode also updates metadata stored at NFS?

I am reading "Hadoop: The Definitive guide". This is how author explains fault tolerance before Hadoop 2.x
Without the namenode, the filesystem cannot be used. In fact, if the machine running
the namenode were obliterated, all the files on the filesystem would be lost since there
would be no way of knowing how to reconstruct the files from the blocks on the
datanodes. For this reason, it is important to make the namenode resilient to failure,
and Hadoop provides two mechanisms for this.
The first way is to back up the files that make up the persistent state of the filesystem
metadata. Hadoop can be configured so that the namenode writes its persistent state to
multiple filesystems. These writes are synchronous and atomic. The usual configuration
choice is to write to local disk as well as a remote NFS mount.
It is also possible to run a secondary namenode, which despite its name does not act as
a namenode. Its main role is to periodically merge the namespace image with the edit
log to prevent the edit log from becoming too large. The secondary namenode usually
runs on a separate physical machine because it requires plenty of CPU and as much
memory as the namenode to perform the merge. It keeps a copy of the merged name‐
space image, which can be used in the event of the namenode failing. However, the state
of the secondary namenode lags that of the primary, so in the event of total failure of
the primary, data loss is almost certain. The usual course of action in this case is to copy
the namenode’s metadata files that are on NFS to the secondary and run it as the new
primary
My understanding is NFS is always synced with primary namenode. My question is how does the metadata stored in NFS gets synced with primary namenode after secondary namenode has updated the metadata of primary namenode? What happens if primary fails totally before NFS gets synced?
That document doesn't say the "primary" or Secondary NameNode is necessarily in sync with NFS, it's saying in the event you have configured Namenode backups to NFS (something you must do yourself, I believe, as it says this is a "configuration choice"), you can restore them to a new server and designate it as the new Namenode. Note "despite its name (the secondary namenode) does not act as a namenode", and "the state of the secondary namenode lags that of the primary", therefore it'll never get data that didn't already arrive on the primary, it will checkpoint what's already there.
That quoted section is alluding to having a Standby Namenode, which serves a different purpose than the secondary, and the standby should be in sync
Quoted from that link,
Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error

Corrupted block in hdfs cluster

The screenshot added below shows the output of hdfs fsck /. It shows that the "/" directory is corrupted. This is the masternode of my Hadoop cluster. What to do?
If you are using Hadoop 2, you can run a Standby namenode to achieve High Availability. Without that, your cluster's master will be a Single Point of Failure.
You can not retrieve the data of Namenode from anywhere else since it is different from the usual data you store. If your namenode goes down, your blocks and files will still be there, but you won't be able to access them since there would be no related metadata in the namenode.

Difference between secondary name node and standby name node in Hadoop

I couldn't understand the difference between secondary name node and standby name node and backup name node. I am looking for in depth understanding of these terms. Kindly help me out with this.
Secondary namenode is just a helper for Namenode.
It gets the edit logs from the namenode in regular intervals and applies to fsimage.
Once it has new fsimage, it copies back to namenode.
Namenode will use this fsimage for the next restart, which will reduce the startup time.
Secondary Namenode's whole purpose is to have a checkpoint in HDFS. Its just a helper node for namenode. That’s why it also known as checkpoint node.
But, It cant replace namenode on namenode's failure.
So, Namenode still is Single-Point-of-Failure.
To overcome this issue; STANDBY-NAMENODE comes into picture.
It does three things:
merging fsimage and edits-log files. (Secondary-namenode's work)
receive online updates of the file system meta-data, apply them to its memory state and persist them on disks just like the name-node does.
Thus at any time the Backup node contains an up-to-date image of the namespace both in memory and on local disk(s).
Cluster will switch over to the new name-node (this standby-node) if the active namenode dies.
However, the answer explained above is satisfactory but I want to add some points to it.
About Standby-Namenode
Both active and standby Namenode use a shared directory and standby Namenode sync through that directory from time to time so there must be no delay in activating it if the active Namenode goes down.
But the main factor is about the block reports, Block reports are not written in edit-logs, they are stored in local disk space. So syncing with a shared directory is not enough.
To avoid this conflict, data-nodes has the addresses of both the name-nodes,
and they send the block reports to both of them but they only follow the block commands coming from the active Namenode.
Hope this is helpful
Standby Node : In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode.Planned maintenance events such as software or hardware upgrades on the NameNode machine could result in whole cluster downtime. So a Standby Node comes in action which is nothing but a backup for the Name Node .
Secondary NameNode : It is one of the poorest named part of the hadoop ecosystem usually beginners get confused thinking of it as a backup.Secondary NameNode in hadoop is a specially dedicated node in HDFS cluster whose main function is to take checkpoints of the file system metadata present on namenode. It is not a backup namenode. It just checkpoints namenode’s file system namespace. The Secondary NameNode is a helper to the primary NameNode but not replace for primary namenode.
The Secondary namenode maps the fsimage and the edit log transactions periodically stores them in a shared storage location in case of HA enabled HDFS Cluster.
In other hand, Standby node has the ability to transfer the latest built fsimage to the Active NameNode via HTTP Get call .
So the main difference between Secondary and standby namenode is secondary namenode does not upload the merged Fsimage with editlogs to active namenode
where as the standby node uplods the merged new image back to active Namenode.
So the NameNode need to fetch the state from the Secondary NameNode

How to delete datanode from hadoop clusters without losing data

I want to delete datanode from my hadoop cluster, but don't want to lose my data. Is there any technique so that data which are there on the node which I am going to delete may get replicated to the reaming datanodes?
What is the replication factor of your hadoop cluster?
If it is default which is generally 3, you can delete the datanode directly since the data automatically gets replicated. this process is generally controlled by name node.
If you changed the replication factor of the cluster to 1, then if you delete the node, the data in it will be lost. You cannot replicate it further.
Check all the current data nodes are healthy, for these you can go to the Hadoop master admin console under the Data nodes tab, the address is normally something link http://server-hadoop-master:50070
Add the server you want to delete to the files /opt/hadoop/etc/hadoop/dfs.exclude using the full domain name in the Hadoop master and all the current datanodes (your config directory installation can be different, please double check this)
Refresh the cluster nodes configuration running the command hdfs dfsadmin -refreshNodes from the Hadoop name node master
Check the Hadoop master admin home page to check the state of the server to remove at the "Decommissioning" section, this may take from couple of minutes to several hours and even days depending of the volume of data you have.
Once the server is shown as decommissioned complete, you may delete the server.
NOTE: if you have other services like Yarn running on the same server, the process is relative similar but with the file /opt/hadoop/etc/hadoop/yarn.exclude and then running yarn rmadmin -refreshNodes from the Yarn master node

Hadoop Namenode without HDFS storage

I have installed a hadoop cluster with total 3 machines, with 2 nodes acting as datanodes and 1 node acting as Namenode and as well as a Datanode.
I wanted to clear certain doubts regarding hadoop cluster installation and architecture.
Here is a list of questions I am looking answers for----
I uploaded a data file around 500mb size in the cluster and then checked the hdfs report.
I noticed that the namenode I made is also occupying 500mb size in the hdfs, along with datanodes with a replication factor of 2.
The problem here is that I want the namenode not to store any data on it, in short i dont want it to work as a datanode as it is also storing the file I am uploading. So what is the way of making it only act as a Master Node and not like a datanode?
I tried running the command hadoop -daemon.sh stop on the Namenode to stop the datanode services on it but it wasnt of any help.
How much metadata does a Namenode generate for a filesize typically of 1 GB? Any approximations?
Go to conf directory inside your $HADOOP_HOME directory on your master. Edit the file named slaves and remove the entry corresponding to your name node from it. This way you are only asking the other two nodes to act as slaves and name node as only the master.

Resources