Can I add standby namenode into existing Hadoop cluster (with Namenode and Secondary namenode) - hadoop

I have Hadoop 2.7.2 setup where Namenode and Secondary Namenode node run together with few datanodes. After namenode failure (it was just restart) I realized that Secondary namenode is not redundant namenode as I thought.
So question is, can I make my cluster high available and add Standby namenode without deleting existing metadata from namenode?

You need a Zookeeper cluster, but yes, you can add a namenode to enable High Availability

Related

How to add a Secondary NameNode in a HBase cluster setup?

I've a Hbase cluster setup with 3 nodes: A NameNode and 2 DataNodes.
The NameNode is a server with 4GB memory and 20GB hard disk while each DataNode has 8GB memory and 100GB hard disk.
I'm using
Apache Hadoop version: 2.7.2 and
Apache Hbase version: 1.2.4
I've seen some people mentioned about a Secondary NameNode.
My questions are,
What is the impact of not having a Secondary NameNode in my setup?
Is it possible to use one of the DataNodes as the Secondary NameNode?
If possible how can I do it? (I inserted only the NameNode in /etc/hadoop/masters file.)
What is the impact of not having a Secondary NameNode in my setup?
SecondaryNamenode does the job of periodically merging the namespace image with the edit log (called as checkpointing). Your setup is not an High-Availability setup, thus not having one will cause the edit log to grow large in size which would eventually add an overhead to the NameNode during startup.
Is it possible to use one of the DataNodes as the Secondary NameNode?
Running the SNN in a Datanode host is not recommended. A separate host is preferred to run the Secondary Namenode process. The host chosen for SNN must have identical memory as the NN.
If possible how can I do it? (I inserted only the NameNode in /etc/hadoop/masters file.)
masters file is not in use anymore. Add this property in hdfs-site.xml
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>SNN_host:50090</value>
</property>
Also note that, SecondaryNamenode process is started by default in the node where start-dfs.sh is executed.

Use of secondary namenode in Hadoop in 2.x

As far as i know, Hadoop 1.x had secondary namenode but was used to create an image of the primary namenode and it updates the primary namenode when it fails and again starts up. But what is the use of secondary namenode in Hadoop 2.x given that we already have a hot standby present?
As far as I know the Hadoop 2.x can be done in 2 ways:
1. With HA (High Availability Cluster): if you are setting up HA cluster then you may not need to use Secondary namenode because standby namenode keep its state synchronized with the Active namenode.
The HDFS NameNode High Availability feature enables you to run redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby.Both NameNode require the same type of hardware configuration.In HA hadoop cluster Active NameNode reads and write metadata information in Separate JournalNode.
In the event of failover, standby NameNode will ensure that its namespace is completely updated according to edit logs before it is changes to active state. So there is no need of Secondary NameNode in this Cluster Setup.
2. Without HA: you can have a hadoop setup without standby node. Then the secondary NameNode will act as you already mentioned in Hadoop 1.x
When you configure HA for NameNodes, Secondary Namenode is not used. However you can still configure HDFS without HA (with NameNode and Secondary NameNode). This part didn't change much since hadoop 1.x.

How to recovery hdfs by HA second namenode?

The situation is as follows:
I accidentally re format the namenode node in Hadoop.
I now have a spare namenode as well as all the datanode nodes.
I lost the master namenode and all the journal nodes of the file.
Excuse me, how to restore the HDFS system through the spare namenode node?
Easy thing would be
restart the cluster without primary namenode. DataNodes and Secondary Namenode will restore by it self.

How Namenode High availability achieved in Hadoop 1.x?

Is there any possible solution to achieve Namenode HA in Hadoop 1.x ?
Hadoop 1.x is known for its single point of failure; there is a single Master Node that contains Hadoop Namenode and Hadoop JobTracker. The Namenode keeps look up table for every file (blocks of the file) location on the cluster. The Name node manages Hadoop Distributed File system and act as a HDFS master.
The Secondary NameNode is used for fault tolerance and it is a copy of the NameNode records. It is only used to backup the Namenode in case of crash.

What is the impact on hadoop cluster when Secondary Namenode fails

What happens to hadoop cluster when Secondary NameNode fails.
Hadoop cluster is said to be a single point of failure as all medata is stored by NameNode. What about Secondary NameNode, if secondary namenode fails, will Cluster fail or keep running.
Secondary name node is little bit confusing name. Hadoop Cluster will run when it crashes. You can run Hadoop claster even without it and it is not used for high availability. I am talking about Hadoop versions <2.
More info: http://wiki.apache.org/hadoop/FAQ#What_is_the_purpose_of_the_secondary_name-node.3F

Resources