What happens if the namenode and the ZooKeeper fail together - hadoop

What happens if the namenode and the ZooKeeper fail together. is this possible? Also, do various QJM keep log edits of each other?

If the Zookeeper server is installed on other nodes(not on namenode). It brings the other standby namenode to active state.
If you have installed more than 1 zookeeper server, for example consider you have installed 3 zookeeper servers. If one of the zookeeper fails, election process takes place and new zookeeper will be made active.

A Zookeeper Quorum is used to avoid zookeeper failure. The Zookeeper replicates its data to other nodes in the quorum. In case of a failure? the election occurs and a new node is appointed the leader which directs the client to the secondary name node.

Related

Namenode with high availability vs zookeeper based leader selection

I am reading 2 different things in Apache Hadoop documentation and cloudera's documentation.
Based on cloudera, we should set up namenode in high availability mode, i.e.: by defining primary and secondary namenode, but based on Hadoop documentation, this should automatically taken care by zookeeper and it should decide namenode among the available datanodes.
Can anyone explain the difference and which one to use?
by defining primary and secondary namenode
There is such a thing as a "secondary namenode", but it's actually a very different thing as it's not a standby and able to become active.
There's no "vs". Namenode HA needs Zookeeper
If you read more of the Cloudera documentation it doesn't fail to mention Zookeeper.
Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).
Cloudera doesn't package much extras, if any, on top of the core Hadoop functions.
Regarding your question...
this should automatically taken care by zookeeper
The failover is automatic if HDFS Zookeeper properties are (manually) configured, Zookeeper is running, and the Active Namenode goes down.
among the available datanodes
The operation has nothing to do with datanodes

ZooKeeper Failover controller crashes when the Hadoop NameNode goes down

We are setting up a Hadoop Cluster in our development environment. While we were testing the fail over of the NameNode we noticed that the Zookeeper Failover controller would sometimes crash. In one case both ZooKeeper and the ZooKeeper Failover controller crashed.
At one point in our testing both NameNodes were in an active state. This would cause a split brain scenario in the Hadoop Cluster.
We are using the following versions:
- hadoop-2.7.3
- zookeeper-3.4.10
We have two a four server cluster. Two of the servers are dedicated to NameNode and two of the servers are dedicated to DataNodes.
The components running on the NameNode servers are
- NameNode
- ZooKeeper
- ZooKeeper Failover controller
- JournalNode
The components running on the DataNode servers are
- DataNode
- ZooKeeper
- JournalNode
The following matrix contains the test scenarios

How does Hadoop Namenode failover process works?

Hadoop defintive guide says -
Each Namenode runs a lightweight failover controller process whose
job it is to monitor its Namenode for failures (using a simple
heartbeat mechanism) and trigger a failover should a namenode
fail.
How come a namenode can run something to detect its own failure?
Who sends heartbeat to whom?
Where this process runs?
How it detects namenode failure?
To whom it notify for the transition?
From Apache docs
The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:
Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active.
Have a look at this Apache PDF which is part of HDFS-2185 JIRA issue
Slide 16 from
http://www.slideshare.net/cloudera/hdfs-update-lipcon-federal-big-data-apache-hadoop-forum
:
Automatic Namenode failover process in Hadoop:
In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.
In order for the Standby Namenode to keep its state synchronized with the Active Namenode, both nodes communicate with a group of separate daemons called JournalNodes (JNs).
When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. The Standby node is reads these edits from the JNs and apply to its own name space.
In the event of a failover, the Standby will ensure that it has read all of the edits from the JounalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.
It is vital for an HA cluster that only one of the NameNodes is Active at a time. ZooKeeper has been used to avoid split brain scenario so that name node state is not getting diverged due to failover.
Slide 8 from : http://www.slideshare.net/cloudera/hdfs-futures-world2012-widescreen
:
In Summary: Name Node is Daemon & Failover controller is a Daemon. If Name Node Daemon fails, Failover controller Daemon detects and takes corrective action. Even if entire machine crashes, ZooKeeper server detects it and lock will be expired and other Standby name node will be elected as Active Name node.

what are the differences zookeeper, journal node tasks and quorum journal manager in hadoop?

On studying the material in multiple no of websites and videos, I am confused with the functionalities and differences in the purposes of the 3 hadoop components ZooKeeper, Journal Node and the Quorum Journal Manager.
Could anyone please explain me the reasons for inventing each of the above and differences in the purposes and functionalities of the above three components?
Thanks in advance.
Think of it like this, zookeeper is a group of people, each assigned to watch over a factory and coordinate them, journal node is a place where all factory managers can check others status and coordinate. QJM is a combination of both to be used in HA for better coordination in case of fail over.
zookeeper coordinates hbase regionservers and other hadoop modules which require zookeeper.
journal node coordinates hadoop datanodes with the namenode.
QJM coordinates regionservers using the technique used by journal node
on core hadoop setup only journal node is necessary in case of distributed setup
Firstly, quorum means there is a need of majority for decisions. So, when you see the word "quorum" you should think of a clustered, saying that; multi-host configuration. You can hear this term for both Zookeeper and Journal Nodes.
Short description of their functionalities will help you distinguish their purpose.
Zookeeper: Zookeeper is the central synchronisation application for informations which applications need to check frequently. There may be many informations that application need like naming structure, information, configuration information (or simply configurations) etc. Most common case is configuration of application. When you change a config which relates to lets say 80 servers, to synchronise this change to all nodes, you need to develop a synchronisation service. Application itself may have this feature. But imagine you add another 12 applications to your environment. You need to take care of each application's synchronisation service one by one. This is where zookeeper comes in. Zookeeper can handle management of all these information by itself. If you set it up as a cluster (need an odd number of hosts. why?) you will have high availability for Zookeeper (failover cases) and have a Zoopeeker Quorum.
Journal Node: In an high availability Hadoop cluster you have more than one Namenodes running in active/passive mode. Active namenode informs journal node for changes. Stand by name node asks to journal node about what changed. Like on the case of Zookeeper if you set up as cluster configuration (need odd number of hosts also here. why?), you have high availability also for Journal Node features and have a Quorum Journal Manager.
Actually I didn't hear them set as single host or node except for lab purposes (vm in pc).
1. Zookeeper
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications
Role of Zookeeper in Hadoop ecosystem:
During the Hadoop Namenode failover process, ZooKeeper has been used to avoid split brain scenario so that name node state is not getting diverged due to failover.
Refer to this post for more details:
How does Hadoop Namenode failover process works?
2. JournalNode ( Used in Namenode failover process)
In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate with a group of separate daemons called “JournalNodes” (JNs).
JournalNode machines - the machines on which you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager.
Note: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine
3.Quorum Journal Manager (QJM) allows to share edit logs between the Active and Standby NameNodes
Importantly, when using the Quorum Journal Manager, only one NameNode will ever be allowed to write to the JournalNodes, so there is no potential for corrupting the file system metadata from a split-brain scenario

HBase HDFS zookeeper

Now I am learning about HBase. I set up my HBase Cluster and Hadoop Cluster like this:
server1: Namenode HMaster
server2: datanode1 RegionServer1 HQuorumPeer
Server3: datanode2 RegionServer2 HQuorumPeer
Server4: datanode3 RegionServer3 HQuorumPeer
I have several question about HBase cluster:
1: All RegionServers must be in the Hadoop Cluster so it can use HDFS to store
data, even though it will store data into local file system, right?
2: What does RegionServer do? Does the HMaster give the job to all RegionServeres
and let them running parallel, like tasktracker in datanode?
3: What does zookeeper do? Do I need to setup zookeeper in all RegionServers
nodes and the master node?
4: It is related to #3. I know HBase uses zookeeper to recovery once regionServer
is down. How does it specific work?
All RegionServers must be in the Hadoop Cluster so it can use HDFS to store
data, even though it will store data into local file system, right?
Yes. RegionServers are the daemons that are responsible for storing data in a HBase cluster. You store data in HBase tables which are spread over many regions on several RegionServers across the cluster. Although data goes into the RegionServers, it actually gets stored inside HDFS. But if you are on a standalone setup HDFS is not used. The data gets stored directly in the local FS. It is analogous to any DB and FS. Take MSQL and ext3 for example. And yes, all the HDFS data is stored on your disk in reality. You cannot see it directly though.
What does RegionServer do? Does the HMaster give the job to all RegionServeres
and let them running parallel, like tasktracker in datanode?
As specified in the comment above RegionServer is the daemon that actually stores data in a HBase cluster. I'm sorry I didn't quite get the second part of this question. what do you mean by like tasktracker in datanode? In a HBase cluster HMaster is the daemon which is responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes. Its job is monitoring and management. Regionservers don't run any job like TaskTrackers do. They just store data and are responsible for stuff like serving and managing regions.
What does zookeeper do? Do I need to setup zookeeper in all RegionServers
nodes and the master node?
Zookeeper is the guy who coordinates everything behind the curtains. It is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. A distributed HBase setup depends on a running ZooKeeper cluster. All participating nodes and clients need to be able to access the running ZooKeeper ensemble. HBase by default manages a ZooKeeper cluster. It gets started and stopped as part of the HBase start/stop process. But, you can also manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use. You don't have to have Zookeepers running on all the nodes. Just decide some number which suits your cluster. One thing to note here is that you should always use an odd number of Zookeepers.
It is related to #3. I know HBase uses zookeeper to recovery once regionServer
is down. How does it specific work?
Each RegionServer is connected to ZooKeeper, and the master watches these connections. ZooKeeper manages a heartbeat with a timeout. So, on a timeout, the HMaster declares the region server as dead, and starts the recovery process. Following things happen during the recovery process :
Identifying that a node is down : a node can cease to respond simply because it is overloaded or as well because it is dead.
Recovering the writes in progress : that’s reading the commit log and recovering the edits that were not flushed.
Reassigning the regions : the region server was previously handling a set of regions. This set must be reallocated to other region servers, depending on their respective workload.
The process is actually a bit more involved. You can find more on this here. I would also suggest you to go through the book HBase The Definitive Guide by Lars in order to get some grip on HBase.
HTH

Resources