ZooKeeper Failover controller crashes when the Hadoop NameNode goes down - hadoop

We are setting up a Hadoop Cluster in our development environment. While we were testing the fail over of the NameNode we noticed that the Zookeeper Failover controller would sometimes crash. In one case both ZooKeeper and the ZooKeeper Failover controller crashed.
At one point in our testing both NameNodes were in an active state. This would cause a split brain scenario in the Hadoop Cluster.
We are using the following versions:
- hadoop-2.7.3
- zookeeper-3.4.10
We have two a four server cluster. Two of the servers are dedicated to NameNode and two of the servers are dedicated to DataNodes.
The components running on the NameNode servers are
- NameNode
- ZooKeeper
- ZooKeeper Failover controller
- JournalNode
The components running on the DataNode servers are
- DataNode
- ZooKeeper
- JournalNode
The following matrix contains the test scenarios

Related

Namenode with high availability vs zookeeper based leader selection

I am reading 2 different things in Apache Hadoop documentation and cloudera's documentation.
Based on cloudera, we should set up namenode in high availability mode, i.e.: by defining primary and secondary namenode, but based on Hadoop documentation, this should automatically taken care by zookeeper and it should decide namenode among the available datanodes.
Can anyone explain the difference and which one to use?
by defining primary and secondary namenode
There is such a thing as a "secondary namenode", but it's actually a very different thing as it's not a standby and able to become active.
There's no "vs". Namenode HA needs Zookeeper
If you read more of the Cloudera documentation it doesn't fail to mention Zookeeper.
Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).
Cloudera doesn't package much extras, if any, on top of the core Hadoop functions.
Regarding your question...
this should automatically taken care by zookeeper
The failover is automatic if HDFS Zookeeper properties are (manually) configured, Zookeeper is running, and the Active Namenode goes down.
among the available datanodes
The operation has nothing to do with datanodes

What happens if the namenode and the ZooKeeper fail together

What happens if the namenode and the ZooKeeper fail together. is this possible? Also, do various QJM keep log edits of each other?
If the Zookeeper server is installed on other nodes(not on namenode). It brings the other standby namenode to active state.
If you have installed more than 1 zookeeper server, for example consider you have installed 3 zookeeper servers. If one of the zookeeper fails, election process takes place and new zookeeper will be made active.
A Zookeeper Quorum is used to avoid zookeeper failure. The Zookeeper replicates its data to other nodes in the quorum. In case of a failure? the election occurs and a new node is appointed the leader which directs the client to the secondary name node.

In a Hbase Cluster Is it advisable to have a ZooKeeper quorum peer on a node that runs a RegionServer?

I have a 4 node hadoop cluster.
namenode.example.com
datanode1.example.com
datanode2.example.com
datanode3.example.com
To have a HBase cluster setup on top of this, I intend to use
namenode.example.com as the hbase master and the 3 datanodes as the region servers.
A fully distributed HBase production setup needs
greater than 1 zookeeper server , and an odd number is recommended.
Hence what would be the disadvantages of having zookeeper servers colocated with the regionservers ?
References
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_hbase_cluster_deploy.html
http://hbase.apache.org/0.94/book/zookeeper.html

Hadoop components to nodes mapping - what components should be installed where

I am considering the following hadoop services for setting up a cluster using HDP 2.1
- HDFS
- YARN
- MapReduce2
- Tez
- Hive
- WebHCat
- Ganglia
- Nagios
- ZooKeeper
There are 3 node types that I can think of
NameNodes (ex: primary, secondary)
Application Nodes (from where I will access hive service most often and also copy code repositories and any other code artifacts)
Data Nodes(The workhorses of the cluster)
Given above I know that there are these best practices and common denominators
Zookeeper services should be running on atleast 3 data nodes
DataNode service should be running on all data nodes
Ganglia monitor should be running on all data nodes
Name node service should be running on name nodes
NodeManager should be installed on all nodes containing DataNode component.
This still leaves lots of open questions ex:
which is the ideal node to install a lot the servers needed ex: Hive Server, App Timeline Server, WebHCat Server, Nagios Server, Ganglia Server, MySQL server. Is it Application nodes? should each get its own node? should we have a separate 'utilities' node?
is there some criterion to choose where zookeeper should be installed?
I thinking the more generic question is there a table with "Hadoop components to nodes mapping essentially what components should be installed where"
Seeking advice/insight/links or documents on this topic.

Hadoop Cluster setup (Fully distributed mode)

I am setting up hadoop on a multinode cluster, and I have a few questions:
Will it be ok to have NameNode and ResourceManager on the same machine?
Which will be the best role for a master system, NameNode, ResourceManager Or DataNode/NodeManager?.
I have a master and 3 slave machines. The slaves file on the master machine has the following entries:
master
slave1
slave2
slave3
Do I have to place this same slaves file in all of the slave machines? Or should I remove the first line (master) and then place it in the slave machines?
Best Regards.
Yes, at least in small clusters those two should be running in the master node.
Check answer 1. Master node can have also for example SecondaryNamenode and JobHistoryServer
No, the slaves file is only on the master node. If you have the master node in the slaves file, it means that the master node acts also as a datanode. Especially in small clusters that's totally fine. The slaves file essentially tells which on nodes the datanode processes are started.
Slave nodes should only run DataNode and NodeManager. But this is all handled by Hadoop if the configurations are correct - you can just check which processes are running after starting the cluster from the master node. Master node basically takes care of everything and you "never" need to manually connect to the slaves for any configurations.
My answer is meant for small clusters, probably in bigger "real" clusters the server responsibilities are even more separated.
For fully understand the multinode cluster concept follow this link-- http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/
and for implemtation of multinode cluster step vise follow this link --
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
May these links help you

Resources