Apache Spark deployment on Hadoop Yarn Cluster with HA Capability - hadoop

Am new to the Big data environment and just started with installing a 3 Node Hadoop cluster 2.6 with HA Capability using Zookeeper.
All works good for now and i have tested the Failover scenario using zookeeper on NN1 and NN2 and works well.
Now i was thinking to install Apache Spark on my Hadoop Yarn cluster also with HA Capability.
Can anyone guide me with the installation steps ? I could only find on how to setup Spark on Stand alone mode and which i have setup successfully. Now i want to install the same in Yarn cluster along with HA Capability ,
I have three node cluster (NN1 , NN2 , DN1) , the following daemons are currently running on each of these servers ,
Nodes running in Master NameNode (NN1)
Jps
DataNode
DFSZKFailoverController
JournalNode
ResourceManager
NameNode
QuorumPeerMain
NodeManager
Nodes running in StandBy NameNode (NN2)
Jps
DFSZKFailoverController
NameNode
QuorumPeerMain
NodeManager
JournalNode
DataNode
Nodes running in DataNode (DN1)
QuorumPeerMain
Jps
DataNode
JournalNode
NodeManager

You should setup ResourceManager HA (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html). Spark when run on YARN doesn't run its own daemon processes, so there is no spark part that requires HA in YARN mode.

You can configure the Spark Yarn mode, In Yarn mode you can configure the Driver and Executors Depends on the Cluster capacity.
spark.executor.memory <value>
Number of executors are allocated based on your YARN Container memory!

Related

Slave's datanodes not starting in hadoop

I followed this tutorial and tried to setup a multinode hadoop cluster on centOS. After doing all the configurations and running start-dfs.sh and start-yarn.sh, this is what jps outputs:
Master
26121 ResourceManager
25964 SecondaryNameNode
25759 NameNode
25738 Jps
Slave 1
19082 Jps
17826 NodeManager
Slave 2
17857 Jps
16650 NodeManager
Data node is not started on slaves.
Can anyone suggest what is wrong with this setup?

Hadoop Datanode failed to start and is not running

I am trying to install Hadoop 2.7 in Ubuntu 14.04 in VMWare. But datanode always failed to start when I do hadoop datanode, I get this error:
This issue is due to Datanode clusterID is not same as Namenode Cluster ID, both should be same and then only the datanode will communicate to Namenode.
Try to stop services and start the namenode after formatting it.

Hadoop Multi Master Cluster Setup

We have a Hadoop setup with 2 Master nodes and 1 Slave node.
We have configured the Hadoop cluster. After configuring, when we executed "jps" command, we are getting following output on my Master Node:
13405 NameNode
14614 Jps
13860 ResourceManager
13650 DataNode
14083 NodeManage
On my second Master Node, output is:
9698 Jps
9234 DataNode
9022 NameNode
9450 NodeManager
On my Data Node, output is:
21681 NodeManager
21461 DataNode
21878 Jps
I feel my secondary node is not running. Please tell me this is right or wrong. If its wrong, what should be the status of my node? Please answer me as soon as possible.
You can check status of node by running below command
hdfs haadmin -getServiceState

Hadoop 2.2.0 jobtracker is not starting

It seems I have no jobtracker with Hadoop 2.2.0. JPS does not show it, there is no one listening on port 50030, and there are no logs about the jobtracker inside the logs folder. Is this because of YARN? How can I configure and start the job tracker?
If you are using YARN framework, there is no jobtracker in it. Its functionality is split and replaced by ResourceManager and ApplicationMaster. Here is expected jps prinout while running YARN
$jps
18509 Jps
17107 NameNode
17170 DataNode
17252 ResourceManager
17309 NodeManager
17626 JobHistoryServer

Couldn't see RegionServer in Terminal-LINUX-HBASE

I installed hadoop and my HBase is running on top of it. All my deamons in hadoop is up and running. After i started my hbase i could see the HMaster running when i gave the JPS command.
I'm running my hadoop in Pseudo distributed mode . When i checked my localhost it shows regionserver is running.
But why couldn't i see the HRegionServer running in my Terminal in Linux?
It might be because hbase.cluster.distributed is not set or set to false in hbase-site.xml .
According to http://hbase.apache.org/book/config.files.html :
hbase.cluster.distributed :
The mode the cluster will be in. Possible values are false for
standalone mode and true for distributed mode. If false, startup will
run all HBase and ZooKeeper daemons together in the one JVM. Default:
false
So if you set it to true you'll see the distinct master, region server and ZooKeeper processes. E.g: a pseudo-distributed Hadoop/HBase process list would look like this:
jps
3991 HMaster
4209 HRegionServer
3140 DataNode
3464 TaskTracker
3246 JobTracker
2942 NameNode
3924 HQuorumPeer

Resources