Any option to make Hadoop historyserver high available - hadoop

Any option to make Hadoop job historyserver high available ? I am using Hadoop 2.7
Also ResourceMgr high availability is not so much matured than of namenode .... even start-yarn.sh doesnot start the standby RM. Any out of the box solution for both of those ?

Related

Type of clusters in hadoop

how can i differentiate hadoop standalone mode & pseudo distributed mode? Can anyone explain difference between all hadoop daemons as a single java process and separate java process
Hadoop standalone mode is running Hadoop commands without starting Hadoop daemons i.e. on local file system.
The pseudo distributed mode is running Hadoop daemons on a single machine.

unable to see Task tracker and Jobtracker after Hadoop single node installation 2.5.1

Iam new to Hadoop 2.5.1. As i have already installed Hadoop 1.0.4 previously, i thought installation process would be same so followed following tutorial.
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Every thing was fine, even i have given these settings in core-site.xml
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
But i have seen in several sites this value as 9000.
And also changes in yarn.xml.
Still everything works fine when i run a mapreduce job. But my question is
when i run command jps it gives me this output..
hduser#secondmaster:~$ jps
5178 ResourceManager
5038 SecondaryNameNode
4863 DataNode
5301 NodeManager
4719 NameNode
6683 Jps
I dont see task tracker and job tracker in jps. Where are these demons running.
And without these deamons how am i able to run Mapreduce job.
Thanks,
Sreelatha K.
From hadoop version hadoop 2.0 onwards, default processing framework has been changed to YARN from Classic Mapreduce. You are using YARN, where you cannot see Jobtracker, Tasker in YARN. Jobtracker and Tasktracker is replaced by Resource manager and Nodemanager respectively in YARN.
But still you have an option to use Classic Mapreduce framework instead of YARN.
In Hadoop 2 there is an alternative method to run MapReduce jobs, called YARN. Since you have made changes in yarn.xml, MapReduce processing happens using YARN, not using the traditional MapReduce framework. That's probably be the reason why you don't see TaskTracker and JobTracker listed after executing the jps command. Note that ResourceManager and NodeManager are the daemons for YARN.
YARN is next generation of Resource Manager who can able to integrate with Apache spark, storm and many more tools you can use to write map-reduce jobs

Job Tracker and TaskTracker in Hadoop2.0

I Installed Hadoop 2.4.X. As expected there is no JobTracker and TaskTracker. Its Yarn based. Is there any way to make it use old JobTracker and TaskTracker for MapReduce and not based on Yarn ? In short can I make JT and TT daemons running on this ?
By default there is no configuration file for map reduce in the 2.4.x installation even though there is a file called mapred-site.xml.template.Rename the file to mapred-site.xml and remember to set the property mapred.framework.name to classic to use the job tracker and tasktracker.Also the start scripts start-all.sh cannot be used as it executes the scripts start-dfs.sh and start-yarn.sh.You need to execute the script that starts jobtracker and tasktracker.
As described above, there is no Jobtracker and Tasktracer in Hadoop 2.0 (yarn). It's better to follow this instruction (http://codesfusion.blogspot.in/2013/10/setup-hadoop-2x-220-on-ubuntu.html) to get the idea, and you will find the processes are as:
25578 ResourceManager
25411 SecondaryNameNode
447 Jps
29464 NameNode
25222 DataNode
25905 NodeManager

Zookeer is part of hadoop or separate configuration?

As I read from various tuts, zookeeper helps to coordinate and sync various hadoop clusters.
Currently I installed hadoop 2.5.0. When I do jps it displays
4494 SecondaryNameNode
8683 Jps
4679 ResourceManager
3921 NameNode
4174 DataNode
4943 NodeManager
no process for zookeeper.
I had doubt whether zookeeper is part of hdfs or we need to install it manually?
If you use hadoop only, zookeeper is not required! for other tools in hadoop, i.e. hbase, it depends on zookeeper! but you don't need install it dedicatedly, hbase has included it, if you startup hbase, the zookeeper will startup at the same time.

where is the hadoop task manager UI

I installed the hadoop 2.2 system on my ubuntu box using this tutorial
http://codesfusion.blogspot.com/2013/11/hadoop-2x-core-hdfs-and-yarn-components.html
Everything worked fine for me and now when I do
http://localhost:50070
I can see the management UI for HDFS. Very good!!
But the I am going through another tutorial which tells me that there must be a task manager UI running at http://mymachine.com:50030 and http://mymachine.com:50060
on my machine I cannot open these ports.
I have already done
start-dfs.sh
start-yarn.sh
start-all.sh
is something wrong? why can't I see the task manager UI?
You have installed YARN (MRv2) which runs the ResourceManager. The URL http://mymachine.com:50030 is the web address for the JobTracker daemon that comes with MRv1 and hence you are not able to see it.
To see the ResourceManager UI, check your yarn-site.xml file for the following property:
yarn.resourcemanager.webapp.address
By default, it should point to : resource_manager_hostname:8088
Assuming your ResourceManager runs on mymachine, you should see the ResourceManager UI at http://mymachine.com:8088/
Make sure all your deamons are up and running before you visit the URL for the ResourceManager.
For Hadoop 2[aka YARN/MRV2] - Any hadoop installation version-ed 2.x or higher its at port number 8088. eg. localhost:8088
For Hadoop 1 - Any hadoop installation version-ed lower than 2.x[eg 1.x or 0.x] its at port number 50030. eg localhost:50030
By default HadoopUI location is as below
http://mymachine.com:50070

Resources