Zookeer is part of hadoop or separate configuration? - hadoop

As I read from various tuts, zookeeper helps to coordinate and sync various hadoop clusters.
Currently I installed hadoop 2.5.0. When I do jps it displays
4494 SecondaryNameNode
8683 Jps
4679 ResourceManager
3921 NameNode
4174 DataNode
4943 NodeManager
no process for zookeeper.
I had doubt whether zookeeper is part of hdfs or we need to install it manually?

If you use hadoop only, zookeeper is not required! for other tools in hadoop, i.e. hbase, it depends on zookeeper! but you don't need install it dedicatedly, hbase has included it, if you startup hbase, the zookeeper will startup at the same time.

Related

Hadoop Multi-Cluster Installation: Unable to see the data nodes despite seeing daemons running on them

I am trying to set of a multi-node hadoop cluster using Hadoop 3.0.0. There is no straightforward documentation on this so I had to read a lot of blogs. I am at a point where when I run start-all.sh I see daemon processes appearing in the name node as well as data nodes. However, when I go to http://namenode:9870 I see 0 live nodes.
To be more specific when I run start-all.sh I see
and I when I run jps I see NameNode, SecondaryNameNode and ResourceManager processes are running. On data nodes running jps shows DataNode and NodeManager are running.
What I get on the url is
Any guidance is greatly appreciated.
Thanks

Why we are configuring mapred.job.tracker in YARN?

What I know is YARN is introduced and it replaced JobTracker and TaskTracker.
I have seen is some Hadoop 2.6.0/2.7.0 installation tutorials and they are configuring mapreduce.framework.name as yarn and mapred.job.tracker property as local or host:port.
The description for mapred.job.tracker property is
"The host and port that the MapReduce job tracker runs at. If "local",
then jobs are run in-process as a single map and reduce task."
My doubt is why are configuring it if we are using YARN , I mean JobTracker shouldn't be running right?
Forgive me if my question is dumb.
Edit: These are the tutorials I was talking about.
http://chaalpritam.blogspot.in/2015/01/hadoop-260-multi-node-cluster-setup-on.html
http://pingax.com/install-apache-hadoop-ubuntu-cluster-setup/
https://chawlasumit.wordpress.com/2015/03/09/install-a-multi-node-hadoop-cluster-on-ubuntu-14-04/
This is just a guess, but either those tutorials talking about configuring the JobTracker in YARN are written by people who don't know what YARN is, or they set it in case you decide to stop working with YARN someday. You are right: the JobTracker and TaskTracker do not exist in YARN. You can add the properties if you want, but they will be ignored. New properties for each of the components replacing the JobTracker and the TaskTracker were added with YARN, such as yarn.resourcemanager.address to replace mapred.jobtracker.address.
If you list your Java processes when running Hadoop under YARN, you see no JobTrackeror TaskTracker:
10561 Jps
20605 NameNode
17176 DataNode
18521 ResourceManager
19625 NodeManager
18424 JobHistoryServer
You can read more about how YARN works here.

unable to see Task tracker and Jobtracker after Hadoop single node installation 2.5.1

Iam new to Hadoop 2.5.1. As i have already installed Hadoop 1.0.4 previously, i thought installation process would be same so followed following tutorial.
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Every thing was fine, even i have given these settings in core-site.xml
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
But i have seen in several sites this value as 9000.
And also changes in yarn.xml.
Still everything works fine when i run a mapreduce job. But my question is
when i run command jps it gives me this output..
hduser#secondmaster:~$ jps
5178 ResourceManager
5038 SecondaryNameNode
4863 DataNode
5301 NodeManager
4719 NameNode
6683 Jps
I dont see task tracker and job tracker in jps. Where are these demons running.
And without these deamons how am i able to run Mapreduce job.
Thanks,
Sreelatha K.
From hadoop version hadoop 2.0 onwards, default processing framework has been changed to YARN from Classic Mapreduce. You are using YARN, where you cannot see Jobtracker, Tasker in YARN. Jobtracker and Tasktracker is replaced by Resource manager and Nodemanager respectively in YARN.
But still you have an option to use Classic Mapreduce framework instead of YARN.
In Hadoop 2 there is an alternative method to run MapReduce jobs, called YARN. Since you have made changes in yarn.xml, MapReduce processing happens using YARN, not using the traditional MapReduce framework. That's probably be the reason why you don't see TaskTracker and JobTracker listed after executing the jps command. Note that ResourceManager and NodeManager are the daemons for YARN.
YARN is next generation of Resource Manager who can able to integrate with Apache spark, storm and many more tools you can use to write map-reduce jobs

Job Tracker and TaskTracker in Hadoop2.0

I Installed Hadoop 2.4.X. As expected there is no JobTracker and TaskTracker. Its Yarn based. Is there any way to make it use old JobTracker and TaskTracker for MapReduce and not based on Yarn ? In short can I make JT and TT daemons running on this ?
By default there is no configuration file for map reduce in the 2.4.x installation even though there is a file called mapred-site.xml.template.Rename the file to mapred-site.xml and remember to set the property mapred.framework.name to classic to use the job tracker and tasktracker.Also the start scripts start-all.sh cannot be used as it executes the scripts start-dfs.sh and start-yarn.sh.You need to execute the script that starts jobtracker and tasktracker.
As described above, there is no Jobtracker and Tasktracer in Hadoop 2.0 (yarn). It's better to follow this instruction (http://codesfusion.blogspot.in/2013/10/setup-hadoop-2x-220-on-ubuntu.html) to get the idea, and you will find the processes are as:
25578 ResourceManager
25411 SecondaryNameNode
447 Jps
29464 NameNode
25222 DataNode
25905 NodeManager

Hadoop: pseudo cluster, adding datanode

am trying to install a multiple pseudo nodes for an experimental cluster. The reason is simple: I just have only one machine in my office.
Therefore, i followed this guide: and especially the answer of Matt:
http://search-hadoop.com/m/sApJY1zWgQV/
I created an additional folder conf2
1.1. In hadoop-env.sh, i edited HADOOP_IDENT_STRING to ${USER}_02
1.2. I changed the data.dir in hdfs-site.xml
1.3. In hdfs-site.xml i changed the port of:
dfs.datanode.address (default 0.0.0.0:50010)
dfs.datanode.ipc.address (default 0.0.0.0:50020)
dfs.datanode.http.address (default 0.0.0.0:50075)
dfs.datanode.https.address (default 0.0.0.0:50475)
I tried the command: "./hadoop-daemons.sh --config ../conf2 start datanode"
on my current single node hadoop system
The error is still: "localhost: datanode running as process 42855. Stop it first."
The jps command says:
:~/hadoop/bin$ jps
2255 Jps
43412 SecondaryNameNode
43853 TaskTracker
42855 DataNode
43544 JobTracker
42537 NameNode
Does anyone have an idea how i could trick my hadoop system to accept the additional data node now?
thanks alot

Resources