Hadoop: pseudo cluster, adding datanode - hadoop

am trying to install a multiple pseudo nodes for an experimental cluster. The reason is simple: I just have only one machine in my office.
Therefore, i followed this guide: and especially the answer of Matt:
http://search-hadoop.com/m/sApJY1zWgQV/
I created an additional folder conf2
1.1. In hadoop-env.sh, i edited HADOOP_IDENT_STRING to ${USER}_02
1.2. I changed the data.dir in hdfs-site.xml
1.3. In hdfs-site.xml i changed the port of:
dfs.datanode.address (default 0.0.0.0:50010)
dfs.datanode.ipc.address (default 0.0.0.0:50020)
dfs.datanode.http.address (default 0.0.0.0:50075)
dfs.datanode.https.address (default 0.0.0.0:50475)
I tried the command: "./hadoop-daemons.sh --config ../conf2 start datanode"
on my current single node hadoop system
The error is still: "localhost: datanode running as process 42855. Stop it first."
The jps command says:
:~/hadoop/bin$ jps
2255 Jps
43412 SecondaryNameNode
43853 TaskTracker
42855 DataNode
43544 JobTracker
42537 NameNode
Does anyone have an idea how i could trick my hadoop system to accept the additional data node now?
thanks alot

Related

Secondary name node is not displaying when I hit JPS command

I have Hadoop-3.1.3 and I can upload a file in hadoop pseudo distributed mode, also can display the contents of file.
but when I call jps command i am getting the following output
10912 DataNode
13072 ResourceManager
4480 NodeManager
6584 Jps
664 Namenode
I am unable to find secondary name node, is there a problem with any configuration or hadoop installation?
You're assuming that secondary namenode is started with psuedo-distributed?
If the basic commands work, then its fine.
You need to look at log files to know if something is broken, before asking elsewhere....
In general, I always suggest you use Apache Ambari to provision a Hadoop cluster
You can start the Secondary NameNode manually and observe the start up logs to see if there's anything wrong:
hdfs secondarynamenode
If there's no error, run jps again and hopefully you see SecondaryNameNode listed.
I'd suggest running hdfs --help and checking out all of the options, there's a lot of good stuff there.

Hadoop Multi-Cluster Installation: Unable to see the data nodes despite seeing daemons running on them

I am trying to set of a multi-node hadoop cluster using Hadoop 3.0.0. There is no straightforward documentation on this so I had to read a lot of blogs. I am at a point where when I run start-all.sh I see daemon processes appearing in the name node as well as data nodes. However, when I go to http://namenode:9870 I see 0 live nodes.
To be more specific when I run start-all.sh I see
and I when I run jps I see NameNode, SecondaryNameNode and ResourceManager processes are running. On data nodes running jps shows DataNode and NodeManager are running.
What I get on the url is
Any guidance is greatly appreciated.
Thanks

Job Tracker and TaskTracker in Hadoop2.0

I Installed Hadoop 2.4.X. As expected there is no JobTracker and TaskTracker. Its Yarn based. Is there any way to make it use old JobTracker and TaskTracker for MapReduce and not based on Yarn ? In short can I make JT and TT daemons running on this ?
By default there is no configuration file for map reduce in the 2.4.x installation even though there is a file called mapred-site.xml.template.Rename the file to mapred-site.xml and remember to set the property mapred.framework.name to classic to use the job tracker and tasktracker.Also the start scripts start-all.sh cannot be used as it executes the scripts start-dfs.sh and start-yarn.sh.You need to execute the script that starts jobtracker and tasktracker.
As described above, there is no Jobtracker and Tasktracer in Hadoop 2.0 (yarn). It's better to follow this instruction (http://codesfusion.blogspot.in/2013/10/setup-hadoop-2x-220-on-ubuntu.html) to get the idea, and you will find the processes are as:
25578 ResourceManager
25411 SecondaryNameNode
447 Jps
29464 NameNode
25222 DataNode
25905 NodeManager

Zookeer is part of hadoop or separate configuration?

As I read from various tuts, zookeeper helps to coordinate and sync various hadoop clusters.
Currently I installed hadoop 2.5.0. When I do jps it displays
4494 SecondaryNameNode
8683 Jps
4679 ResourceManager
3921 NameNode
4174 DataNode
4943 NodeManager
no process for zookeeper.
I had doubt whether zookeeper is part of hdfs or we need to install it manually?
If you use hadoop only, zookeeper is not required! for other tools in hadoop, i.e. hbase, it depends on zookeeper! but you don't need install it dedicatedly, hbase has included it, if you startup hbase, the zookeeper will startup at the same time.

cluster not working with cdh4 tarball installation

I am trying with installing CDH4 using tarball version , but facing issues as in steps taken by me are as below :
i downloaded tarball from link https://ccp.cloudera.com/display/SUPPORT/CDH4+Downloadable+Tarballs
i first untar the hadoop-0.20-mapreduce-0.20.2+1341 tar file
i did with configuration changes in
hadoop-0.20-mapreduce-0.20.2+1341 since i wanted mrv1 not yarn .
the first thing as per mentioned in cdh4 installation was to configure HDFS
i made the relevant changes in
core-site.xml
hdfs-site.xml
mapred-site.xml
masters --- which is my namenode
slaves ---- my datanodes
copied the hadoop configurations on all the nodes in the cluster
did a namenode format .
after format i had to start the cluster , but in the bin folder could not
find start-all.sh script . so in that case i started with command
bin/start-mapred.sh
in the logs it shows jobtracker started and tasktracker started on slave nodes
but when i do a jps
i can see only
jobtracker
jps
further going did a datanode start on the datanode with below command
bin/hadoop-daemon.sh start datanode .
it shows datanode started .
Namenode not getting started , tasktracker not getting started .
when i checked with my logs i could see
ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.FileNotFoundException: webapps/hdfs not found in CLASSPATH
not sure what is stopping my cluster to work .
earlier i had a cdh3 running . so i stopped the cdh3 cluster . Then i started with installing cdh4 . Also i changed all the directories hdfs-site.xml i.e. pointed it new empty directories for namenode and datanode and not the used the ones defined in cdh3.
but still nothing seems to help .
Also i turned off firewall since i do have a root access but same thing it did not work for me .
Any help on above will be great help.
thank you for kind reply but
I do not have
start-dfs.sh file in bin folder
only files in /home/hadoop-2.0.0-mr1-cdh4.2.0/bin folder are as
start-mapred.sh
stop-mapred.sh
hadoop-daemon.sh
hadoop-daemons.sh
hadoop-config.sh
rcc
slaves.sh
hadoop
command now i am using are as below
for starting datanode :
for x in /home/hadoop-2.0.0-mr1-cdh4.2.0/bin/hadoop-* ; do $x start datanode ; done ;
for starting namenode :
bin/start-mapred.sh
still i am working on the same issue .
Hi sorry for the above misunderstanding the following commands can be run to start your datanodes and namenode
To start namenode:
hadoop-daemon.sh start namenode
To start datanode:
hadoop-daemons.sh start datanode
To start secondarynamenode:
hadoop-daemons.sh --hosts masters start secondarynamenode
The jobtracker demon will get started in your master node and tasktraker demons will get started in each of your datanodes after you run the command
bin/start-mapred.sh
In Hadoop Cluster Setup only jobtacker demon will be show by JPS command in masternode and in each of your datanodes you can see Tasktracker demons runnig by using JPS command.
Then you have to start HDFS by running the following command in your masternode
bin/start-dfs.sh
This command will start namenode demon in you namenode machine (in this configuration your masternode itself I believe) and Datanode demons are started in each of your slave nodes.
Now you can run JPS on each of your datanodes and it will give output
tasktracker
datanode
jps
I think this link will be usefull
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Resources