./start-all.sh is not working to start all the nodes in single hadoop cluster - installation

./start-all.sh is only starting resource manager and for namenode,datanode and secondarynamenode i need to cd hadoop-daemon.sh and start each of them separately and same with the node manager need to start with ./yarn-daemon.sh start nodemanager I am not able to start all these nodes with ./start-all.sh, how can we make all node to start with the single command ./start-all.sh
And I set the path in .bashrc as --> PATH=$PATH:/home/kirankathe/hadoop-3.3.0/sbin
can anyone help me in getting out of this, Thanks in advance

Related

Datanode is not starting in hadoop-hbase start?

I am running the following script to run all the hbase and hadoop processes in my hbase setup in virtual machine.
#!/bin/sh
start-dfs.sh
start-yarn.sh
start-hbase.sh
#hbase-daemon.sh start rest
hbase-daemon.sh start thriftr
Earlier all the processes used to run properly. But recently, I have force shutdown my virtual machine process without stopping the hbase and hadoop related processes. Then my datanode process stopped. Later I have formatted my name node process, using some suggestion on online. Now my name node comes properly but data node process does not come up. When I check the running java process (JPS) the datanode process is missing
4672 NodeManager
5474 ThriftServer
4098 NameNode
4408 SecondaryNameNode
5723 Jps
4555 ResourceManager
5372 HRegionServer
5246 HMaster
5182 HQuorumPeer
But earlier the DataNode process used to come properly. Is it because of formatting my namenode. Do I need to change any config data or someting also?

Hadoop Multi-Cluster Installation: Unable to see the data nodes despite seeing daemons running on them

I am trying to set of a multi-node hadoop cluster using Hadoop 3.0.0. There is no straightforward documentation on this so I had to read a lot of blogs. I am at a point where when I run start-all.sh I see daemon processes appearing in the name node as well as data nodes. However, when I go to http://namenode:9870 I see 0 live nodes.
To be more specific when I run start-all.sh I see
and I when I run jps I see NameNode, SecondaryNameNode and ResourceManager processes are running. On data nodes running jps shows DataNode and NodeManager are running.
What I get on the url is
Any guidance is greatly appreciated.
Thanks

Hadoop - Restart datanode and tasktracker

I want to bring down a single datanode and tasktracker, so that some new changes that i've made in my mapred-site.xml take effect, such as mapred.reduce.child.java.opts etc. How do I do that? However I don't want to bring down the whole cluster since i have active jobs running.
Also, how can that be done ensuring that the namenode does not copy the relevant data blocks of a "temporarily down" datanode onto another node
To stop
You can stop the DataNodes and TaskTrackers from NameNode's hadoop bin directory.
./hadoop-daemon.sh stop tasktracker
./hadoop-daemon.sh stop datanode
So this script checks for slaves file in conf directory of hadoop to stop the DataNodes and same with the TaskTracker.
To start
Again this script checks for slaves file in conf directory of hadoop to start the DataNodes and TaskTrackers.
./hadoop-daemon.sh start tasktracker
./hadoop-daemon.sh start datanode
In Hadoop 2.7.2, tasktracker is long gone, to manually restart services out on slaves:
yarn-daemon.sh stop nodemanager
hadoop-daemon.sh stop datanode
hadoop-daemon.sh start datanode
yarn-daemon.sh start nodemanager
Ssh into the datanode/tasktracker machine and cd into the bin directory of hadoop.
Invoke
./hadoop-daemon.sh stop tasktracker
./hadoop-daemon.sh stop datanode
./hadoop-daemon.sh start datanode
./hadoop-daemon.sh start tasktracker
I'm not sure if restarting the tasktracker is required for the changes in mapred-site.xml to take effect. Please leave a comment so that i can correct my answer if needed

What is best way to start and stop hadoop ecosystem, with command line?

I see there are several ways we can start hadoop ecosystem,
start-all.sh & stop-all.sh
Which say it's deprecated use start-dfs.sh & start-yarn.sh.
start-dfs.sh, stop-dfs.sh and start-yarn.sh, stop-yarn.sh
hadoop-daemon.sh namenode/datanode and yarn-deamon.sh resourcemanager
EDIT: I think there has to be some specific use cases for each command.
start-all.sh & stop-all.sh : Used to start and stop hadoop daemons all at once. Issuing it on the master machine will start/stop the daemons on all the nodes of a cluster. Deprecated as you have already noticed.
start-dfs.sh, stop-dfs.sh and start-yarn.sh, stop-yarn.sh : Same as above but start/stop HDFS and YARN daemons separately on all the nodes from the master machine. It is advisable to use these commands now over start-all.sh & stop-all.sh
hadoop-daemon.sh namenode/datanode and yarn-deamon.sh resourcemanager : To start individual daemons on an individual machine manually. You need to go to a particular node and issue these commands.
Use case : Suppose you have added a new DN to your cluster and you need to start the DN daemon only on this machine,
bin/hadoop-daemon.sh start datanode
Note : You should have ssh enabled if you want to start all the daemons on all the nodes from one machine.
Hope this answers your query.
From Hadoop page,
start-all.sh
This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on your machine.
start-dfs.sh
This will bring up HDFS with the Namenode running on the machine you ran the command on. On such a machine you would need start-mapred.sh to separately start the job tracker
start-all.sh/stop-all.sh has to be run on the master node
You would use start-all.sh on a single node cluster (i.e. where you would have all the services on the same node.The namenode is also the datanode and is the master node).
In multi-node setup,
You will use start-all.sh on the master node and would start what is necessary on the slaves as well.
Alternatively,
Use start-dfs.sh on the node you want the Namenode to run on. This will bring up HDFS with the Namenode running on the machine you ran the command on and Datanodes on the machines listed in the slaves file.
Use start-mapred.sh on the machine you plan to run the Jobtracker on. This will bring up the Map/Reduce cluster with Jobtracker running on the machine you ran the command on and Tasktrackers running on machines listed in the slaves file.
hadoop-daemon.sh as stated by Tariq is used on each individual node. The master node will not start the services on the slaves.In a single node setup this will act same as start-all.sh.In a multi-node setup you will have to access each node (master as well as slaves) and execute on each of them.
Have a look at this start-all.sh it call config followed by dfs and mapred
Starting
start-dfs.sh (starts the namenode and the datanode)
start-mapred.sh (starts the jobtracker and the tasktracker)
Stopping
stop-dfs.sh
stop-mapred.sh

cluster not working with cdh4 tarball installation

I am trying with installing CDH4 using tarball version , but facing issues as in steps taken by me are as below :
i downloaded tarball from link https://ccp.cloudera.com/display/SUPPORT/CDH4+Downloadable+Tarballs
i first untar the hadoop-0.20-mapreduce-0.20.2+1341 tar file
i did with configuration changes in
hadoop-0.20-mapreduce-0.20.2+1341 since i wanted mrv1 not yarn .
the first thing as per mentioned in cdh4 installation was to configure HDFS
i made the relevant changes in
core-site.xml
hdfs-site.xml
mapred-site.xml
masters --- which is my namenode
slaves ---- my datanodes
copied the hadoop configurations on all the nodes in the cluster
did a namenode format .
after format i had to start the cluster , but in the bin folder could not
find start-all.sh script . so in that case i started with command
bin/start-mapred.sh
in the logs it shows jobtracker started and tasktracker started on slave nodes
but when i do a jps
i can see only
jobtracker
jps
further going did a datanode start on the datanode with below command
bin/hadoop-daemon.sh start datanode .
it shows datanode started .
Namenode not getting started , tasktracker not getting started .
when i checked with my logs i could see
ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.FileNotFoundException: webapps/hdfs not found in CLASSPATH
not sure what is stopping my cluster to work .
earlier i had a cdh3 running . so i stopped the cdh3 cluster . Then i started with installing cdh4 . Also i changed all the directories hdfs-site.xml i.e. pointed it new empty directories for namenode and datanode and not the used the ones defined in cdh3.
but still nothing seems to help .
Also i turned off firewall since i do have a root access but same thing it did not work for me .
Any help on above will be great help.
thank you for kind reply but
I do not have
start-dfs.sh file in bin folder
only files in /home/hadoop-2.0.0-mr1-cdh4.2.0/bin folder are as
start-mapred.sh
stop-mapred.sh
hadoop-daemon.sh
hadoop-daemons.sh
hadoop-config.sh
rcc
slaves.sh
hadoop
command now i am using are as below
for starting datanode :
for x in /home/hadoop-2.0.0-mr1-cdh4.2.0/bin/hadoop-* ; do $x start datanode ; done ;
for starting namenode :
bin/start-mapred.sh
still i am working on the same issue .
Hi sorry for the above misunderstanding the following commands can be run to start your datanodes and namenode
To start namenode:
hadoop-daemon.sh start namenode
To start datanode:
hadoop-daemons.sh start datanode
To start secondarynamenode:
hadoop-daemons.sh --hosts masters start secondarynamenode
The jobtracker demon will get started in your master node and tasktraker demons will get started in each of your datanodes after you run the command
bin/start-mapred.sh
In Hadoop Cluster Setup only jobtacker demon will be show by JPS command in masternode and in each of your datanodes you can see Tasktracker demons runnig by using JPS command.
Then you have to start HDFS by running the following command in your masternode
bin/start-dfs.sh
This command will start namenode demon in you namenode machine (in this configuration your masternode itself I believe) and Datanode demons are started in each of your slave nodes.
Now you can run JPS on each of your datanodes and it will give output
tasktracker
datanode
jps
I think this link will be usefull
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Resources