Job tracker and Task tracker don't sow up when ran the start-all.sh command in ububtu for hadoop - hadoop

Job tracker and Task tracker don't sow up when ran the start-all.sh command in ububtu for hadoop
I do get the rest of the processes while i run the "JPS" command in unix.
Not sure why i am not being shown with the job tracker and task tracker.Have been following couple of links and couldn't get my prob sorted.
Steps done :
-Multiple times formatted the namenode
-Multiple time deleted and recreated the tmp folder with appropriate permissions.
What could be the issue ?
Any suggestions would really help me as i am struggling in setting up hadoop on my laptop.I am new to it though.

Try starting jobtracker and tasktracker separately.
From your hadoop HOME directory run
. bin/../libexec/hadoop-config.sh
Then from hadoop BIN directory run
hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
hadoop-daemon.sh --config $HADOOP_CONF_DIR start tasktracker

You must have been using hadoop 2.x version where jobtracker is replaced with YARN resource manager. Using jps(jdk is needed) you can check whether resouce manager is running. If it is running then the default url for it is (host-name):8088. You can check your nodes,jobs also configuration there.If not running then start them with sbin/start-yarn.sh.

Related

start-all.sh command not found

I have just installed Cloudera VM setup for hadoop. But when I open the command prompt and want to start all daemons for hadoop using command 'start-all.sh' , I get an error stating "bash : start-all.sh: command not found".
I have tried 'start-dfs.sh' too yet still gives the same error. When I use 'jps' command, I can see that none of the daemons have been started.
You can find start-all.sh and start-dfs.sh scripts in bin or sbin folders. You can use the following command to find that. Go to hadoop installation folder and run this command.
find . -name 'start-all.sh' # Finds files having name similar to start-all.sh
Then you can specify the path to start all the daemons using bash /path/to/start-all.sh
If you're using the QuickStart VM then the right way to start the cluster (as #cricket_007 hinted) is by restarting it in the Cloudera Manager UI. The start-all.sh scripts will not work since those only apply to the Hadoop servers (Name Node, Data Node, Resource Manager, Node Manager ...) but not all the services in the ecosystem (like Hive, Impala, Spark, Oozie, Hue ...).
You can refer to the YouTube video and the official documentation Starting, Stopping, Refreshing, and Restarting a Cluster

Do we need to put namenode in safe mode before restarting the job tracker?

I have a Hadoop cluster running Cloudera's CDH3, Apache Hadoop's 0.20.2 equivalent. I want to restart the job-tracker as there are some jobs which are not getting killed. I tried killing them from the command line, the command executes successfully, but the jobs are still in Job Cleanup: Pending status. Anyways I want to restart the job-tracker and see if that cleanup the jobs. I know the command to restart the job-tracker, but I am not sure if I need to put the name-node in safe-mode before I restart the job-tracker.
You can try to kill the unwanted jobs using hadoop job -kill <Job-ID> and check for command status echo "$?". If that doesn't work, Restart is the only option.
Hadoop Jobtracker and namenodes are independent components, No need to execute namenode safenode before Jobtracker restart. You can restart Jobtracker process alone.(tasktracker if required)

Job Tracker and TaskTracker in Hadoop2.0

I Installed Hadoop 2.4.X. As expected there is no JobTracker and TaskTracker. Its Yarn based. Is there any way to make it use old JobTracker and TaskTracker for MapReduce and not based on Yarn ? In short can I make JT and TT daemons running on this ?
By default there is no configuration file for map reduce in the 2.4.x installation even though there is a file called mapred-site.xml.template.Rename the file to mapred-site.xml and remember to set the property mapred.framework.name to classic to use the job tracker and tasktracker.Also the start scripts start-all.sh cannot be used as it executes the scripts start-dfs.sh and start-yarn.sh.You need to execute the script that starts jobtracker and tasktracker.
As described above, there is no Jobtracker and Tasktracer in Hadoop 2.0 (yarn). It's better to follow this instruction (http://codesfusion.blogspot.in/2013/10/setup-hadoop-2x-220-on-ubuntu.html) to get the idea, and you will find the processes are as:
25578 ResourceManager
25411 SecondaryNameNode
447 Jps
29464 NameNode
25222 DataNode
25905 NodeManager

Hadoop Configuration

I have started configuring Hadoop 2.1.0-beta version for single node. I followed steps mentioned in Michael Noll's Tutorial (http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#configuring-single-node-clusters-first). Every thing I did and configured well. As a result of JPS, I got that NameNode, DataNode, Secondary NameNode started fine. Then I found out that there is no start-mapred.sh script. So I tried starting the jobtracker using hadoop-daemons.sh (hadoop-daemon.sh --config /home/nayan/dev/hadoop/etc/hadoop/ start jobtracker) and it resulted in failure with message "Sorry, the jobtracker command is no longer supported. You may find similar functionality with the "yarn" shell command.". I do not know what all configuration changes (if any) I need to make. I made changes in "yarn-site.xml" file, as suggested in Hadoop:The Definitive Guide. But could not proceed further. Where can I find out about Yarn. I checked Apache site, but could not figure it out.
You need to check your configuration xml files. Sometimes if you have any problrm in xml then some daemons wont start.
and try to use ./start-all.sh and then JPS
you can use start-yarn.sh to start the ResourceManger and Jobtracker daemons
I usually start everything using these two commands
./start-dfs.sh
./start-yarn.sh
You Should use start-dfs.sh for Hdfs Daemons and start-yarn.sh for Resource manager and nodemanager daemon both are in /bin of hadoop.
./start-dfs.sh or start-dfs.sh will start only HDFS components , while ./start-yarn.sh or start-yarn.sh will start Yarn component like NodeManager , Resource manager etc. If you don't want to start both the components separately , try using this command :
./start-all.sh or start-all.sh (This is deprecated command though).
To answer your question , use ./start-yarn.sh
Cheers!
First have to start the yarn daemons in the YARN( HADOOP 2.x) Environment.
So start with this
at /hadoop_installed_path/sbin$ ./start-yarn.sh
Once the yarn daemons started then we can start df daemons
at /hadoop_installed_path/sbin$ ./start-dfs.sh
1.You should check all the steps in Hadoop The definitive guide.
if it's all proper than use start-all.sh
than run jps.
2.some time You have to close console for reflecting your changes.so close the console and reopen it again and then try jps,
hope this will help.

Need help adding multiple DataNodes in pseudo-distributed mode (one machine), using Hadoop-0.18.0

I am a student, interested in Hadoop and started to explore it recently.
I tried adding an additional DataNode in the pseudo-distributed mode but failed.
I am following the Yahoo developer tutorial and so the version of Hadoop I am using is hadoop-0.18.0
I tried to start up using 2 methods I found online:
Method 1 (link)
I have a problem with this line
bin/hadoop-daemon.sh --script bin/hdfs $1 datanode $DN_CONF_OPTS
--script bin/hdfs doesn't seem to be valid in the version I am using. I changed it to --config $HADOOP_HOME/conf2 with all the configuration files in that directory, but when the script is ran it gave the error:
Usage: Java DataNode [-rollback]
Any idea what does the error mean? The log files are created but DataNode did not start.
Method 2 (link)
Basically I duplicated conf folder to conf2 folder, making necessary changes documented on the website to hadoop-site.xml and hadoop-env.sh. then I ran the command
./hadoop-daemon.sh --config ..../conf2 start datanode
it gives the error:
datanode running as process 4190. stop it first.
So I guess this is the 1st DataNode that was started, and the command failed to start another DataNode.
Is there anything I can do to start additional DataNode in the Yahoo VM Hadoop environment? Any help/advice would be greatly appreciated.
Hadoop start/stop scripts use /tmp as a default directory for storing PIDs of already started daemons. In your situation, when you start second datanode, startup script finds /tmp/hadoop-someuser-datanode.pid file from the first datanode and assumes that the datanode daemon is already started.
The plain solution is to set HADOOP_PID_DIR env variable to something else (but not /tmp). Also do not forget to update all network port numbers in conf2.
The smart solution is start a second VM with hadoop environment and join them in a single cluster. It's the way hadoop is intended to use.

Resources