Job Tracker web interface - hadoop

I followed the tutorialshttp://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html and installed hadoop 2.4.1 as pseudo distributed cluster. I created a ubuntu VM using OracleVM and installed hadoop as mentioned in the link. It was setup fine and able to run the examples. However the job tracker URL is not working. :50030 gives page not found. I also tried netstat on the server and there is no process waiting on 50030 port . Do i need to start any other service ? What are the possible reasons ?

You need to execute this:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
Or JobTracker won't start.
(In my case, $HADOOP_HOME is in /usr/local/hadoop)

Check the value of mapred.job.tracker.http.address in mapred-site.xml
If the port is different, use that.
Also check if jobtracker is running. Check the jobtracker logs.

You need to enter the following command
http://localhost:50030/
Job Tracker web UI.

Related

how to start and check job history on hadoop 2.5.2

in the mapreduce webconsole for each application there is a tracking ui link which points to xx:19888/jobhistory/, but how to start the service on 19888 (i have started 4 services: yarn-resource-manager, yarn-node-manager, hdfs-name-node, hdfs-data-node, what i have missed?)
is the jobtracker removed in 2.5.2
I want to check the job.xml generated for my job, where can i find it. I have specified "mapreduce.jobtracker.jobhistory.location" but nothing is there
Thank you.
To access the JobHistory server's web interface then you have start the hadoop-mapreduce-historyserver service, which will bind to 19888 by default.
If your are running YARN in the cluster then you don't need jobtracker anymore, the work done jobtracker is offloaded to ResourceManager, NodeManager's & ApplicationMaster's. But, you could still install just MRv1 in that case you will install JobTracker and TaskTracker's (which is not recommended).
You could check the job.xml from the ResourceManager's UI by navigating to http://RESOURCEMANAGER_HOST:8088/cluster -> select your application's Tracking UI -> Select your Job ID -> On the left tab you'll be able to see Configuration. Or simply if you already know you'r job's id then visit this link: http://JOBHISTORY_SERVER:19888/jobhistory/conf/YOUR_JOB_ID.
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh --config $HADOOP_HOME/etc/hadoop start historyserver
then run jps to see if it is running.
Please try 'jps' command to verify all the services are running.
try //jhs_host:port/ Default HTTP port is 19888."
Make sure that your hadoop services are running.

Spark Standalone Mode: Worker not starting properly in cloudera

I am new to the spark, After installing the spark using parcels available in the cloudera manager.
I have configured the files as shown in the below link from cloudera enterprise:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Installation-Guide/cmig_spark_installation_standalone.html
After this setup, I have started all the nodes in the spark by running /opt/cloudera/parcels/SPARK/lib/spark/sbin/start-all.sh. But I couldn't run the worker nodes as I got the specified error below.
[root#localhost sbin]# sh start-all.sh
org.apache.spark.deploy.master.Master running as process 32405. Stop it first.
root#localhost.localdomain's password:
localhost.localdomain: starting org.apache.spark.deploy.worker.Worker, logging to /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain: failed to launch org.apache.spark.deploy.worker.Worker:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
localhost.localdomain: at gnu.java.lang.MainThread.run(libgcj.so.10)
localhost.localdomain: full log in /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain:starting org.apac
When I run jps command, I got:
23367 Jps
28053 QuorumPeerMain
28218 SecondaryNameNode
32405 Master
28148 DataNode
7852 Main
28159 NameNode
I couldn't run the worker node properly. Actually I thought to install a standalone spark where the master and worker work on a single machine. In slaves file of spark directory, I given the address as "localhost.localdomin" which is my host name. I am not aware of this settings file. Please any one cloud help me out with this installation process. Actually I couldn't run the worker nodes. But I can start the master node.
Thanks & Regards,
bips
Please notice error info below:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
I met the same error when I installed and started Spark master/workers on CentOS 6.2 x86_64 after making sure that libgcj.x86_64 and libgcj.i686 had been installed on my server, finally I solved it. Below is my solution, wish it can help you.
It seem as if your JAVA_HOME environment parameter didn't set correctly.
Maybe, your JAVA_HOME links to system embedded java, e.g. java version "1.5.0".
Spark needs java version >= 1.6.0. If you are using java 1.5.0 to start Spark, you will see this error info.
Try to export JAVA_HOME="your java home path", then start Spark again.

Hadoop 2.2.0 Web UI not showing Job Progress

I have installed Single node hadoop 2.2.0 from this link. When i run a job from terminal, it works fine with output. Web UI's i used
- Resource Manager : http://localhost:8088
- Namenode Daemon : http://localhost:50070
But from Resource Manager's web UI(shown above) i can't see job progress like Submitted Jobs, Running Jobs, etc..
MY /etc/hosts file is as follows:
127.0.0.1 localhost
127.0.1.1 meitpict
My System has IP: 192.168.2.96(I tried by removing this ip but still it didn't worked)
The only host:port i mentioned is in core-site.xml and that is:
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:54310</value>
</property>
Even i get these problems while executing Map - Reduce job.
Best what i did that while executing Map-Reduce job you will get link on the box from which you executed MapR job for checking the job progress something like below:
http://node1:19888/jobhistory/job/job_1409462238335_0002/mapreduce/app
Here node1 is set to 192.168.56.101 in my hosts file entry and that is for NameNode box.
So at the time of your MapR job is running you can go to the UI link provided by MapR framework.
ANd when it gets opened then do not close and there you can find details about other jobs also, when they started and when they got finished etc.
So next time better to check your putty console output after submitting the MapR job, you will definitely see a link for the current job to check it status from the browser UI.
In Hadoop 2.x this problem could be related to memory issues, you can see it in MapReduce in Hadoop 2.2.0 not working

where is the hadoop task manager UI

I installed the hadoop 2.2 system on my ubuntu box using this tutorial
http://codesfusion.blogspot.com/2013/11/hadoop-2x-core-hdfs-and-yarn-components.html
Everything worked fine for me and now when I do
http://localhost:50070
I can see the management UI for HDFS. Very good!!
But the I am going through another tutorial which tells me that there must be a task manager UI running at http://mymachine.com:50030 and http://mymachine.com:50060
on my machine I cannot open these ports.
I have already done
start-dfs.sh
start-yarn.sh
start-all.sh
is something wrong? why can't I see the task manager UI?
You have installed YARN (MRv2) which runs the ResourceManager. The URL http://mymachine.com:50030 is the web address for the JobTracker daemon that comes with MRv1 and hence you are not able to see it.
To see the ResourceManager UI, check your yarn-site.xml file for the following property:
yarn.resourcemanager.webapp.address
By default, it should point to : resource_manager_hostname:8088
Assuming your ResourceManager runs on mymachine, you should see the ResourceManager UI at http://mymachine.com:8088/
Make sure all your deamons are up and running before you visit the URL for the ResourceManager.
For Hadoop 2[aka YARN/MRV2] - Any hadoop installation version-ed 2.x or higher its at port number 8088. eg. localhost:8088
For Hadoop 1 - Any hadoop installation version-ed lower than 2.x[eg 1.x or 0.x] its at port number 50030. eg localhost:50030
By default HadoopUI location is as below
http://mymachine.com:50070

Hadoop Configuration

I have started configuring Hadoop 2.1.0-beta version for single node. I followed steps mentioned in Michael Noll's Tutorial (http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#configuring-single-node-clusters-first). Every thing I did and configured well. As a result of JPS, I got that NameNode, DataNode, Secondary NameNode started fine. Then I found out that there is no start-mapred.sh script. So I tried starting the jobtracker using hadoop-daemons.sh (hadoop-daemon.sh --config /home/nayan/dev/hadoop/etc/hadoop/ start jobtracker) and it resulted in failure with message "Sorry, the jobtracker command is no longer supported. You may find similar functionality with the "yarn" shell command.". I do not know what all configuration changes (if any) I need to make. I made changes in "yarn-site.xml" file, as suggested in Hadoop:The Definitive Guide. But could not proceed further. Where can I find out about Yarn. I checked Apache site, but could not figure it out.
You need to check your configuration xml files. Sometimes if you have any problrm in xml then some daemons wont start.
and try to use ./start-all.sh and then JPS
you can use start-yarn.sh to start the ResourceManger and Jobtracker daemons
I usually start everything using these two commands
./start-dfs.sh
./start-yarn.sh
You Should use start-dfs.sh for Hdfs Daemons and start-yarn.sh for Resource manager and nodemanager daemon both are in /bin of hadoop.
./start-dfs.sh or start-dfs.sh will start only HDFS components , while ./start-yarn.sh or start-yarn.sh will start Yarn component like NodeManager , Resource manager etc. If you don't want to start both the components separately , try using this command :
./start-all.sh or start-all.sh (This is deprecated command though).
To answer your question , use ./start-yarn.sh
Cheers!
First have to start the yarn daemons in the YARN( HADOOP 2.x) Environment.
So start with this
at /hadoop_installed_path/sbin$ ./start-yarn.sh
Once the yarn daemons started then we can start df daemons
at /hadoop_installed_path/sbin$ ./start-dfs.sh
1.You should check all the steps in Hadoop The definitive guide.
if it's all proper than use start-all.sh
than run jps.
2.some time You have to close console for reflecting your changes.so close the console and reopen it again and then try jps,
hope this will help.

Resources