JobTracker doesn't show completed tasks - hadoop

I am running hadoop-1.1.2 on my laptop in pseudo-distributed mode. I am able to run a simple WordCount program, reading from and writing back to HDFS. I am also able to see JobTracker running at http://localhost:50030/jobtracker.jsp. However when I run the WordCount job from Eclipse, there is no entry, either under running or completed jobs.
Am I missing any additional property setting in one of the configuration files?
Thanks.

This is happening because when you are running your job through Eclipse it starts running job inside itself rather than submitting it to the JobTracker as it does not know where to go to find the JobTracker. You need to tell it to Eclipse. Add the following lines in your code and it should work :
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:9000");
conf.set("mapred.job.tracker", "localhost:9001");

Copy your hdfs-site.xml and mapred-site.xml into your 'src' folder (make them available in classpath).
It will pick up all Configurations.

Related

Why MR2 map task is running under 'yarn' user and not under user I ran hadoop job?

I'm trying to run mapreduce job on MR2, Hadoop ver. 2.6.0-cdh5.8.0. Job has relative path to directory which has a lot of files to be compressed based on some criteria(not really necessary for this question). I'm running my job as following:
sudo -u my_user hadoop jar my_jar.jar com.example.Main
There is a folder on HDFS under path /user/my_user/ with files. But when I'm running my job I got following exception:
java.io.FileNotFoundException: File /user/yarn/<path_from_job> does not exist.
I'm migrating this job from MR1 where this job is working correctly. My suggestion is this is happening due to YARN, because each container started under YARN user. In my job configuration I've tried to set mapreduce.job.user.name="my_user" but this didn't help.
I've found ${user.home} usage in me Job configuration, but I don't know aware where it is set and is it possible to change this.
The only solution I found so far is to provide absolute path to folder. Is there any other way around, because I feel like this is not correct approach.
Thank you

Job tracker and Task tracker don't sow up when ran the start-all.sh command in ububtu for hadoop

Job tracker and Task tracker don't sow up when ran the start-all.sh command in ububtu for hadoop
I do get the rest of the processes while i run the "JPS" command in unix.
Not sure why i am not being shown with the job tracker and task tracker.Have been following couple of links and couldn't get my prob sorted.
Steps done :
-Multiple times formatted the namenode
-Multiple time deleted and recreated the tmp folder with appropriate permissions.
What could be the issue ?
Any suggestions would really help me as i am struggling in setting up hadoop on my laptop.I am new to it though.
Try starting jobtracker and tasktracker separately.
From your hadoop HOME directory run
. bin/../libexec/hadoop-config.sh
Then from hadoop BIN directory run
hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
hadoop-daemon.sh --config $HADOOP_CONF_DIR start tasktracker
You must have been using hadoop 2.x version where jobtracker is replaced with YARN resource manager. Using jps(jdk is needed) you can check whether resouce manager is running. If it is running then the default url for it is (host-name):8088. You can check your nodes,jobs also configuration there.If not running then start them with sbin/start-yarn.sh.

mapreduce tasks only run on namenode

I have builded a Hadoop cluster on three machines; these are the characteristics:
OS:Ubuntu14.04LTS
Hadoop:2.6.0
NameNode and ResourceManager IP: namenode/192.168.0.100
DataNode also as the NodeManger IP: data1/192.168.0.101, data2/192.168.0.102
I have configed all xml files as official doc. When I execute the wordcount example program in eclipse, I wanna show the machine information, which is running the mapTask or reduceTask, so here is my code snippet.
//get localhost
InetAddress mLocalHost = InetAddress.getLocalHost();
System.out.println("Task on " + mLocalHost);
above the snippet was put into map and reduce function and runs it on hadoop. Nevertheless the console always show:
Task on namenode/192.168.0.100
From my perspective, these tasks should run on data1 or data2. Can you explain the puzzle? Whats wrong with my cluster?
Whats more?
the jobHistory(namenode:19888) records nothing.
and webAppProxy(namenode:8088) just show the active nodes:2, but nothing more infomation about job.
can you help me? really appreciated.
namenode's further info below,
jps command show:
12647 Jps
11426 SecondaryNameNode
11217 NameNode
11585 ResourceManager
12033 JobHistoryServe
Where did you put that code, Is it in your Driver class ? You need to have it in your mapper or reducer so that you can see which node is processing.
Instead of that you can have a look at resource manager web ui at rmipaddress:8088 which will give you more details on which node is executing mappers and other logs.
i have found whats wrong with my problem. "run on hadoop" in Eclipse just starts the job locally, so i should modify the MyHadoopXML.xml file which is under Eclipse plugins' sub-directory. Otherwise, i just develop and debug mapreduce job locally and export the project into a jar, then run the jar with command of "hadoop jar" in the cluster to verify whether the job is executed successfully.

How to set HADOOP_CLASSPATH via oozie while running HBase job

I'm using CDH5. I'm hit by a HBase bug while running a MapReduce job through Oozie in a fully distributed environment. This job connects to HBase and adds records programmatically. Requesting to refer these links to understand the bug I'm hitting. Please note that I cannot modify the map reduce job code. The job runs fine from commandline after setting HADOOP_CLASSPATH env variable. But there seem to be no way to set/override this environment variable from oozie. As a result the job fails when running from oozie. Anybody experienced and found a workaround for this problem?
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_releasenotes_hdp_2.0/content/ch_relnotes-hdpch_relnotes-hdp-2.0.9.0-knownissues-hbase.html
https://issues.apache.org/jira/browse/HBASE-11118
You can set the HADOOP_CLASSPATH in the system that runs oozie server. So, sending it every time in request is not required.
Otherwise, we can set it in the xml. In file oozie-site.xml set:
<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/home/user/oozie/etc/hadoop</value>
</property>
Where /home/user/oozie/etc/hadoop is the absolute path where hadoop
configuration files are located.

Issue with psuedo mode configuration of Hadoop

I am trying to do pseudo mode configuration of Hadoop 2.0.4 version. Script start-dfs.sh works fine. However, start-mapred.sh fails to start the jobtracker and tasktracker. Below is the error I am getting. Seeing at error it looks like it is not able to pick the jar file. Please let me know if you have any idea of this issue. Thanks.
FATAL org.apache.hadoop.mapred.JobTracker: java.lang.NoSuchMethodError: org/apache/hadoop/mapred/JobACLsManager.<init>(Lorg/apache/hadoop/mapred/JobConf;)V
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2182)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1895)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1889)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:311)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:302)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:297)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4820)
It seems I was using incorrect jars. So, first I replaced those. Then, I just created a new directory with hadoop conf files. Formatted the namenode. Finally it worked. :)

Resources