class not found exception in mapreduce - hadoop

When I tried the below query
[cloudera#localhost ~]$ hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount examples/output3;
15/11/05 10:13:04 INFO mapred.JobClient: map 0% reduce 0% 15/11/05 10:13:18 INFO mapred.JobClient: Task Id : attempt_201511050944_0005_m_000000_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: Class word.Paras$Map not found.
Here word is package name Paras is class name I have checked the output file is created as logs.
How can I fix this issue.
Thanks,
Anbu k

Related

In Hadoop, How can I find which slave node is executing an attempt N?

I'm using Hadoop 1.2.1, and my hadoop application fails in doing Reduce. From Hadoop run I see messages like following :
15/05/22 18:14:15 INFO mapred.JobClient: map 0% reduce 0% 15/05/22
18:14:25 INFO mapred.JobClient: map 100% reduce 0% 15/05/22 18:24:25
INFO mapred.JobClient: map 0% reduce 0% 15/05/22 18:24:26 INFO
mapred.JobClient: Task Id : attempt_201505221804_0013_m_000000_0,
Status : FAILED Task attempt_201505221804_0013_m_000000_0 failed to
report status for 600 seconds. Killing! 15/05/22 18:24:35 INFO
mapred.JobClient: map 100% reduce 0%
I'd like to see the log of attempt_201505221804_0013_m_000000_0, but it is too time-consuming to find which slave had executed attempt_201505221804_0013_m_000000_0.
Someone told me to use Hadoop web pages to find it, but there is some firewall on this cluster and I can't change the option because the cluster is fundamentally not owned by our group.
Is there any way to find in where this attempt was executed?
You should be able to find this information in the jobtracker logs which are by default under HADOOP_HOME/logs. This will contain entries looking similar to this:
INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201503262103_0001_m_000000_0' to tip task_201503262103_0001_m_000000, for tracker 'host'
You can search the file for the specific attempt id.

hadoop MapReduce Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out

I am facing below error while trying to run MapReduce job with more than one input file. Although I am able to run MapReduce job with only one input file.
I go through some posts and almost every one is saying there is firewall Issue or not setup properly hostnames in /etc/hosts file.
Even IF this is the case my MapReduce job will fail whether the input is single file or directory(multiple files)
Below is the output from console.
INFO input.FileInputFormat: Total input paths to process : 2
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
WARN snappy.LoadSnappy: Snappy native library not loaded
INFO mapred.JobClient: Running job: job_201505201700_0005
INFO mapred.JobClient: map 0% reduce 0%
INFO mapred.JobClient: map 50% reduce 0%
INFO mapred.JobClient: map 100% reduce 0%
INFO mapred.JobClient: map 100% reduce 16%
INFO mapred.JobClient: map 100% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201505201700_0005_r_000000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
WARN mapred.JobClient: Error reading task outputAMR-DEV02.local
WARN mapred.JobClient: Error reading task outputAMR-DEV02.local
INFO mapred.JobClient: map 100% reduce 16%
INFO mapred.JobClient: map 100% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201505201700_0005_r_000000_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
WARN mapred.JobClient: Error reading task outputEmbeddedQASrv.local
WARN mapred.JobClient: Error reading task outputEmbeddedQASrv.local
INFO mapred.JobClient: map 100% reduce 16%
Note. EmbeddedQASrv.local(ip address. 192.168.115.80) and AMR-DEV02.local(ip address. 192.168.115.79) are my slave node host names.
My Hadoop cluster is consisting of 1 Master and 2 Slaves.
This is the command I am running from console.(emp_dept_data is a directory contains empData and deptData files)
hadoop jar testdata/joindevice.jar JoinDevice emp_dept_data output15
However, If i run this command MapReduce job gets successed(single file as input)
hadoop jar testdata/joindevice.jar JoinDevice emp_dept_data/empData output16
Here is my /etc/hosts file entry set up Master node. However same entry's were copied to my slave nodes also.
127.0.0.1 amr-dev01.local amr-dev01 localhost
::1 localhost6.localdomain6 localhost6
#Hadoop Configurations
192.168.115.78 master
192.168.115.79 slave01
192.168.115.80 slave02
I am clueless for what is wrong and where to check for exact root cause.
The actual problem was with /etc/hosts file. I commented my local host configuration.
amr-dev01.local amr-dev01 localhost
and Instead of specifying different names like master, slave01, slave02...I used same hostnames
192.168.115.78 amr-dev01
192.168.115.79 amr-dev02

Hadoop-2.5.1 + Nutch-2.2.1: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

Command: ./crawl /urls /mydir XXXXX 2
When I run this command in Hadoop-2.5.1 and Nutch-2.2.1, I get the wrong information as following.
14/10/07 19:58:10 INFO mapreduce.Job: Running job: job_1411692996443_0016
14/10/07 19:58:17 INFO mapreduce.Job: Job job_1411692996443_0016 running in uber mode : false
14/10/07 19:58:17 INFO mapreduce.Job: map 0% reduce 0%
14/10/07 19:58:21 INFO mapreduce.Job: Task Id : attempt_1411692996443_0016_m_000000_0, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
14/10/07 19:58:26 INFO mapreduce.Job: Task Id : attempt_1411692996443_0016_m_000000_1, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
14/10/07 19:58:31 INFO mapreduce.Job: Task Id : attempt_1411692996443_0016_m_000000_2, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
14/10/07 19:58:36 INFO mapreduce.Job: map 100% reduce 0%
14/10/07 19:58:36 INFO mapreduce.Job: Job job_1411692996443_0016 failed with state FAILED due to: Task failed task_1411692996443_0016_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/10/07 19:58:36 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=11785
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=11785
Total vcore-seconds taken by all map tasks=11785
Total megabyte-seconds taken by all map tasks=12067840
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
14/10/07 19:58:36 ERROR crawl.InjectorJob: InjectorJob: java.lang.RuntimeException: job failed: name=[/mydir]inject /urls, jobid=job_1411692996443_0016
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:55)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Probably you are using Gora (or smth else) compiled with Hadoop 1 (from maven repo?). You can download Gora (0.5?) and build it with Hadoop 2.
Perhaps it is just the first trouble in the series of problems.
Please notify us about your future steps.
I had similar error on nutch 2.x with hadoop 2.4.0
Recompile nutch with hadoop 2.5.1 dependencies (ivy) and exclude all hadoop 1.x dependencies - you can find them in lib - probably hadoop-core.

mapreduce wroking on single node cluster but not on multinode cluster

I am running a map reduce program which works fine on my cdh quickstart vm but when trying on a multinode cluster, it gives the below error:
WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/12 00:23:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/02/12 00:23:06 INFO input.FileInputFormat: Total input paths to process : 1
14/02/12 00:23:07 INFO mapred.JobClient: Running job: job_201401221117_5777
14/02/12 00:23:08 INFO mapred.JobClient: map 0% reduce 0%
14/02/12 00:23:16 INFO mapred.JobClient: Task Id : attempt_201401221117_5777_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class Mappercsv not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1774)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:191)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassNotFoundException: Class Mappercsv not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1680)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1772)
... 8 more"
Please help.

Hadoop - Mapreduce job fails to run in Windows (Cygwin)

I have installed cygwin in windows, and configured the hadoop0.20.0 set up on it, i could able to run the word count project in eclipse successfully, but when i run the wordcount in hadoop-..*-example.jar,it throws the following error
3/06/28 07:32:51 INFO input.FileInputFormat: Total input paths to process : 1
13/06/28 07:32:52 INFO mapred.JobClient: Running job: job_201306280622_0002
13/06/28 07:32:53 INFO mapred.JobClient: map 0% reduce 0%
13/06/28 07:32:57 INFO mapred.JobClient: Task Id : attempt_201306280622_0002_m_000002_0, Status : FAILED
Error initializing attempt_201306280622_0002_m_000002_0:
org.apache.hadoop.util.Shell$ExitCodeException: //job.jar: invalid mode: `jar'
Try `//job.jar --help' for more information.
at org.apache.hadoop.util.Shell.runCommand(Shell.java:195)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java: 286)
what would be the problem, plz assit
Your commands looks fine to me. try giving input path with file name. I hope that will solve your problem.
bin/hadoop jar hadoop-0.20.0-examples.jar wordcount /user/input/input_file.txt /user/output

Resources