mapreduce wroking on single node cluster but not on multinode cluster - hadoop

I am running a map reduce program which works fine on my cdh quickstart vm but when trying on a multinode cluster, it gives the below error:
WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/12 00:23:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/02/12 00:23:06 INFO input.FileInputFormat: Total input paths to process : 1
14/02/12 00:23:07 INFO mapred.JobClient: Running job: job_201401221117_5777
14/02/12 00:23:08 INFO mapred.JobClient: map 0% reduce 0%
14/02/12 00:23:16 INFO mapred.JobClient: Task Id : attempt_201401221117_5777_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class Mappercsv not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1774)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:191)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassNotFoundException: Class Mappercsv not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1680)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1772)
... 8 more"
Please help.

Related

reducer always fails and map succeeds

I am running simple wordcount job on 1GB of text file . My cluster has 8 Datanodes and 1 namenode each has a storage capacity of 3GB.
When i run wordcount I can see map always succeeds and reducer is throwing an error and fails. Please find below error message.
14/10/05 15:42:02 INFO mapred.JobClient: map 100% reduce 31%
14/10/05 15:42:07 INFO mapred.JobClient: Task Id : attempt_201410051534_0002_m_000016_0, Status : FAILED
FSError: java.io.IOException: No space left on device
14/10/05 15:42:14 INFO mapred.JobClient: Task Id : attempt_201410051534_0002_r_000000_0, Status : FAILED
java.io.IOException: Task: attempt_201410051534_0002_r_000000_0 - The reduce copier failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201410051534_0002/attempt_201410051534_0002_r_000000_0/output/map_18.out
Could you please tell me how can i fix this problem ?
Thanks
Navaz

Hadoop Image Processing Interface

I have one question about HIPI as I new in these field..I am trying to run simple example and my command is like.:
$> bin/hadoop jar /opt/hipi-dev/examples/downloader.jar /user/hduser/hipiFile /user/hduser/outputhipi.hib 1
Where hipiFile folder contain one text file containing 4 image url. Path in my build.xml file is right. Though it gives me following error:
14/03/09 09:44:01 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
Found host successfully: 0
Tried to get 1 nodes, got 1
14/03/09 09:44:01 INFO input.FileInputFormat: Total input paths to process : 2
First n-1 nodes responsible for 4 images
Last node responsible for 4 images
14/03/09 09:44:02 INFO mapred.JobClient: Running job: job_201403090903_0003
14/03/09 09:44:03 INFO mapred.JobClient: map 0% reduce 0%
14/03/09 09:44:13 INFO mapred.JobClient: Task Id : attempt_201403090903_0003_m_000000_0,
**Status : FAILED**
**Error:** java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at hipi.examples.downloader.Downloader$DownloaderMapper.map(Unknown Source)
at hipi.examples.downloader.Downloader$DownloaderMapper.map(Unknown Source)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Check whether hadoop-common-*.jar's are in class path.

Hadoop giving SCDynamicStore on my JAR but not on hadoop-examples.jar

I'm very confused about building and executing my first job in Hadoop and would love help from anyone who can clarify the error I am seeing and provide guidance :)
I have a JAR file that I've compiled. When I try to execute a M/R job using it in OSX, I get the SCDynamicStore error that is often associated with the HADOOP_OPTS environment variable. However, this does not happen when I run examples from the example JAR file. I have set the variable in hadoop-env.sh and it appears to be recognized in the cluster.
Running a test from hadoop-examples.jar works:
$ hadoop jar /usr/local/Cellar/hadoop/1.1.2/libexec/hadoop-examples-1.1.2.jar wordcount /stock/data /stock/count.out
13/06/22 13:21:51 INFO input.FileInputFormat: Total input paths to process : 3
13/06/22 13:21:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/06/22 13:21:51 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/22 13:21:51 INFO mapred.JobClient: Running job: job_201306221315_0003
13/06/22 13:21:52 INFO mapred.JobClient: map 0% reduce 0%
13/06/22 13:21:56 INFO mapred.JobClient: map 66% reduce 0%
13/06/22 13:21:58 INFO mapred.JobClient: map 100% reduce 0%
13/06/22 13:22:04 INFO mapred.JobClient: map 100% reduce 33%
13/06/22 13:22:05 INFO mapred.JobClient: map 100% reduce 100%
13/06/22 13:22:05 INFO mapred.JobClient: Job complete: job_201306221315_0003
...
Running a job using my own class does not work:
$ hadoop jar test.jar mapreduce.X /data /output
13/06/22 13:38:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/06/22 13:38:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/06/22 13:38:36 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/22 13:38:36 INFO mapred.FileInputFormat: Total input paths to process : 3
13/06/22 13:38:36 INFO mapred.JobClient: Running job: job_201306221328_0002
13/06/22 13:38:37 INFO mapred.JobClient: map 0% reduce 0%
13/06/22 13:38:44 INFO mapred.JobClient: Task Id : attempt_201306221328_0002_m_000000_0, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 9 more
Caused by: java.lang.NoClassDefFoundError: com/google/gson/TypeAdapterFactory
at mapreduce.VerifyMarket$Map.<clinit>(VerifyMarket.java:26)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:802)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:847)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:873)
at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:947)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.ClassNotFoundException: com.google.gson.TypeAdapterFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 22 more
attempt_201306221328_0002_m_000000_0: 2013-06-22 13:38:39.314 java[60367:1203] Unable to load realm info from SCDynamicStore
13/06/22 13:38:44 INFO mapred.JobClient: Task Id : attempt_201306221328_0002_m_000001_0, Status : FAILED
... (This repeats a few times, but hopefully this is enough to see what I mean.)
Initially, I thought this was related to the aforementioned environment variable, but now I'm not so sure. Maybe I'm packaging my JAR incorrectly?
The easiest answer was to convert the project to Maven and include a gson dependency in the POM. Now mvn package picks up all the necessary dependencies and creates a single JAR file that contains everything necessary to complete the job in the cluster.

Can distcp be used to copy a directory of files from S3 to HDFS?

I am wondering if hadoop distcp can be used to copy multiple files at once from S3 to HDFS. It appears to only work for individual files with absolute paths. I would like to copy either an entire directory, or use a wildcard.
See: Hadoop DistCp using wildcards?
I am aware of s3distcp, but I would prefer to use distcp for simplicity's sake.
Here was my attempt at copying a directory from S3 to HDFS:
[root#ip-10-147-167-56 ~]# /root/ephemeral-hdfs/bin/hadoop distcp s3n://<key>:<secret>#mybucket/dir hdfs:///input/
13/05/23 19:58:27 INFO tools.DistCp: srcPaths=[s3n://<key>:<secret>#mybucket/dir]
13/05/23 19:58:27 INFO tools.DistCp: destPath=hdfs:/input
13/05/23 19:58:29 INFO tools.DistCp: sourcePathsCount=4
13/05/23 19:58:29 INFO tools.DistCp: filesToCopyCount=3
13/05/23 19:58:29 INFO tools.DistCp: bytesToCopyCount=87.0
13/05/23 19:58:29 INFO mapred.JobClient: Running job: job_201305231521_0005
13/05/23 19:58:30 INFO mapred.JobClient: map 0% reduce 0%
13/05/23 19:58:45 INFO mapred.JobClient: Task Id : attempt_201305231521_0005_m_000000_0, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:106)
at java.io.BufferedInputStream.close(BufferedInputStream.java:468)
at java.io.FilterInputStream.close(FilterInputStream.java:172)
at org.apache.hadoop.tools.DistCp.checkAndClose(DistCp.java:1386)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:434)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:547)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:314)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
13/05/23 19:58:55 INFO mapred.JobClient: Task Id : attempt_201305231521_0005_m_000000_1, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:106)
at java.io.BufferedInputStream.close(BufferedInputStream.java:468)
at java.io.FilterInputStream.close(FilterInputStream.java:172)
at org.apache.hadoop.tools.DistCp.checkAndClose(DistCp.java:1386)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:434)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:547)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:314)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
13/05/23 19:59:04 INFO mapred.JobClient: Task Id : attempt_201305231521_0005_m_000000_2, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:106)
at java.io.BufferedInputStream.close(BufferedInputStream.java:468)
at java.io.FilterInputStream.close(FilterInputStream.java:172)
at org.apache.hadoop.tools.DistCp.checkAndClose(DistCp.java:1386)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:434)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:547)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:314)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
13/05/23 19:59:18 INFO mapred.JobClient: Job complete: job_201305231521_0005
13/05/23 19:59:18 INFO mapred.JobClient: Counters: 6
13/05/23 19:59:18 INFO mapred.JobClient: Job Counters
13/05/23 19:59:18 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=38319
13/05/23 19:59:18 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/05/23 19:59:18 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/05/23 19:59:18 INFO mapred.JobClient: Launched map tasks=4
13/05/23 19:59:18 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/05/23 19:59:18 INFO mapred.JobClient: Failed map tasks=1
13/05/23 19:59:18 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201305231521_0005_m_000000
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:667)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
You cannot use wildcards in s3n:// addresses.
However, it is possible to copy an entire directory from S3 to HDFS. The reason for the null pointer exceptions in this case was that the HDFS destination folder already existed.
Fix: delete the HDFS destination folder: ./hadoop fs -rmr /input/
Note 1: I also tried passing -update and -overwrite, but I still got NPE.
Note 2: https://hadoop.apache.org/docs/r1.2.1/distcp.html shows how to copy multiple explicit files.

Hadoop on EC2 error: could only be replicated to 0 nodes, instead of 1

I'm running a very small Hadoop cluster on EC2.
I'm starting a cluster using whirr (version: whirr-0.2.0-incubating) with 1 (jobtracker + namenode) and 4 (datanodes + tasktracker) :
whirr.hardware-id=c1.medium
whirr.instance-templates=1 jt+nn,4 dn+tt
whirr.provider=ec2
When I run my job I'm getting the following error , not right away but after a while :
.....
11/01/17 17:31:47 INFO mapred.JobClient: map 100% reduce 66%
11/01/17 17:31:49 INFO mapred.JobClient: map 100% reduce 68%
11/01/17 17:31:52 INFO mapred.JobClient: map 100% reduce 70%
11/01/17 17:31:56 INFO mapred.JobClient: map 100% reduce 73%
11/01/17 17:32:01 INFO mapred.JobClient: Task Id : attempt_201101172141_0002_r_000000_0, Status : FAILED
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/test/test2/test3/_temporary/_attempt_201101172141_0002_r_000000_0/parts/81/part-00000 could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
....
I found the same error in the tasktracker log file :
....
2011-01-17 22:31:36,968 INFO org.apache.hadoop.mapred.JobTracker: Adding task (cleanup)'attempt_201101172141_0002_m_000004_1' to tip task_201101172141_0002_m_000004, for tracker 'tracker_ip-11-222-333-444.ec2.internal:localhost/127.0.0.1:44840'
2011-01-17 22:31:39,972 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201101172141_0002_m_000004_1' from 'tracker_ip-11-222-333-444.ec2.internal:localhost/127.0.0.1:44840'
2011-01-17 22:31:57,985 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201101172141_0002_r_000000_0: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/dsp-test/test1/test2/_temporary/_attempt_201101172141_0002_r_000000_0/parts/81/part-00000 could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
In job's configuration I have:
dfs.replication=2
mapred.child.java.opts=-server -Xmx180m -XX:ErrorFile=/mnt/hadoop/logs/logs/java/java_error.log
Does anyone have an idea why I'm getting this error?
Did you run out of disk space? Sometimes if you run out of disk space it will complain in this way.

Resources