Hadoop Wordcount Program Compile Error - hadoop

I am new to hadoop programming.I am using eclipse for hadoop development.I added all jar files through java buildpath when i run my program it is not running and giving this error,so please help me.how to solve error?
14/05/31 23:33:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/31 23:33:10 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/05/31 23:33:10 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-deep/mapred/staging/deep689130586/.staging/job_local689130586_0001
14/05/31 23:33:10 ERROR security.UserGroupInformation: PriviledgedActionException as:deep cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/ already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/ already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:975)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
at hadoop1.MyJob.run(MyJob.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at hadoop1.MyJob.main(MyJob.java:63)

The native libraries failed to load.
This can be due to you are using a 64-bit machine but the hadoop distro is for 32-bit. You can follow the steps here http://www.ercoppa.org/Linux-Compile-Hadoop-220-fix-Unable-to-load-native-hadoop-library.htm to recompile hadoop for 64 bit, and then replace the native libs.

Related

Debugging hadoop Wordcount program in eclipse in windows

Trying to run Wordcount program in hadoop in eclipse (windows 7). and passing these argument in eclipse only
E:\hadoop\eclipse-hadoop-pro\workspace-hadoop\WordCountPro\input\word.txt
E:\hadoop\eclipse-hadoop-pro\workspace-hadoop\WordCountPro\output
I have created input file in project only like input folder and inside it word.txt file
But it is throughing below excption
2015-04-08 15:30:09,947 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-04-08 15:30:10,238 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable E:\hadoop\hadoop-HADOOP_HOME\hadoop-2.6.0\bin\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
at org.apache.hadoop.mapreduce.task.JobContextImpl.<init>(JobContextImpl.java:72)
at org.apache.hadoop.mapreduce.Job.<init>(Job.java:144)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:187)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:206)
at com.WordCount.main(WordCount.java:52)
2015-04-08 15:30:11,039 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2015-04-08 15:30:11,041 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/E:/hadoop/eclipse-hadoop-pro/workspace-hadoop/WordCountPro/output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:562)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at com.WordCount.main(WordCount.java:61)
I doubt if Hadoop is installed correctly. Check in your machine if all the daemons are running or not.If not, then consider re-checking or re-installing what you are missing.
ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable

ERROR security.UserGroupInformation: PriviledgedActionException in Hadoop 2.2

I am using Hadoop 2.2.0. hadoop-mapreduce-examples-2.2.0.jar are running fine on hdfs.
I have made a wordcount program in eclipse and add the jars using maven and run this jar:
ubuntu#ubuntu-linux:~$ yarn jar Sample-0.0.1-SNAPSHOT.jar com.vij.Sample.WordCount /user/ubuntu/wordcount/input/vij.txt user/ubuntu/wordcount/output
it give following error: 15/02/17 13:09:09 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
15/02/17 13:09:10 INFO client.RMProxy: Connecting to ResourceManager
at /0.0.0.0:8032
15/02/17 13:09:11 ERROR security.UserGroupInformation:
PriviledgedActionException as:ubuntu (auth:SIMPLE)
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output
directory hdfs://localhost:54310/user/ubuntu/wordcount/input/vij.txt
already exists
Exception in thread "main"
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
hdfs://localhost:54310/user/ubuntu/wordcount/input/vij.txt already
exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at com.vij.Sample.WordCount.main(WordCount.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
jar is on my local system. both input and output path is on hdfs. there is no output dir exist on output path on hdfs.
please advice.
Thanks.
Acctually the error is:
ERROR security.UserGroupInformation:PriviledgedActionException as:ubuntu (auth:SIMPLE)
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output
directory hdfs://localhost:54310/user/ubuntu/wordcount/input/vij.txt
already exists
Delete the output file that already exists "vij.txt", or output to a different file.
OR try to do the following steps:
Download and unzip the WordCount source code from this link under $HADOOP_HOME.
$ cd $HADOOP_HOME
$ wget http://salsahpc.indiana.edu/tutorial/source_code/Hadoop-WordCount.zip
$ unzip Hadoop-WordCount.zip
then , upload the input files (any text format file) into Hadoop distributed file system (HDFS):
$bin/hadoop fs -put $HADOOP_HOME/Hadoop-WordCount/input/ input
$bin/hadoop fs -ls input
Here, $HADOOP_HOME/Hadoop-WordCount/input/ is the local directory where the program inputs are stored. The second "input" represents the remote destination directory on the HDFS.
After uploading the inputs into HDFS, run the WordCount program with the following commands. We assume you have already compiled the word count program.
$ bin/hadoop jar $HADOOP_HOME/Hadoop-WordCount/wordcount.jar WordCount input output
If Hadoop is running correctly, it will print hadoop running messages similar to the following:
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
11/11/02 18:34:46 INFO input.FileInputFormat: Total input paths to process : 1
11/11/02 18:34:46 INFO mapred.JobClient: Running job: job_201111021738_0001 11/11/02 18:34:47 INFO mapred.JobClient: map 0% reduce 0%
11/11/02 18:35:01 INFO mapred.JobClient: map 100% reduce 0%
11/11/02 18:35:13 INFO mapred.JobClient: map 100% reduce 100%
11/11/02 18:35:18 INFO mapred.JobClient: Job complete: job_201111021738_0001 11/11/02 18:35:18 INFO mapred.JobClient: Counters: 25
...

error when reading from S3 using Spark/Hadoop

I am trying to read data from Amazon S3 using the Spark.
but I am getting
java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException
from inside a Hadoop call.
I have tried dwnloading jets3t and adding all the included jars to my classpath,
but it did not help.
Here is the full transcript of what is happening:
scala> val zz = sc.textFile("s3n:/<bucket>/<path>")
13/08/30 19:50:21 INFO storage.MemoryStore: ensureFreeSpace(45979) called with curMem=46019, maxMem=8579469803
13/08/30 19:50:21 INFO storage.MemoryStore: Block broadcast_1 stored as values to memory (estimated size 44.9 KB, free 8.0 GB)
zz: spark.RDD[String] = MappedRDD[3] at textFile at <console>:12
scala> zz.first
13/08/30 19:50:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/08/30 19:50:38 WARN snappy.LoadSnappy: Snappy native library not loaded
java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:224)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:214)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:76)
at spark.RDD.partitions(RDD.scala:214)
at spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)
at spark.RDD.partitions(RDD.scala:214)
at spark.RDD.take(RDD.scala:764)
at spark.RDD.first(RDD.scala:778)
When you run a Hadoop job you have to set the Hadoop classpath Environmental Variable. Typically, this is done in the Hadoop startup script.
export HADOOP_CLASSPATH=/path/to/yourlib:/path/to/your/other/lib
replace the : with a ; if you are on windows.

Hadoop compression : "Loaded native gpl library" but "Failed to load/initialize native-lzo library"

after several try installing Lzo compression for hadoop, I need help because I have really no idea why it doesn't work.
I'using hadoop 1.0.4 on CentOs 6. I tried http://opentsdb.net/setup-hbase.html, https://github.com/kevinweil/hadoop-lzo and some others but i'm still getting error :
13/07/03 19:52:23 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
13/07/03 19:52:23 WARN lzo.LzoCompressor: java.lang.NoSuchFieldError: workingMemoryBuf
13/07/03 19:52:23 ERROR lzo.LzoCodec: Failed to load/initialize native-lzo library
even if native gpl is loaded. I've updated my mapred-site and core-site according to links below, I've copy/paste libs in right path (still according to links).
The real problem is that the lzo test works on the namenode :
13/07/03 18:55:47 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
13/07/03 18:55:47 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev ]
I've try setting several path in haddop-env.sh but there seems to be no right solution...
So, if you have any idea, link ... ? I'm really interested
[edit] after a week, i'm still trying to make it functionnal.
I've try sudhirvn.blogspot.fr/2010/08/hadoop-lzo-installation-errors-and.html but removing all Lzo and gplcompression libraries and then making a nez install was not better at all.
Is that due to my hadoop core version ? Is it possible to have hadoop-core-0.20 and hadoop-core-1.0.4 at the same time ? Should i compile Lzo on a 0.20 hadoop in order to use lzo ?
By the way I already tried compiling hadoop-lzo like this :
CLASSPATH=/usr/lib/hadoop/hadoop-core-1.0.4.jar CFLAGS=-m64 CXXFLAGS=-m64 ant compile-native tar
If it helps the full error is :
INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
WARN lzo.LzoCompressor: java.lang.NoSuchFieldError: workingMemoryBuf
ERROR lzo.LzoCodec: Failed to load/initialize native-lzo library
INFO lzo.LzoIndexer: [INDEX] LZO Indexing file test/table.lzo, size 0.00 GB...
WARN snappy.LoadSnappy: Snappy native library is available
INFO util.NativeCodeLoader: Loaded the native-hadoop library
INFO snappy.LoadSnappy: Snappy native library loaded
Exception in thread "main" java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzopCodec.createDecompressor(LzopCodec.java:87)
at com.hadoop.compression.lzo.LzoIndex.createIndex(LzoIndex.java:229)
at com.hadoop.compression.lzo.LzoIndexer.indexSingleFile(LzoIndexer.java:117)
at com.hadoop.compression.lzo.LzoIndexer.indexInternal(LzoIndexer.java:98)
at com.hadoop.compression.lzo.LzoIndexer.index(LzoIndexer.java:52)
at com.hadoop.compression.lzo.LzoIndexer.main(LzoIndexer.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I really want to use lzo because I have to deal with very large files on a rather small cluster (5 nodes). Having splittable compressed files could make it run really fast.
Every remark or idea is welcome.
I was having the exact same issue and finally resolved it by randomly choosing a datanode, and checking whether lzop was installed properly.
If it wasn't, I did:
sudo apt-get install lzop
Assuming you are using Debian-based packages.
I had this same issue on my OSX Machine. The problem was solved when I removed hadoop-lzo.jar (0.4.16) from my classpath and put hadoop-gpl-compression jar instead.

wordcount from eclipse

I was using the eclipse plugin for hadoop. I can see all the files in HDFS by making a hadoop server but when I try to run the wordcount.java file from the eclipse, it gives me exception whereas from the terminal it runs smoothly. The exception is below.
2/11/14 04:09:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/11/14 04:09:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
12/11/14 04:09:06 WARN snappy.LoadSnappy: Snappy native library not loaded
12/11/14 04:09:06 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-hduser/mapred/staging/hduser1728681403/.staging/job_local_0001
12/11/14 04:09:06 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/hduser/gutenberg
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/hduser/gutenberg
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at WordCount.run(WordCount.java:149)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at WordCount.main(WordCount.java:155)
I'd start with investigating this:
ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/hduser/gutenberg
It seems it causes the problem. Are you sure this is the proper path? If so, you may not have the privilage to access to it. Later on I would try to eliminate as much WARN as I can.
Thanks Shujaat that solved my problem. From Eclipse I was getting the same issue...
use hdfs://localhost:54310/user/... instead of "/user/..."

Resources