wordcount from eclipse

wordcount from eclipse - hadoop

I was using the eclipse plugin for hadoop. I can see all the files in HDFS by making a hadoop server but when I try to run the wordcount.java file from the eclipse, it gives me exception whereas from the terminal it runs smoothly. The exception is below.
2/11/14 04:09:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/11/14 04:09:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
12/11/14 04:09:06 WARN snappy.LoadSnappy: Snappy native library not loaded
12/11/14 04:09:06 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-hduser/mapred/staging/hduser1728681403/.staging/job_local_0001
12/11/14 04:09:06 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/hduser/gutenberg
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/hduser/gutenberg
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at WordCount.run(WordCount.java:149)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at WordCount.main(WordCount.java:155)

I'd start with investigating this:
ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/hduser/gutenberg
It seems it causes the problem. Are you sure this is the proper path? If so, you may not have the privilage to access to it. Later on I would try to eliminate as much WARN as I can.

Thanks Shujaat that solved my problem. From Eclipse I was getting the same issue...
use hdfs://localhost:54310/user/... instead of "/user/..."

Related

Debugging hadoop Wordcount program in eclipse in windows

Trying to run Wordcount program in hadoop in eclipse (windows 7). and passing these argument in eclipse only
E:\hadoop\eclipse-hadoop-pro\workspace-hadoop\WordCountPro\input\word.txt
E:\hadoop\eclipse-hadoop-pro\workspace-hadoop\WordCountPro\output
I have created input file in project only like input folder and inside it word.txt file
But it is throughing below excption
2015-04-08 15:30:09,947 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-04-08 15:30:10,238 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable E:\hadoop\hadoop-HADOOP_HOME\hadoop-2.6.0\bin\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
at org.apache.hadoop.mapreduce.task.JobContextImpl.<init>(JobContextImpl.java:72)
at org.apache.hadoop.mapreduce.Job.<init>(Job.java:144)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:187)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:206)
at com.WordCount.main(WordCount.java:52)
2015-04-08 15:30:11,039 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2015-04-08 15:30:11,041 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/E:/hadoop/eclipse-hadoop-pro/workspace-hadoop/WordCountPro/output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:562)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at com.WordCount.main(WordCount.java:61)

I doubt if Hadoop is installed correctly. Check in your machine if all the daemons are running or not.If not, then consider re-checking or re-installing what you are missing.
ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable

ERROR security.UserGroupInformation: PriviledgedActionException in Hadoop 2.2

I am using Hadoop 2.2.0. hadoop-mapreduce-examples-2.2.0.jar are running fine on hdfs.
I have made a wordcount program in eclipse and add the jars using maven and run this jar:
ubuntu#ubuntu-linux:~$ yarn jar Sample-0.0.1-SNAPSHOT.jar com.vij.Sample.WordCount /user/ubuntu/wordcount/input/vij.txt user/ubuntu/wordcount/output
it give following error: 15/02/17 13:09:09 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
15/02/17 13:09:10 INFO client.RMProxy: Connecting to ResourceManager
at /0.0.0.0:8032
15/02/17 13:09:11 ERROR security.UserGroupInformation:
PriviledgedActionException as:ubuntu (auth:SIMPLE)
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output
directory hdfs://localhost:54310/user/ubuntu/wordcount/input/vij.txt
already exists
Exception in thread "main"
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
hdfs://localhost:54310/user/ubuntu/wordcount/input/vij.txt already
exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at com.vij.Sample.WordCount.main(WordCount.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
jar is on my local system. both input and output path is on hdfs. there is no output dir exist on output path on hdfs.
please advice.
Thanks.

Acctually the error is:
ERROR security.UserGroupInformation:PriviledgedActionException as:ubuntu (auth:SIMPLE)
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output
directory hdfs://localhost:54310/user/ubuntu/wordcount/input/vij.txt
already exists
Delete the output file that already exists "vij.txt", or output to a different file.
OR try to do the following steps:
Download and unzip the WordCount source code from this link under $HADOOP_HOME.
$ cd $HADOOP_HOME
$ wget http://salsahpc.indiana.edu/tutorial/source_code/Hadoop-WordCount.zip
$ unzip Hadoop-WordCount.zip
then , upload the input files (any text format file) into Hadoop distributed file system (HDFS):
$bin/hadoop fs -put $HADOOP_HOME/Hadoop-WordCount/input/ input
$bin/hadoop fs -ls input
Here, $HADOOP_HOME/Hadoop-WordCount/input/ is the local directory where the program inputs are stored. The second "input" represents the remote destination directory on the HDFS.
After uploading the inputs into HDFS, run the WordCount program with the following commands. We assume you have already compiled the word count program.
$ bin/hadoop jar $HADOOP_HOME/Hadoop-WordCount/wordcount.jar WordCount input output
If Hadoop is running correctly, it will print hadoop running messages similar to the following:
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
11/11/02 18:34:46 INFO input.FileInputFormat: Total input paths to process : 1
11/11/02 18:34:46 INFO mapred.JobClient: Running job: job_201111021738_0001 11/11/02 18:34:47 INFO mapred.JobClient: map 0% reduce 0%
11/11/02 18:35:01 INFO mapred.JobClient: map 100% reduce 0%
11/11/02 18:35:13 INFO mapred.JobClient: map 100% reduce 100%
11/11/02 18:35:18 INFO mapred.JobClient: Job complete: job_201111021738_0001 11/11/02 18:35:18 INFO mapred.JobClient: Counters: 25
...

Pig 0.13 ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl

Just installed Pig 0.13 and I am attempting to use it with Hadoop 1.1.2. (Pig documentation states Pig 0.13 is compatible with Hadoop 1.1.2). Per the Pig install instructions, I set $PIG_CLASSPATH
to point at /etc/hadoop where core-site.xml, hdfs-site.xml, and mapred-site.xml are defined. Hadoop cluster is functional and works fine with non-Pig jobs. Based on the error descriptions below, I understand that Pig cannot find the JobContextImpl class it is looking for.
Based on the Hadoop 1.1.2 API documentation, I don't believe "task" is a sub-package of the "mapreduce" package. I have tried adding hadoop-core-1.1.2.jar directly to $PIG_CLASSPATH
and that did not work. (After looking at the contents of hadoop-core-1.1.2.jar, and the Hadoop 1.1.2 API documentation, I don't believe JobContextImpl is defined in the package Pig is attempting to load it from). How do I get Pig 0.13 to work with Hadoop 1.1.2?
=======Error follows as below=======
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-08-03 14:01:05,959 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-08-03 14:01:05,959 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/pig-0.13.0/bin/pig_1407088865958.log
2014-08-03 14:01:06,112 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.localdomain:8020/
2014-08-03 14:01:06,388 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.localdomain:8021
2014-08-03 14:01:06,440 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl
Details at logfile: /home/hadoop/pig-0.13.0/bin/pig_1407088865958.log
Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class org.apache.pig.tools.pigstats.PigStatsUtil
at org.apache.pig.Main.run(Main.java:643)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
===Contents of pig_1407088865958.log ===
Pig Stack Trace
ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/task/JobContextImpl
at org.apache.pig.tools.pigstats.PigStatsUtil.<clinit>(PigStatsUtil.java:68)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:79)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.JobContextImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 9 more

Though it is unclear how well this works for everyone, it appears that the asker mentioned how he solved the problem:
In my searching for help I saw posts stating that it needs to be
recompiled with a parameter that indicates version. The parameter
values I saw were 23,24. I did not know how that parameter mapped to
the version of hadoop that I am using 1.1.2. I hacked the bin/pig
script to point to hadoop-core-1.1.2.jar. The script requires
HADOOP_HOME be set (which is deprecated).

Hadoop Wordcount Program Compile Error

I am new to hadoop programming.I am using eclipse for hadoop development.I added all jar files through java buildpath when i run my program it is not running and giving this error,so please help me.how to solve error?
14/05/31 23:33:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/31 23:33:10 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/05/31 23:33:10 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-deep/mapred/staging/deep689130586/.staging/job_local689130586_0001
14/05/31 23:33:10 ERROR security.UserGroupInformation: PriviledgedActionException as:deep cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/ already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/ already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:975)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
at hadoop1.MyJob.run(MyJob.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at hadoop1.MyJob.main(MyJob.java:63)

The native libraries failed to load.
This can be due to you are using a 64-bit machine but the hadoop distro is for 32-bit. You can follow the steps here http://www.ercoppa.org/Linux-Compile-Hadoop-220-fix-Unable-to-load-native-hadoop-library.htm to recompile hadoop for 64 bit, and then replace the native libs.

error when reading from S3 using Spark/Hadoop

I am trying to read data from Amazon S3 using the Spark.
but I am getting
java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException
from inside a Hadoop call.
I have tried dwnloading jets3t and adding all the included jars to my classpath,
but it did not help.
Here is the full transcript of what is happening:
scala> val zz = sc.textFile("s3n:/<bucket>/<path>")
13/08/30 19:50:21 INFO storage.MemoryStore: ensureFreeSpace(45979) called with curMem=46019, maxMem=8579469803
13/08/30 19:50:21 INFO storage.MemoryStore: Block broadcast_1 stored as values to memory (estimated size 44.9 KB, free 8.0 GB)
zz: spark.RDD[String] = MappedRDD[3] at textFile at <console>:12
scala> zz.first
13/08/30 19:50:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/08/30 19:50:38 WARN snappy.LoadSnappy: Snappy native library not loaded
java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:224)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:214)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:176)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:76)
at spark.RDD.partitions(RDD.scala:214)
at spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)
at spark.RDD.partitions(RDD.scala:214)
at spark.RDD.take(RDD.scala:764)
at spark.RDD.first(RDD.scala:778)

When you run a Hadoop job you have to set the Hadoop classpath Environmental Variable. Typically, this is done in the Hadoop startup script.
export HADOOP_CLASSPATH=/path/to/yourlib:/path/to/your/other/lib
replace the : with a ; if you are on windows.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

wordcount from eclipse - hadoop

Thanks Shujaat that solved my problem. From Eclipse I was getting the same issue... use hdfs://localhost:54310/user/... instead of "/user/..."

Related

Debugging hadoop Wordcount program in eclipse in windows

ERROR security.UserGroupInformation: PriviledgedActionException in Hadoop 2.2

Pig 0.13 ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl

Hadoop Wordcount Program Compile Error

error when reading from S3 using Spark/Hadoop

Categories

Resources