Nutch on windows: ERROR crawl.Injector

Nutch on windows: ERROR crawl.Injector - windows

I'm trying to install nutch 1.12 on a windows 2012 server based on cygwin64 2.874. Due to limited skills with java and linux I followed the step by step introduction at https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Seeding_the_crawldb_with_a_list_of_URLs. The command
bin/nutch inject crawl/crawldb urls
throws an error because winutils.exe couldn't be found. Here is the hadoop log:
2016-07-01 09:22:25,660 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:326)
at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:432)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:478)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at org.apache.nutch.crawl.Injector.main(Injector.java:441)
2016-07-01 09:22:25,832 INFO crawl.Injector - Injector: starting at 2016-07-01 09:22:25
2016-07-01 09:22:25,832 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb
2016-07-01 09:22:25,832 INFO crawl.Injector - Injector: urlDir: urls
2016-07-01 09:22:25,832 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries.
2016-07-01 09:22:26,050 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-07-01 09:22:26,394 ERROR crawl.Injector - Injector: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:849)
at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1149)
at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:58)
at org.apache.nutch.crawl.Injector.inject(Injector.java:357)
at org.apache.nutch.crawl.Injector.run(Injector.java:467)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:441)
I found several hints here and downloaded winutils.exe from https://codeload.github.com/srccodes/hadoop-common-2.2.0-bin/zip/master. I unpacked the folder on the server and set the environment variable NUTCH_OPTS=-Dhadoop.home.dir=[winutils_folder]. Now the winutils error is gone but the nutch call fails with a different error:
2016-07-01 13:24:09,697 INFO crawl.Injector - Injector: starting at 2016-07-01 13:24:09
2016-07-01 13:24:09,697 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb
2016-07-01 13:24:09,697 INFO crawl.Injector - Injector: urlDir: urls
2016-07-01 13:24:09,697 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries.
2016-07-01 13:24:09,885 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-07-01 13:24:10,307 ERROR crawl.Injector - Injector: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:849)
at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1149)
at org.apache.nutch.util.LockUtil.createLockFile(LockUtil.java:58)
at org.apache.nutch.crawl.Injector.inject(Injector.java:357)
at org.apache.nutch.crawl.Injector.run(Injector.java:467)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:441)
After updating .bashrc (added the following lines) the hadoop log only shows warnings.
export JAVA_HOME='/cygdrive/c/Program Files/Java/jre1.8.0_92'
export PATH=$PATH:$JAVA_HOME/bin
But nutch still throws an error:
Injector: starting at 2016-07-01 17:25:22
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:570)
at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)
at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:173)
at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:160)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:94)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:131)
at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163)
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.nutch.crawl.Injector.inject(Injector.java:376)
at org.apache.nutch.crawl.Injector.run(Injector.java:467)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:441)
I need hints what may be wrong configured or isn't it possible to run nutch with windows/cygwin?

I spent about a week to find out what causes UnsatisfiedLinkError.
I think the main reason is that java cannot find its dependencies.
I solved this problem by commenting out some scripts in nutch file.
(located in {nutch_folder}\runtime\local\bin)
To be more specific, I commented out scripts setting -Djava.library.path.
I made java program to use original java.library.path instead.
Use below command to check original java.libary.path on your environment.
java -XshowSettings:properties -version
It will show many properties including java.library.path = ...
comment out below code:
# setup 'java.library.path' for native-hadoop code if necessary
# used only in local mode
JAVA_LIBRARY_PATH=''
if [ -d "${NUTCH_HOME}/lib/native" ]; then
JAVA_PLATFORM=`"${JAVA}" -classpath "$CLASSPATH" org.apache.hadoop.util.PlatformName | sed -e 's/ /_/g'`
if [ -d "${NUTCH_HOME}/lib/native" ]; then
if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
JAVA_LIBRARY_PATH="${JAVA_LIBRARY_PATH}:${NUTCH_HOME}/lib/native/${JAVA_PLATFORM}"
else
JAVA_LIBRARY_PATH="${NUTCH_HOME}/lib/native/${JAVA_PLATFORM}"
fi
fi
fi
if [ $cygwin = true -a "X${JAVA_LIBRARY_PATH}" != "X" ]; then
JAVA_LIBRARY_PATH="`cygpath -p -w "$JAVA_LIBRARY_PATH"`"
fi
also comment out below code:
if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
NUTCH_OPTS=("${NUTCH_OPTS[#]}" -Djava.library.path="$JAVA_LIBRARY_PATH")
fi
by doing this NUTCH_OPTS will not contain -Djava.library.path, so java will use original setting.
nutch command is running like in (pseudo)-distributed mode.
I don't know why exactly.
Hope this helps.

No need to comment anything in the nutch script.
If you have placed the hadoop.dll and winutils.exe under a bin folder of your HADOOP_HOME make sure that the path to bin folder is also appended to your windows PATH env varable.

Related

geomesa add-attribute-index fails

I am trying to add an index to my existing table using the following command (run inside accumulo-master docker image)
geomesa add-attribute-index -u root -p secret -i gis -z SERVER_IP -c posiciones -f posicion -a id_posicion --coverage join
But it does not work and produce this output:
INFO Running map reduce index job for attributes: [id_posicion] with coverage: join...
ERROR Error encountered running attribute index command. Check hadoop's job history logs for more information.
The hadoop job log is the following:
2017-09-17 20:39:48,253 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1505353025896_0020_000002
2017-09-17 20:39:48,706 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-09-17 20:39:48,757 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2017-09-17 20:39:49,079 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 20 cluster_timestamp: 1505353025896 } attemptId: 2 } keyId: -1893920016)
2017-09-17 20:39:49,094 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred newApiCommitter.
2017-09-17 20:39:49,095 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config org.apache.hadoop.mapred.DirectFileOutputCommitter
2017-09-17 20:39:49,173 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat not found
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat not found
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:519)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:499)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1594)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:499)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:284)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1552)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1549)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1482)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:223)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:516)
... 11 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 13 more
Any idea?

This is probably a bug - the jars to load are defined in this file. Likely the file needs to be updated for newer versions of accumulo - the missing class now appears to be in the accumulo-core jar. You should be able to fix it by adding the line accumulo-core to that file, which ends up in lib/geomesa-accumulo-jobs-<version>.jar in the tools distribution.

Is $ACCUMULO_HOME set? And are other geomesa commands working?
Setting $ACCUMULO_HOME to point to a copy of the Accumulo distribution would likely help. If you are using the GeoMesa tools from a machine which is not part of the cluster, then you can use the install-hadoop-accumulo.sh script in the tools distribution to download a copy of the necessary dependencies to $GEOMESA_HOME/lib.

Apache Nutch 2.3: won't inject urls (hangs) & hadoop log shows warning

I'm stuck trying to set up Nutch 2.3 with Elasticsearch 5.4. The problem is in Nutch as I cannot get it to inject my urls. The hadoop log shows the following warning:
Console:
aurora apache-nutch-2.3.1 # runtime/local/bin/nutch inject urls/seed.txt
InjectorJob: starting at 2017-06-14 17:08:28
InjectorJob: Injecting urlDir: urls/seed.txt
** it hangs here**
and the
Hadoop log:
aurora apache-nutch-2.3.1 # cat runtime/local/logs/hadoop.log
2017-06-14 17:08:28,339 INFO crawl.InjectorJob - InjectorJob: starting at 2017-06-14 17:08:28
2017-06-14 17:08:28,340 INFO crawl.InjectorJob - InjectorJob: Injecting urlDir: urls/seed.txt
2017-06-14 17:08:28,992 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I've tried setting my Hadoop environment variables following this thread (Hadoop "Unable to load native-hadoop library for your platform" warning) but I'm still getting the same error.
Any ideas?

Don't worry about warning. And I believe you are running on a Linux distribution
Nutch2.3 is not compatible with ES 5.x. I had written a custom IndexWriter which invokes Logstash at given port which in turn invokes Elastic Search. You may try this approach or something around it.

Debugging hadoop Wordcount program in eclipse in windows

Trying to run Wordcount program in hadoop in eclipse (windows 7). and passing these argument in eclipse only
E:\hadoop\eclipse-hadoop-pro\workspace-hadoop\WordCountPro\input\word.txt
E:\hadoop\eclipse-hadoop-pro\workspace-hadoop\WordCountPro\output
I have created input file in project only like input folder and inside it word.txt file
But it is throughing below excption
2015-04-08 15:30:09,947 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-04-08 15:30:10,238 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable E:\hadoop\hadoop-HADOOP_HOME\hadoop-2.6.0\bin\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
at org.apache.hadoop.mapreduce.task.JobContextImpl.<init>(JobContextImpl.java:72)
at org.apache.hadoop.mapreduce.Job.<init>(Job.java:144)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:187)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:206)
at com.WordCount.main(WordCount.java:52)
2015-04-08 15:30:11,039 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1049)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2015-04-08 15:30:11,041 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/E:/hadoop/eclipse-hadoop-pro/workspace-hadoop/WordCountPro/output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:562)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at com.WordCount.main(WordCount.java:61)

I doubt if Hadoop is installed correctly. Check in your machine if all the daemons are running or not.If not, then consider re-checking or re-installing what you are missing.
ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable

Error on starting Pig

I configured Pig on my Hadoop system, but when I start it I get an error related to log4j. Am I missing something?
Thanks!
$ pig
log4j:ERROR Could not instantiate class [org.apache.hadoop.log.metrics.EventCounter].
java.lang.ClassNotFoundException: org.apache.hadoop.log.metrics.EventCounter
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.log4j.helpers.Loader.loadClass(Loader.java:179)
...
log4j:ERROR Could not instantiate appender named "EventCounter".
2014-02-14 10:45:46,512 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2014-02-14 10:45:46,513 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/hadoop/pig_1392381946511.log
2014-02-14 10:45:46,541 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hduser/.pigbootup not found
2014-02-14 10:45:46,695 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: maprfs:///
2014-02-14 10:45:46,767 [main] INFO org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2014-02-14 10:45:46,768 [main] INFO org.apache.hadoop.security.JniBasedUnixGroupsMapping - Using JniBasedUnixGroupsMapping for Group resolution
2014-02-14 10:45:46,853 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: maprfs:///
grunt>

First try running a pig script locally by
pig -x local <filename>.pig
If some error message shows up for running it locally also try
www.youtube.com/watch?v=BSsVvZnGz0M setup video,it is proper if you are using ubuntu 12.04LTS‎

wordcount from eclipse

I was using the eclipse plugin for hadoop. I can see all the files in HDFS by making a hadoop server but when I try to run the wordcount.java file from the eclipse, it gives me exception whereas from the terminal it runs smoothly. The exception is below.
2/11/14 04:09:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/11/14 04:09:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
12/11/14 04:09:06 WARN snappy.LoadSnappy: Snappy native library not loaded
12/11/14 04:09:06 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-hduser/mapred/staging/hduser1728681403/.staging/job_local_0001
12/11/14 04:09:06 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/hduser/gutenberg
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/hduser/gutenberg
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at WordCount.run(WordCount.java:149)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at WordCount.main(WordCount.java:155)

I'd start with investigating this:
ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/hduser/gutenberg
It seems it causes the problem. Are you sure this is the proper path? If so, you may not have the privilage to access to it. Later on I would try to eliminate as much WARN as I can.

Thanks Shujaat that solved my problem. From Eclipse I was getting the same issue...
use hdfs://localhost:54310/user/... instead of "/user/..."

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Nutch on windows: ERROR crawl.Injector - windows

No need to comment anything in the nutch script. If you have placed the hadoop.dll and winutils.exe under a bin folder of your HADOOP_HOME make sure that the path to bin folder is also appended to your windows PATH env varable.

Related

geomesa add-attribute-index fails

Apache Nutch 2.3: won't inject urls (hangs) & hadoop log shows warning

Debugging hadoop Wordcount program in eclipse in windows

Error on starting Pig

wordcount from eclipse

Categories

Resources