Apache Nutch 2.3: throwing Error Failed with exit value 255 - hadoop

I'm using apache nutch 2.3 version.
My hadoop version is 2.6.0.Hadoop is running on single node.
When I run following command of nutch
./crawl --index ~/test/seed ~/test -1
The output of the above command is following.
InjectorJob: starting at 2016-01-04 12:03:26
InjectorJob: Injecting urlDir: --index
InjectorJob: Using class org.apache.gora.memory.store.MemStore as the
Gora storage class.
InjectorJob:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path does not exist: file:/usr/local/nutch/runtime/local/bin/--index
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus
(FileInputFormat.java:235)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits
(FileInputFormat.java:252)
at org.apache.hadoop.mapred.JobClient.writeNewSplits
(JobClient.java:1054)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs
(UserGroupInformation.java:1190
at org.apache.hadoop.mapred.JobClient.submitJobInternal
(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:231)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Error running:
/usr/local/nutch/runtime/local/bin/nutch inject --index -crawlId
/home/jalaj/test/seed
Failed with exit value 255.
What is the problem with nutch? Should I need to install Apache Gora?

The problem is here: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/usr/local/nutch/runtime/local/bin/--index
Nutch try to read the seed file, but cannot. Please make sure your command is correct.
Hope this helps,
Le Quoc Do

Related

nutch1.14 deduplication failed

I have integrated nutch 1.14 along with solr-6.6.0 on CentOS Linux release 7.3.1611 I had given about 10 urls in seedlist which is at /usr/local/apache-nutch-1.13/urls/seed.txt I followed the tutorial
[root#localhost apache-nutch-1.14]# bin/nutch dedup http://ip:8983/solr/
DeduplicationJob: starting at 2018-01-09 15:07:52
DeduplicationJob: java.io.IOException: No FileSystem for scheme: http
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870)
at org.apache.nutch.crawl.DeduplicationJob.run(DeduplicationJob.java:326)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.DeduplicationJob.main(DeduplicationJob.java:369)
Everything upto solr related commands work. Please help.
Where is the hadoop element they are talking about in nutch tutorial. Do we have to install anything other than java for hadoop, nutch and solr to work together to build a search engine?
try this
bin/nutch dedup -Dsolr.server.url=http://ip:8983/solr/
I was reading the same guide and ran into the same problem. This might help:
(Step-by-Step: Deleting Duplicates)
$ bin/nutch dedup crawl/crawldb/ -Dsolr.server.url=http://localhost:8983/solr/nutch
DeduplicationJob: starting at 2018-02-23 14:27:34
Deduplication: 1 documents marked as duplicates
Deduplication: Updating status of duplicate urls into crawl db.
Deduplication finished at 2018-02-23 14:27:37, elapsed: 00:00:03

not able to run pig on windows 7 using cygwin

I configured pig as directed in the documentation.
Enviornment: Windows 7, Hadoop-0.20.2, pig 0.13.0, Cygwin
But when i type pig (mapreduce) on the command prompt it just displays below thing. I am not sure whether pig is started or not. I don't see GRUNT shell to execute script.
Btw, Hadoop is running on the same node.
Can someone please help?
$ pig
Find hadoop at /hadoop-0.20.2/bin/hadoop
dry run:
HADOOP_CLASSPATH: C:\cygwin64\pig-0.13.0\conf;C;C:\Program Files\Java\jdk1.6.0_25\lib\tools.jar;C;C:\cygwin64\hadoop-0.20.2\conf;C:\cygwin64\pig-0.13.0\lib\accumulo-core-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\accumulo-fate-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\accumulo-server-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\accumulo-start-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\accumulo-trace-1.5.0.jar;C:\cygwin64\pig-0.13.0\lib\avro-1.7.5.jar;C:\cygwin64\pig-0.13.0\lib\avro-mapred-1.7.5.jar;C:\cygwin64\pig-0.13.0\lib\avro-tools-1.7.5-nodeps.jar;C:\cygwin64\pig-0.13.0\lib\groovy-all-1.8.6.jar;C:\cygwin64\pig-0.13.0\lib\hbase-0.94.1.jar;C:\cygwin64\pig-0.13.0\lib\jruby-complete-1.6.7.jar;C:\cygwin64\pig-0.13.0\lib\js-1.7R2.jar;C:\cygwin64\pig-0.13.0\lib\json-simple-1.1.jar;C:\cygwin64\pig-0.13.0\lib\jython-standalone-2.5.3.jar;C:\cygwin64\pig-0.13.0\lib\piggybank.jar;C:\cygwin64\pig-0.13.0\lib\protobuf-java-2.4.0a.jar;C:\cygwin64\pig-0.13.0\lib\zookeeper-3.4.5.jar:C:\cygwin64\PIG-01~1.0/pig-withouthadoop-h2.jar:
HADOOP_OPTS: -Xmx1000m -Dpig.log.dir=C:\cygwin64\PIG-01~1.0\logs -Dpig.log.file=pig.log -Dpig.home.dir=C:\cygwin64\PIG-01~1.0\
HADOOP_CLIENT_OPTS: -Xmx1000m -Dpig.log.dir=C:\cygwin64\PIG-01~1.0\logs -Dpig.log.file=pig.log -Dpig.home.dir=C:\cygwin64\PIG-01~1.0\
/hadoop-0.20.2/bin/hadoop jar C:\cygwin64\PIG-01~1.0/pig-withouthadoop-h2.jar
when i run in debug mode, i see this exception. This is because Hadoop Jar is not set.
localhsot#mymachine~
$ echo $PIG_INSTALL
C:\cygwin64\pig-0.13.0
localhsot#mymachine~
$ export PIG_INSTALL=/cygdrive/c/cygwin64/pig-0.13.0
localhsot#mymachine~
$ export HADOOP_INSTALL=/cygdrive/c/cygwin64/hadoop-0.20.2/
localhsot#mymachine~
$ export PATH=$PATH:$PIG_INSTALL/bin:$HADOOP_INSTALL/bin
$ pig
14/08/26 14:05:12 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/08/26 14:05:12 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/08/26 14:05:12 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-08-26 14:05:12,998 [main] INFO org.apache.pig.Main - Apache Pig version 0. 13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-08-26 14:05:12,998 [main] INFO org.apache.pig.Main - Logging error message s to: C:\cygwin64\home\chparekh\pig_1409076312996.log
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/map reduce/task/JobContextImpl
at org.apache.pig.tools.pigstats.PigStatsUtil.<clinit>(PigStatsUtil.java :68)
at org.apache.pig.Main.run(Main.java:643)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.Jo bContextImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 8 more
you can refer below link for the same, i hope this will help you.
http://abhijitsureshshingate.wordpress.com/2013/07/08/code-debug-test-apache-pig-scripts-using-eclipse-on-windows/

Pig 0.13 ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl

Just installed Pig 0.13 and I am attempting to use it with Hadoop 1.1.2. (Pig documentation states Pig 0.13 is compatible with Hadoop 1.1.2). Per the Pig install instructions, I set $PIG_CLASSPATH
to point at /etc/hadoop where core-site.xml, hdfs-site.xml, and mapred-site.xml are defined. Hadoop cluster is functional and works fine with non-Pig jobs. Based on the error descriptions below, I understand that Pig cannot find the JobContextImpl class it is looking for.
Based on the Hadoop 1.1.2 API documentation, I don't believe "task" is a sub-package of the "mapreduce" package. I have tried adding hadoop-core-1.1.2.jar directly to $PIG_CLASSPATH
and that did not work. (After looking at the contents of hadoop-core-1.1.2.jar, and the Hadoop 1.1.2 API documentation, I don't believe JobContextImpl is defined in the package Pig is attempting to load it from). How do I get Pig 0.13 to work with Hadoop 1.1.2?
=======Error follows as below=======
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/08/03 14:01:05 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-08-03 14:01:05,959 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-08-03 14:01:05,959 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/pig-0.13.0/bin/pig_1407088865958.log
2014-08-03 14:01:06,112 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master.localdomain:8020/
2014-08-03 14:01:06,388 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master.localdomain:8021
2014-08-03 14:01:06,440 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl
Details at logfile: /home/hadoop/pig-0.13.0/bin/pig_1407088865958.log
Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class org.apache.pig.tools.pigstats.PigStatsUtil
at org.apache.pig.Main.run(Main.java:643)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
===Contents of pig_1407088865958.log ===
Pig Stack Trace
ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/task/JobContextImpl
at org.apache.pig.tools.pigstats.PigStatsUtil.<clinit>(PigStatsUtil.java:68)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:79)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.JobContextImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 9 more
Though it is unclear how well this works for everyone, it appears that the asker mentioned how he solved the problem:
In my searching for help I saw posts stating that it needs to be
recompiled with a parameter that indicates version. The parameter
values I saw were 23,24. I did not know how that parameter mapped to
the version of hadoop that I am using 1.1.2. I hacked the bin/pig
script to point to hadoop-core-1.1.2.jar. The script requires
HADOOP_HOME be set (which is deprecated).

Apache Pig - ERROR 2118: For input string: "4f8:0:a111::add:9898"

We recently upgraded the cluster to use Hadoop 2.0.0-cdh4.4.0.
After the change, we needed to reinstall pig, which used to work absolutely fine. After the installation as described here, the simplest HBase job does not get created.
raw_protobuffer = LOAD 'hbase://data_table' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('external_data:downloaded',
'-limit=1
-gte=0 -lte=1')
AS (data:bytearray);
Which fails with the magical:
Failed Jobs: JobId Alias Feature Message Outputs
N/A raw_protobuffer MAP_ONLY Message:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: For
input string: "4f8:0:a111::add:9898" at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at
org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063)
at
org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992) at
org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at
org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:319)
at
org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:239)
at
org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:270)
at
org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:160)
at java.lang.Thread.run(Thread.java:744) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:257)
Caused by: java.lang.NumberFormatException: For input string:
"4f8:0:a111::add:9898" at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492) at
java.lang.Integer.parseInt(Integer.java:527) at
com.sun.jndi.dns.DnsClient.(DnsClient.java:122) at
com.sun.jndi.dns.Resolver.(Resolver.java:61) at
com.sun.jndi.dns.DnsContext.getResolver(DnsContext.java:570) at
com.sun.jndi.dns.DnsContext.c_getAttributes(DnsContext.java:430) at
com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:231)
at
com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:139)
at
com.sun.jndi.toolkit.url.GenericURLDirContext.getAttributes(GenericURLDirContext.java:103)
at
javax.naming.directory.InitialDirContext.getAttributes(InitialDirContext.java:142)
at org.apache.hadoop.net.DNS.reverseDns(DNS.java:85) at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.reverseDNS(TableInputFormatBase.java:219)
at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:184)
at
org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat.getSplits(HBaseTableInputFormat.java:87)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
... 16 more
We suspected permissions to tmp folder but they seem to be fine (i.e. the job directory gets created with the pig runner (!) being its owner). Any suggestion what we might have missed would be much appreciated.
Looks like an IPv6 address to me - suggest you investigate disabling IPv6 functionality on your cluster

Hadoop error stalling job reduce process

I have been running a Hadoop job(word count example) a few times on my two-node cluster setup, and it´s been working fine up until now. I keep getting a RuntimeException which stalls the reduce process at 19%:
2013-04-13 18:45:22,191 INFO org.apache.hadoop.mapred.Task: Task:attempt_201304131843_0001_m_000000_0 is done. And is in the process of commiting
2013-04-13 18:45:22,299 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201304131843_0001_m_000000_0' done.
2013-04-13 18:45:22,318 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-04-13 18:45:23,181 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: Error while running command to get file permissions : org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:710)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:443)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)
at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:468)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)
at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Has anyone any ideas of what might be causing this?
Edit: Solved it myself.
If anyone else runs into the same problem, this was caused by the etc/hosts file on the master-node. I hadn´t entered the host-name and address of the slave-node.
This is how my hosts-file is structured on the master-node:
127.0.0.1 MyUbuntuServer
192.xxx.x.xx2 master
192.xxx.x.xx3 MySecondUbuntuServer
192.xxx.x.xx3 slave
a similar problem is described here:
http://comments.gmane.org/gmane.comp.apache.mahout.user/8898
The info there might be related to other version of hadoop. It says:
java.lang.RuntimeException: Error while running command to
get file permissions : java.io.IOException: Cannot run program
"/bin/ls": error=12, Not enough space
The solution their was to change heapsize through mapred.child.java.opts* *-Xmx1200M
See also: https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/BHGYJDNKMGE
HTH,
Avner

Resources