Unable to run hadoop 1.2.1 examples on Mac OS X - macos

I have installed hadoop 1.2.1 on an iMAC running OS X 10.8.5 and after running jps I can see that all the expected processes have started up fine. The issue I am having is when I try to run a map-reduce job I am getting a repeated error: "Error: Can't connect to window server - not enough permissions".
These lines are in my hadoop-env.sh:
export JAVA_HOME=`/usr/libexec/java_home -v 1.6`
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
This is the output I am getting:
bash-3.2$ hadoop jar hadoop-examples-1.2.1.jar pi 10 100
Warning: $HADOOP_HOME is deprecated.
Number of Maps = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
14/02/03 13:11:20 INFO mapred.FileInputFormat: Total input paths to process : 10
14/02/03 13:11:21 INFO mapred.JobClient: Running job: job_201402031302_0002
14/02/03 13:11:22 INFO mapred.JobClient: map 0% reduce 0%
14/02/03 13:11:23 INFO mapred.JobClient: Task Id : attempt_201402031302_0002_m_000011_0, Status : FAILED
Error: Can't connect to window server - not enough permissions.
attempt_201402031302_0002_m_000011_0: 2014-02-03 13:11:21.878 java[8245:1203] Unable to load realm info from SCDynamicStore
14/02/03 13:11:24 INFO mapred.JobClient: Task Id : attempt_201402031302_0002_m_000011_1, Status : FAILED
Error: Can't connect to window server - not enough permissions.
attempt_201402031302_0002_m_000011_1: 2014-02-03 13:11:22.627 java[8252:1203] Unable to load realm info from SCDynamicStore
14/02/03 13:11:24 INFO mapred.JobClient: Task Id : attempt_201402031302_0002_m_000011_2, Status : FAILED
Error: Can't connect to window server - not enough permissions.
attempt_201402031302_0002_m_000011_2: 2014-02-03 13:11:23.558 java[8269:1203] Unable to load realm info from SCDynamicStore
14/02/03 13:11:26 INFO mapred.JobClient: Task Id : attempt_201402031302_0002_m_000010_0, Status : FAILED
Error: Can't connect to window server - not enough permissions.
attempt_201402031302_0002_m_000010_0: 2014-02-03 13:11:25.353 java[8301:1203] Unable to load realm info from SCDynamicStore
14/02/03 13:11:27 INFO mapred.JobClient: Task Id : attempt_201402031302_0002_m_000010_1, Status : FAILED
Error: Can't connect to window server - not enough permissions.
attempt_201402031302_0002_m_000010_1: 2014-02-03 13:11:26.259 java[8309:1203] Unable to load realm info from SCDynamicStore
14/02/03 13:11:28 INFO mapred.JobClient: Task Id : attempt_201402031302_0002_m_000010_2, Status : FAILED
Error: Can't connect to window server - not enough permissions.
attempt_201402031302_0002_m_000010_2: 2014-02-03 13:11:27.179 java[8325:1203] Unable to load realm info from SCDynamicStore
14/02/03 13:11:28 INFO mapred.JobClient: Job complete: job_201402031302_0002
14/02/03 13:11:28 INFO mapred.JobClient: Counters: 4
14/02/03 13:11:28 INFO mapred.JobClient: Job Counters
14/02/03 13:11:28 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5846
14/02/03 13:11:28 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/03 13:11:28 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/02/03 13:11:28 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/02/03 13:11:28 INFO mapred.JobClient: Job Failed: JobCleanup Task Failure, Task: task_201402031302_0002_m_000010
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:297)
at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

Related

mapreduce job losts connection and then reconnects in hadoop example "calculating pi 3 3"

Does anyone know why? The job always got stuck in progressing(not 0%), >sometimes it might disconnect and then reconnect, basically the job cannot be >finished!!!
Would it be the memory distributed to mapreduce too little? Looking forward help!
[debura#master mapreduce]hadoop jar hadoop-mapreduce-examples-2.7.3.jar pi 3 3
Number of Maps = 3
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Starting Job
19/12/05 21:04:20 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.56.110:8032
19/12/05 21:04:21 INFO input.FileInputFormat: Total input paths to process : 3
19/12/05 21:04:22 INFO mapreduce.JobSubmitter: number of splits:3
19/12/05 21:04:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1575550949758_0001
19/12/05 21:04:23 INFO impl.YarnClientImpl: Submitted application application_1575550949758_0001
19/12/05 21:04:23 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1575550949758_0001/
19/12/05 21:04:23 INFO mapreduce.Job: Running job: job_1575550949758_0001
19/12/05 21:04:30 INFO mapreduce.Job: Job job_1575550949758_0001 running in uber mode : false
19/12/05 21:04:30 INFO mapreduce.Job: map 0% reduce 0%
19/12/05 21:04:34 INFO mapreduce.Job: map 33% reduce 0%
19/12/05 21:04:45 INFO mapreduce.Job: map 33% reduce 11%
19/12/05 21:07:31 INFO mapreduce.Job: Task Id : attempt_1575550949758_0001_m_000001_0, Status : FAILED
Container launch failed for container_1575550949758_0001_01_000004 : java.net.ConnectException: Call From slave2/192.168.56.112 to localhost:42149 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
...
Then reconnects again
19/12/05 21:07:36 INFO mapreduce.Job: map 67% reduce 11%
19/12/05 21:07:37 INFO mapreduce.Job: map 67% reduce 22%
19/12/05 21:10:33 INFO mapreduce.Job: Task Id : attempt_1575550949758_0001_m_000000_1, Status : FAILED
Container launch failed for container_1575550949758_0001_01_000007 : java.net.ConnectException: Call From slave2/192.168.56.112 to localhost:42149 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
...
It appears that a datanode is not running on slave2, or the hdfs-site.xml is misconfigured to where the clients should be reading from
From slave2/192.168.56.112 to localhost:42149 failed

Hadoop-2.5.1 + Nutch-2.2.1: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

Command: ./crawl /urls /mydir XXXXX 2
When I run this command in Hadoop-2.5.1 and Nutch-2.2.1, I get the wrong information as following.
14/10/07 19:58:10 INFO mapreduce.Job: Running job: job_1411692996443_0016
14/10/07 19:58:17 INFO mapreduce.Job: Job job_1411692996443_0016 running in uber mode : false
14/10/07 19:58:17 INFO mapreduce.Job: map 0% reduce 0%
14/10/07 19:58:21 INFO mapreduce.Job: Task Id : attempt_1411692996443_0016_m_000000_0, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
14/10/07 19:58:26 INFO mapreduce.Job: Task Id : attempt_1411692996443_0016_m_000000_1, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
14/10/07 19:58:31 INFO mapreduce.Job: Task Id : attempt_1411692996443_0016_m_000000_2, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
14/10/07 19:58:36 INFO mapreduce.Job: map 100% reduce 0%
14/10/07 19:58:36 INFO mapreduce.Job: Job job_1411692996443_0016 failed with state FAILED due to: Task failed task_1411692996443_0016_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/10/07 19:58:36 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=11785
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=11785
Total vcore-seconds taken by all map tasks=11785
Total megabyte-seconds taken by all map tasks=12067840
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
14/10/07 19:58:36 ERROR crawl.InjectorJob: InjectorJob: java.lang.RuntimeException: job failed: name=[/mydir]inject /urls, jobid=job_1411692996443_0016
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:55)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Probably you are using Gora (or smth else) compiled with Hadoop 1 (from maven repo?). You can download Gora (0.5?) and build it with Hadoop 2.
Perhaps it is just the first trouble in the series of problems.
Please notify us about your future steps.
I had similar error on nutch 2.x with hadoop 2.4.0
Recompile nutch with hadoop 2.5.1 dependencies (ivy) and exclude all hadoop 1.x dependencies - you can find them in lib - probably hadoop-core.

Mahout - Exception: Java Heap space

I'm trying to convert some texts to mahout sequence files using:
mahout seqdirectory -i Lastfm-ArtistTags2007 -o seqdirectory
But all I get is a OutOfMemoryError, as here:
Running on hadoop, using /usr/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /opt/mahout/mahout-examples-0.9-job.jar
14/04/07 16:44:34 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[Lastfm-ArtistTags2007], --keyPrefix=[], --method=[mapreduce], --output=[seqdirectoryjps], --startPhase=[0], --tempDir=[temp]}
14/04/07 16:44:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/04/07 16:44:35 INFO input.FileInputFormat: Total input paths to process : 4
14/04/07 16:44:35 WARN snappy.LoadSnappy: Snappy native library not loaded
14/04/07 16:44:35 INFO mapred.JobClient: Running job: job_local407267609_0001
14/04/07 16:44:35 INFO mapred.LocalJobRunner: Waiting for map tasks
14/04/07 16:44:35 INFO mapred.LocalJobRunner: Starting task: attempt_local407267609_0001_m_000000_0
14/04/07 16:44:35 INFO util.ProcessTree: setsid exited with exit code 0
14/04/07 16:44:35 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#6ad3ad65
14/04/07 16:44:35 INFO mapred.MapTask: Processing split: Paths:/home/giuliano/cook/lastfm/Lastfm-ArtistTags2007/README.txt:0+2472,/home/giuliano/cook/lastfm/Lastfm-ArtistTags2007/ArtistTags.dat:0+71652722,/home/giuliano/cook/lastfm/Lastfm-ArtistTags2007/tags.txt:0+1739746,/home/giuliano/cook/lastfm/Lastfm-ArtistTags2007/artists.txt:0+327051
14/04/07 16:44:35 INFO compress.CodecPool: Got brand-new compressor
14/04/07 16:44:35 INFO mapred.LocalJobRunner: Map task executor complete.
14/04/07 16:44:35 WARN mapred.LocalJobRunner: job_local407267609_0001
java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:119)
at org.apache.mahout.text.WholeFileRecordReader.nextKeyValue(WholeFileRecordReader.java:118)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
14/04/07 16:44:36 INFO mapred.JobClient: map 0% reduce 0%
14/04/07 16:44:36 INFO mapred.JobClient: Job complete: job_local407267609_0001
14/04/07 16:44:36 INFO mapred.JobClient: Counters: 0
14/04/07 16:44:36 INFO driver.MahoutDriver: Program took 1749 ms (Minutes: 0.02915)
I am using Mahout 0.9, Hadoop 1.2.1 and OpenJDK Java7u25
defining MAHOUT_HEAPSIZE to 4096 did not help, and the text files can be found here: http://static.echonest.com/Lastfm-ArtistTags2007.tar.gz
Currently the spawned job is executed as a local job runner, the execution happens only in the node in which you fired the job. Specify the job tracker address by setting the property mapred.job.tracker in your mapred-site.xml inorder to make the execution distributed.
Execution in distributed mode might solve your outOfMemory issue
If you look at the environment variable HADOOP_CONF_DIR, its values is empty set its value using the following command export HADOOP_CONF_DIR=/etc/hadoop/conf. Make sure the value of the property mapred.job.tracker which should point to your jobTracker in /etc/hadoop/conf/mapred-site.xml configuration

Hadoop distcp command not working

Hi i am trying to move my data from cluster having CDH4.3 to cluster having CDH4.5.
I am executing the following command.
hadoop distcp -update hftp://server1:50070/hbase/test/x hdfs://server2:8020/copy/
After executing i am getting the following error:
14/01/28 19:42:43 INFO tools.DistCp: srcPaths=[hftp://server1:50070/hbase/test/x]
14/01/28 19:42:43 INFO tools.DistCp: destPath=hdfs://server2:8020/copy
14/01/28 19:42:45 INFO tools.DistCp: sourcePathsCount=1
14/01/28 19:42:45 INFO tools.DistCp: filesToCopyCount=1
14/01/28 19:42:45 INFO tools.DistCp: bytesToCopyCount=1
14/01/28 19:42:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/01/28 19:42:47 INFO mapred.JobClient: Running job: job_201401101918_0008
14/01/28 19:42:48 INFO mapred.JobClient: map 0% reduce 0%
14/01/28 19:43:05 INFO mapred.JobClient: map 100% reduce 0%
14/01/28 19:43:07 INFO mapred.JobClient: Task Id : attempt_201401101918_0008_m_000000_0, Status : FAILED
14/01/28 19:43:08 INFO mapred.JobClient: map 0% reduce 0%
14/01/28 19:43:19 INFO mapred.JobClient: map 100% reduce 0%
14/01/28 19:43:22 INFO mapred.JobClient: Task Id : attempt_201401101918_0008_m_000000_1, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
14/01/28 19:43:23 INFO mapred.JobClient: map 0% reduce 0%
14/01/28 19:43:33 INFO mapred.JobClient: map 100% reduce 0%
14/01/28 19:43:35 INFO mapred.JobClient: Task Id : attempt_201401101918_0008_m_000000_2, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
14/01/28 19:43:36 INFO mapred.JobClient: map 0% reduce 0%
14/01/28 19:43:46 INFO mapred.JobClient: map 100% reduce 0%
14/01/28 19:43:50 INFO mapred.JobClient: map 0% reduce 0%
14/01/28 19:43:53 INFO mapred.JobClient: Job complete: job_201401101918_0008
14/01/28 19:43:53 INFO mapred.JobClient: Counters: 6
14/01/28 19:43:53 INFO mapred.JobClient: Job Counters
14/01/28 19:43:53 INFO mapred.JobClient: Failed map tasks=1
14/01/28 19:43:53 INFO mapred.JobClient: Launched map tasks=4
14/01/28 19:43:53 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=64095
14/01/28 19:43:53 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/01/28 19:43:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/01/28 19:43:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/01/28 19:43:53 INFO mapred.JobClient: Job Failed: NA
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1388)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:667)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
You have new mail in /var/spool/mail/root
[hdfs#sdl1039 root]$ hadoop distcp -update hftp://server1:50070/hbase/test/x hdfs://server2:8020/copy hadoop distcp -update hftp://server1:50070/hbase/test/x hdfs://server2:8020/copy
14/01/28 19:46:09 INFO tools.DistCp: srcPaths=[hftp://server1:50070/hbase/test/x, hdfs://server2:8020/copy, hadoop, distcp, hftp://server1:50070/hbase/test/x]
14/01/28 19:46:09 INFO tools.DistCp: destPath=hdfs://server2:8020/copy
With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source hadoop does not exist.
Input source distcp does not exist.
at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
Please guide me where i am going wrong.
I got a solution for now
hadoop distcp -update hdfs://server1:8020/hbase/test/x hdfs://server2:8020/copy/
But definatly would like to know why hftp is not working for me.
I think u have a wrong port number for hftp. 50070 is the default port for namenode web ui.
try :
hadoop distcp -update hftp://server1/hbase/test/x hdfs://server2:8020/copy/

Hadoop on EC2 error: could only be replicated to 0 nodes, instead of 1

I'm running a very small Hadoop cluster on EC2.
I'm starting a cluster using whirr (version: whirr-0.2.0-incubating) with 1 (jobtracker + namenode) and 4 (datanodes + tasktracker) :
whirr.hardware-id=c1.medium
whirr.instance-templates=1 jt+nn,4 dn+tt
whirr.provider=ec2
When I run my job I'm getting the following error , not right away but after a while :
.....
11/01/17 17:31:47 INFO mapred.JobClient: map 100% reduce 66%
11/01/17 17:31:49 INFO mapred.JobClient: map 100% reduce 68%
11/01/17 17:31:52 INFO mapred.JobClient: map 100% reduce 70%
11/01/17 17:31:56 INFO mapred.JobClient: map 100% reduce 73%
11/01/17 17:32:01 INFO mapred.JobClient: Task Id : attempt_201101172141_0002_r_000000_0, Status : FAILED
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/test/test2/test3/_temporary/_attempt_201101172141_0002_r_000000_0/parts/81/part-00000 could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
....
I found the same error in the tasktracker log file :
....
2011-01-17 22:31:36,968 INFO org.apache.hadoop.mapred.JobTracker: Adding task (cleanup)'attempt_201101172141_0002_m_000004_1' to tip task_201101172141_0002_m_000004, for tracker 'tracker_ip-11-222-333-444.ec2.internal:localhost/127.0.0.1:44840'
2011-01-17 22:31:39,972 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201101172141_0002_m_000004_1' from 'tracker_ip-11-222-333-444.ec2.internal:localhost/127.0.0.1:44840'
2011-01-17 22:31:57,985 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201101172141_0002_r_000000_0: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/dsp-test/test1/test2/_temporary/_attempt_201101172141_0002_r_000000_0/parts/81/part-00000 could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
In job's configuration I have:
dfs.replication=2
mapred.child.java.opts=-server -Xmx180m -XX:ErrorFile=/mnt/hadoop/logs/logs/java/java_error.log
Does anyone have an idea why I'm getting this error?
Did you run out of disk space? Sometimes if you run out of disk space it will complain in this way.

Resources