Hadoop attempt jvm hang up - hadoop

Hadoop 0.20.2
There are a couple of jobs need to be executed one by one, and some attempt job's JVM can't be killed. Logs below. It seems like the tasktracker can't find the JVMId if you see "JVM Not killed jvm_201208192339_6873_m_1286217329 but just removed". I have seen the source code. But I can't find out the reason why the tasktracker can't find the JVMId. By the way, there are 13 tasktrakers, and only the new 3 of them got this problem, did I forget to configure something?
Somebody help me find the reason? Thanks. ^O^
2012-09-20 13:52:56,655 INFO org.apache.hadoop.mapred.TaskTracker:
Received KillTaskAction for task: attempt_201208192339_6873_m_004334_0
2012-09-20 13:52:56,655 INFO org.apache.hadoop.mapred.TaskTracker:
About to purge task: attempt_201208192339_6873_m_004334_0
2012-09-20 13:52:56,655 INFO org.apache.hadoop.mapred.JvmManager:
JVM Not killed jvm_201208192339_6873_m_1286217329 but just removed
2012-09-20 13:52:56,655 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 8
2012-09-20 13:52:56,655 INFO org.apache.hadoop.mapred.IndexCache: Map
ID attempt_201208192339_6873_m_004334_0 not found in cache
2012-09-20 13:52:56,962 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_201208192339_6873_m_004334_0
task's state:KILLED_UNCLEAN
2012-09-20 13:52:56,962 INFO org.apache.hadoop.mapred.TaskTracker:
Trying to launch : attempt_201208192339_6873_m_004334_0 which needs 1
slots
2012-09-20 13:52:56,962 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 8 and trying to launch
attempt_201208192339_6873_m_004334_0 which needs 1 slots
2012-09-20 13:52:56,968 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_201208192339_6873_m_677724590
2012-09-20 13:52:56,968 INFO org.apache.hadoop.mapred.JvmManager: JVM
Runner jvm_201208192339_6873_m_677724590 spawned.
2012-09-20 13:52:56,974 INFO org.apache.hadoop.mapred.TaskController:
Writing commands to
/disk10/hdfs/mapred/local/ttprivate/taskTracker/root/jobcache/job_201208192339_6873/attempt_201208192339_6873_m_004334_0.cleanup/taskjvm.sh
2012-09-20 13:52:58,017 INFO org.apache.hadoop.mapred.TaskTracker: JVM
with ID: jvm_201208192339_6873_m_677724590 given task:
attempt_201208192339_6873_m_004334_0
2012-09-20 13:52:58,557 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201208192339_6873_m_004334_0 0.0%
2012-09-20 13:52:58,564 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201208192339_6873_m_004334_0 0.0% cleanup
2012-09-20 13:52:58,566 INFO org.apache.hadoop.mapred.TaskTracker:
Task attempt_201208192339_6873_m_004334_0 is done.
2012-09-20 13:52:58,566 INFO org.apache.hadoop.mapred.TaskTracker:
reported output size for attempt_201208192339_6873_m_004334_0 was -1
2012-09-20 13:52:58,566 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 8

Eventually, that node with this problem has another issue that is its operating system didn't match the hardware.After running job in new operating system for a while, that problem didn't appear again. The old operating system turned out to not perform well in network. It will lower the network bandwidth.

Related

Unable to run hadoop 1.2.1 examples on Mac OS X

I have installed hadoop 1.2.1 on an iMAC running OS X 10.8.5 and after running jps I can see that all the expected processes have started up fine. The issue I am having is when I try to run a map-reduce job I am getting a repeated error: "Error: Can't connect to window server - not enough permissions".
These lines are in my hadoop-env.sh:
export JAVA_HOME=`/usr/libexec/java_home -v 1.6`
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
This is the output I am getting:
bash-3.2$ hadoop jar hadoop-examples-1.2.1.jar pi 10 100
Warning: $HADOOP_HOME is deprecated.
Number of Maps = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
14/02/03 13:11:20 INFO mapred.FileInputFormat: Total input paths to process : 10
14/02/03 13:11:21 INFO mapred.JobClient: Running job: job_201402031302_0002
14/02/03 13:11:22 INFO mapred.JobClient: map 0% reduce 0%
14/02/03 13:11:23 INFO mapred.JobClient: Task Id : attempt_201402031302_0002_m_000011_0, Status : FAILED
Error: Can't connect to window server - not enough permissions.
attempt_201402031302_0002_m_000011_0: 2014-02-03 13:11:21.878 java[8245:1203] Unable to load realm info from SCDynamicStore
14/02/03 13:11:24 INFO mapred.JobClient: Task Id : attempt_201402031302_0002_m_000011_1, Status : FAILED
Error: Can't connect to window server - not enough permissions.
attempt_201402031302_0002_m_000011_1: 2014-02-03 13:11:22.627 java[8252:1203] Unable to load realm info from SCDynamicStore
14/02/03 13:11:24 INFO mapred.JobClient: Task Id : attempt_201402031302_0002_m_000011_2, Status : FAILED
Error: Can't connect to window server - not enough permissions.
attempt_201402031302_0002_m_000011_2: 2014-02-03 13:11:23.558 java[8269:1203] Unable to load realm info from SCDynamicStore
14/02/03 13:11:26 INFO mapred.JobClient: Task Id : attempt_201402031302_0002_m_000010_0, Status : FAILED
Error: Can't connect to window server - not enough permissions.
attempt_201402031302_0002_m_000010_0: 2014-02-03 13:11:25.353 java[8301:1203] Unable to load realm info from SCDynamicStore
14/02/03 13:11:27 INFO mapred.JobClient: Task Id : attempt_201402031302_0002_m_000010_1, Status : FAILED
Error: Can't connect to window server - not enough permissions.
attempt_201402031302_0002_m_000010_1: 2014-02-03 13:11:26.259 java[8309:1203] Unable to load realm info from SCDynamicStore
14/02/03 13:11:28 INFO mapred.JobClient: Task Id : attempt_201402031302_0002_m_000010_2, Status : FAILED
Error: Can't connect to window server - not enough permissions.
attempt_201402031302_0002_m_000010_2: 2014-02-03 13:11:27.179 java[8325:1203] Unable to load realm info from SCDynamicStore
14/02/03 13:11:28 INFO mapred.JobClient: Job complete: job_201402031302_0002
14/02/03 13:11:28 INFO mapred.JobClient: Counters: 4
14/02/03 13:11:28 INFO mapred.JobClient: Job Counters
14/02/03 13:11:28 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5846
14/02/03 13:11:28 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/03 13:11:28 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/02/03 13:11:28 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/02/03 13:11:28 INFO mapred.JobClient: Job Failed: JobCleanup Task Failure, Task: task_201402031302_0002_m_000010
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:297)
at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

mapred.JobClient: Error reading task output http:... when running hadoop from Cygwin on Windows OS

I was running the "Generating vectors from documents" sample from the book "Mahout in Action" from Cygwin on Windows.
Hadoop is started only on the local machine.
Below is my running command:
$ bin/mahout seq2sparse -i reuters-seqfiles/ -o reuters-vectors -ow
But it shows below java.io.IOException, anyone knows what causes this problem? Thanks in advance!
Running on hadoop, using HADOOP_HOME=my_hadoop_path
HADOOP_CONF_DIR=my_hadoop_conf_path
13/05/13 18:38:03 WARN driver.MahoutDriver: No seq2sparse.props found on classpath, will use command-line arguments only
13/05/13 18:38:03 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
13/05/13 18:38:03 INFO common.HadoopUtil: Deleting reuters-vectors
13/05/13 18:38:04 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
13/05/13 18:38:04 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
13/05/13 18:38:04 INFO input.FileInputFormat: Total input paths to process : 2
13/05/13 18:38:04 INFO mapred.JobClient: Running job: job_201305131836_0001
13/05/13 18:38:05 INFO mapred.JobClient: map 0% reduce 0%
13/05/13 18:38:15 INFO mapred.JobClient: Task Id : attempt_201305131836_0001_m_000003_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
13/05/13 18:38:15 WARN mapred.JobClient: Error reading task outputhttp://namenode_address:50060/tasklog?plaintext=true&taskid=attempt_201305131836_0001_m_000003_0&filter=stdout
13/05/13 18:38:15 WARN mapred.JobClient: Error reading task outputhttp://namenode_address:50060/tasklog?plaintext=true&taskid=attempt_201305131836_0001_m_000003_0&filter=stderr
13/05/13 18:38:21 INFO mapred.JobClient: Task Id : attempt_201305131836_0001_m_000003_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
Below is the running log of tasktracker:
INFO org.apache.hadoop.mapred.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
INFO org.apache.hadoop.mapred.TaskTracker: ProcessTree implementation is missing on this system. TaskMemoryManager is disabled.
INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760
INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201305141049_0001_m_000002_0 task's state:UNASSIGNED
INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201305141049_0001_m_000002_0
INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201305141049_0001_m_000002_0
INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201305141049_0001_m_1036671648
INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201305141049_0001_m_1036671648 spawned.
INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201305141049_0001_m_1036671648 exited. Number of tasks it ran: 0
WARN org.apache.hadoop.mapred.TaskRunner: attempt_201305141049_0001_m_000002_0 Child Error
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
INFO org.apache.hadoop.mapred.TaskRunner: attempt_201305141049_0001_m_000002_0 done; removing files.
INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
By looking at the whatever log you have posted, it seems you haven't set the HADOOP_HOME=my_hadoop_path and HADOOP_CONF_DIR=my_hadoop_conf_path.
You need to put those directory paths for e.g. HADOOP_HOME=/usr/lib/hadoop and HADOOP_CONF_DIR=/usr/lib/hadoop/conf.
If this is not the case, try with bin/mahout only and check if seq2sparse is present somewhere in the list. This line clearly states that it's not found: driver.MahoutDriver: No seq2sparse.props found on classpath, will use command-line arguments only.

Pig DUMP gets stuck in GROUP

I'm a PIG beginner (using pig 0.10.0) and I have some simple JSON like the following:
test.json:
{
"from": "1234567890",
.....
"profile": {
"email": "me#domain.com"
.....
}
}
which i perform some group/counting in pig:
>pig -x local
with the following PIG script:
REGISTER /pig-udfs/oink.jar;
REGISTER /pig-udfs/json-simple-1.1.jar;
REGISTER /pig-udfs/guava-12.0.jar;
REGISTER /pig-udfs/elephant-bird-2.2.3.jar;
users = LOAD 'test.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad=true') as (json:map[]);
domain_user = FOREACH users GENERATE oink.EmailDomainFilter(json#'profile'#'email') as email, json#'from' as user_id;
DUMP domain_user; /* Outputs: (domain.com,1234567890) */
grouped_domain_user = GROUP domain_user BY email;
DUMP grouped_domain_user; /* Outputs: =stuck here= */
Basically, when i try to dump the grouped_domain_user, pig gets stuck, seemly waiting for a map output to complete:
2012-05-31 17:45:22,111 [Thread-15] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0002_m_000000_0' done.
2012-05-31 17:45:22,119 [Thread-15] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : null
2012-05-31 17:45:22,123 [Thread-15] INFO org.apache.hadoop.mapred.ReduceTask - ShuffleRamManager: MemoryLimit=724828160, MaxSingleShuffleLimit=181207040
2012-05-31 17:45:22,125 [Thread-15] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
2012-05-31 17:45:22,125 [Thread-15] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
2012-05-31 17:45:22,125 [Thread-15] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
2012-05-31 17:45:22,126 [Thread-15] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
2012-05-31 17:45:22,126 [Thread-15] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
2012-05-31 17:45:22,128 [Thread for merging on-disk files] INFO org.apache.hadoop.mapred.ReduceTask - attempt_local_0002_r_000000_0 Thread started: Thread for merging on-disk files
2012-05-31 17:45:22,128 [Thread for merging in memory files] INFO org.apache.hadoop.mapred.ReduceTask - attempt_local_0002_r_000000_0 Thread started: Thread for merging in memory files
2012-05-31 17:45:22,128 [Thread for merging on-disk files] INFO org.apache.hadoop.mapred.ReduceTask - attempt_local_0002_r_000000_0 Thread waiting: Thread for merging on-disk files
2012-05-31 17:45:22,129 [Thread-15] INFO org.apache.hadoop.mapred.ReduceTask - attempt_local_0002_r_000000_0 Need another 1 map output(s) where 0 is already in progress
2012-05-31 17:45:22,129 [Thread for polling Map Completion Events] INFO org.apache.hadoop.mapred.ReduceTask - attempt_local_0002_r_000000_0 Thread started: Thread for polling Map Completion Events
2012-05-31 17:45:22,129 [Thread-15] INFO org.apache.hadoop.mapred.ReduceTask - attempt_local_0002_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-05-31 17:45:28,118 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:45:31,122 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:45:37,123 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:45:43,124 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:45:46,124 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:45:52,126 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:45:58,127 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
2012-05-31 17:46:01,128 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy >
.... repeats ....
Suggestions would be welcome on why this is happening.
Thanks!
UPDATE
Chris solved this one for me. I was setting the fs.default.name, etc to correct values in the pig.properties, however i also had the HADOOP_CONF_DIR environment variable set to point to my local Hadoop installation with these sames values set with <final>true</final>.
Great find and much appreciated.
To mark this question as answered, and to those stumbling across this in future:
When running on local mode (whether that be for pig via the pig -x local, or submitting a map reduce job to the local job runner, if you are seeing the reduce phase 'hang', especially if you see entries in the log similar to:
2012-05-31 17:45:22,129 [Thread-15] INFO org.apache.hadoop.mapred.ReduceTask -
attempt_local_0002_r_000000_0 Need another 1 map output(s) where 0 is already in progress
Then your job, although started in local mode, has probably switched to 'clustered' mode because the mapred.job.tracker property is marked as 'final' in your $HADOOP/conf/mapred-site.xml:
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9000</value>
<final>true</final>
</property>
You should also check the fs.default.name property in core-site.xml too, and ensure it is not marked as final
This means that you are unable to set this value at runtime, and you may even see error messages similar to:
12/05/22 14:28:29 WARN conf.Configuration:
file:/tmp/.../job_local_0001.xml:a attempt to override final parameter: fs.default.name; Ignoring.
12/05/22 14:28:29 WARN conf.Configuration:
file:/tmp/.../job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring.

Hadoop - Reducer is waiting for Mapper inputs?

as explained in the title, when i execute my Hadoop Program (and debug it in local mode) the following happens:
1. All 10 csv-lines in my test data are handled correctly in the Mapper, the Partitioner and the RawComperator(OutputKeyComparatorClass) that is called after the map-step. But the OutputValueGroupingComparatorClass's and the ReduceClass's functions do NOT get executed afterwards.
2. My application looks like the following. (due to space constraints i omit the implementation of the classes i used as configuration parameters, til somebody has an idea, that involves them):
public class RetweetApplication {
public static int DEBUG = 1;
static String INPUT = "/home/ema/INPUT-H";
static String OUTPUT = "/home/ema/OUTPUT-H "+ (new Date()).toString();
public static void main(String[] args) {
JobClient client = new JobClient();
JobConf conf = new JobConf(RetweetApplication.class);
if(DEBUG > 0){
conf.set("mapred.job.tracker", "local");
conf.set("fs.default.name", "file:///");
conf.set("dfs.replication", "1");
}
FileInputFormat.setInputPaths(conf, new Path(INPUT));
FileOutputFormat.setOutputPath(conf, new Path(OUTPUT));
//conf.setOutputKeyClass(Text.class);
//conf.setOutputValueClass(Text.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);
conf.setMapperClass(RetweetMapper.class);
conf.setPartitionerClass(TweetPartitioner.class);
conf.setOutputKeyComparatorClass(TwitterValueGroupingComparator.class);
conf.setOutputValueGroupingComparator(TwitterKeyGroupingComparator.class);
conf.setReducerClass(RetweetReducer.class);
conf.setOutputFormat(TextOutputFormat.class);
client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
3. I get the following console output(sorry for the format, but somehow this log doesnt get formatted correctly):
12/05/22 03:51:05 INFO mapred.MapTask: io.sort.mb = 100 12/05/22
03:51:05 INFO mapred.MapTask: data buffer = 79691776/99614720
12/05/22 03:51:05 INFO mapred.MapTask: record buffer = 262144/327680
12/05/22 03:51:06 INFO mapred.JobClient: map 0% reduce 0%
12/05/22 03:51:11 INFO mapred.LocalJobRunner:
file:/home/ema/INPUT-H/tweets:0+967 12/05/22 03:51:12 INFO
mapred.JobClient: map 39% reduce 0%
12/05/22 03:51:14 INFO mapred.LocalJobRunner:
file:/home/ema/INPUT-H/tweets:0+967 12/05/22 03:51:15 INFO
mapred.MapTask: Starting flush of map output
12/05/22 03:51:15 INFO mapred.MapTask: Finished spill 0
12/05/22 03:51:15 INFO mapred.Task: Task:attempt_local_0001_m_000000_0
is done. And is in the process of commiting
12/05/22 03:51:15 INFO mapred.JobClient: map 79% reduce 0%
12/05/22 03:51:17 INFO mapred.LocalJobRunner:
file:/home/ema/INPUT-H/tweets:0+967
12/05/22 03:51:17 INFO mapred.LocalJobRunner:
file:/home/ema/INPUT-H/tweets:0+967
12/05/22 03:51:17 INFO mapred.Task: Task
'attempt_local_0001_m_000000_0' done.
12/05/22 03:51:17 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#35eed0
12/05/22 03:51:17 INFO mapred.ReduceTask: ShuffleRamManager:
MemoryLimit=709551680, MaxSingleShuffleLimit=177387920
12/05/22 03:51:17 INFO mapred.ReduceTask:
attempt_local_0001_r_000000_0 Thread started: Thread for merging
on-disk files
12/05/22 03:51:17 INFO mapred.ReduceTask:
attempt_local_0001_r_000000_0 Thread waiting: Thread for merging
on-disk files
12/05/22 03:51:17 INFO mapred.ReduceTask:
attempt_local_0001_r_000000_0 Thread started: Thread for merging in
memory files
12/05/22 03:51:17 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Need another 1 map output(s) where 0 is
already in progress 12/05/22 03:51:17 INFO mapred.ReduceTask:
attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0
dup hosts)
12/05/22 03:51:17 INFO mapred.ReduceTask:
attempt_local_0001_r_000000_0 Thread started: Thread for polling Map
Completion Events
12/05/22 03:51:18 INFO mapred.JobClient: map 100% reduce 0% 12/05/22 03:51:23 INFO mapred.LocalJobRunner: reduce > copy >
The bold marked lines repeat endlessly from this point.
4. Alot of open processes are active after the mapper saw every tuple:
RetweetApplication (1) [Remote Java Application]
OpenJDK Client VM[localhost:5002]
Thread [main] (Running)
Thread [Thread-2] (Running)
Daemon Thread [communication thread] (Running)
Thread [MapOutputCopier attempt_local_0001_r_000000_0.0] (Running)
Thread [MapOutputCopier attempt_local_0001_r_000000_0.1] (Running)
Thread [MapOutputCopier attempt_local_0001_r_000000_0.2] (Running)
Thread [MapOutputCopier attempt_local_0001_r_000000_0.4] (Running)
Thread [MapOutputCopier attempt_local_0001_r_000000_0.3] (Running)
Daemon Thread [Thread for merging on-disk files] (Running)
Daemon Thread [Thread for merging in memory files] (Running)
Daemon Thread [Thread for polling Map Completion Events] (Running)
Is there any reason, why Hadoop expects more output from the mapper (see the bold marked lines in the log) than i put into the input directory? As already mentioned, i debugged that ALL inputs are properly processed in the mapper/partitioner/etc.
UPDATE
With the help of Chris (see comments) i found out, that my program was NOT started in localMode as i expected it: the isLocal variable in the ReduceTask class is set to false, though it should be true.
To me it is absolutely unclear why this happens, since the 3 options that have to be set to enable the standalone mode were set the right way. Surprisingly: tho the local setting was ignored, the "read from normal disc" setting wasnt, which is very strange imho, because i thought local mode and the file:/// protocol are coupled.
During debugging ReduceTask i set the isLocal variable to true by evaluating isLocal=true in my debug view and then tried to execute the rest of the program. It did not work out and this is the stacktrace:
12/05/22 14:28:28 INFO mapred.LocalJobRunner:
12/05/22 14:28:28 INFO mapred.Merger: Merging 1 sorted segments
12/05/22 14:28:28 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1956 bytes
12/05/22 14:28:28 INFO mapred.LocalJobRunner:
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name; Ignoring.
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring.
12/05/22 14:28:30 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 0 time(s).
12/05/22 14:28:31 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 1 time(s).
12/05/22 14:28:32 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 2 time(s).
12/05/22 14:28:33 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 3 time(s).
12/05/22 14:28:34 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 4 time(s).
12/05/22 14:28:35 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 5 time(s).
12/05/22 14:28:36 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 6 time(s).
12/05/22 14:28:37 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 7 time(s).
12/05/22 14:28:38 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 8 time(s).
12/05/22 14:28:39 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 9 time(s).
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name; Ignoring.
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring.
12/05/22 14:28:39 WARN mapred.LocalJobRunner: job_local_0001
java.net.ConnectException: Call to master/127.0.0.1:9001 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
at org.apache.hadoop.ipc.Client.call(Client.java:1071)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:446)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:490)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
at org.apache.hadoop.ipc.Client.call(Client.java:1046)
... 17 more
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name; Ignoring.
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring.
12/05/22 14:28:39 INFO mapred.JobClient: Job complete: job_local_0001
12/05/22 14:28:39 INFO mapred.JobClient: Counters: 20
12/05/22 14:28:39 INFO mapred.JobClient: File Input Format Counters
12/05/22 14:28:39 INFO mapred.JobClient: Bytes Read=967
12/05/22 14:28:39 INFO mapred.JobClient: FileSystemCounters
12/05/22 14:28:39 INFO mapred.JobClient: FILE_BYTES_READ=14093
12/05/22 14:28:39 INFO mapred.JobClient: FILE_BYTES_WRITTEN=47859
12/05/22 14:28:39 INFO mapred.JobClient: Map-Reduce Framework
12/05/22 14:28:39 INFO mapred.JobClient: Map output materialized bytes=1960
12/05/22 14:28:39 INFO mapred.JobClient: Map input records=10
12/05/22 14:28:39 INFO mapred.JobClient: Reduce shuffle bytes=0
12/05/22 14:28:39 INFO mapred.JobClient: Spilled Records=10
12/05/22 14:28:39 INFO mapred.JobClient: Map output bytes=1934
12/05/22 14:28:39 INFO mapred.JobClient: Total committed heap usage (bytes)=115937280
12/05/22 14:28:39 INFO mapred.JobClient: CPU time spent (ms)=0
12/05/22 14:28:39 INFO mapred.JobClient: Map input bytes=967
12/05/22 14:28:39 INFO mapred.JobClient: SPLIT_RAW_BYTES=82
12/05/22 14:28:39 INFO mapred.JobClient: Combine input records=0
12/05/22 14:28:39 INFO mapred.JobClient: Reduce input records=0
12/05/22 14:28:39 INFO mapred.JobClient: Reduce input groups=0
12/05/22 14:28:39 INFO mapred.JobClient: Combine output records=0
12/05/22 14:28:39 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
12/05/22 14:28:39 INFO mapred.JobClient: Reduce output records=0
12/05/22 14:28:39 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
12/05/22 14:28:39 INFO mapred.JobClient: Map output records=10
12/05/22 14:28:39 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at uni.kassel.macek.rtprep.RetweetApplication.main(RetweetApplication.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Since this stacktrace now shows me, that the port 9001 is used during execution, i guess that somehow the xml-configuration file overwrites the local-java-made setting (which i use for testing), which is strange since i read over and over on the internet, that java overwrites xml configuration. If nobody knows how to correct this, ill try to simply erase all configuration-xmls. Perhaps this solves the problem...
NEW UPDATE
Renaming Hadoops conf folder solved the problem of the waiting copier and the program is executed til the end. Sadly the execution doesnt wait anymore for my debugger although HADOOP_OPTS is set correctly.
RESUME:Its only a configuration issue: XML may (for some configuration parameters) overwrite JAVA. If somebody knew how i can get debugging to run again, it would be perfect, but for now im just glad i dont see this stacktrace anymore! ;)
Thank you Chris for your time and effords!
Sorry i didn't see this before, but you appear to have two important configuration properties set to final in your conf xml files, as denoted by the following log statements:
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name; Ignoring.
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring.
This means that your job is unable to actually run in local mode, it starts in local mode, but the reducer reads the serialized job configuration and determines it is not in local mode, and tried to fetch map outputs via the task tracker ports.
You said your fix was to rename the conf folder - this will default hadoop back to the default configuration, where these two properties are not marked as 'final'

Hadoop Single Node installation on Windows 7

I am new to hadoop and trying to get a single node setup of Hadoop 0.20.2 on my Windows 7 machine.
My questions are two-fold - one with respect to the completeness of the installation itself and the other regarding the error in the reduce stage of a sample Word Count program.
My Installation steps are as follows:
I am following http://blog.benhall.me.uk/2011/01/installing-hadoop-0210-on-windows.html for the installation procedure.
I have installed cygwin and set up password-less ssh on my localhost
My java version is:
java version "1.7.0_02"
Java(TM) SE Runtime Environment (build 1.7.0_02-b13)
Java HotSpot(TM) 64-Bit Server VM (build 22.0-b10, mixed mode)
Contents of conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Contents of conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Contents of conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
I set the JAVA_HOME variable and the command "hadoop version" prints 0.20.2
hadoop namenode -format creates the DFS without any errors
start-all.sh prints that namenode, secondarynamenode, datanode, jobtracker and tasktracker have all started.
however, the command "jps" prints:
$ jps
4584 Jps
11008 JobTracker
2084 NameNode
I noticed in that jps printed the pids' of tasktracker, secondarynamenode as well.
I am able to view the output of
http://localhost:50030 for the jobtracker,
http://localhost:50060 for the tasktracker and
http://localhost:50070 for the namenode.
I tried both put and get commands to the hdfs and they were successful:
bin/hadoop fs -mkdir In
bin/hadoop fs -put *.txt In
mkdir temp
bin/hadoop fs -get In temp
ls -l temp/In
$ ls -l temp/In/
total 365
348624 Mar 24 23:59 CHANGES.txt
13366 Mar 24 23:59 LICENSE.txt
101 Mar 24 23:59 NOTICE.txt
1366 Mar 24 23:59 README.txt
I could also view these files by browsing the DFS via the http interface for namenode
Is my installation complete?
If yes, why does the jps command not show the pids of all five components?
If not, then, what steps do i need to complete the installation?
What are other sanity checks used to test the completeness of the installation?
I initially believed my installation to be complete and ran a sample WordCount map-reduce program along the lines of http://jayant7k.blogspot.com/2010/06/writing-your-first-map-reduce-program.html
I obtain the following output:
12/03/25 00:10:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/25 00:10:26 INFO input.FileInputFormat: Total input paths to process : 1
12/03/25 00:10:27 INFO mapred.JobClient: Running job: job_201203242348_0001
12/03/25 00:10:28 INFO mapred.JobClient: map 0% reduce 0%
12/03/25 00:10:35 INFO mapred.JobClient: map 100% reduce 0%
12/03/25 00:21:29 INFO mapred.JobClient: Task Id : attempt_201203242348_0001_r_0
00000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
12/03/25 00:32:25 INFO mapred.JobClient: Task Id : attempt_201203242348_0001_r_0
00000_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
12/03/25 00:44:02 INFO mapred.JobClient: Task Id : attempt_201203242348_0001_r_0
00000_2, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
12/03/25 00:55:00 INFO mapred.JobClient: Job complete: job_201203242348_0001
12/03/25 00:55:00 INFO mapred.JobClient: Counters: 12
12/03/25 00:55:00 INFO mapred.JobClient: Job Counters
12/03/25 00:55:00 INFO mapred.JobClient: Launched reduce tasks=4
12/03/25 00:55:00 INFO mapred.JobClient: Launched map tasks=1
12/03/25 00:55:00 INFO mapred.JobClient: Data-local map tasks=1
12/03/25 00:55:00 INFO mapred.JobClient: Failed reduce tasks=1
12/03/25 00:55:00 INFO mapred.JobClient: FileSystemCounters
12/03/25 00:55:00 INFO mapred.JobClient: HDFS_BYTES_READ=13366
12/03/25 00:55:00 INFO mapred.JobClient: FILE_BYTES_WRITTEN=23511
12/03/25 00:55:00 INFO mapred.JobClient: Map-Reduce Framework
12/03/25 00:55:00 INFO mapred.JobClient: Combine output records=0
12/03/25 00:55:00 INFO mapred.JobClient: Map input records=244
12/03/25 00:55:00 INFO mapred.JobClient: Spilled Records=1887
12/03/25 00:55:00 INFO mapred.JobClient: Map output bytes=19699
12/03/25 00:55:00 INFO mapred.JobClient: Combine input records=0
12/03/25 00:55:00 INFO mapred.JobClient: Map output records=1887
The map task seems complete, but the reduce task shows the following error in the logs:
2012-03-25 00:10:35,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0: Got 1 new map-outputs
2012-03-25 00:10:40,193 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201203242348_0001_m_000000_0, compressed len: 23479, decompressed len: 23475
2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 23475 bytes (23479 raw bytes) into RAM from attempt_201203242348_0001_m_000000_0
2012-03-25 00:11:35,194 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Need another 1 map output(s) where 1 is already in progress
2012-03-25 00:11:35,194 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-03-25 00:12:35,197 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Need another 1 map output(s) where 1 is already in progress
2012-03-25 00:12:35,197 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-03-25 00:13:35,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Need another 1 map output(s) where 1 is already in progress
2012-03-25 00:13:35,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-03-25 00:13:40,249 INFO org.apache.hadoop.mapred.ReduceTask: Failed to shuffle from attempt_201203242348_0001_m_000000_0
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:239)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:680)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2959)
at org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:149)
at org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1522)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
The following are the contents of the task tracker logs:
2012-03-25 00:10:27,910 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203242348_0001_m_000002_0 task's state:UNASSIGNED
2012-03-25 00:10:27,915 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:27,915 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:28,453 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203242348_0001_m_625085452
2012-03-25 00:10:28,454 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201203242348_0001_m_625085452 spawned.
2012-03-25 00:10:29,217 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201203242348_0001_m_625085452 given task: attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:29,523 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_m_000002_0 0.0% setup
2012-03-25 00:10:29,524 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201203242348_0001_m_000002_0 is done.
2012-03-25 00:10:29,524 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201203242348_0001_m_000002_0 was 0
2012-03-25 00:10:29,526 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
2012-03-25 00:10:29,718 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201203242348_0001_m_625085452 exited. Number of tasks it ran: 1
2012-03-25 00:10:30,911 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201203242348_0001/attempt_201203242348_0001_m_000002_0/output/file.out in any of the configured local directories
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203242348_0001_m_000000_0 task's state:UNASSIGNED
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201203242348_0001_m_000000_0
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203242348_0001_m_000000_0
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: Received KillTaskAction for task: attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201203242348_0001_m_000002_0 done; removing files.
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201203242348_0001_m_000002_0 not found in cache
2012-03-25 00:10:31,077 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203242348_0001_m_-1399302881
2012-03-25 00:10:31,077 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201203242348_0001_m_-1399302881 spawned.
2012-03-25 00:10:31,812 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201203242348_0001_m_-1399302881 given task: attempt_201203242348_0001_m_000000_0
2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_m_000000_0 1.0%
2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201203242348_0001_m_000000_0 is done.
2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201203242348_0001_m_000000_0 was 0
2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
2012-03-25 00:10:32,822 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201203242348_0001_m_-1399302881 exited. Number of tasks it ran: 1
2012-03-25 00:10:33,982 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203242348_0001_r_000000_0 task's state:UNASSIGNED
2012-03-25 00:10:33,982 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201203242348_0001_r_000000_0
2012-03-25 00:10:33,982 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203242348_0001_r_000000_0
2012-03-25 00:10:34,057 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203242348_0001_r_625085452
2012-03-25 00:10:34,057 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201203242348_0001_r_625085452 spawned.
2012-03-25 00:10:34,852 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201203242348_0001_r_625085452 given task: attempt_201203242348_0001_r_000000_0
2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 23479 bytes for reduce: 0 from map: attempt_201203242348_0001_m_000000_0 given 23479/23475
2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.33:50060, dest: 192.168.1.33:60790, bytes: 23479, op: MAPRED_SHUFFLE, cliID: attempt_201203242348_0001_m_000000_0
2012-03-25 00:10:41,153 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy >
2012-03-25 00:10:44,158 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy >
2012-03-25 00:16:05,244 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 23479 bytes for reduce: 0 from map: attempt_201203242348_0001_m_000000_0 given 23479/23475
2012-03-25 00:16:05,244 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.33:50060, dest: 192.168.1.33:60864, bytes: 23479, op: MAPRED_SHUFFLE, cliID: attempt_201203242348_0001_m_000000_0
2012-03-25 00:16:05,249 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy >
2012-03-25 00:16:08,249 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy >
2012-03-25 00:21:25,251 FATAL org.apache.hadoop.mapred.TaskTracker: Task: attempt_201203242348_0001_r_000000_0 - Killed due to Shuffle Failure: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
I had opened the ports 9000 and 9001 in the windows firewall
I checked the telnet output to verify that these ports were indeed open:
C:\Windows\system32>netstat -a -n | grep -e "500[367]0"
TCP 0.0.0.0:50030 0.0.0.0:0 LISTENING
TCP 0.0.0.0:50060 0.0.0.0:0 LISTENING
TCP 0.0.0.0:50070 0.0.0.0:0 LISTENING
TCP [::]:50030 [::]:0 LISTENING
TCP [::]:50060 [::]:0 LISTENING
TCP [::]:50070 [::]:0 LISTENING
C:\Windows\system32>netstat -a -n | grep -e "900[01]"
TCP 127.0.0.1:9000 0.0.0.0:0 LISTENING
TCP 127.0.0.1:9000 127.0.0.1:60332 ESTABLISHED
TCP 127.0.0.1:9000 127.0.0.1:60987 ESTABLISHED
TCP 127.0.0.1:9001 0.0.0.0:0 LISTENING
TCP 127.0.0.1:9001 127.0.0.1:60410 ESTABLISHED
TCP 127.0.0.1:60332 127.0.0.1:9000 ESTABLISHED
TCP 127.0.0.1:60410 127.0.0.1:9001 ESTABLISHED
TCP 127.0.0.1:60987 127.0.0.1:9000 ESTABLISHED
Could you help with both the issues of installation and getting the reduce task to work?
I looked at http://wiki.apache.org/hadoop/SocketTimeout and a few other links and tried the suggestions, but without any success.
I appreciate your patience in reading this post and would be happy to provide additional details.
Thanks in advance.
See this line in your logs:
2012-03-25 00:10:30,911 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201203242348_0001/attempt_201203242348_0001_m_000002_0/output/file.out in any of the configured local directories
I am guessing that you need to check hadoop.tmp.dir and mapred.local.dir. You mentioned about the configs that you are using and so the values of these two params is default. The default values of these params is given here. Set those to some relevant location and try again.
NOTE: Before you this change, you need to stop hadoop and start after you are done.

Resources