Hadoop: Reduce-side join get stuck at map 100% reduce 100% and never finish - hadoop

I'm beginner with Hadoop, these days I'm trying to run
reduce-side join example but it got stuck: Map 100% and Reduce 100%
but never finishing. Progress,logs, code, sample data and
configuration files are as below:
Progress:
12/10/02 15:48:06 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/02 15:48:06 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/02 15:48:06 INFO mapred.FileInputFormat: Total input paths to process : 2
12/10/02 15:48:07 INFO mapred.JobClient: Running job: job_201210021515_0007
12/10/02 15:48:08 INFO mapred.JobClient: map 0% reduce 0%
12/10/02 15:48:26 INFO mapred.JobClient: map 66% reduce 0%
12/10/02 15:48:35 INFO mapred.JobClient: map 100% reduce 0%
12/10/02 15:48:38 INFO mapred.JobClient: map 100% reduce 22%
12/10/02 15:48:47 INFO mapred.JobClient: map 100% reduce 100%
Logs from Reduce task:
2012-10-02 15:48:28,018 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1f53935
2012-10-02 15:48:28,179 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager: MemoryLimit=668126400, MaxSingleShuffleLimit=167031600
2012-10-02 15:48:28,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201210021515_0007_r_000000_0 Thread started: Thread for merging on-disk files
2012-10-02 15:48:28,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201210021515_0007_r_000000_0 Thread started: Thread for merging in memory files
2012-10-02 15:48:28,203 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201210021515_0007_r_000000_0 Thread waiting: Thread for merging on-disk files
2012-10-02 15:48:28,207 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201210021515_0007_r_000000_0 Thread started: Thread for polling Map Completion Events
2012-10-02 15:48:28,207 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201210021515_0007_r_000000_0 Need another 3 map output(s) where 0 is already in progress
2012-10-02 15:48:28,208 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201210021515_0007_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-10-02 15:48:33,209 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201210021515_0007_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2012-10-02 15:48:33,596 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201210021515_0007_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2012-10-02 15:48:38,606 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201210021515_0007_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2012-10-02 15:48:39,239 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread exiting
2012-10-02 15:48:39,239 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined.
2012-10-02 15:48:39,241 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager
2012-10-02 15:48:39,242 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 0 files left.
2012-10-02 15:48:39,242 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 3 files left.
2012-10-02 15:48:39,285 INFO org.apache.hadoop.mapred.Merger: Merging 3 sorted segments
2012-10-02 15:48:39,285 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 10500 bytes
2012-10-02 15:48:39,314 INFO org.apache.hadoop.mapred.ReduceTask: Merged 3 segments, 10500 bytes to disk to satisfy reduce memory limit
2012-10-02 15:48:39,318 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1 files, 10500 bytes from disk
2012-10-02 15:48:39,319 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes from memory into reduce
2012-10-02 15:48:39,320 INFO org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
2012-10-02 15:48:39,322 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 10496 bytes
Java Code:
public class DataJoin extends Configured implements Tool {
public static class MapClass extends DataJoinMapperBase {
protected Text generateInputTag(String inputFile) {//specify tag
String datasource = inputFile.split("-")[0];
return new Text(datasource);
}
protected Text generateGroupKey(TaggedMapOutput aRecord) {//takes a tagged record (of type TaggedMapOutput)and returns the group key for joining
String line = ((Text) aRecord.getData()).toString();
String[] tokens = line.split(",", 2);
String groupKey = tokens[0];
return new Text(groupKey);
}
protected TaggedMapOutput generateTaggedMapOutput(Object value) {//wraps the record value into a TaggedMapOutput type
TaggedWritable retv = new TaggedWritable((Text) value);
retv.setTag(this.inputTag);//inputTag: result of generateInputTag
return retv;
}
}
public static class Reduce extends DataJoinReducerBase {
protected TaggedMapOutput combine(Object[] tags, Object[] values) {//combination of the cross product of the tagged records with the same join (group) key
if (tags.length != 2) return null;
String joinedStr = "";
for (int i=0; i<tags.length; i++) {
if (i > 0) joinedStr += ",";
TaggedWritable tw = (TaggedWritable) values[i];
String line = ((Text) tw.getData()).toString();
if (line == null)
return null;
String[] tokens = line.split(",", 2);
joinedStr += tokens[1];
}
TaggedWritable retv = new TaggedWritable(new Text(joinedStr));
retv.setTag((Text) tags[0]);
return retv;
}
}
public static class TaggedWritable extends TaggedMapOutput {//tagged record
private Writable data;
public TaggedWritable() {
this.tag = new Text("");
this.data = null;
}
public TaggedWritable(Writable data) {
this.tag = new Text("");
this.data = data;
}
public Writable getData() {
return data;
}
#Override
public void write(DataOutput out) throws IOException {
this.tag.write(out);
out.writeUTF(this.data.getClass().getName());
this.data.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
this.tag.readFields(in);
String dataClz = in.readUTF();
if ((this.data == null) || !this.data.getClass().getName().equals(dataClz)) {
try {
this.data = (Writable) ReflectionUtils.newInstance(Class.forName(dataClz), null);
System.out.printf(dataClz);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
this.data.readFields(in);
}
}
public int run(String[] args) throws Exception {
Configuration conf = getConf();
JobConf job = new JobConf(conf, DataJoin.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("DataJoin");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(TextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(TaggedWritable.class);
job.set("mapred.textoutputformat.separator", ",");
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(),
new DataJoin(),
args);
System.exit(res);
}
}
Sample data:
file 1: apat.txt(1 line) 4373932,1983,8446,1981,"NL","",16025,2,65,436,1,19,108,49,1,0.5289,0.6516,9.8571,4.1481,0.0109,0.0093,0,0
file 2: cite.txt(100 lines)
4373932,3641235
4373932,3720760
4373932,3853987
4373932,3900558
4373932,3939350
4373932,3941876
4373932,3992631
4373932,3996345
4373932,3998943
4373932,3999948
4373932,4001400
4373932,4011219
4373932,4025310
4373932,4036946
4373932,4058732
4373932,4104029
4373932,4108972
4373932,4160016
4373932,4160018
4373932,4160019
4373932,4160818
4373932,4161515
4373932,4163779
4373932,4168146
4373932,4169137
4373932,4181650
4373932,4187075
4373932,4197361
4373932,4199599
4373932,4200436
4373932,4201763
4373932,4207075
4373932,4208479
4373932,4211766
4373932,4215102
4373932,4220450
4373932,4222744
4373932,4225783
4373932,4231750
4373932,4234563
4373932,4235869
4373932,4238195
4373932,4238395
4373932,4248854
4373932,4251514
4373932,4258130
4373932,4248965
4373932,4252783
4373932,4254097
4373932,4259313
4373932,4272505
4373932,4272506
4373932,4277437
4373932,4279992
4373932,4283382
4373932,4294817
4373932,4296201
4373932,4297273
4373932,4298687
4373932,4302534
4373932,4314026
4373932,4318707
4373932,4318846
4373932,3773625
4373932,3935074
4373932,3951748
4373932,3992516
4373932,3996344
4373932,3997657
4373932,4011308
4373932,4016250
4373932,4018884
4373932,4056724
4373932,4067959
4373932,4069352
4373932,4097586
4373932,4098876
4373932,4130462
4373932,4152411
4373932,4153675
4373932,4174384
4373932,4222743
4373932,4254096
4373932,4256834
4373932,4284412
4373932,4323647
4373932,3985867
4373932,4166105
4373932,4278653
4373932,4194877
4373932,4202815
4373932,4286959
4373932,4302536
4373932,4020151
4373932,4115535
4373932,4152412
4373932,4177253
4373932,4223002
4373932,4225485
4373932,4261968
Configurations:
core-site.xml
<!-- In: conf/core-site.xml -->
<property>
<name>hadoop.tmp.dir</name>
<value>/your/path/to/hadoop/tmp/dir/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
mapred-site.xml
<!-- In: conf/mapred-site.xml -->
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
hdfs-site.xml
<!-- In: conf/hdfs-site.xml -->
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
I've googled the answer and made some change in code or some configuration in (mapred/core/hdps)-site.xml files but I lost. I run this code in pseudo-mode. The join key from two files is equivalent. If I change the cite.txt file to 99 lines or lesser, It runs well while from 100 lines or above, it gets stuck like the logs shown. Please help me figure out the problem. I appreciate your explanation.
Best regards,
HaiLong

Please check your Reduce class.
I faced similar problem which turned out to be a very silly mistake. Maybe this will help you out and solve the issue:
while (values.hasNext()) {
String val = values.next().toString();
.....
}
You need to add: .next

Related

Map Reduce File Output Counter is zero

I am writing Map Reduce code for Inverted Indexing of a file which contains each line as "Doc_id Title Document Contents".
I am not able to figure out why File output format counter is zero although map reduce jobs are successfully completed without any Exception.
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class InvertedIndex {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, Text> {
private Text word = new Text();
private Text docID_Title = new Text();
//RemoveStopWords is a different class
static RemoveStopWords rmvStpWrd = new RemoveStopWords();
//Stemmer is a different class
Stemmer stemmer = new Stemmer();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
rmvStpWrd.makeStopWordList();
StringTokenizer itr = new StringTokenizer(value.toString().replaceAll(" [^\\p{L}]", " "));
//fetching id of the document
String id = null;
String title = null;
if(itr.hasMoreTokens())
id = itr.nextToken();
//fetching title of the document
if(itr.hasMoreTokens())
title = itr.nextToken();
String ID_TITLE = id + title;
if(id!=null)
docID_Title.set(ID_TITLE);
while (itr.hasMoreTokens()) {
/*manipulation of tokens:
* First we remove stop words
* Then Stem the words
*/
String temp = itr.nextToken().toLowerCase();
if(RemoveStopWords.isStopWord(temp)) {
continue;
}
else {
//now the word is not a stop word
//we will stem it
char[] a;
stemmer.add((a = temp.toCharArray()), a.length);
stemmer.stem();
temp = stemmer.toString();
word.set(temp);
context.write(word, docID_Title);
}
}//end while
}//end map
}//end mapper
public static class IntSumReducer
extends Reducer<Text,Text,Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
//to iterate over the values
Iterator<Text> itr = values.iterator();
String old = itr.next().toString();
int freq = 1;
String next = null;
boolean isThere = true;
StringBuilder stringBuilder = new StringBuilder();
while(itr.hasNext()) {
//freq counts number of times a word comes in a document
freq = 1;
while((isThere = itr.hasNext())) {
next = itr.next().toString();
if(old == next)
freq++;
else {
//the loop break when we get different docID_Title for the word(key)
break;
}
//if more data is there
if(isThere) {
old = old +"_"+ freq;
stringBuilder.append(old);
stringBuilder.append(" | ");
old = next;
context.write(key, new Text(stringBuilder.toString()));
stringBuilder.setLength(0);
}
else {
//for the last key
freq++;
old = old +"_"+ freq;
stringBuilder.append(old);
stringBuilder.append(" | ");
old = next;
context.write(key, new Text(stringBuilder.toString()));
}//end else
}//end while
}//end while
}//end reduce
}//end reducer
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "InvertedIndex");
job.setJarByClass(InvertedIndex.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}//end main
}//end InvertexIndex
This is the output I am getting:
16/10/03 15:34:21 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/10/03 15:34:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/10/03 15:34:21 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/10/03 15:34:22 INFO input.FileInputFormat: Total input paths to process : 1
16/10/03 15:34:22 INFO mapreduce.JobSubmitter: number of splits:1
16/10/03 15:34:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local507694567_0001
16/10/03 15:34:22 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/10/03 15:34:22 INFO mapreduce.Job: Running job: job_local507694567_0001
16/10/03 15:34:22 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/10/03 15:34:22 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/10/03 15:34:22 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
16/10/03 15:34:22 INFO mapred.LocalJobRunner: Waiting for map tasks
16/10/03 15:34:22 INFO mapred.LocalJobRunner: Starting task: attempt_local507694567_0001_m_000000_0
16/10/03 15:34:22 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/10/03 15:34:22 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/10/03 15:34:22 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/sonu/ss.txt:0+1002072
16/10/03 15:34:23 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/10/03 15:34:23 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/10/03 15:34:23 INFO mapred.MapTask: soft limit at 83886080
16/10/03 15:34:23 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/10/03 15:34:23 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/10/03 15:34:23 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/10/03 15:34:23 INFO mapreduce.Job: Job job_local507694567_0001 running in uber mode : false
16/10/03 15:34:23 INFO mapreduce.Job: map 0% reduce 0%
16/10/03 15:34:24 INFO mapred.LocalJobRunner:
16/10/03 15:34:24 INFO mapred.MapTask: Starting flush of map output
16/10/03 15:34:24 INFO mapred.MapTask: Spilling map output
16/10/03 15:34:24 INFO mapred.MapTask: bufstart = 0; bufend = 2206696; bufvoid = 104857600
16/10/03 15:34:24 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 25789248(103156992); length = 425149/6553600
16/10/03 15:34:24 INFO mapred.MapTask: Finished spill 0
16/10/03 15:34:24 INFO mapred.Task: Task:attempt_local507694567_0001_m_000000_0 is done. And is in the process of committing
16/10/03 15:34:24 INFO mapred.LocalJobRunner: map
16/10/03 15:34:24 INFO mapred.Task: Task 'attempt_local507694567_0001_m_000000_0' done.
16/10/03 15:34:24 INFO mapred.LocalJobRunner: Finishing task: attempt_local507694567_0001_m_000000_0
16/10/03 15:34:24 INFO mapred.LocalJobRunner: map task executor complete.
16/10/03 15:34:25 INFO mapred.LocalJobRunner: Waiting for reduce tasks
16/10/03 15:34:25 INFO mapred.LocalJobRunner: Starting task: attempt_local507694567_0001_r_000000_0
16/10/03 15:34:25 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/10/03 15:34:25 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/10/03 15:34:25 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle#5d0e7307
16/10/03 15:34:25 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=333971456, maxSingleShuffleLimit=83492864, mergeThreshold=220421168, ioSortFactor=10, memToMemMergeOutputsThreshold=10
16/10/03 15:34:25 INFO reduce.EventFetcher: attempt_local507694567_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
16/10/03 15:34:25 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local507694567_0001_m_000000_0 decomp: 2 len: 6 to MEMORY
16/10/03 15:34:25 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local507694567_0001_m_000000_0
16/10/03 15:34:25 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->2
16/10/03 15:34:25 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
16/10/03 15:34:25 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/10/03 15:34:25 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
16/10/03 15:34:25 INFO mapred.Merger: Merging 1 sorted segments
16/10/03 15:34:25 INFO mapred.Merger: Down to the last merge-pass, with 0 segments left of total size: 0 bytes
16/10/03 15:34:25 INFO reduce.MergeManagerImpl: Merged 1 segments, 2 bytes to disk to satisfy reduce memory limit
16/10/03 15:34:25 INFO reduce.MergeManagerImpl: Merging 1 files, 6 bytes from disk
16/10/03 15:34:25 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
16/10/03 15:34:25 INFO mapred.Merger: Merging 1 sorted segments
16/10/03 15:34:25 INFO mapred.Merger: Down to the last merge-pass, with 0 segments left of total size: 0 bytes
16/10/03 15:34:25 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/10/03 15:34:25 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
16/10/03 15:34:25 INFO mapred.Task: Task:attempt_local507694567_0001_r_000000_0 is done. And is in the process of committing
16/10/03 15:34:25 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/10/03 15:34:25 INFO mapred.Task: Task attempt_local507694567_0001_r_000000_0 is allowed to commit now
16/10/03 15:34:25 INFO output.FileOutputCommitter: Saved output of task 'attempt_local507694567_0001_r_000000_0' to hdfs://localhost:9000/user/sonu/output/_temporary/0/task_local507694567_0001_r_000000
16/10/03 15:34:25 INFO mapred.LocalJobRunner: reduce > reduce
16/10/03 15:34:25 INFO mapred.Task: Task 'attempt_local507694567_0001_r_000000_0' done.
16/10/03 15:34:25 INFO mapred.LocalJobRunner: Finishing task: attempt_local507694567_0001_r_000000_0
16/10/03 15:34:25 INFO mapred.LocalJobRunner: reduce task executor complete.
16/10/03 15:34:25 INFO mapreduce.Job: map 100% reduce 100%
16/10/03 15:34:25 INFO mapreduce.Job: Job job_local507694567_0001 completed successfully
16/10/03 15:34:25 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=17342
FILE: Number of bytes written=571556
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2004144
HDFS: Number of bytes written=0
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=53
Map output records=106288
Map output bytes=2206696
Map output materialized bytes=6
Input split bytes=103
Combine input records=106288
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=6
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=12
Total committed heap usage (bytes)=562036736
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1002072
File Output Format Counters
Bytes Written=0

hadoop mapper over consumption of memory(heap)

I wrote a simple hash join program in hadoop map reduce. The idea is the following:
A small table is distributed to every mapper using DistributedCache provided by hadoop framework. The large table is distributed over the mappers with the split size being 64M.
The setup code of the mapper creates a hashmap reading every line from this small table. In the mapper code, every key is searched(get) on the hashmap, and if the key exists in the hash map it is written out. There is no need of a reducer at this point of time. This is the code which we use:
public class Map extends Mapper<LongWritable, Text, Text, Text> {
private HashMap<String, String> joinData = new HashMap<String, String>();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String textvalue = value.toString();
String[] tokens;
tokens = textvalue.split(",");
if (tokens.length == 2) {
String joinValue = joinData.get(tokens[0]);
if (null != joinValue) {
context.write(new Text(tokens[0]), new Text(tokens[1] + ","
+ joinValue));
}
}
}
public void setup(Context context) {
try {
Path[] cacheFiles = DistributedCache.getLocalCacheFiles(context
.getConfiguration());
if (null != cacheFiles && cacheFiles.length > 0) {
String line;
String[] tokens;
BufferedReader br = new BufferedReader(new FileReader(
cacheFiles[0].toString()));
try {
while ((line = br.readLine()) != null) {
tokens = line.split(",");
if (tokens.length == 2) {
joinData.put(tokens[0], tokens[1]);
}
}
System.exit(0);
} finally {
br.close();
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
While testing this code, our small table was 32M, and large table was 128M, one master and 2 slave nodes.
This code fails with the above inputs when I have a 256M of heap. I use -Xmx256m in the mapred.child.java.opts in mapred-site.xml file. When I increase it to 300m it proceeds very slowly and with 512m it reaches its max throughput.
I dont understand where my mapper is consuming so much memory. With the inputs given above
and with the mapper code I dont expect my heap memory to ever reach 256M, yet it fails with java heap space error.
I will be thankful if you can give some insight into why the mapper is consuming so much memory.
EDIT:
13/03/11 09:37:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/03/11 09:37:33 INFO input.FileInputFormat: Total input paths to process : 1
13/03/11 09:37:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/11 09:37:33 WARN snappy.LoadSnappy: Snappy native library not loaded
13/03/11 09:37:34 INFO mapred.JobClient: Running job: job_201303110921_0004
13/03/11 09:37:35 INFO mapred.JobClient: map 0% reduce 0%
13/03/11 09:39:12 INFO mapred.JobClient: Task Id : attempt_201303110921_0004_m_000000_0, Status : FAILED
Error: GC overhead limit exceeded
13/03/11 09:40:43 INFO mapred.JobClient: Task Id : attempt_201303110921_0004_m_000001_0, Status : FAILED
org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException: File /usr/home/hadoop/hadoop-1.0.3/libexec/../logs/userlogs/job_201303110921_0004/attempt_201303110921_0004_m_000001_0/log.tmp already exists
at org.apache.hadoop.io.SecureIOUtils.insecureCreateForWrite(SecureIOUtils.java:130)
at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:157)
at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:312)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:385)
at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201303110921_0004_m_000001_0: Exception in thread "Thread for syncLogs" java.lang.OutOfMemoryError: Java heap space
attempt_201303110921_0004_m_000001_0: at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
attempt_201303110921_0004_m_000001_0: at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:59)
attempt_201303110921_0004_m_000001_0: at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:312)
attempt_201303110921_0004_m_000001_0: at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:385)
attempt_201303110921_0004_m_000001_0: at org.apache.hadoop.mapred.Child$3.run(Child.java:141)
attempt_201303110921_0004_m_000001_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201303110921_0004_m_000001_0: log4j:WARN Please initialize the log4j system properly.
13/03/11 09:42:18 INFO mapred.JobClient: Task Id : attempt_201303110921_0004_m_000001_1, Status : FAILED
Error: GC overhead limit exceeded
13/03/11 09:43:48 INFO mapred.JobClient: Task Id : attempt_201303110921_0004_m_000001_2, Status : FAILED
Error: GC overhead limit exceeded
13/03/11 09:45:09 INFO mapred.JobClient: Job complete: job_201303110921_0004
13/03/11 09:45:09 INFO mapred.JobClient: Counters: 7
13/03/11 09:45:09 INFO mapred.JobClient: Job Counters
13/03/11 09:45:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=468506
13/03/11 09:45:09 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/03/11 09:45:09 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/03/11 09:45:09 INFO mapred.JobClient: Launched map tasks=6
13/03/11 09:45:09 INFO mapred.JobClient: Data-local map tasks=6
13/03/11 09:45:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/03/11 09:45:09 INFO mapred.JobClient: Failed map tasks=1
It's hard to say for sure where the memory consumption is going, but here are a few pointers:
You're creating 2 Text objects for every line of your input. You should just use 2 Text objects that will be initialized once in your Mapper as class variables, and then for each line just call text.set(...). This is a common usage pattern for Map/Reduce patterns, and can save quite a bit of memory overhead.
You should consider using SequenceFile format for your input, which would avoid the need to parse the lines with textValue.split, you would instead have this data directly available as an array. I've read several times that doing string splits like this can be quite intensive, so you should avoid as much as possible if memory is really an issue. You can also think about using KeyValueTextInputFormat if, as in your example, you only care about key/value pairs.
If that isn't enough, I would advise looking at this link, especially part 7 which gives you a very simple method to profile your application and see what gets allocated where.

Getting error while implementing a simple sorting program in Mapreduce with zero reduce nodes

I tried implementing a sorting program in mapreduce such that I have just the sorted output after the map phase where the sorting is done by the hadoop framework internally. For it, I tried to set the number of reduce tasks to zero as there wasnt any reduction required. Now when I tried executing the program, I kept on getting checksum
error.. I am not able to figure out what's to be done next. Surely it's possible to run the program on my netbook as the sorting does work fine when I have set the reduce tasks to one.. Please help!!
For your reference, here's the entire code that I have written to perform the sorting:
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
/**
*
* #author root
*/
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.io.*;
import java.util.*;
import java.io.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.util.*;
import org.apache.hadoop.conf.*;
public class word extends Configured implements Tool
{
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
{
private static IntWritable one=new IntWritable(1);
private Text word=new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter report) throws IOException
{
String line=value.toString();
StringTokenizer token=new StringTokenizer(line," .,?!");
String wordToken=null;
while(token.hasMoreTokens())
{
wordToken=token.nextToken();
output.collect(new Text(wordToken), one);
}
}
}
public int run(String args[])throws Exception
{
//Configuration conf=getConf();
JobConf job=new JobConf(word.class);
job.setInputFormat(TextInputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setOutputFormat(TextOutputFormat.class);
job.setMapperClass(Map.class);
job.setNumReduceTasks(0);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
JobClient.runJob(job);
return 0;
}
public static void main(String args[])throws Exception
{
int exitCode=ToolRunner.run(new word(), args);
System.exit(exitCode);
}
}
Here is the checksum error I got on executing this program:
12/03/25 10:26:42 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
12/03/25 10:26:43 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/25 10:26:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/25 10:26:44 INFO mapred.FileInputFormat: Total input paths to process : 1
12/03/25 10:26:45 INFO mapred.JobClient: Running job: job_local_0001
12/03/25 10:26:45 INFO mapred.FileInputFormat: Total input paths to process : 1
12/03/25 10:26:45 INFO mapred.MapTask: numReduceTasks: 0
12/03/25 10:26:45 INFO fs.FSInputChecker: Found checksum error: b[0, 26]=610a630a620a640a650a740a790a780a730a670a7a0a680a730a
org.apache.hadoop.fs.ChecksumException: Checksum error: file:/root/NetBeansProjects/projectAll/output/regionMulti/individual/part-00000 at 0
at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/25 10:26:45 WARN mapred.LocalJobRunner: job_local_0001
org.apache.hadoop.fs.ChecksumException: Checksum error: file:/root/NetBeansProjects/projectAll/output/regionMulti/individual/part-00000 at 0
at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/25 10:26:46 INFO mapred.JobClient: map 0% reduce 0%
12/03/25 10:26:46 INFO mapred.JobClient: Job complete: job_local_0001
12/03/25 10:26:46 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at sortLog.run(sortLog.java:59)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at sortLog.main(sortLog.java:66)
Java Result: 1
BUILD SUCCESSFUL (total time: 4 seconds)
So have a look at the org.apache.hadoop.mapred.MapTask arround line 600 in 0.20.2.
// get an output object
if (job.getNumReduceTasks() == 0) {
output =
new NewDirectOutputCollector(taskContext, job, umbilical, reporter);
} else {
output = new NewOutputCollector(taskContext, job, umbilical, reporter);
}
If you set the number of reduce tasks to zero it will be directly written to the output. The NewOutputCollector will use the so called MapOutputBuffer which does the spilling, sorting, combining and partitioning.
So when you set no reducer, no sort takes places, even if Tom White states this in the definitive guide.
I have faced the same problem (checksum error concerning file part-00000 at 0). I solved it by renaming the file to any other name than -00000.
So if you need at least one Reducer to make the internal sorting happen, than you can take the IdentityReducer.
You may also want to see this discussion:
hadoop: difference between 0 reducer and identity reducer?

why is my sequence file being read twice in my hadoop mapper class?

i have a SequenceFile with 1264 records. each key is unique for each record. my problem is that my mapper seems to be reading this file twice or it is being read twice. for sanity checking, i have written a little utility class to read the SequenceFile and indeed, there are only 1264 records (i.e. SequenceFile.Reader).
in my reducer, i should only get 1 record per Iterable, however, when i iterate over the iterable (Iterator), i get 2 records per Key (always 2 per key, and not 1 or 3 or something else per Key).
the logging output of my Job is below. i am not sure why, but why is it that the "Total input paths to process" is 2? when i run my Job, i tried -Dmapred.input.dir=/data and also -Dmapred.input.dir=/data/part-r-00000, but still, the total paths to process is 2.
any ideas is appreciated.
12/03/01 05:28:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/01 05:28:30 INFO input.FileInputFormat: Total input paths to process : 2
12/03/01 05:28:31 INFO mapred.JobClient: Running job: job_local_0001
12/03/01 05:28:31 INFO input.FileInputFormat: Total input paths to process : 2
12/03/01 05:28:31 INFO mapred.MapTask: io.sort.mb = 100
12/03/01 05:28:31 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/01 05:28:31 INFO mapred.MapTask: record buffer = 262144/327680
12/03/01 05:28:31 INFO mapred.MapTask: Starting flush of map output
12/03/01 05:28:31 INFO mapred.MapTask: Finished spill 0
12/03/01 05:28:31 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/03/01 05:28:31 INFO mapred.LocalJobRunner:
12/03/01 05:28:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
12/03/01 05:28:31 INFO mapred.MapTask: io.sort.mb = 100
12/03/01 05:28:31 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/01 05:28:31 INFO mapred.MapTask: record buffer = 262144/327680
12/03/01 05:28:31 INFO mapred.MapTask: Starting flush of map output
12/03/01 05:28:31 INFO mapred.MapTask: Finished spill 0
12/03/01 05:28:31 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
12/03/01 05:28:31 INFO mapred.LocalJobRunner:
12/03/01 05:28:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.
12/03/01 05:28:31 INFO mapred.LocalJobRunner:
12/03/01 05:28:31 INFO mapred.Merger: Merging 2 sorted segments
12/03/01 05:28:31 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 307310 bytes
12/03/01 05:28:31 INFO mapred.LocalJobRunner:
12/03/01 05:28:32 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/03/01 05:28:32 INFO mapred.LocalJobRunner:
12/03/01 05:28:32 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/03/01 05:28:32 INFO mapred.JobClient: map 100% reduce 0%
12/03/01 05:28:32 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to results
12/03/01 05:28:32 INFO mapred.LocalJobRunner: reduce > reduce
12/03/01 05:28:32 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
12/03/01 05:28:33 INFO mapred.JobClient: map 100% reduce 100%
12/03/01 05:28:33 INFO mapred.JobClient: Job complete: job_local_0001
12/03/01 05:28:33 INFO mapred.JobClient: Counters: 12
12/03/01 05:28:33 INFO mapred.JobClient: FileSystemCounters
12/03/01 05:28:33 INFO mapred.JobClient: FILE_BYTES_READ=1320214
12/03/01 05:28:33 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1275041
12/03/01 05:28:33 INFO mapred.JobClient: Map-Reduce Framework
12/03/01 05:28:33 INFO mapred.JobClient: Reduce input groups=1264
12/03/01 05:28:33 INFO mapred.JobClient: Combine output records=0
12/03/01 05:28:33 INFO mapred.JobClient: Map input records=2528
12/03/01 05:28:33 INFO mapred.JobClient: Reduce shuffle bytes=0
12/03/01 05:28:33 INFO mapred.JobClient: Reduce output records=2528
12/03/01 05:28:33 INFO mapred.JobClient: Spilled Records=5056
12/03/01 05:28:33 INFO mapred.JobClient: Map output bytes=301472
12/03/01 05:28:33 INFO mapred.JobClient: Combine input records=0
12/03/01 05:28:33 INFO mapred.JobClient: Map output records=2528
12/03/01 05:28:33 INFO mapred.JobClient: Reduce input records=2528
My mapper class is very simple. It reads in a text file. To each line, it appends "m" to the line.
public class MyMapper extends Mapper<LongWritable, Text, LongWritable, Text> {
private static final Log _log = LogFactory.getLog(MyMapper.class);
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String s = (new StringBuilder()).append(value.toString()).append("m").toString();
context.write(key, new Text(s));
_log.debug(key.toString() + " => " + s);
}
}
My reducer class is also very simple. It simply appends "r" to the line.
public class MyReducer extends Reducer<LongWritable, Text, LongWritable, Text> {
private static final Log _log = LogFactory.getLog(MyReducer.class);
#Override
public void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
for(Iterator<Text> it = values.iterator(); it.hasNext();) {
Text txt = it.next();
String s = (new StringBuilder()).append(txt.toString()).append("r").toString();
context.write(key, new Text(s));
_log.debug(key.toString() + " => " + s);
}
}
}
my Job class is as follows.
public class MyJob extends Configured implements Tool {
public static void main(String[] args) throws Exception {
ToolRunner.run(new Configuration(), new MyJob(), args);
}
#Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
Path input = new Path(conf.get("mapred.input.dir"));
Path output = new Path(conf.get("mapred.output.dir"));
System.out.println("input = " + input);
System.out.println("output = " + output);
Job job = new Job(conf, "dummy job");
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
FileInputFormat.addInputPath(job, input);
FileOutputFormat.setOutputPath(job, output);
job.setJarByClass(MyJob.class);
return job.waitForCompletion(true) ? 0 : 1;
}
}
my input data looks like the following.
T, T
T, T
T, T
F, F
F, F
F, F
F, F
T, F
F, T
after running my Job, i get an output like the following.
0 T, Tmr
0 T, Tmr
6 T, Tmr
6 T, Tmr
12 T, Tmr
12 T, Tmr
18 F, Fmr
18 F, Fmr
24 F, Fmr
24 F, Fmr
30 F, Fmr
30 F, Fmr
36 F, Fmr
36 F, Fmr
42 T, Fmr
42 T, Fmr
48 F, Tmr
48 F, Tmr
did i do something wrong with setting up my Job? i tried the following way to run my Job, and in this approach, the file only gets read once. why is this? the System.out.println(inpath) and System.out.println(outpath) values are identical! help?
public class MyJob2 {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: MyJob2 <in> <out>");
System.exit(2);
}
String sInput = args[0];
String sOutput = args[1];
Path input = new Path(sInput);
Path output = new Path(sOutput);
System.out.println("input = " + input);
System.out.println("output = " + output);
Job job = new Job(conf, "dummy job");
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
FileInputFormat.addInputPath(job, input);
FileOutputFormat.setOutputPath(job, output);
job.setJarByClass(MyJob2.class);
int result = job.waitForCompletion(true) ? 0 : 1;
System.exit(result);
}
}
i got help from the hadoop mailing list. my problem was with the line below.
FileInputFormat.addInputPath(job, input);
this line simply appends input back to config. after commenting this line out, the input file is read only once now. in fact, i also commented out the other line,
FileOutputFormat.setOutputPath(job, output);
and everything still works.
I've had a similar problem, but for a different reason: linux apparently created a hidden copy of my input file (~input.txt), so that's a second way of getting this error..

infinitely loop for org.apache.hadoop.mapred.TaskTracker

I am running one simple hadoop application which collect information from a 64MB file, the map finished quickly, roughly in about 2 -5 minutes, then reduce also runs fast up to 16%, then it just stopped.
This is the program output
11/12/20 17:53:46 INFO input.FileInputFormat: Total input paths to process : 1
11/12/20 17:53:46 INFO mapred.JobClient: Running job: job_201112201749_0001
11/12/20 17:53:47 INFO mapred.JobClient: map 0% reduce 0%
11/12/20 17:54:06 INFO mapred.JobClient: map 4% reduce 0%
11/12/20 17:54:09 INFO mapred.JobClient: map 15% reduce 0%
11/12/20 17:54:12 INFO mapred.JobClient: map 28% reduce 0%
11/12/20 17:54:15 INFO mapred.JobClient: map 40% reduce 0%
11/12/20 17:54:18 INFO mapred.JobClient: map 53% reduce 0%
11/12/20 17:54:21 INFO mapred.JobClient: map 64% reduce 0%
11/12/20 17:54:24 INFO mapred.JobClient: map 77% reduce 0%
11/12/20 17:54:27 INFO mapred.JobClient: map 89% reduce 0%
11/12/20 17:54:30 INFO mapred.JobClient: map 98% reduce 0%
11/12/20 17:54:33 INFO mapred.JobClient: map 100% reduce 0%
11/12/20 17:54:54 INFO mapred.JobClient: map 100% reduce 8%
11/12/20 17:54:57 INFO mapred.JobClient: map 100% reduce 16%
In the data node log, I see tons of same message again and again, the following is there is starts,
2011-12-20 17:54:51,353 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.083333336% reduce > copy (1 of 4 at 9.01 MB/s) >
2011-12-20 17:54:51,507 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060, dest: 127.0.0.1:44367, bytes: 75623263, op: MAPRED_SHUFFLE, cliID: attempt_201112201749_0001_m_000000_0, duration: 2161793492
2011-12-20 17:54:54,389 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:54:57,433 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:55:40,359 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet#53d3cf JVM BUG(s) - injecting delay3 times
2011-12-20 17:55:40,359 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet#53d3cf JVM BUG(s) - recreating selector 3 times, canceled keys 72 times
2011-12-20 17:57:51,518 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:57:57,536 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:58:03,554 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
...
up to
2011-12-20 17:54:51,353 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.083333336% reduce > copy (1 of 4 at 9.01 MB/s) >
2011-12-20 17:54:51,507 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060, dest: 127.0.0.1:44367, bytes: 75623263, op: MAPRED_SHUFFLE, cliID: attempt_201112201749_0001_m_000000_0, duration: 2161793492
2011-12-20 17:54:54,389 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:54:57,433 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:55:40,359 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet#53d3cf JVM BUG(s) - injecting delay3 times
2011-12-20 17:55:40,359 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet#53d3cf JVM BUG(s) - recreating selector 3 times, canceled keys 72 times
2011-12-20 17:57:51,518 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:57:57,536 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
2011-12-20 17:58:03,554 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201112201749_0001_r_000000_0 0.16666667% reduce > copy (2 of 4 at 14.42 MB/s) >
Here is the code
package com.bluedolphin;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class MyJob {
private final static LongWritable one = new LongWritable(1);
private final static Text word = new Text();
public static class MyMapClass extends Mapper<LongWritable, Text, Text, LongWritable> {
public void map(LongWritable key,
Text value,
Context context) throws IOException, InterruptedException {
String[] citation = value.toString().split(",");
word.set(citation[0]);
context.write(word, one);
}
}
public static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
private LongWritable result = new LongWritable();
public void reduce(
Text key,
Iterator<LongWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: myjob <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "patent citation");
job.setJarByClass(MyJob.class);
job.setMapperClass(MyMapClass.class);
// job.setCombinerClass(MyReducer.class);
// job.setReducerClass(MyReducer.class);
job.setNumReduceTasks(0);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
I don't know how to further troubleshooting.
Thanks in advace.
I figured out the solution, in the reduce method signature, I should have been using Iterable, not Iterator. Because of that, the reduce method was actually not called. It runs fine now, but I don't know the internal reason that it hung before.
A couple of things i noticed in your code:
Since you are updating the Text object "word" in map and the LongWritable object "result" in reduce for each call to map and reduce respectively, you probably shouldn't declare them final (Although i don't see this as a problem in this case, as the objects are only changing the state).
Your code looks similar to the trivial word count, only difference is as you are just emiting one value per record in map. You could just eliminate reduce (i.e., run a map-only job) and see if you get what you expect from map.
I also had this infinity loop during the reduce work. After struggling for a day, the solution turns out to be adjust the /etc/hosts file.
It seems that the existence of the entry "127.0.1.1 your_Machine's_Name" confused the Hadoop. One proof of that would be the disability to access the slave:50060, the taskTracker on the slave machine, from the master machine.
As long as this "127.0.1.1 your_machine's_name" entry is deleted and add "your_machine's_name" at the end of the entry "127.0.0.1 localhost", my problem is gone.
I hope this observation can be helpful.

Resources