I met a very very strange problem. The reducers do work but if I check the output files, I only found the output from the mappers.
When I was trying to debug, I found the same problem with the word count sample after I changed the mappers' output value type from Longwritable to Text
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;
public class WordCount extends Configured implements Tool {
public static class Map
extends Mapper<LongWritable, Text, Text, Text> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text wtf, Context context)
throws IOException, InterruptedException {
String line = wtf.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
context.write(word, new Text("frommapper"));
public static class Reduce
extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Text wtfs,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : wtfs) {
sum += val.get();
context.write(key, new IntWritable(sum));*/
context.write(key,new Text("can't output"));
public int run(String [] args) throws Exception {
Job job = new Job(getConf());
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
public static void main(String[] args) throws Exception {
int ret = ToolRunner.run(new WordCount(), args);
here are the results
JobClient: Combine output records=0
12/06/13 17:37:46 INFO mapred.JobClient: Map input records=7
12/06/13 17:37:46 INFO mapred.JobClient: Reduce shuffle bytes=116
12/06/13 17:37:46 INFO mapred.JobClient: Reduce output records=7
12/06/13 17:37:46 INFO mapred.JobClient: Spilled Records=14
12/06/13 17:37:46 INFO mapred.JobClient: Map output bytes=96
12/06/13 17:37:46 INFO mapred.JobClient: Combine input records=0
12/06/13 17:37:46 INFO mapred.JobClient: Map output records=7
12/06/13 17:37:46 INFO mapred.JobClient: Reduce input records=7
then I found the strange results in the outfile. This problem happened after I changed the output value type of map and input key type of reducer to Text no matter I changed the type of reduce output value or not. I was also forced to change job.setOutputValue(Text.class)
a frommapper
a frommapper
a frommapper
gg frommapper
h frommapper
sss frommapper
sss frommapper

Your reduce function arguments should be as follows:
public void reduce(Text key, Iterable <Text> wtfs,
Context context) throws IOException, InterruptedException {
With the way you've defined the arguments, reduce operation is not getting a list of values, and therefore it just outputs whatever input it gets from the map function because
sum+ = val.get()
is just going from 0 to 1 every time because each <key, value> pair in the form <word, one> is coming separately to the reducer.
Also, the mapper function doesn't normally write to the output file ( i have never heard of it, but I don't know if that's possible). In the usual case, it is always the reducer that writes to output file. Mapper output is intermediate data that is handled transparently by Hadoop. So if you see something in the output file, that has to be the reducer output, not the mapper output. If you want to verify this, you can go to the logs for the job you ran, and check out what's happening in each mapper and reducer individually.
Hope this clears some things for you.


Hadoop mapreduce - mapping NullPointerException

I need to write a simple map-reduce program that , given as input a directed graph represented as a list of edges, produces the same graph where each edge (x,y) with x>y is replaced by (y,x) and there are no repetitions of edges in the output graph.
This is the code :
public class ExamGraph {
// mapper class
public static class MyMapper extends Mapper<LongWritable, Text, Text, NullWritable> {
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
value = new Text( value.toString());
String[] campi = value.toString().split(";");
if (Integer.getInteger(campi[0]) > Integer.getInteger(campi[1]))
context.write(new Text(campi[1]+";"+campi[0]), NullWritable.get());
else context.write(new Text(campi[0]+";"+campi[1]), NullWritable.get());
// reducer class
public static class MyReducer extends Reducer<Text, NullWritable, Text, NullWritable> {
protected void reduce(Text key, Iterable <NullWritable> values , Context context)
throws IOException, InterruptedException {
context.write(key, NullWritable.get());
public static void main(String[] args) throws Exception {
// create new job
Job job = Job.getInstance(new Configuration());
// job is based on jar containing this class
// for logging purposes
// set input path in HDFS
FileInputFormat.addInputPath(job, new Path(args[0]));
// set output path in HDFS (destination must not exist)
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// set mapper and reducer classes
// An InputFormat for plain text files.
// Files are broken into lines. Either linefeed or carriage-return are used
// to signal end of line. Keys are the position in the file, and values
// are the line of text.
// set type of output keys and values for both mappers and reducers
// start job
When I run the jar file using :
hadoop jar path/jar JOBNAME /inputlocation /outputlocation
I got this error :
18/05/22 02:13:11 INFO mapreduce.Job: Task Id : attempt_1526979627085_0001_m_000000_1, Status : FAILED
Error: java.lang.NullPointerException
at ExamGraph$MyMapper.map(ExamGraph.java:38)
at ExamGraph$MyMapper.map(ExamGraph.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
But I did not find the error in the code.
Found the problem , I confused the method getInteger() with the parseInt() in the mapper.

yarn stderr no logger appender and no stdout

I'm running a simple mapreduce program wordcount agian Apache Hadoop 2.6.0. The hadoop is running distributedly (several nodes). However, I'm not able to see any stderr and stdout from yarn job history. (but I can see the syslog)
The wordcount program is really simple, just for demo purpose.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static final Log LOG = LogFactory.getLog(WordCount.class);
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
LOG.info("LOG - map function invoked");
System.out.println("stdout - map function invoded");
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
context.write(word, one);
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
context.write(key, result);
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
FileInputFormat.addInputPath(job, new Path("hdfs://localhost:9000/user/jsun/input"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:9000/user/jsun/output"));
System.exit(job.waitForCompletion(true) ? 0 : 1);
Note in the map function of Mapper class, I added two statements:
LOG.info("LOG - map function invoked");
System.out.println("stdout - map function invoded");
These two statements are to test whether I can see logging from hadoop server. I can successfully run the program. But if I go to localhost:8088 to see the application history and then "logs", I see nothing in "stdout", and in "stderr":
log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
I think there is some configuration needed to get those output, but not sure which piece of information is missing. I searched online as well as in stackoverflow. Some people mentioned container-log4j.properties but they are not specific about how to configure that file and where to put.
One thing to note is I also tried the job with Hortonworks Data Platform 2.2 and Cloudera 5.4. The result is the same. I remember when I dealt with some previous version of hadoop (hadoop 1.x), I can easily see the loggings from same place. So I guess this is something new in hadoop 2.x
As a comparison, if I make the apache hadoop run in local mode (meaning LocalJobRunner), I can see some loggings in console like this:
[2015-09-08 15:57:25,992]org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:998) INFO:kvstart = 26214396; length = 6553600
[2015-09-08 15:57:25,996]org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402) INFO:Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
[2015-09-08 15:57:26,064]WordCount$TokenizerMapper.map(WordCount.java:28) INFO:LOG - map function invoked
stdout - map function invoded
[2015-09-08 15:57:26,075]org.apache.hadoop.mapred.LocalJobRunner$Job.statusUpdate(LocalJobRunner.java:591) INFO:
[2015-09-08 15:57:26,077]org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1457) INFO:Starting flush of map output
[2015-09-08 15:57:26,077]org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1475) INFO:Spilling map output
These kind of loggings ("map function is invoked") is what I expected in hadoop server logging.
All the sysout written in Map-Reduce program can not be seen on console. It is because map-reduce run in multiple parallel copies across the cluster, so there is no concept of a single console with output.
However, The System.out.println() for map and reduce phases can be seen in the job logs. Easy way to access the logs is
open the jobtracker web console - http://localhost:50030/jobtracker.jsp
click on the completed job
click on map or reduce task
click on tasknumber
Go to task logs
Check stdout logs.
Please note that if you are not able to locate URL, just look into the console log for jobtracker URL.

Hadoop ClassCastException for default value of InputFormat

I'm having a issue getting started with my first map-reduce code on Hadoop. I copied the following code from "Hadoop: The definitive guide" but I'm not able to run it on my single node Hadoop installation.
My Code snippet:
Job job = new Job();
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
public void map(LongWritable key, Text value, Context context)
public void reduce(Text key, Iterable<IntWritable> values,
Context context)
Implementations of map and reduce function are also picked from the book only. But when I try to execute this code, this is the error I get:
INFO mapred.JobClient: Task Id : attempt_201304021022_0016_m_000000_0, Status : FAILED
java.lang.ClassCastException: interface javax.xml.soap.Text
at java.lang.Class.asSubclass(Class.java:3027)
at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:774)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:959)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Answers to similar questions in the past (Hadoop type mismatch in key from map expected value Text received value LongWritable) helped me to figure out that InputFormatClass should match the input to the map function. So I also tried using job.setInputFormatClass(TextInputFormat.class); in my main method, but it also did not solve the issue. What could be the issue here?
Here is the implementation of the Mapper class
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(45) == '+') { // parseInt doesn't like leading plus signs
airTemperature = Integer.parseInt(line.substring(46, 50));
} else {
airTemperature = Integer.parseInt(line.substring(45, 50));
String quality = line.substring(50, 51);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
You auto imported the wrong import. Instead of import org.apache.hadoop.io.Text you imported import javax.xml.soap.Text
You can find a sample wrong import in this blog.
Looks like you have the wrong Text class imported (javax.xml.soap.Text). You want org.apache.hadoop.io.Text

Hadoop type mismatch in key from map expected value Text received value LongWritable

Anyone have any idea why I would be getting this error? I have looked at alot of other similar posts but most of them did not apply to me, I also tried the few solutions that were posted that did apply to me but they did not work, I'm sure I'm just missing something stupid, thanks for the help
chris#chrisUHadoop:/usr/local/hadoop-1.0.3/build$ hadoop MaxTemperature 1901 output4
12/07/03 17:23:08 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/07/03 17:23:08 INFO input.FileInputFormat: Total input paths to process : 1
12/07/03 17:23:08 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/07/03 17:23:08 WARN snappy.LoadSnappy: Snappy native library not loaded
12/07/03 17:23:09 INFO mapred.JobClient: Running job: job_201207031642_0005
12/07/03 17:23:10 INFO mapred.JobClient: map 0% reduce 0%
12/07/03 17:23:28 INFO mapred.JobClient: Task Id : attempt_201207031642_0005_m_000000_0, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1014)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
private static final int MISSING = 9999;
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
String line = value.toString();
String year = line.substring(15,19);
int airTemperature;
if (line.charAt(87) == '+')
airTemperature = Integer.parseInt(line.substring(88,92));
airTemperature = Integer.parseInt(line.substring(87,92));
String quality = line.substring(92,93);
if (airTemperature != MISSING && quality.matches("[01459]"))
context.write(new Text(year), new IntWritable(airTemperature));
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable>
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values)
maxValue = Math.max(maxValue, value.get());
context.write(key, new IntWritable(maxValue));
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxTemperature
public static void main(String[] args) throws Exception
if (args.length != 2)
System.out.println("Usage: MaxTemperature <input path> <output path>");
Job job = new Job();
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
You appear to be missing a number of configuration properties:
Mapper and Reducer classes? - if not defined, you'll be defaulted to the 'Identity' Mapper / Reducer
Your specific error message is because the identity mapper just outputs the same key / value types it was passed in, in this case probably a key of type LongWritable and value of type Text (as you haven't defined an Input format, the default is probably TextInputFormat). In your configuration you have defined the output key type as Text, but the mapper is outputting LongWritable, hence the error message.
You should set the following property in job.xml
<description>The full class name of the InputFormat class to be used for obtaining the input to the mapper.</description>

Not getting correct output when running standard "WordCount" program using Hadoop0.20.2

I'm new to Hadoop.I have been trying to run the famous "WordCount" program -- which counts the total number of words
in a list of files using Hadoop-0.20.2.
I'm using single node cluster.
Following is my program:
import java.io.File;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
context.write(word, one);
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
while (values.hasNext()) {
++sum ;
context.write(key, new IntWritable(sum));
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
Suppose input file is A.txt which has following contents
When I run this program using hadoop-0.20.2 (not showing commands for sake of clarity) ,the output that comes is
A 1
A 1
B 1
B !
C 1
C 1
D !
D 1
which is wrong.The actual output should be :
A 2
B 2
C 2
D 2
This "WordCount" program is pretty standard program. I'm not sure what is wrong with this code.
I have written the contents of all configuration files like mapred-site.xml , core-site.xml etc correctly.
How can I fix this problem?
This code actually runs a local mapreduce job. If you want to submit this to the real cluster, you have to provide the fs.default.name and the mapred.job.tracker configuration parameter. These keys are mapped to your machine with a host:port pair. Just like in your mapred/core-site.xml.
Make sure your data is available in HDFS and not on local disk, as well as your number of reducers should be reduced. That's about 2 records per reducer. You should set this to 1.
reduce signature is incorrect.
Second parameter is Iterable type and not Iterator
See also Using Hadoop for the First Time, MapReduce Job does not run Reduce Phase
