Error on map reduce example of Hadoop 2.2.0 - hadoop

I am new to hadoop and after installing Hadoop 2.2.0 I tried to follow example http://www.srccodes.com/p/article/45/run-hadoop-wordcount-mapreduce-example-windows to try a simple map reduce job.
However whenever I try to do the map reduce job over the txt file I created, I keep getting failures with this message
c:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.ja
r wordcount /input output
14/03/26 14:20:48 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0
:8032
14/03/26 14:20:50 INFO input.FileInputFormat: Total input paths to process : 1
14/03/26 14:20:51 INFO mapreduce.JobSubmitter: number of splits:1
14/03/26 14:20:51 INFO Configuration.deprecation: user.name is deprecated. Inste
ad, use mapreduce.job.user.name
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.jar is deprecated. Inst
ead, use mapreduce.job.jar
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.output.value.class is d
eprecated. Instead, use mapreduce.job.output.value.class
14/03/26 14:20:51 INFO Configuration.deprecation: mapreduce.combine.class is dep
recated. Instead, use mapreduce.job.combine.class
14/03/26 14:20:51 INFO Configuration.deprecation: mapreduce.map.class is depreca
ted. Instead, use mapreduce.job.map.class
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.job.name is deprecated.
Instead, use mapreduce.job.name
14/03/26 14:20:51 INFO Configuration.deprecation: mapreduce.reduce.class is depr
ecated. Instead, use mapreduce.job.reduce.class
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.input.dir is deprecated
. Instead, use mapreduce.input.fileinputformat.inputdir
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.output.dir is deprecate
d. Instead, use mapreduce.output.fileoutputformat.outputdir
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.map.tasks is deprecated
. Instead, use mapreduce.job.maps
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.output.key.class is dep
recated. Instead, use mapreduce.job.output.key.class
14/03/26 14:20:51 INFO Configuration.deprecation: mapred.working.dir is deprecat
ed. Instead, use mapreduce.job.working.dir
14/03/26 14:20:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_13
95833928952_0004
14/03/26 14:20:52 INFO impl.YarnClientImpl: Submitted application application_13
95833928952_0004 to ResourceManager at /0.0.0.0:8032
14/03/26 14:20:52 INFO mapreduce.Job: The url to track the job: http://GoncaloPe
reira:8088/proxy/application_1395833928952_0004/
14/03/26 14:20:52 INFO mapreduce.Job: Running job: job_1395833928952_0004
14/03/26 14:21:08 INFO mapreduce.Job: Job job_1395833928952_0004 running in uber
mode : false
14/03/26 14:21:08 INFO mapreduce.Job: map 0% reduce 0%
14/03/26 14:21:20 INFO mapreduce.Job: Task Id : attempt_1395833928952_0004_m_000
000_0, Status : FAILED
Error: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileS
plit cannot be cast to org.apache.hadoop.mapred.InputSplit
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/03/26 14:21:33 INFO mapreduce.Job: Task Id : attempt_1395833928952_0004_m_000
000_1, Status : FAILED
Error: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileS
plit cannot be cast to org.apache.hadoop.mapred.InputSplit
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/03/26 14:21:48 INFO mapreduce.Job: Task Id : attempt_1395833928952_0004_m_000
000_2, Status : FAILED
Error: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileS
plit cannot be cast to org.apache.hadoop.mapred.InputSplit
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/03/26 14:22:04 INFO mapreduce.Job: map 100% reduce 100%
14/03/26 14:22:10 INFO mapreduce.Job: Job job_1395833928952_0004 failed with sta
te FAILED due to: Task failed task_1395833928952_0004_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/03/26 14:22:10 INFO mapreduce.Job: Counters: 6
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=48786
Total time spent by all reduces in occupied slots (ms)=0
Since I followed everything with no issues step by step I have no idea why this might be, does anyone know?
Edit: Tried adopt 2.3.0 same issue happens with the example jar given, and the code bellow I tried compile, no idea what the issue is
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class teste {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setJarByClass(teste.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}

I had the same issue (java.lang.ClassCastException) and was able to solve it by running Hadoop with admin privileges. The problem seems to be the creation of symbolic links which by default is not possible for non-admin Windows users. Open a console as administrator and then proceed as described in the example from your link.

link you provided has input perameter as input NOT /input...try with this syntax...
C:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount input output
if this doesn't work than see this - Link and modify the mapper class.

Related

Hadoop - Mappers not emitting anything

I'm running the code below and no output is generated (well, the output folder and the reducer output file are created, but there is nothing wihtin the part-r-00000 file). From the logs, I suspect the mappers are not emitting anything.
The code:
package com.telefonica.iot.tidoop.mrlib;
import com.telefonica.iot.tidoop.mrlib.utils.Constants;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.log4j.Logger;
public class Count extends Configured implements Tool {
private static final Logger LOGGER = Logger.getLogger(Count.class);
public static class UnitEmitter extends Mapper<Object, Text, Text, LongWritable> {
private final Text commonKey = new Text("common-key");
#Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
context.write(commonKey, new LongWritable(1));
} // map
} // UnitEmitter
public static class Adder extends Reducer<Text, LongWritable, Text, LongWritable> {
#Override
public void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
long sum = 0;
for (LongWritable value : values) {
sum += value.get();
} // for
context.write(key, new LongWritable(sum));
} // reduce
} // Adder
public static class AdderWithTag extends Reducer<Text, LongWritable, Text, LongWritable> {
private String tag;
#Override
public void setup(Context context) throws IOException, InterruptedException {
tag = context.getConfiguration().get(Constants.PARAM_TAG, "");
} // setup
#Override
public void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
long sum = 0;
for (LongWritable value : values) {
sum += value.get();
} // for
context.write(new Text(tag), new LongWritable(sum));
} // reduce
} // AdderWithTag
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new Filter(), args);
System.exit(res);
} // main
#Override
public int run(String[] args) throws Exception {
// check the number of arguments, show the usage if it is wrong
if (args.length != 3) {
showUsage();
return -1;
} // if
// get the arguments
String input = args[0];
String output = args[1];
String tag = args[2];
// create and configure a MapReduce job
Configuration conf = this.getConf();
conf.set(Constants.PARAM_TAG, tag);
Job job = Job.getInstance(conf, "tidoop-mr-lib-count");
job.setNumReduceTasks(1);
job.setJarByClass(Count.class);
job.setMapperClass(UnitEmitter.class);
job.setCombinerClass(Adder.class);
job.setReducerClass(AdderWithTag.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(input));
FileOutputFormat.setOutputPath(job, new Path(output));
// run the MapReduce job
return job.waitForCompletion(true) ? 0 : 1;
} // main
private void showUsage() {
System.out.println("...");
} // showUsage
} // Count
The command executed, and the output logs:
$ hadoop jar target/tidoop-mr-lib-0.0.0-SNAPSHOT-jar-with-dependencies.jar com.telefonica.iot.tidoop.mrlib.Count -libjars target/tidoop-mr-lib-0.0.0-SNAPSHOT-jar-with-dependencies.jar tidoop/numbers tidoop/numbers_count onetag
15/11/05 17:24:52 INFO input.FileInputFormat: Total input paths to process : 1
15/11/05 17:24:52 WARN snappy.LoadSnappy: Snappy native library is available
15/11/05 17:24:53 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/11/05 17:24:53 INFO snappy.LoadSnappy: Snappy native library loaded
15/11/05 17:24:53 INFO mapred.JobClient: Running job: job_201507101501_23002
15/11/05 17:24:54 INFO mapred.JobClient: map 0% reduce 0%
15/11/05 17:25:00 INFO mapred.JobClient: map 100% reduce 0%
15/11/05 17:25:07 INFO mapred.JobClient: map 100% reduce 33%
15/11/05 17:25:08 INFO mapred.JobClient: map 100% reduce 100%
15/11/05 17:25:09 INFO mapred.JobClient: Job complete: job_201507101501_23002
15/11/05 17:25:09 INFO mapred.JobClient: Counters: 25
15/11/05 17:25:09 INFO mapred.JobClient: Job Counters
15/11/05 17:25:09 INFO mapred.JobClient: Launched reduce tasks=1
15/11/05 17:25:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5350
15/11/05 17:25:09 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/11/05 17:25:09 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/11/05 17:25:09 INFO mapred.JobClient: Rack-local map tasks=1
15/11/05 17:25:09 INFO mapred.JobClient: Launched map tasks=1
15/11/05 17:25:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8702
15/11/05 17:25:09 INFO mapred.JobClient: FileSystemCounters
15/11/05 17:25:09 INFO mapred.JobClient: FILE_BYTES_READ=6
15/11/05 17:25:09 INFO mapred.JobClient: HDFS_BYTES_READ=1968928
15/11/05 17:25:09 INFO mapred.JobClient: FILE_BYTES_WRITTEN=108226
15/11/05 17:25:09 INFO mapred.JobClient: Map-Reduce Framework
15/11/05 17:25:09 INFO mapred.JobClient: Map input records=598001
15/11/05 17:25:09 INFO mapred.JobClient: Reduce shuffle bytes=6
15/11/05 17:25:09 INFO mapred.JobClient: Spilled Records=0
15/11/05 17:25:09 INFO mapred.JobClient: Map output bytes=0
15/11/05 17:25:09 INFO mapred.JobClient: CPU time spent (ms)=2920
15/11/05 17:25:09 INFO mapred.JobClient: Total committed heap usage (bytes)=355663872
15/11/05 17:25:09 INFO mapred.JobClient: Combine input records=0
15/11/05 17:25:09 INFO mapred.JobClient: SPLIT_RAW_BYTES=124
15/11/05 17:25:09 INFO mapred.JobClient: Reduce input records=0
15/11/05 17:25:09 INFO mapred.JobClient: Reduce input groups=0
15/11/05 17:25:09 INFO mapred.JobClient: Combine output records=0
15/11/05 17:25:09 INFO mapred.JobClient: Physical memory (bytes) snapshot=328683520
15/11/05 17:25:09 INFO mapred.JobClient: Reduce output records=0
15/11/05 17:25:09 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1466642432
15/11/05 17:25:09 INFO mapred.JobClient: Map output records=0
The content of the output file:
$ hadoop fs -cat /user/frb/tidoop/numbers_count/part-r-00000
[frb#cosmosmaster-gi tidoop-mr-lib]$ hadoop fs -ls /user/frb/tidoop/numbers_count/
Found 3 items
-rw-r--r-- 3 frb frb 0 2015-11-05 17:25 /user/frb/tidoop/numbers_count/_SUCCESS
drwxr----- - frb frb 0 2015-11-05 17:24 /user/frb/tidoop/numbers_count/_logs
-rw-r--r-- 3 frb frb 0 2015-11-05 17:25 /user/frb/tidoop/numbers_count/part-r-00000
Any hints about what is happening?
Weird. I'd try using Mapper (identity mapper) with your job.
If the Mapper does not output anything there must be something weird with your hadoop installation, or job configuration.

Custom Partitioning gives ArrayIndexOuntOfBounds Error

When I run my code, I get the following exception:
hadoop#hadoop:~/testPrograms$ hadoop jar cp.jar CustomPartition /test/test.txt /test/output33
15/03/03 16:33:33 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/03/03 16:33:33 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/03/03 16:33:33 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/03/03 16:33:33 INFO input.FileInputFormat: Total input paths to process : 1
15/03/03 16:33:34 INFO mapreduce.JobSubmitter: number of splits:1
15/03/03 16:33:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1055584612_0001
15/03/03 16:33:35 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/03/03 16:33:35 INFO mapreduce.Job: Running job: job_local1055584612_0001
15/03/03 16:33:35 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/03/03 16:33:35 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
15/03/03 16:33:35 INFO mapred.LocalJobRunner: Waiting for map tasks
15/03/03 16:33:35 INFO mapred.LocalJobRunner: Starting task: attempt_local1055584612_0001_m_000000_0
15/03/03 16:33:35 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
15/03/03 16:33:35 INFO mapred.MapTask: Processing split: hdfs://node1/test/test.txt:0+107
15/03/03 16:33:35 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/03/03 16:33:35 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/03/03 16:33:35 INFO mapred.MapTask: soft limit at 83886080
15/03/03 16:33:35 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/03/03 16:33:35 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/03/03 16:33:35 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/03/03 16:33:35 INFO mapred.MapTask: Starting flush of map output
15/03/03 16:33:35 INFO mapred.LocalJobRunner: map task executor complete.
15/03/03 16:33:35 WARN mapred.LocalJobRunner: job_local1055584612_0001
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 2
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at CustomPartition$MapperClass.map(CustomPartition.java:27)
at CustomPartition$MapperClass.map(CustomPartition.java:17)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/03/03 16:33:36 INFO mapreduce.Job: Job job_local1055584612_0001 running in uber mode : false
15/03/03 16:33:36 INFO mapreduce.Job: map 0% reduce 0%
15/03/03 16:33:36 INFO mapreduce.Job: Job job_local1055584612_0001 failed with state FAILED due to: NA
15/03/03 16:33:36 INFO mapreduce.Job: Counters: 0
I am trying to partition based on the game the persons play. Each word is separated by a tab. And after the three fields, I got the next line by pressing the return key.
My code:
public class CustomPartition {
public static class MapperClass extends Mapper<Object, Text, Text, Text>{
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String itr[] = value.toString().split("\t");
String game=itr[2].toString();
String nameGoals=itr[0]+"\t"+itr[1];
context.write(new Text(game), new Text(nameGoals));
}
}
public static class GoalPartition extends Partitioner<Text, Text> {
#Override
public int getPartition(Text key,Text value, int numReduceTasks){
if(key.toString()=="football")
{return 0;}
else if(key.toString()=="basketball")
{return 1;}
else// (key.toString()=="icehockey")
{return 2;}
}
}
public static class ReducerClass extends Reducer<Text,Text,Text,Text> {
#Override
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
String name="";
String game="";
int maxGoals=0;
for (Text val : values)
{
String valTokens[]= val.toString().split("\t");
int goals = Integer.parseInt(valTokens[1]);
if(goals > maxGoals)
{
name = valTokens[0];
game = key.toString();
maxGoals = goals;
context.write(new Text(name), new Text ("game"+game+"score"+maxGoals));
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "custom partition");
job.setJarByClass(CustomPartition.class);
job.setMapperClass(MapperClass.class);
job.setCombinerClass(ReducerClass.class);
job.setPartitionerClass(GoalPartition.class);
job.setReducerClass(ReducerClass.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

hadoop yarn class not found in same jar but different package during run job

I'm following example of the book "Hadoop: The Definitive Guide 2/e".
I encounter a problem..:-(.
I used ubuntu 12.04, hadoop 2.2.0.
I made the job.jar using eclipse.
The class map_reduce.programming.v1.MaxTemperatureReducer is in jar but different package.
when i run job, i encounter class not found exception.
Below is mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
If change the value to local not yarn, it worked. but in case of yarn, not worked.
The HADOOP_CLASS_PATH include the path which including job.jar.
What is the root cause?
package map_reduce.programming.v3;
import map_reduce.programming.v1.MaxTemperatureReducer;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
/**
* hadoop map_reduce.programming.v3.MaxTemperatureDriver -conf conf/hadoop-local.xml /book/input/ncdc/micro max-temp
* hadoop jar job.jar map_reduce.programming.v3.MaxTemperatureDriver -conf conf/hadoop-cluster.xml /book/input/ncdc/all max-temp
*
*
*/
public class MaxTemperatureDriver extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.printf("Usage: %s [generic optins] <input> <output>\n", getClass().getSimpleName());
ToolRunner.printGenericCommandUsage(System.err);
return -1;
}
JobConf conf = new JobConf(getConf(), getClass());
conf.setJobName("Max temperature");
FileInputFormat.addInputPath(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(MaxTemperatureMapper.class);
conf.setCombinerClass(MaxTemperatureReducer.class);
conf.setReducerClass(MaxTemperatureReducer.class);
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args);
System.exit(exitCode);
}
}
Below is logs..
jar job.jar map_reduce.programming.v3.MaxTemperatureDriver -conf conf/hadoop-cluster.xml /book/input/ncdc/all max-temp0991
14/06/05 18:10:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/06/05 18:10:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/06/05 18:10:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/06/05 18:10:20 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
14/06/05 18:10:20 INFO mapred.FileInputFormat: Total input paths to process : 2
14/06/05 18:10:21 INFO mapreduce.JobSubmitter: number of splits:2
14/06/05 18:10:21 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/06/05 18:10:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401958773644_0002
14/06/05 18:10:22 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/06/05 18:10:22 INFO impl.YarnClientImpl: Submitted application
application_1401958773644_0002 to ResourceManager at /0.0.0.0:8032
14/06/05 18:10:22 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1401958773644_0002/
14/06/05 18:10:22 INFO mapreduce.Job: Running job: job_1401958773644_0002
14/06/05 18:10:27 INFO mapreduce.Job: Job job_1401958773644_0002 running in uber mode : false
14/06/05 18:10:27 INFO mapreduce.Job: map 0% reduce 0%
14/06/05 18:10:30 INFO mapreduce.Job: Task Id : attempt_1401958773644_0002_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class map_reduce.programming.v1.MaxTemperatureReducer not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1752)
at org.apache.hadoop.mapred.JobConf.getCombinerClass(JobConf.java:1139)
at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1517)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1010)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
I had the exact same problem, with the Job configuration not finding the Mapper and Reducer classes and throwing a ClassNotFoundException.
I'm using mapreduce2 so I had to add
job.setJarByClass(Avg.class);
while in your case, I think you should call
JobConf conf = new JobConf(getConf(), YouClassMain.class);
Best,
Edoardo

Accessing Hive Table Data with MapReduce

In single node installation of Hadoop 2.2, I am trying to run Cloudera example "Accessing Table Data with MapReduce" that copies data from one table to another:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_19_6.html
Example code compiles with numerous deprecation warnings (see below).
Before running this example from Eclipse, I create input table 'simple' in Hive default DB. I pass input 'simple' and output 'simpid' tables on a command line. Notwithstanding input table already exists in default DB, when I run this code I get exception:
java.io.IOException: NoSuchObjectException(message:default.simple table not found
Questions:
1) Why does "table not found" exception happen? How to solve this?
2) How does deprecated HCatRecord, HCatSchema, HCatBaseInputFormat in this example translate to a latest, stable API?
package com.bigdata;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hcatalog.mapreduce.*;
import org.apache.hcatalog.data.*;
import org.apache.hcatalog.data.schema.*;
public class UseHCat extends Configured implements Tool {
public static class Map extends Mapper<WritableComparable, HCatRecord, Text, IntWritable> {
String groupname;
#Override
protected void map( WritableComparable key,
HCatRecord value,
org.apache.hadoop.mapreduce.Mapper<WritableComparable, HCatRecord,
Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// The group table from /etc/group has name, 'x', id
groupname = (String) value.get(0);
int id = (Integer) value.get(1);
// Just select and emit the name and ID
context.write(new Text(groupname), new IntWritable(id));
}
}
public static class Reduce extends Reducer<Text, IntWritable,
WritableComparable, HCatRecord> {
protected void reduce( Text key,
java.lang.Iterable<IntWritable> values,
org.apache.hadoop.mapreduce.Reducer<Text, IntWritable,
WritableComparable, HCatRecord>.Context context)
throws IOException, InterruptedException {
// Only expecting one ID per group name
Iterator<IntWritable> iter = values.iterator();
IntWritable iw = iter.next();
int id = iw.get();
// Emit the group name and ID as a record
HCatRecord record = new DefaultHCatRecord(2);
record.set(0, key.toString());
record.set(1, id);
context.write(null, record);
}
}
public int run(String[] args) throws Exception {
Configuration conf = getConf();
args = new GenericOptionsParser(conf, args).getRemainingArgs();
// Get the input and output table names as arguments
String inputTableName = args[0];
String outputTableName = args[1];
// Assume the default database
String dbName = null;
Job job = new Job(conf, "UseHCat");
HCatInputFormat.setInput(job, InputJobInfo.create(dbName,
inputTableName, null));
job.setJarByClass(UseHCat.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
// An HCatalog record as input
job.setInputFormatClass(HCatInputFormat.class);
// Mapper emits a string as key and an integer as value
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// Ignore the key for the reducer output; emitting an HCatalog record as value
job.setOutputKeyClass(WritableComparable.class);
job.setOutputValueClass(DefaultHCatRecord.class);
job.setOutputFormatClass(HCatOutputFormat.class);
HCatOutputFormat.setOutput(job, OutputJobInfo.create(dbName,
outputTableName, null));
HCatSchema s = HCatOutputFormat.getTableSchema(job);
System.err.println("INFO: output schema explicitly set for writing:" + s);
HCatOutputFormat.setSchema(job, s);
return (job.waitForCompletion(true) ? 0 : 1);
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new UseHCat(), args);
System.exit(exitCode);
}
}
When I run this on a single-node Hadoop 2.2 I get the following exception:
14/03/05 15:17:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/03/05 15:17:21 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/03/05 15:17:22 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
14/03/05 15:17:22 INFO metastore.ObjectStore: ObjectStore, initialize called
14/03/05 15:17:23 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
14/03/05 15:17:24 WARN bonecp.BoneCPConfig: Max Connections < 1. Setting to 20
14/03/05 15:17:25 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
14/03/05 15:17:25 INFO metastore.ObjectStore: Initialized ObjectStore
14/03/05 15:17:27 WARN bonecp.BoneCPConfig: Max Connections < 1. Setting to 20
14/03/05 15:17:27 INFO metastore.HiveMetaStore: 0: get_database: NonExistentDatabaseUsedForHealthCheck
14/03/05 15:17:27 INFO HiveMetaStore.audit: ugi=dk ip=unknown-ip-addr cmd=get_database: NonExistentDatabaseUsedForHealthCheck
14/03/05 15:17:27 ERROR metastore.RetryingHMSHandler: NoSuchObjectException(message:There is no database named nonexistentdatabaseusedforhealthcheck)
at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
at com.sun.proxy.$Proxy6.getDatabase(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
at com.sun.proxy.$Proxy7.get_database(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
at org.apache.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.isOpen(HiveClientCache.java:277)
at org.apache.hcatalog.common.HiveClientCache.get(HiveClientCache.java:147)
at org.apache.hcatalog.common.HCatUtil.getHiveClient(HCatUtil.java:547)
at org.apache.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:104)
at org.apache.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:87)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:56)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:48)
at com.bigdata.UseHCat.run(UseHCat.java:64)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.bigdata.UseHCat.main(UseHCat.java:91)
14/03/05 15:17:27 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=simple
14/03/05 15:17:27 INFO HiveMetaStore.audit: ugi=dk ip=unknown-ip-addr cmd=get_table : db=default tbl=simple
14/03/05 15:17:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
14/03/05 15:17:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
Exception in thread "main" java.io.IOException: NoSuchObjectException(message:default.simple table not found)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:89)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:56)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:48)
at com.bigdata.UseHCat.run(UseHCat.java:64)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.bigdata.UseHCat.main(UseHCat.java:91)
Caused by: NoSuchObjectException(message:default.simple table not found)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1373)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
at com.sun.proxy.$Proxy7.get_table(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:854)
at org.apache.hcatalog.common.HCatUtil.getTable(HCatUtil.java:193)
at org.apache.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105)
at org.apache.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:87)
... 6 more
14/03/05 15:17:29 INFO metastore.HiveMetaStore: 1: Shutting down the object store...
14/03/05 15:17:29 INFO HiveMetaStore.audit: ugi=dk ip=unknown-ip-addr cmd=Shutting down the object store...
14/03/05 15:17:29 INFO metastore.HiveMetaStore: 1: Metastore shutdown complete.
14/03/05 15:17:29 INFO HiveMetaStore.audit: ugi=dk ip=unknown-ip-addr cmd=Metastore shutdown complete.
It looks like you don't have hive-site.xml in your eclipse classpath. Hive looks for the metastore server address in your configuration. When it can't find the address, it creates or loads an embedded metastore. This metastore, of course, does not have the table you created.
Edit (In answer to your comment):
Yes you can get values by key, but in order to do that you need the HCatSchema for the table. Do this in the map setup phase...
HCatSchema schema = HCatBaseInputFormat.getTableSchema(context.getConfiguration);
And in the map phase...
value.get('field', schema);

Hadoop Word Count working but not summing words up

I'm using Hadoop 1.2.1 and for some reason my Word Count output looks strange:
input file:
this is sparta this was sparta hello world goodbye world
output in hdfs:
goodbye 1
hello 1
is 1
sparta 1
sparta 1
this 1
this 1
was 1
world 1
world 1
code:
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
And here's some relevant console output:
14/01/04 16:17:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/01/04 16:17:37 INFO input.FileInputFormat: Total input paths to process : 1
14/01/04 16:17:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/01/04 16:17:37 WARN snappy.LoadSnappy: Snappy native library not loaded
14/01/04 16:17:38 INFO mapred.JobClient: Running job: job_201401041506_0013
14/01/04 16:17:39 INFO mapred.JobClient: map 0% reduce 0%
14/01/04 16:17:45 INFO mapred.JobClient: map 100% reduce 0%
14/01/04 16:17:52 INFO mapred.JobClient: map 100% reduce 33%
14/01/04 16:17:54 INFO mapred.JobClient: map 100% reduce 100%
14/01/04 16:17:55 INFO mapred.JobClient: Job complete: job_201401041506_0013
14/01/04 16:17:55 INFO mapred.JobClient: Counters: 26
14/01/04 16:17:55 INFO mapred.JobClient: Job Counters
14/01/04 16:17:55 INFO mapred.JobClient: Launched reduce tasks=1
14/01/04 16:17:55 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6007
14/01/04 16:17:55 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/01/04 16:17:55 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/01/04 16:17:55 INFO mapred.JobClient: Launched map tasks=1
14/01/04 16:17:55 INFO mapred.JobClient: Data-local map tasks=1
14/01/04 16:17:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9167
14/01/04 16:17:55 INFO mapred.JobClient: File Output Format Counters
14/01/04 16:17:55 INFO mapred.JobClient: Bytes Written=77
14/01/04 16:17:55 INFO mapred.JobClient: FileSystemCounters
14/01/04 16:17:55 INFO mapred.JobClient: FILE_BYTES_READ=123
14/01/04 16:17:55 INFO mapred.JobClient: HDFS_BYTES_READ=169
14/01/04 16:17:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=122037
14/01/04 16:17:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=77
14/01/04 16:17:55 INFO mapred.JobClient: File Input Format Counters
14/01/04 16:17:55 INFO mapred.JobClient: Bytes Read=57
14/01/04 16:17:55 INFO mapred.JobClient: Map-Reduce Framework
14/01/04 16:17:55 INFO mapred.JobClient: Map output materialized bytes=123
14/01/04 16:17:55 INFO mapred.JobClient: Map input records=10
14/01/04 16:17:55 INFO mapred.JobClient: Reduce shuffle bytes=123
14/01/04 16:17:55 INFO mapred.JobClient: Spilled Records=20
14/01/04 16:17:55 INFO mapred.JobClient: Map output bytes=97
14/01/04 16:17:55 INFO mapred.JobClient: Total committed heap usage (bytes)=269619200
14/01/04 16:17:55 INFO mapred.JobClient: Combine input records=0
14/01/04 16:17:55 INFO mapred.JobClient: SPLIT_RAW_BYTES=112
14/01/04 16:17:55 INFO mapred.JobClient: Reduce input records=10
14/01/04 16:17:55 INFO mapred.JobClient: Reduce input groups=7
14/01/04 16:17:55 INFO mapred.JobClient: Combine output records=0
14/01/04 16:17:55 INFO mapred.JobClient: Reduce output records=10
14/01/04 16:17:55 INFO mapred.JobClient: Map output records=10
What can cause this? I'm very new to Hadoop, so i'm not sure where to look.
Thanks!
You're using an old API signature. In 1.x+ the reduce method changed to use iterables instead of iterator (which was what the old 0.x API used, so you will see iterator in many examples in books and on the web).
http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/Reducer.html#reduce%28KEYIN,%20java.lang.Iterable,%20org.apache.hadoop.mapreduce.Reducer.Context%29
Try
#Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
The #Override annotation tells your compiler to check that your reduce method is overriding the correct method signature in the parent class.

Resources