ArrayIndexOutOfBoundException in Reducer function of mapreduce

ArrayIndexOutOfBoundException in Reducer function of mapreduce - hadoop

I can not understand what the bug is, when I removed the
job.setSortComparatorClass(LongWritable.DecreasingComparator.class);
I got the output but when I tried to use it I'm getting this exception.
Im trying to get the output in decreasing order from the reducer based on the value, hence I have used setsortcomparator class, so please help me out
package topten.mostviewed.movies;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MostViewdReducer extends Reducer<Text,IntWritable,Text,LongWritable>
{
public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException,InterruptedException
{
int sum = 0;
for(IntWritable value:values)
{
sum = sum+1;
}
context.write(key, new LongWritable(sum));
}
}
package topten.mostviewed.movies;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.RawComparator;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class MostViewdDriver
{
// #SuppressWarnings("unchecked")
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2)
{
System.err.println("Usage: movie <input> <out>");
System.exit(2);
}
Job job = new Job(conf, "Movie ");
job.setJarByClass(MostViewdDriver.class);
job.setMapperClass(MostviewdMapper.class);
job.setReducerClass(MostViewdReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setSortComparatorClass(LongWritable.DecreasingComparator.class);
// job.setSortComparatorClass((Class<? extends RawComparator>) LongWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
The exception i'm getting is as below:
18/10/11 11:35:05 INFO mapreduce.Job: Task Id : attempt_1539236679371_0004_r_000000_2, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 7
at org.apache.hadoop.io.WritableComparator.readInt(WritableComparator.java:212)
at org.apache.hadoop.io.WritableComparator.readLong(WritableComparator.java:226)
at org.apache.hadoop.io.LongWritable$Comparator.compare(LongWritable.java:91)
at org.apache.hadoop.io.LongWritable$DecreasingComparator.compare(LongWritable.java:106)
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:158)
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:307)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Your map output keys are ints, but you tried to use comparator intended for longs. Replace LongWritable.DecreasingComparator.class with IntWritable.DecreasingComparator.class.

Related

Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable; how to correct?

I am experiencing the error in the title and I cannot understand how to complete this Map function in order to create the following Reduce function. Any clue? (Data devoted to the processing come from the Covid-19 .csv file which keeps track of every new case in each country, region and so on.)
The request that has been assigned to me is the following: "Give back the total number of cases per continent", so I thought to create a key-value pair composed by (key = continent, value = new_cases).
Here, I state the code in the order "Map class", "Driver Class":
package ContTotCase;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapreduce.*;
public class ContTotCasesMap extends Mapper<Text, LongWritable, Text, LongWritable>
{
public void Map(Text key, LongWritable value, OutputCollector<Text, LongWritable> output) throws IOException, InterruptedException
{
String[] query1 = (value.toString()).split(",");
String parseBase = query1[5];
int parsed = Integer.parseInt(parseBase.trim());
if(parsed > 0)
{
output.collect(new Text(query1[1]), new LongWritable(parsed));
}
}
}
and
package ContTotCase;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class ContTotCaseDriver {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: ContTotCases <InPath> <OutPath>");
System.exit(2);
}
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "ContTotCases");
job.setJarByClass(ContTotCasesMap.class);
job.setMapperClass(ContTotCasesMap.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Please note that I'm a newbie in Java... Thanks in advance!

Hadoop, MapReduce Custom Java Counters Exception in thread "main" java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING

Error is:
Exception in thread "main" java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:294)
at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:762)
at com.aamend.hadoop.MapReduce.CountryIncomeConf.main(CountryIncomeConf.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
The error shows that the problem lies in the line:
Counter counter =
job.getCounters().findCounter(COUNTERS.MISSING_FIELDS_RECORD_COUNT);
Also I do have a enum with the name COUNTERS.
Mapper :
import java.io.IOException;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.log4j.Logger;
public class CountryIncomeMapper extends Mapper<Object, Text, Text, DoubleWritable> {
private Logger logger = Logger.getLogger("FilterMapper");
private final int incomeIndex = 54;
private final int countryIndex = 0;
private final int lenIndex = 58;
String seperator = ",";
public void map(Object key, Text line, Context context) throws IOException,
InterruptedException {
if (line == null) {
logger.info("null found.");
context.getCounter(COUNTERS.ERROR_COUNT).increment(1);
return;
}
if (line.toString().contains(
"Adjusted net national income per capita (current US$)")) {
String[] recordSplits = line.toString().split(seperator);
logger.info("The data has been splitted.");
if (recordSplits.length == lenIndex) {
String countryName = recordSplits[countryIndex];
try {
double income = Double.parseDouble(recordSplits[incomeIndex]);
context.write(new Text(countryName), new DoubleWritable(income));
} catch (NumberFormatException nfe) {
logger.info("The value of income is in wrong format." + countryName);
context.getCounter(COUNTERS.MISSING_FIELDS_RECORD_COUNT).increment(1);
return;
}
}
}
}
}
Driver Class :
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.mapred.Counters;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class CountryIncomeConf {
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Path inputPath = new Path(args[0]);
Path outputDir = new Path(args[1]);
// Create configuration
Configuration conf = new Configuration(true);
// Create job
Job job = new Job(conf, "CountryIncomeConf");
job.setJarByClass(CountryIncomeConf.class);
Counter counter =
job.getCounters().findCounter(COUNTERS.MISSING_FIELDS_RECORD_COUNT);
System.out.println("Error Counter = " + counter.getValue());
// Setup MapReduce
job.setMapperClass(CountryIncomeMapper.class);
job.setNumReduceTasks(1);
// Specify key / value
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
// Input
FileInputFormat.addInputPath(job, inputPath);
job.setInputFormatClass(TextInputFormat.class);
// Output
FileOutputFormat.setOutputPath(job, outputDir);
job.setOutputFormatClass(TextOutputFormat.class);
// Delete output if exists
FileSystem hdfs = FileSystem.get(conf);
if (hdfs.exists(outputDir))
hdfs.delete(outputDir, true);
// Execute job
int code = job.waitForCompletion(true) ? 0 : 1;
System.exit(code);
}
}

Looks like you're trying to get the counter before you submitted the job.

I had the same error at the time of an sqoop export.
The error was generated because the hdfs directory was empty.
Once I populated the directory (corresponding to a hive table), the sqoop ran without problems.

Reducer is getting skipped and yields only Mapper Result

I have seen some related posts, but in my case all the thing is fine but , while i am running my Map reduce job, The Reducer part is getting Skipped.I am struggling a lot to get rid of this problem. Any one please figure out the problem and help in getting the job to run.
Map Task:
package org.netflix.rating;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class NetflixAvgRatingMap extends Mapper<LongWritable,Text,LongWritable,IntWritable>{
public void map(LongWritable key,Text value,Context context){
String NetflixEntrypattern="^(\\s*)([0-9]+)(\\s+)([0-9]+)(\\s+)(\\d{1})(\\s+)(.*)";
Pattern p = Pattern.compile(NetflixEntrypattern);
Matcher matcher = p.matcher(value.toString());
if (!matcher.matches()) {
return;
}
Long movie_id=Long.parseLong(matcher.group(4));
int rating=Integer.parseInt(matcher.group(6));
try {
context.write(new LongWritable(movie_id),new IntWritable(rating));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Reduce Task:
package org.netflix.rating;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Reducer;
public class NetflixAvgRatingReducer extends Reducer<LongWritable,IntWritable,LongWritable,FloatWritable> {
public void reduce(LongWritable key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{
int sum=0,count=0;
while(values.iterator().hasNext()){
sum+=values.iterator().next().get();
count++;
}
float avg=(float)sum/count;
context.write(key,new FloatWritable(avg));
}
}
Driver Class:
package org.netflix.rating;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
public class NetflixAvgRating {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
// Create job
Job job = new Job(conf, "NetflixAvgRating");
job.setJarByClass(NetflixAvgRating.class);
// Setup MapReduce job
job.setMapperClass(NetflixAvgRatingMap.class);
job.setReducerClass(NetflixAvgRatingReducer.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(FloatWritable.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(IntWritable.class);
// Input
FileInputFormat.addInputPath(job, new Path(args[0]));
//job.setInputFormatClass(TextInputFormat.class);
// Output
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//job.setOutputFormatClass(TextOutputFormat.class);
// Execute job
int code = job.waitForCompletion(true) ? 0 : 1;
System.exit(code);
}
}
I have set all the Configuration and arguments correctly, but My Reduce Task is not getting Executed any More. Any Suggestions would be appreciated.

I am trying to add up all the numbers in a file which contains numbers separated by space and are contained in multiple lines using MapReduce

My output is coming wrong. The Input File is:
1 2 3 4
5 4 3 2
Output should be key: sum value: 24
Output produced by MapReduce: key: sum value: 34
I am using OpenJDK 7 in Ubuntu 14.04 to run the jar file whereas, the jar file was created in Eclipse Juna and the java version used was Oracle JDK 7 to compile it.
NumberDriver.java
package numbersum;
import java.io.*;
//import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
//import org.apache.hadoop.mapreduce.Mapper;
//import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class NumberDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
// TODO Auto-generated method stub
Configuration conf=new Configuration();
String[] otherArgs=new GenericOptionsParser(conf,args).getRemainingArgs();
if(otherArgs.length!=2)
{
System.err.println("Error");
System.exit(2);
}
Job job=new Job(conf, "number sum");
job.setJarByClass(NumberDriver.class);
job.setMapperClass(NumberMapper.class);
job.setReducerClass(NumberReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
NumberMapper.java
package numbersum;
import java.io.*;
import java.util.StringTokenizer;
//import org.apache.hadoop.conf.Configuration;
//import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
//import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
//import org.apache.hadoop.mapreduce.Reducer;
//import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
//import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
//import org.apache.hadoop.util.GenericOptionsParser;
//import org.hsqldb.Tokenizer;
public class NumberMapper extends Mapper <LongWritable, Text, Text, IntWritable>
{
int sum;
public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException
{
StringTokenizer itr=new StringTokenizer(value.toString());
while(itr.hasMoreTokens())
{
sum+=Integer.parseInt(itr.nextToken());
}
context.write(new Text("sum"),new IntWritable(sum));
}
}
NumberReducer.java
package numbersum;
import java.io.*;
//import java.util.StringTokenizer;
//import org.apache.hadoop.conf.Configuration;
//import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
//import org.apache.hadoop.mapreduce.Job;
//import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
//import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
//import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
//import org.apache.hadoop.util.GenericOptionsParser;
public class NumberReducer extends Reducer <Text, IntWritable, Text, IntWritable>
{
public void reduce(Text key,Iterable<IntWritable> values, Context context)throws IOException, InterruptedException
{
int sum=0;
for(IntWritable value:values)
{
sum+=value.get();
}
context.write(key,new IntWritable(sum));
}
}

My best guess:
int sum; // <-- Why a class member?
public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException
{
int sum = 0; //Why not here?
StringTokenizer itr=new StringTokenizer(value.toString());
Reasoning for the guess:
1st map: 1 + 2 + 3 + 4 = 10
2nd map:(10 +) 2 + 3 + 4 + 5 = 34
..meaning, the previous value is being retained.

I think you forget to set sum to 0 at the begining of map function:
public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException
{
sum = 0;
...

Mapper not invoked while using multipleInputFormat

I have a driver class which is using MultipleInputFormat class to invoke different mappers at runtime.
However when I use the MultipleInputs.addInputPath(job, fStatus.getPath(), TextInputFormat.class,CreatePureDeltaMapperOne.class) in the first for loop, my first mapper(CreatePureDeltaMapperOne) is not getting invoked. When I comment the block of code which invokes the multiple input format from the first for loop, and call it from outside, the mapper class is invoked. Please help me find the issue.
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URISyntaxException;
import java.util.Properties;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
/***
* Creates the pure delta file by matching the history records present in HDFS
* #author Debajit
*
*/
public class CreatePureDeltaDriver {
/**
* #param args
* #throws URISyntaxException
*/
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException, URISyntaxException {
String historyFileInputPath="";
String deltaFileDirectoryPath="";
String pureDeltaFileOutPath="";
Configuration config= new Configuration();
Job job = new Job(config, "Pure Delta File Creation");
job.setJarByClass(CreatePureDeltaDriver.class);
Path historyDirPath= new Path(historyFileInputPath);
FileSystem fs = FileSystem.get(config);
FileStatus[] statusHistory = fs.listStatus(historyDirPath);
for (FileStatus fStatus : statusHistory) {
String historyFileName=fStatus.getPath().getName();
if(historyFileName.contains("part-r")){
MultipleInputs.addInputPath(job, fStatus.getPath(), TextInputFormat.class,CreatePureDeltaMapperOne.class);
}
}
Path deltaDirPath= new Path(deltaFileDirectoryPath);
FileStatus[] statusDelta = fs.listStatus(deltaDirPath);
for (FileStatus fStatus : statusDelta) {
String deltaFileName=fStatus.getPath().getName();
if(deltaFileName.startsWith("part-r")){
MultipleInputs.addInputPath(job, fStatus.getPath(), TextInputFormat.class, CreatePureDeltaMapperTwo.class);
}
}
job.setMapperClass(CreatePureDeltaMapperOne.class);
job.setMapperClass(CreatePureDeltaMapperTwo.class);
job.setReducerClass(CreatePureDeltaReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path hisInPath = new Path(historyFileInputPath);
Path outPath = new Path(pureDeltaFileOutPath);
//MultipleInputs.addInputPath(job, hisInPath, TextInputFormat.class, CreatePureDeltaMapperOne.class);
//MultipleInputs.addInputPath(job, delPath, TextInputFormat.class, CreatePureDeltaMapperTwo.class);
FileOutputFormat.setOutputPath(job, outPath);
System.out.println(job.waitForCompletion(true));
}
}
MY MAPPER CLASS
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Properties;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
public class CreatePureDeltaMapperOne extends Mapper<LongWritable, Text, Text, Text> {
private Text outKey = new Text();
private Text outValue = new Text();
int counter=0;
private String delimiter="";
private int primaryKeyIndicator =0;
private Integer numMapNodes = null;
public void setup(Context context) throws IOException{
System.out.println("SETUP--- Mapper 1");
Configuration config = context.getConfiguration();
Properties properties = new Properties();
String propertyDirectory = config.get("propertyDirectory");
String propertyFileName =config.get("propertyFileName");
Path propertyDirPath= new Path(propertyDirectory);
FileSystem fs = FileSystem.get(config);
FileStatus[] status = fs.listStatus(propertyDirPath);
for (FileStatus fStatus : status) {
String propFileName=fStatus.getPath().getName().trim();
if(propFileName.equals(propertyFileName)){
properties.load(new InputStreamReader(fs.open(fStatus.getPath())));
this.setNumMapNodes(Integer.parseInt(properties.getProperty("num.of.nodes").trim()));
this.setDelimiter(properties.getProperty("file.delimiter.type").trim());
this.setPrimaryKeyIndicator(Integer.parseInt(properties.getProperty("file.primary.key.index.specifier").trim()));
}
}
}
public void map(LongWritable key, Text val, Context context) throws IOException, InterruptedException{
String valueString = val.toString().trim();
String[] tokens = valueString.split(this.getDelimiter());
String temp=tokens[this.getPrimaryKeyIndicator()].toString();
System.out.println(" MAPPER 1 invoked");
this.setOutKey(new Text(tokens[this.getPrimaryKeyIndicator()].toString().trim()));//Account number
this.setOutValue(new Text("h"+valueString.trim()));
context.write(outKey,outValue );
}
}

Do not use these two lines in your code :
job.setMapperClass(CreatePureDeltaMapperOne.class);
job.setMapperClass(CreatePureDeltaMapperTwo.class);
Because you are already passing name of corresponding class in the loop.
Hope it helps..

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ArrayIndexOutOfBoundException in Reducer function of mapreduce - hadoop

Your map output keys are ints, but you tried to use comparator intended for longs. Replace LongWritable.DecreasingComparator.class with IntWritable.DecreasingComparator.class.

Related

Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable; how to correct?

Hadoop, MapReduce Custom Java Counters Exception in thread "main" java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING

Reducer is getting skipped and yields only Mapper Result

I am trying to add up all the numbers in a file which contains numbers separated by space and are contained in multiple lines using MapReduce

Mapper not invoked while using multipleInputFormat

Categories

Resources