I have seen some related posts, but in my case all the thing is fine but , while i am running my Map reduce job, The Reducer part is getting Skipped.I am struggling a lot to get rid of this problem. Any one please figure out the problem and help in getting the job to run.
Map Task:
package org.netflix.rating;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class NetflixAvgRatingMap extends Mapper<LongWritable,Text,LongWritable,IntWritable>{
public void map(LongWritable key,Text value,Context context){
String NetflixEntrypattern="^(\\s*)([0-9]+)(\\s+)([0-9]+)(\\s+)(\\d{1})(\\s+)(.*)";
Pattern p = Pattern.compile(NetflixEntrypattern);
Matcher matcher = p.matcher(value.toString());
if (!matcher.matches()) {
return;
}
Long movie_id=Long.parseLong(matcher.group(4));
int rating=Integer.parseInt(matcher.group(6));
try {
context.write(new LongWritable(movie_id),new IntWritable(rating));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Reduce Task:
package org.netflix.rating;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Reducer;
public class NetflixAvgRatingReducer extends Reducer<LongWritable,IntWritable,LongWritable,FloatWritable> {
public void reduce(LongWritable key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{
int sum=0,count=0;
while(values.iterator().hasNext()){
sum+=values.iterator().next().get();
count++;
}
float avg=(float)sum/count;
context.write(key,new FloatWritable(avg));
}
}
Driver Class:
package org.netflix.rating;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
public class NetflixAvgRating {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
// Create job
Job job = new Job(conf, "NetflixAvgRating");
job.setJarByClass(NetflixAvgRating.class);
// Setup MapReduce job
job.setMapperClass(NetflixAvgRatingMap.class);
job.setReducerClass(NetflixAvgRatingReducer.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(FloatWritable.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(IntWritable.class);
// Input
FileInputFormat.addInputPath(job, new Path(args[0]));
//job.setInputFormatClass(TextInputFormat.class);
// Output
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//job.setOutputFormatClass(TextOutputFormat.class);
// Execute job
int code = job.waitForCompletion(true) ? 0 : 1;
System.exit(code);
}
}
I have set all the Configuration and arguments correctly, but My Reduce Task is not getting Executed any More. Any Suggestions would be appreciated.
Related
I am experiencing the error in the title and I cannot understand how to complete this Map function in order to create the following Reduce function. Any clue? (Data devoted to the processing come from the Covid-19 .csv file which keeps track of every new case in each country, region and so on.)
The request that has been assigned to me is the following: "Give back the total number of cases per continent", so I thought to create a key-value pair composed by (key = continent, value = new_cases).
Here, I state the code in the order "Map class", "Driver Class":
package ContTotCase;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapreduce.*;
public class ContTotCasesMap extends Mapper<Text, LongWritable, Text, LongWritable>
{
public void Map(Text key, LongWritable value, OutputCollector<Text, LongWritable> output) throws IOException, InterruptedException
{
String[] query1 = (value.toString()).split(",");
String parseBase = query1[5];
int parsed = Integer.parseInt(parseBase.trim());
if(parsed > 0)
{
output.collect(new Text(query1[1]), new LongWritable(parsed));
}
}
}
and
package ContTotCase;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class ContTotCaseDriver {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: ContTotCases <InPath> <OutPath>");
System.exit(2);
}
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "ContTotCases");
job.setJarByClass(ContTotCasesMap.class);
job.setMapperClass(ContTotCasesMap.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Please note that I'm a newbie in Java... Thanks in advance!
I can not understand what the bug is, when I removed the
job.setSortComparatorClass(LongWritable.DecreasingComparator.class);
I got the output but when I tried to use it I'm getting this exception.
Im trying to get the output in decreasing order from the reducer based on the value, hence I have used setsortcomparator class, so please help me out
package topten.mostviewed.movies;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MostViewdReducer extends Reducer<Text,IntWritable,Text,LongWritable>
{
public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException,InterruptedException
{
int sum = 0;
for(IntWritable value:values)
{
sum = sum+1;
}
context.write(key, new LongWritable(sum));
}
}
package topten.mostviewed.movies;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.RawComparator;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class MostViewdDriver
{
// #SuppressWarnings("unchecked")
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2)
{
System.err.println("Usage: movie <input> <out>");
System.exit(2);
}
Job job = new Job(conf, "Movie ");
job.setJarByClass(MostViewdDriver.class);
job.setMapperClass(MostviewdMapper.class);
job.setReducerClass(MostViewdReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setSortComparatorClass(LongWritable.DecreasingComparator.class);
// job.setSortComparatorClass((Class<? extends RawComparator>) LongWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
The exception i'm getting is as below:
18/10/11 11:35:05 INFO mapreduce.Job: Task Id : attempt_1539236679371_0004_r_000000_2, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 7
at org.apache.hadoop.io.WritableComparator.readInt(WritableComparator.java:212)
at org.apache.hadoop.io.WritableComparator.readLong(WritableComparator.java:226)
at org.apache.hadoop.io.LongWritable$Comparator.compare(LongWritable.java:91)
at org.apache.hadoop.io.LongWritable$DecreasingComparator.compare(LongWritable.java:106)
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:158)
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:307)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Your map output keys are ints, but you tried to use comparator intended for longs. Replace LongWritable.DecreasingComparator.class with IntWritable.DecreasingComparator.class.
Error is:
Exception in thread "main" java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:294)
at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:762)
at com.aamend.hadoop.MapReduce.CountryIncomeConf.main(CountryIncomeConf.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
The error shows that the problem lies in the line:
Counter counter =
job.getCounters().findCounter(COUNTERS.MISSING_FIELDS_RECORD_COUNT);
Also I do have a enum with the name COUNTERS.
Mapper :
import java.io.IOException;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.log4j.Logger;
public class CountryIncomeMapper extends Mapper<Object, Text, Text, DoubleWritable> {
private Logger logger = Logger.getLogger("FilterMapper");
private final int incomeIndex = 54;
private final int countryIndex = 0;
private final int lenIndex = 58;
String seperator = ",";
public void map(Object key, Text line, Context context) throws IOException,
InterruptedException {
if (line == null) {
logger.info("null found.");
context.getCounter(COUNTERS.ERROR_COUNT).increment(1);
return;
}
if (line.toString().contains(
"Adjusted net national income per capita (current US$)")) {
String[] recordSplits = line.toString().split(seperator);
logger.info("The data has been splitted.");
if (recordSplits.length == lenIndex) {
String countryName = recordSplits[countryIndex];
try {
double income = Double.parseDouble(recordSplits[incomeIndex]);
context.write(new Text(countryName), new DoubleWritable(income));
} catch (NumberFormatException nfe) {
logger.info("The value of income is in wrong format." + countryName);
context.getCounter(COUNTERS.MISSING_FIELDS_RECORD_COUNT).increment(1);
return;
}
}
}
}
}
Driver Class :
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.mapred.Counters;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class CountryIncomeConf {
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Path inputPath = new Path(args[0]);
Path outputDir = new Path(args[1]);
// Create configuration
Configuration conf = new Configuration(true);
// Create job
Job job = new Job(conf, "CountryIncomeConf");
job.setJarByClass(CountryIncomeConf.class);
Counter counter =
job.getCounters().findCounter(COUNTERS.MISSING_FIELDS_RECORD_COUNT);
System.out.println("Error Counter = " + counter.getValue());
// Setup MapReduce
job.setMapperClass(CountryIncomeMapper.class);
job.setNumReduceTasks(1);
// Specify key / value
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
// Input
FileInputFormat.addInputPath(job, inputPath);
job.setInputFormatClass(TextInputFormat.class);
// Output
FileOutputFormat.setOutputPath(job, outputDir);
job.setOutputFormatClass(TextOutputFormat.class);
// Delete output if exists
FileSystem hdfs = FileSystem.get(conf);
if (hdfs.exists(outputDir))
hdfs.delete(outputDir, true);
// Execute job
int code = job.waitForCompletion(true) ? 0 : 1;
System.exit(code);
}
}
Looks like you're trying to get the counter before you submitted the job.
I had the same error at the time of an sqoop export.
The error was generated because the hdfs directory was empty.
Once I populated the directory (corresponding to a hive table), the sqoop ran without problems.
I have a driver class which is using MultipleInputFormat class to invoke different mappers at runtime.
However when I use the MultipleInputs.addInputPath(job, fStatus.getPath(), TextInputFormat.class,CreatePureDeltaMapperOne.class) in the first for loop, my first mapper(CreatePureDeltaMapperOne) is not getting invoked. When I comment the block of code which invokes the multiple input format from the first for loop, and call it from outside, the mapper class is invoked. Please help me find the issue.
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URISyntaxException;
import java.util.Properties;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
/***
* Creates the pure delta file by matching the history records present in HDFS
* #author Debajit
*
*/
public class CreatePureDeltaDriver {
/**
* #param args
* #throws URISyntaxException
*/
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException, URISyntaxException {
String historyFileInputPath="";
String deltaFileDirectoryPath="";
String pureDeltaFileOutPath="";
Configuration config= new Configuration();
Job job = new Job(config, "Pure Delta File Creation");
job.setJarByClass(CreatePureDeltaDriver.class);
Path historyDirPath= new Path(historyFileInputPath);
FileSystem fs = FileSystem.get(config);
FileStatus[] statusHistory = fs.listStatus(historyDirPath);
for (FileStatus fStatus : statusHistory) {
String historyFileName=fStatus.getPath().getName();
if(historyFileName.contains("part-r")){
MultipleInputs.addInputPath(job, fStatus.getPath(), TextInputFormat.class,CreatePureDeltaMapperOne.class);
}
}
Path deltaDirPath= new Path(deltaFileDirectoryPath);
FileStatus[] statusDelta = fs.listStatus(deltaDirPath);
for (FileStatus fStatus : statusDelta) {
String deltaFileName=fStatus.getPath().getName();
if(deltaFileName.startsWith("part-r")){
MultipleInputs.addInputPath(job, fStatus.getPath(), TextInputFormat.class, CreatePureDeltaMapperTwo.class);
}
}
job.setMapperClass(CreatePureDeltaMapperOne.class);
job.setMapperClass(CreatePureDeltaMapperTwo.class);
job.setReducerClass(CreatePureDeltaReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path hisInPath = new Path(historyFileInputPath);
Path outPath = new Path(pureDeltaFileOutPath);
//MultipleInputs.addInputPath(job, hisInPath, TextInputFormat.class, CreatePureDeltaMapperOne.class);
//MultipleInputs.addInputPath(job, delPath, TextInputFormat.class, CreatePureDeltaMapperTwo.class);
FileOutputFormat.setOutputPath(job, outPath);
System.out.println(job.waitForCompletion(true));
}
}
MY MAPPER CLASS
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Properties;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
public class CreatePureDeltaMapperOne extends Mapper<LongWritable, Text, Text, Text> {
private Text outKey = new Text();
private Text outValue = new Text();
int counter=0;
private String delimiter="";
private int primaryKeyIndicator =0;
private Integer numMapNodes = null;
public void setup(Context context) throws IOException{
System.out.println("SETUP--- Mapper 1");
Configuration config = context.getConfiguration();
Properties properties = new Properties();
String propertyDirectory = config.get("propertyDirectory");
String propertyFileName =config.get("propertyFileName");
Path propertyDirPath= new Path(propertyDirectory);
FileSystem fs = FileSystem.get(config);
FileStatus[] status = fs.listStatus(propertyDirPath);
for (FileStatus fStatus : status) {
String propFileName=fStatus.getPath().getName().trim();
if(propFileName.equals(propertyFileName)){
properties.load(new InputStreamReader(fs.open(fStatus.getPath())));
this.setNumMapNodes(Integer.parseInt(properties.getProperty("num.of.nodes").trim()));
this.setDelimiter(properties.getProperty("file.delimiter.type").trim());
this.setPrimaryKeyIndicator(Integer.parseInt(properties.getProperty("file.primary.key.index.specifier").trim()));
}
}
}
public void map(LongWritable key, Text val, Context context) throws IOException, InterruptedException{
String valueString = val.toString().trim();
String[] tokens = valueString.split(this.getDelimiter());
String temp=tokens[this.getPrimaryKeyIndicator()].toString();
System.out.println(" MAPPER 1 invoked");
this.setOutKey(new Text(tokens[this.getPrimaryKeyIndicator()].toString().trim()));//Account number
this.setOutValue(new Text("h"+valueString.trim()));
context.write(outKey,outValue );
}
}
Do not use these two lines in your code :
job.setMapperClass(CreatePureDeltaMapperOne.class);
job.setMapperClass(CreatePureDeltaMapperTwo.class);
Because you are already passing name of corresponding class in the loop.
Hope it helps..
I am just trying to access the Distributed cache file from the mapper and trying to set the record(string) from the cache file as the key just to check if i am getting the contents from the cache file(stop.txt),but what i am getting is that the file content of the actual file which is the input file(input.txt) contents as the key.Please guide
Both the cache file and the input file are in the hdfs
Below is my actual code
package com.cache;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.util.Map;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.lib.IdentityReducer;
public class DistributedCacheTest {
public static class MyMap extends MapReduceBase implements Mapper<LongWritable,Text , Text, IntWritable>{
public Path[] localArchives;
public Path[] localFiles;
BufferedReader cacheReader;
public void configure(JobConf job){
// Get the cached archives/files
try {
localArchives = DistributedCache.getLocalCacheArchives(job);
localFiles = DistributedCache.getLocalCacheFiles(job);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void map(LongWritable key,Text value ,
OutputCollector<Text, IntWritable> output, Reporter report)
throws IOException {
if (localFiles != null && localFiles.length > 0) {
System.out.println("Inside setup(): "
+ localFiles[0].toString());
String line;
try{
cacheReader = new BufferedReader(new FileReader(localFiles[0].toString()));
while((line=cacheReader.readLine())!=null)
{
System.out.println("**********" + line);
output.collect(new Text(line), new IntWritable(1));
}
}
catch(Exception ex)
{
ex.printStackTrace();
}
finally
{
cacheReader.close();
}
}
}
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException, URISyntaxException {
if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path> <output path>");
System.exit(-1);
}
JobConf job =new JobConf(DistributedCacheTest.class);
job.setJobName("DistriTestjob");
DistributedCache.addCacheFile(new URI("/user/hadoop/stop.txt"),job);
job.setMapperClass(MyMap.class);
job.setReducerClass(IdentityReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
JobClient.runJob(job);
}
}