Map-reduce JobConf - Error in adding FileInputFormat - hadoop

I have created a Mapper using the syntax:
public class xyz extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>{
-----
public void map(LongWritable key, Text value,
OutputCollector<Text, Text> output, Reporter reporter)
--
}
In the job, I created a Job object:
Job job = new Job(getConf());
To this job, I am not able to add Mapper class using:
job.setMapper(xyz);
error message:
The method setMapperClass(Class<? extends Mapper>) in the type Job is not applicable for the arguments (Class<InvertedIndMap1>)
I cannot use a map with extends Mapper as I am using outputCollectorand Reporter in the mapper.
In the job, if I use JobConf instead of job like:
JobConf conf = new JobConf(getConf());
then conf.setMapper(xyz) is working.
But not able to set input paths using:
FileInputFormat.addInputPaths(conf,new Path(args[0]));
Error message:
The method addInputPaths(Job, String) in the type FileInputFormat is not applicable for the arguments (JobConf, Path)
I tried setInputPaths, setInputpath, addInputPath. But same error again.
Same error occurs for addOutputPath/SetOuputpath.
Please suggest a solution for this issue.

I think the problem is that you have imported not suitable FileInputFormat. I guess that you need to replace
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
with
import org.apache.hadoop.mapred.FileInputFormat;

You are basically mixing the two imports, the mapred(older) and mapreduce(new).Try to include only one and replace all the older with the new mapreduce class

Related

Could reducer class not be launched by any chance? Can't see Sytem.out.println statements in the reducer logs

I have a driver class, mapper class and reducer class. The mapreduce job runs fine. But the desired out is not coming. I have put System.out.println statements in the reducer. I looked at the logs of mapper and reducer. System.out.println statements that I put in mapper can be seen in the logs but println statements in the reducer are not seen in the logs. Could it be possible that reducer is not at all launched?
This is the log fine from reducer.
I assume this question is based on the code in your earlier question: mapreduce composite Key sample - doesn't show the desired output
public class CompositeKeyReducer extends Reducer<Country, IntWritable, Country, IntWritable> {
public void reduce(Country key, Iterator<IntWritable> values, Context context) throws IOException, InterruptedException {
}
}
The reduce isn't running because the reduce method signature is wrong. You have:
public void reduce(Country key, Iterator<IntWritable> values, Context context)
It should be:
public void reduce(Country key, Iterable<IntWritable> values, Context context)
To make sure this doesn't happen again you should add the #Override annotation to the class. This will tell you if you've got the signature wrong.
No change in the code. It works now.
All I did was restarted my Hadoop Cloudera image and it works now. I can't believe this happended.

Explanation of mapper program for search in hadoop

I am new to hadoop, so i am having difficulty in understanding the programs a little. So, If someone can help me in understanding this mapper program ?
package SearchTxn;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MyMap extends Mapper<LongWritable, Text, NullWritable, Text>{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String Txn = value.toString();
String TxnParts[] = Txn.split(",");
Double Amt = Double.parseDouble(TxnParts[3]);
String Uid = TxnParts[2];
if(Uid.equals("4000010") && Amt>100)
{
context.write(null, value);
}
}
}
The code basically filters lines in which Uid (second column in your csv) is "4000010" and Amt (I guess for amount, third column in your csv) is greater than 100.
Along with answer from #Thomas Jungblut, below line of your program says about Mapper class overall input and output. Here nothing is retuned as a key but Text as a value.
public class MyMap extends Mapper<LongWritable, Text, NullWritable, Text>{
So are the parameters in write method.
context.write(null, value);
Its not always necessary to write key for serialization from Mapper class. As per your use case , either key or value or both can be written to context.write method.

about context object in map-reduce

Can anyone explain why we are writing arguments in angle brackets in below statement and why we are defining output key/value pairs in arguments.
public static class Map extends Mapper <LongWritable, Text, Text, IntWritable>
What is context object and why we are using in the below statement.
public void map(LongWritable key, Text value, Context context ) throws IOException, InterruptedException
To add to what #Vasu answered..
Context stores references to RecordReader and RecordWriter.
Whenever context.getCurrentKey() and context.getCurrentValue() are used to retrieve key and value pair, the request is assigned to RecordReader. And when context.write() is called, it is assigned to RecordWriter.
Here RecordReader and RecordWriter are actually abstract classes.
<> is used to indicate generics in Java.
Mapper <LongWritable, Text, Text, IntWritable> takes only <LongWritable,Text> as keys and <Text,IntWritable> as values. If you try to provide any other writable types to your mapper, this will throw an error.
Context context object is used to write output Key-Values as well as get configuration, counters, cacheFiles etc in the Mapper.

not able to access the reduce method in reduce class

public static class Reduce extends Reducer<Text, Text, Text, Text>
{
private Text txtReduceOutputKey = new Text("");
private Text txtReduceOutputValue = new Text("");
public void reduce(Text key, Iterator<Text> values, Context context) throws IOException, InterruptedException
{
//some code;
}
}
It is not giving any error. I am able to access the class as I am able to initiate those variables txtReduceOutputKey and txtReduceOutputValue. but the reduce method is ignored while execution. So I am not able to run the code //some code in the above method. Also I am using below packages.
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
Any idea how can I fix this?
Please make it sure that you have set the Reducer class in the driver code.
For example :-
job.setReducerClass(Base_Reducer.class);
it was because of Iterator. it should be replaced by Iterable.
Thank you.
Such issues come due to couple reasons like below:
Not overriding the reduce() method - it's better to annotate with override
As mentioned in an earlier comment, forgetting to set reducer in the driver code

Error in using one MapReduce's output as another MapReduce's input

I have two Map/Reduce classes, named MyMappper1/MyReducer1 and MyMapper2/MyReducer2, and want to use the output of MyReducer1 as the input of MyMapper2, by setting the input path of job2 to the output path of job1.
The types are as follows:
public class MyMapper1 extends Mapper<LongWritable, Text, IntWritable, IntArrayWritable>
public class MyReducer1 extends Reducer<IntWritable, IntArrayWritable, IntWritable, IntArrayWritable>
public class MyMapper2 extends Mapper<IntWritable, IntArrayWritable, IntWritable, IntArrayWritable>
public class MyReducer2 extends Reducer<IntWritable, IntArrayWritable, IntWritable, IntWritable>
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
}
And the code for setting the input/output path is like:
Path temppath = new Path("temp-dir-" + temp_time);
FileOutputFormat.setOutputPath(job1, temppath);
...........
FileInputFormat.addInputPath(job2, temppath);
The code for setting Input/Output format is like:
job1.setOutputFormatClass(TextOutputFormat.class);
..........
job2.setInputFormatClass(KeyValueTextInputFormat.class);
However I always get the exception when running job2:
11/04/16 12:34:09 WARN mapred.LocalJobRunner: job_local_0002
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable
at ligon.MyMapper2.map(MyMapper2.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
I have tried changing the InputFormat and OutputFormat, but with no success, similar(although not the same) exception happens in job2.
My complete code package is at:
http://dl.dropbox.com/u/7361939/HW2_Q1.zip
The problem is that in job 2, KeyValueTextInputFormat produces key-value pairs of type , and you're attempting to process them with a Mapper that accepts , resulting in a ClassCastException. Best bet is to change your mapper to accept and convert from Text to integer.
I was facing the same problem and figured out the solution few moments ago. Since you are using IntArrayWritable as the output of the reducer its easy to write and later read the data as binary.
For the first job:
job1.setOutputFormatClass(SequenceFileOutputFormat.class);
job1.setOutputKeyClass(IntWritable.class);
job1.setOutputValueClass(IntArrayWritable.class);
For the second job:
job2.setInputFormatClass(SequenceFileInputFormat.class);
This should work in your case

Resources