not able to access the reduce method in reduce class - hadoop

public static class Reduce extends Reducer<Text, Text, Text, Text>
{
private Text txtReduceOutputKey = new Text("");
private Text txtReduceOutputValue = new Text("");
public void reduce(Text key, Iterator<Text> values, Context context) throws IOException, InterruptedException
{
//some code;
}
}
It is not giving any error. I am able to access the class as I am able to initiate those variables txtReduceOutputKey and txtReduceOutputValue. but the reduce method is ignored while execution. So I am not able to run the code //some code in the above method. Also I am using below packages.
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
Any idea how can I fix this?

Please make it sure that you have set the Reducer class in the driver code.
For example :-
job.setReducerClass(Base_Reducer.class);

it was because of Iterator. it should be replaced by Iterable.
Thank you.

Such issues come due to couple reasons like below:
Not overriding the reduce() method - it's better to annotate with override
As mentioned in an earlier comment, forgetting to set reducer in the driver code

Related

Instantiate a field level HashMap in JCodeModel

I want to declare and instantiate a HashMap in one go in JCodeModel.
I do:
jc.field(JMod.PRIVATE, HashMap.class, "initAttributes");
which declares it but doesn't instantiate it. How do I instantiate it?
Thanks
In the simplest case, you can just append the initialization directly to your creation of the field:
jc.field(JMod.PRIVATE, HashMap.class, "initAttributes")
.init(JExpr._new(codeModel.ref(HashMap.class)));
Some further hints:
Considering that you should usually program to an interface, it is a good practice to declare the variable using a type that is "as basic as possible". You should hardly ever declare a variable as
private HashMap map;
but basically always only as
private Map map;
because Map is the interface that is relevant here.
You can also add generics in JCodeModel. These usually involve some calls to narrow on certain types. It is a bit more effort, but it will generate code that can be compiled without causing warnings due to the raw types.
An example is shown here. (It uses String as the key type and Integer as the value type of the map. You may adjust this accordingly)
import java.util.HashMap;
import java.util.Map;
import com.sun.codemodel.CodeWriter;
import com.sun.codemodel.JClass;
import com.sun.codemodel.JCodeModel;
import com.sun.codemodel.JDefinedClass;
import com.sun.codemodel.JExpr;
import com.sun.codemodel.JMod;
import com.sun.codemodel.writer.SingleStreamCodeWriter;
public class InitializeFieldInCodeModel
{
public static void main(String[] args) throws Exception
{
JCodeModel codeModel = new JCodeModel();
JDefinedClass definedClass = codeModel._class("com.example.Example");
JClass keyType = codeModel.ref(String.class);
JClass valueType = codeModel.ref(Integer.class);
JClass mapClass =
codeModel.ref(Map.class).narrow(keyType, valueType);
JClass hashMapClass =
codeModel.ref(HashMap.class).narrow(keyType, valueType);
definedClass.field(JMod.PRIVATE, mapClass, "initAttributes")
.init(JExpr._new(hashMapClass));
CodeWriter codeWriter = new SingleStreamCodeWriter(System.out);
codeModel.build(codeWriter);
}
}
The generated class looks as follows:
package com.example;
import java.util.HashMap;
import java.util.Map;
public class Example {
private Map<String, Integer> initAttributes = new HashMap<String, Integer>();
}

Could reducer class not be launched by any chance? Can't see Sytem.out.println statements in the reducer logs

I have a driver class, mapper class and reducer class. The mapreduce job runs fine. But the desired out is not coming. I have put System.out.println statements in the reducer. I looked at the logs of mapper and reducer. System.out.println statements that I put in mapper can be seen in the logs but println statements in the reducer are not seen in the logs. Could it be possible that reducer is not at all launched?
This is the log fine from reducer.
I assume this question is based on the code in your earlier question: mapreduce composite Key sample - doesn't show the desired output
public class CompositeKeyReducer extends Reducer<Country, IntWritable, Country, IntWritable> {
public void reduce(Country key, Iterator<IntWritable> values, Context context) throws IOException, InterruptedException {
}
}
The reduce isn't running because the reduce method signature is wrong. You have:
public void reduce(Country key, Iterator<IntWritable> values, Context context)
It should be:
public void reduce(Country key, Iterable<IntWritable> values, Context context)
To make sure this doesn't happen again you should add the #Override annotation to the class. This will tell you if you've got the signature wrong.
No change in the code. It works now.
All I did was restarted my Hadoop Cloudera image and it works now. I can't believe this happended.

Explanation of mapper program for search in hadoop

I am new to hadoop, so i am having difficulty in understanding the programs a little. So, If someone can help me in understanding this mapper program ?
package SearchTxn;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MyMap extends Mapper<LongWritable, Text, NullWritable, Text>{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String Txn = value.toString();
String TxnParts[] = Txn.split(",");
Double Amt = Double.parseDouble(TxnParts[3]);
String Uid = TxnParts[2];
if(Uid.equals("4000010") && Amt>100)
{
context.write(null, value);
}
}
}
The code basically filters lines in which Uid (second column in your csv) is "4000010" and Amt (I guess for amount, third column in your csv) is greater than 100.
Along with answer from #Thomas Jungblut, below line of your program says about Mapper class overall input and output. Here nothing is retuned as a key but Text as a value.
public class MyMap extends Mapper<LongWritable, Text, NullWritable, Text>{
So are the parameters in write method.
context.write(null, value);
Its not always necessary to write key for serialization from Mapper class. As per your use case , either key or value or both can be written to context.write method.

Map-reduce JobConf - Error in adding FileInputFormat

I have created a Mapper using the syntax:
public class xyz extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>{
-----
public void map(LongWritable key, Text value,
OutputCollector<Text, Text> output, Reporter reporter)
--
}
In the job, I created a Job object:
Job job = new Job(getConf());
To this job, I am not able to add Mapper class using:
job.setMapper(xyz);
error message:
The method setMapperClass(Class<? extends Mapper>) in the type Job is not applicable for the arguments (Class<InvertedIndMap1>)
I cannot use a map with extends Mapper as I am using outputCollectorand Reporter in the mapper.
In the job, if I use JobConf instead of job like:
JobConf conf = new JobConf(getConf());
then conf.setMapper(xyz) is working.
But not able to set input paths using:
FileInputFormat.addInputPaths(conf,new Path(args[0]));
Error message:
The method addInputPaths(Job, String) in the type FileInputFormat is not applicable for the arguments (JobConf, Path)
I tried setInputPaths, setInputpath, addInputPath. But same error again.
Same error occurs for addOutputPath/SetOuputpath.
Please suggest a solution for this issue.
I think the problem is that you have imported not suitable FileInputFormat. I guess that you need to replace
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
with
import org.apache.hadoop.mapred.FileInputFormat;
You are basically mixing the two imports, the mapred(older) and mapreduce(new).Try to include only one and replace all the older with the new mapreduce class

Method v Class level variables in Hadoop MapReduce

This is a question regarding the performance of writable variables and allocation within a map reduce step. Here is a reducer:
static public class MyReducer extends Reducer<Text, Text, Text, Text> {
#Override
protected void reduce(Text key, Iterable<Text> values, Context context) {
for (Text val : values) {
context.write(key, new Text(val));
}
}
}
Or is this better performance-wise:
static public class MyReducer extends Reducer<Text, Text, Text, Text> {
private Text myText = new Text();
#Override
protected void reduce(Text key, Iterable<Text> values, Context context) {
for (Text val : values) {
myText.set(val);
context.write(key, myText);
}
}
}
In the Hadoop Definitive Guide all the examples are in the first form but I'm not sure if that is for shorter code samples or because it's more idiomatic.
The book may use the first form because it is more concise. However, it is less efficient. For large input files, that approach will create a large number of objects. This excessive object creation would slow down your performance. Performance-wise, the second approach is preferable.
Some references that discuss this issue:
Tip 7 here,
On Hadoop object re-use, and
This JIRA.
Yeah, second approach is preferable if reducer has large data to process. The first approach, will keep creating references and cleaning it up depends on the garbage collector.

Resources