Custom Partitioner Error - hadoop

I am writing my own custom Partitioner(Old Api) below is the code where I am extending Partitioner class:
public static class WordPairPartitioner extends Partitioner<WordPair,IntWritable> {
#Override
public int getPartition(WordPair wordPair, IntWritable intWritable, int numPartitions) {
return wordPair.getWord().hashCode() % numPartitions;
}
}
Setting the JobConf:
conf.setPartitionerClass(WordPairPartitioner.class);
WordPair Class contains:
private Text word;
private Text neighbor;
Questions:
1. I am getting error:"actual argument class (WordPairPartitioner) cannot convert to Class (?extends Partitioner).
2. Is this a right way to write the custom partitioner or do I need to override some other functionality as well?

I believe you are mixing up old API(classes from org.apache.hadoop.mapred.*) and new API(classes from org.apache.hadoop.mapreduce.*)
Using old API, you may do as follows:
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.Partitioner;
public static class WordPairPartitioner implements Partitioner<WordPair,IntWritable> {
#Override
public int getPartition(WordPair wordPair, IntWritable intWritable, int numPartitions) {
return wordPair.getWord().hashCode() % numPartitions;
}
#Override
public void configure(JobConf arg0) {
}
}

In addition to Amar's answer, you should handle the eventuality of hashCode returning a negative number by bit masking:
#Override
public int getPartition(WordPair wordPair, IntWritable intWritable, int numPartitions) {
return (wordPair.getWord().hashCode() % numPartitions) & 0x7FFFFFFF;
}

Related

How to implement WritableComparable interface?

I need to use the method setGroupingComparatorClass in Job and it takes an argument of type WritableComparable.
I am unable to implement WritableComparable class.
Please help me to solve this. Regards, Bidyut
setGroupingComparatorClass(Class<? extends RawComparator> cls)
Define the comparator that controls which keys are grouped together for a single call to Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
job.setGroupingComparatorClass(CustomKey.GroupComparator.class);
In your Customkey class you can write static method.
Add below code in your custom key class.
public class Customkey implements WritableComparable<IndexerKey> {
public static class GroupComparator extends WritableComparator
implements Serializable {
private static final long serialVersionUID = -3385728040072507941L;
public GroupComparator() {
super(Customkey .class, true);
}
#SuppressWarnings("rawtypes")
public int compare(WritableComparable a, WritableComparable b) {
Customkey w1 = (Customkey ) a;
Customkey w2 = (Customkey ) b;
return w1.compareGroup(w2);
}
}
}
Hope this could help you.

hadoop partitioner not working

public class Partitioner_2 implements Partitioner<Text,Text>{
#Override
public int getPartition(Text key, Text value, int numPartitions) {
int hashValue=0;
for(char c: key.toString().split("\\|\\|")[0].toCharArray()){
hashValue+=(int)c;
}
return Math.abs(hashValue * 127) % numPartitions;
}
}
That is my partitioner code and the key is in the form:
"str1||str2" , I would like to send all keys that have the same value for str1 to the same reducer.
My GroupComparator and KeyComparator are as follows:
public static class GroupComparator_2 extends WritableComparator {
protected GroupComparator_2() {
super(Text.class, true);
}
#Override
public int compare(WritableComparable w1, WritableComparable w2) {
Text kw1 = (Text) w1;
Text kw2 = (Text) w2;
String k1=kw1.toString().split("||")[0].trim();
String k2=kw2.toString().split("||")[0].trim();
return k1.compareTo(k2);
}
}
public static class KeyComparator_2 extends WritableComparator {
protected KeyComparator_2() {
super(Text.class, true);
}
#Override
public int compare(WritableComparable w1, WritableComparable w2) {
Text key1 = (Text) w1;
Text key2 = (Text) w2;
String kw1_key1=key1.toString().split("||")[0];
String kw1_key2=key2.toString().split("||")[0];
int cmp=kw1_key1.compareTo(kw1_key2);
if(cmp==0){
String kw2_key1=key1.toString().split("||")[1].trim();
String kw2_key2=key2.toString().split("||")[1].trim();
cmp=kw2_key1.compareTo(kw2_key2);
}
return cmp;
}
}
The error I am currently receiving is :
KeywordKeywordCoOccurrence_2.java:92: interface expected here
public class Partitioner_2 implements Partitioner<Text,Text>{
^
KeywordKeywordCoOccurrence_2.java:94: method does not override or implement a method from a supertype
#Override
^
KeywordKeywordCoOccurrence_2.java:147: setPartitionerClass(java.lang.Class<? extends org.apache.hadoop.mapreduce.Partitioner>) in org.apache.hadoop.mapreduce.Job cannot be applied to (java.lang.Class<KeywordKeywordCoOccurrence_2.Partitioner_2>)
job.setPartitionerClass(Partitioner_2.class);
But as far as I can tell I have overridden the getPartition() method which is the only method in the Partitioner interface? Any help in identifying what I am doing wrong and how to fix it would be much appreciated.
Thanks in advance!
Partitioner is an abstract class in the new mapreduce API (that you're apparently using).
So you should define it as:
public class Partitioner_2 extends Partitioner<Text, Text> {

hadoop compiling

I'm new to hadoop. I got this code from net
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class Gender {
private static String genderCheck = "female";
public static class Map extends MapReduceBase implements Mapper {
private final static IntWritable one = new IntWritable(1);
private Text locText = new Text();
public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {
String line = value.toString();
String location = line.split(",")[14] + "," + line.split(",")[15];
long male = 0L;
long female = 0L;
if (line.split(",")[17].matches("\d+") && line.split(",")[18].matches("\d+")) {
male = Long.parseLong(line.split(",")[17]);
female = Long.parseLong(line.split(",")[18]);
}
long diff = male - female;
locText.set(location);
if (Gender.genderCheck.toLowerCase().equals("female") && diff < 0) {
output.collect(locText, new LongWritable(diff * -1L));
}
else if (Gender.genderCheck.toLowerCase().equals("male") && diff
> 0) {
output.collect(locText, new LongWritable(diff));
}
} }
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(Gender.class);
conf.setJobName("gender");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(LongWritable.class);
conf.setMapperClass(Map.class);
if (args.length != 3) {
System.out.println("Usage:");
System.out.println("[male/female] /path/to/2kh/files /path/to/output");
System.exit(1);
}
if (!args[0].equalsIgnoreCase("male") && !args[0].equalsIgnoreCase("female")) {
System.out.println("first argument must be male or female");
System.exit(1);
}
Gender.genderCheck = args[0];
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[1]));
FileOutputFormat.setOutputPath(conf, new Path(args[2]));
JobClient.runJob(conf); }
}
when I compile this code using "javac -cp /usr/local/hadoop/hadoop-core-1.0.3.jar Gender.java"
getting the following error:
"Gender.Map is not abstract and does not override abstract method
map(java.lang.Object,java.lang.Object,org.apache.hadoop.mapred.OutputCollector,org.apache.hadoop.mapred.Reporter)
in org.apache.hadoop.mapred.Mapper
public static class Map extends MapReduceBase implements Mapper "
How can I compile it correctly?
Change the class Maper class declaration as follows:
public static class Map extends MapReduceBase implements Mapper<LongWritable,Text,Text, LongWritable>
If you do not specify any specific class names, you would need to have the map function as follows:
#Override
public void map(Object arg0, Object arg1, OutputCollector arg2, Reporter arg3) throws IOException {
// TODO Auto-generated method stub
}
Now, the specific types denote here the expected input key-value pair types and the output key-value types from the mapper.
In your case the input key-value pair are LongWritable-Text.
And, guessing from your output.collect method calls, your mapper output key-value pair is Text-LongWritable.
Hence, your Map class shall implememnt Mapper<LongWritable,Text,Text, LongWritable>.
There was one more error in your code -
Using "\d+" will not compile as \d has no meaning, after backslash it expects a special escape sequence, so I guess for you the following should work:
line.split(",")[17].matches("\\d+")
Change the map class as follows:
public static class Map extends MapReduceBase implements Mapper <Input key, Input value, Output Key , Output Value>
In your case the input key is LongWritable, Input value is Text, Output Key is Text , Output value is LongWritable
public static class Map extends MapReduceBase implements Mapper <LongWritable, Text, Text,LongWritable>

Spring InitBinder: bind empty or null values of a float field as 0

I'm just wondering if it's possible to tell an #InitBinder that empty float values in a form would be converted to 0.
I know that float is a primitive data type but I'd still like to convert null or empty values to 0.
If that is possible, how can i achieve that?
Otherwise I'll just make a workaround using a String instead of a float
Define a subclsss of CustomNumberEditor as
import org.springframework.beans.propertyeditors.CustomNumberEditor;
import org.springframework.util.StringUtils;
public class MyCustomNumberEditor extends CustomNumberEditor {
public MyCustomNumberEditor(Class<? extends Number> numberClass) throws IllegalArgumentException {
super(numberClass, true);
}
#Override
public void setAsText(String text) throws IllegalArgumentException {
if (!StringUtils.hasText(text)) {
setValue(0);
}else {
super.setAsText(text.trim());
}
}
}
Then in your controller class (I create a BaseController for all my application controllers, I need this behavior for all the numeric primitive types in my application, so I simply define this in my BaseController), register binders for the various primitive types.
Note that the constructor parameter of MyCustomNumberEditor must be a subclass of Number, instead of primitive class type.
#InitBinder
public void registerCustomerBinder(WebDataBinder binder) {
binder.registerCustomEditor(double.class, new MyCustomNumberEditor(Double.class));
binder.registerCustomEditor(float.class, new MyCustomNumberEditor(Float.class));
binder.registerCustomEditor(long.class, new MyCustomNumberEditor(Long.class));
binder.registerCustomEditor(int.class, new MyCustomNumberEditor(Integer.class));
....
}
Yes you could always do that .Spring have a CustomNumberEditor which is a customizable property editor for any Number subclass like Integer, Long, Float, Double.It is registered by default by BeanWrapperImpl,but, can be overridden by registering custom instance of it as custom editor.It means you could extend a class like this
public class MyCustomNumberEditor extends CustomNumberEditor{
public MyCustomNumberEditor(Class<? extends Number> numberClass, NumberFormat numberFormat, boolean allowEmpty) throws IllegalArgumentException {
super(numberClass, numberFormat, allowEmpty);
}
public MyCustomNumberEditor(Class<? extends Number> numberClass, boolean allowEmpty) throws IllegalArgumentException {
super(numberClass, allowEmpty);
}
#Override
public String getAsText() {
//return super.getAsText();
return "Your desired text";
}
#Override
public void setAsText(String text) throws IllegalArgumentException {
super.setAsText("set your desired text");
}
}
And then register it normally in you controller:
#InitBinder
public void initBinder(WebDataBinder binder) {
binder.registerCustomEditor(Float.class,new MyCustomNumberEditor(Float.class, true));
}
This should do the task.

Can Hadoop mapper produce multiple keys in output?

Can a single Mapper class produce multiple key-value pairs (of same type) in a single run?
We output the key-value pair in the mapper like this:
context.write(key, value);
Here's a trimmed down (and exemplified) version of the Key:
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.ObjectWritable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
public class MyKey extends ObjectWritable implements WritableComparable<MyKey> {
public enum KeyType {
KeyType1,
KeyType2
}
private KeyType keyTupe;
private Long field1;
private Integer field2 = -1;
private String field3 = "";
public KeyType getKeyType() {
return keyTupe;
}
public void settKeyType(KeyType keyType) {
this.keyTupe = keyType;
}
public Long getField1() {
return field1;
}
public void setField1(Long field1) {
this.field1 = field1;
}
public Integer getField2() {
return field2;
}
public void setField2(Integer field2) {
this.field2 = field2;
}
public String getField3() {
return field3;
}
public void setField3(String field3) {
this.field3 = field3;
}
#Override
public void readFields(DataInput datainput) throws IOException {
keyTupe = KeyType.valueOf(datainput.readUTF());
field1 = datainput.readLong();
field2 = datainput.readInt();
field3 = datainput.readUTF();
}
#Override
public void write(DataOutput dataoutput) throws IOException {
dataoutput.writeUTF(keyTupe.toString());
dataoutput.writeLong(field1);
dataoutput.writeInt(field2);
dataoutput.writeUTF(field3);
}
#Override
public int compareTo(MyKey other) {
if (getKeyType().compareTo(other.getKeyType()) != 0) {
return getKeyType().compareTo(other.getKeyType());
} else if (getField1().compareTo(other.getField1()) != 0) {
return getField1().compareTo(other.getField1());
} else if (getField2().compareTo(other.getField2()) != 0) {
return getField2().compareTo(other.getField2());
} else if (getField3().compareTo(other.getField3()) != 0) {
return getField3().compareTo(other.getField3());
} else {
return 0;
}
}
public static class MyKeyComparator extends WritableComparator {
public MyKeyComparator() {
super(MyKey.class);
}
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
return compareBytes(b1, s1, l1, b2, s2, l2);
}
}
static { // register this comparator
WritableComparator.define(MyKey.class, new MyKeyComparator());
}
}
And this is how we tried to output both keys in the Mapper:
MyKey key1 = new MyKey();
key1.settKeyType(KeyType.KeyType1);
key1.setField1(1L);
key1.setField2(23);
MyKey key2 = new MyKey();
key2.settKeyType(KeyType.KeyType2);
key2.setField1(1L);
key2.setField3("abc");
context.write(key1, value1);
context.write(key2, value2);
Our job's output format class is: org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
I'm stating this because in other output format classes I've seen the output not appending and just committing in their implementation of write method.
Also, we are using the following classes for Mapper and Context:
org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Context
Writing to the context multiple times in one map task is perfectly fine.
However, you may have several problems with your key class. Whenever you implement WritableComparable for a key, you should also implement equals(Object) and hashCode() methods. These aren't part of the WritableComparable interface, since they are defined in Object, but you must provide implementations.
The default partitioner uses the hashCode() method to decide which reducer each key/value pair goes to. If you don't provide a sane implementation, you can get strange results.
As a rule of thumb, whenever you implement hashCode() or any sort of comparison method, you should provide an equals(Object) method as well. You will have to make sure it accepts an Object as the parameter, as this is how it is defined in the Object class (whose implementation you are probably overriding).

Resources