hadoop partitioner not working - sorting

public class Partitioner_2 implements Partitioner<Text,Text>{
#Override
public int getPartition(Text key, Text value, int numPartitions) {
int hashValue=0;
for(char c: key.toString().split("\\|\\|")[0].toCharArray()){
hashValue+=(int)c;
}
return Math.abs(hashValue * 127) % numPartitions;
}
}
That is my partitioner code and the key is in the form:
"str1||str2" , I would like to send all keys that have the same value for str1 to the same reducer.
My GroupComparator and KeyComparator are as follows:
public static class GroupComparator_2 extends WritableComparator {
protected GroupComparator_2() {
super(Text.class, true);
}
#Override
public int compare(WritableComparable w1, WritableComparable w2) {
Text kw1 = (Text) w1;
Text kw2 = (Text) w2;
String k1=kw1.toString().split("||")[0].trim();
String k2=kw2.toString().split("||")[0].trim();
return k1.compareTo(k2);
}
}
public static class KeyComparator_2 extends WritableComparator {
protected KeyComparator_2() {
super(Text.class, true);
}
#Override
public int compare(WritableComparable w1, WritableComparable w2) {
Text key1 = (Text) w1;
Text key2 = (Text) w2;
String kw1_key1=key1.toString().split("||")[0];
String kw1_key2=key2.toString().split("||")[0];
int cmp=kw1_key1.compareTo(kw1_key2);
if(cmp==0){
String kw2_key1=key1.toString().split("||")[1].trim();
String kw2_key2=key2.toString().split("||")[1].trim();
cmp=kw2_key1.compareTo(kw2_key2);
}
return cmp;
}
}
The error I am currently receiving is :
KeywordKeywordCoOccurrence_2.java:92: interface expected here
public class Partitioner_2 implements Partitioner<Text,Text>{
^
KeywordKeywordCoOccurrence_2.java:94: method does not override or implement a method from a supertype
#Override
^
KeywordKeywordCoOccurrence_2.java:147: setPartitionerClass(java.lang.Class<? extends org.apache.hadoop.mapreduce.Partitioner>) in org.apache.hadoop.mapreduce.Job cannot be applied to (java.lang.Class<KeywordKeywordCoOccurrence_2.Partitioner_2>)
job.setPartitionerClass(Partitioner_2.class);
But as far as I can tell I have overridden the getPartition() method which is the only method in the Partitioner interface? Any help in identifying what I am doing wrong and how to fix it would be much appreciated.
Thanks in advance!

Partitioner is an abstract class in the new mapreduce API (that you're apparently using).
So you should define it as:
public class Partitioner_2 extends Partitioner<Text, Text> {

Related

Spring MVC Validation for list and reporting the invalid value

I have a list of strings which should be of a specific format. I need to return the error message with the strings which are not of the format specified. How to do this with spring validation(I am using the hibernate validator).
The annotation:
#Documented
#Retention(RUNTIME)
#Target({FIELD, METHOD})
#Constraint(validatedBy = HostsValidator.class)
public #interface HostsConstraint {
String message();
Class<?>[] groups() default {};
Class<? extends Payload>[] payload() default {};
}
The implementation:
public class HostsValidator implements ConstraintValidator<HostsConstraint, List<String>>{
#Override
public void initialize(OriginHostsConstraint constraintAnnotation) {
}
#Override
public boolean isValid(List<String> strings, ConstraintValidatorContext context) {
for (String s : strings) {
if (!s.matches("[0-9]+") {
//How do I say: Invalid string <s> ?
return false;
}
}
}
}
The usage:
public class Test {
#HostsConstraint(message="Invalid string ")
private List<String> hosts;
}
Using validatedValue will give the entire list.
Use JSR 380 validation, it allows container element constraints.
Here is a link to the container element section in the Hibernate Validator 6.0.6.FINAL Document
I think I found a solution but it is coupled to hibernate validator. May be it is even a hacky implementation.
The usage:
public class Test {
#HostsConstraint(message="Invalid string : ${invalidStr}")
private List<String> hosts;
}
The implementation
public class HostsValidator implements ConstraintValidator<HostsConstraint, List<String>>{
#Override
public void initialize(OriginHostsConstraint constraintAnnotation) {}
#Override
public boolean isValid(List<String> strings, ConstraintValidatorContext context) {
for (String s : strings) {
if (!s.matches("[0-9]+") {
ConstraintValidatorContextImpl contextImpl =
(ConstraintValidatorContextImpl) context
.unwrap(HibernateConstraintValidatorContext.class);
contextImpl.addExpressionVariable("invalidStr", s);
return false;
}
}
}
}

How to implement WritableComparable interface?

I need to use the method setGroupingComparatorClass in Job and it takes an argument of type WritableComparable.
I am unable to implement WritableComparable class.
Please help me to solve this. Regards, Bidyut
setGroupingComparatorClass(Class<? extends RawComparator> cls)
Define the comparator that controls which keys are grouped together for a single call to Reducer.reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
job.setGroupingComparatorClass(CustomKey.GroupComparator.class);
In your Customkey class you can write static method.
Add below code in your custom key class.
public class Customkey implements WritableComparable<IndexerKey> {
public static class GroupComparator extends WritableComparator
implements Serializable {
private static final long serialVersionUID = -3385728040072507941L;
public GroupComparator() {
super(Customkey .class, true);
}
#SuppressWarnings("rawtypes")
public int compare(WritableComparable a, WritableComparable b) {
Customkey w1 = (Customkey ) a;
Customkey w2 = (Customkey ) b;
return w1.compareGroup(w2);
}
}
}
Hope this could help you.

Custom Partitioner Error

I am writing my own custom Partitioner(Old Api) below is the code where I am extending Partitioner class:
public static class WordPairPartitioner extends Partitioner<WordPair,IntWritable> {
#Override
public int getPartition(WordPair wordPair, IntWritable intWritable, int numPartitions) {
return wordPair.getWord().hashCode() % numPartitions;
}
}
Setting the JobConf:
conf.setPartitionerClass(WordPairPartitioner.class);
WordPair Class contains:
private Text word;
private Text neighbor;
Questions:
1. I am getting error:"actual argument class (WordPairPartitioner) cannot convert to Class (?extends Partitioner).
2. Is this a right way to write the custom partitioner or do I need to override some other functionality as well?
I believe you are mixing up old API(classes from org.apache.hadoop.mapred.*) and new API(classes from org.apache.hadoop.mapreduce.*)
Using old API, you may do as follows:
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.Partitioner;
public static class WordPairPartitioner implements Partitioner<WordPair,IntWritable> {
#Override
public int getPartition(WordPair wordPair, IntWritable intWritable, int numPartitions) {
return wordPair.getWord().hashCode() % numPartitions;
}
#Override
public void configure(JobConf arg0) {
}
}
In addition to Amar's answer, you should handle the eventuality of hashCode returning a negative number by bit masking:
#Override
public int getPartition(WordPair wordPair, IntWritable intWritable, int numPartitions) {
return (wordPair.getWord().hashCode() % numPartitions) & 0x7FFFFFFF;
}

Spring -Mongodb storing/retrieving enums as int not string

My enums are stored as int in mongodb (from C# app). Now in Java, when I try to retrieve them, it throws an exception (it seems enum can be converted from string value only). Is there any way I can do it?
Also when I save some collections into mongodb (from Java), it converts enum values to string (not their value/cardinal). Is there any override available?
This can be achieved by writing mongodb-converter on class level but I don't want to write mondodb-converter for each class as these enums are in many different classes.
So do we have something on the field level?
After a long digging in the spring-mongodb converter code,
Ok i finished and now it's working :) here it is (if there is simpler solution i will be happy see as well, this is what i've done ) :
first define :
public interface IntEnumConvertable {
public int getValue();
}
and a simple enum that implements it :
public enum tester implements IntEnumConvertable{
vali(0),secondvali(1),thirdvali(5);
private final int val;
private tester(int num)
{
val = num;
}
public int getValue(){
return val;
}
}
Ok, now you will now need 2 converters , one is simple ,
the other is more complex. the simple one (this simple baby is also handling the simple convert and returns a string when cast is not possible, that is great if you want to have enum stored as strings and for enum that are numbers to be stored as integers) :
public class IntegerEnumConverters {
#WritingConverter
public static class EnumToIntegerConverter implements Converter<Enum<?>, Object> {
#Override
public Object convert(Enum<?> source) {
if(source instanceof IntEnumConvertable)
{
return ((IntEnumConvertable)(source)).getValue();
}
else
{
return source.name();
}
}
}
}
the more complex one , is actually a converter factory :
public class IntegerToEnumConverterFactory implements ConverterFactory<Integer, Enum> {
#Override
public <T extends Enum> Converter<Integer, T> getConverter(Class<T> targetType) {
Class<?> enumType = targetType;
while (enumType != null && !enumType.isEnum()) {
enumType = enumType.getSuperclass();
}
if (enumType == null) {
throw new IllegalArgumentException(
"The target type " + targetType.getName() + " does not refer to an enum");
}
return new IntegerToEnum(enumType);
}
#ReadingConverter
public static class IntegerToEnum<T extends Enum> implements Converter<Integer, Enum> {
private final Class<T> enumType;
public IntegerToEnum(Class<T> enumType) {
this.enumType = enumType;
}
#Override
public Enum convert(Integer source) {
for(T t : enumType.getEnumConstants()) {
if(t instanceof IntEnumConvertable)
{
if(((IntEnumConvertable)t).getValue() == source.intValue()) {
return t;
}
}
}
return null;
}
}
}
and now for the hack part , i personnaly didnt find any "programmitacly" way to register a converter factory within a mongoConverter , so i digged in the code and with a little casting , here it is (put this 2 babies functions in your #Configuration class)
#Bean
public CustomConversions customConversions() {
List<Converter<?, ?>> converters = new ArrayList<Converter<?, ?>>();
converters.add(new IntegerEnumConverters.EnumToIntegerConverter());
// this is a dummy registration , actually it's a work-around because
// spring-mongodb doesnt has the option to reg converter factory.
// so we reg the converter that our factory uses.
converters.add(new IntegerToEnumConverterFactory.IntegerToEnum(null));
return new CustomConversions(converters);
}
#Bean
public MappingMongoConverter mappingMongoConverter() throws Exception {
MongoMappingContext mappingContext = new MongoMappingContext();
mappingContext.setApplicationContext(appContext);
DbRefResolver dbRefResolver = new DefaultDbRefResolver(mongoDbFactory());
MappingMongoConverter mongoConverter = new MappingMongoConverter(dbRefResolver, mappingContext);
mongoConverter.setCustomConversions(customConversions());
ConversionService convService = mongoConverter.getConversionService();
((GenericConversionService)convService).addConverterFactory(new IntegerToEnumConverterFactory());
mongoConverter.afterPropertiesSet();
return mongoConverter;
}
You will need to implement your custom converters and register it with spring.
http://static.springsource.org/spring-data/data-mongo/docs/current/reference/html/#mongo.custom-converters
Isn't it easier to use plain constants rather than an enum...
int SOMETHING = 33;
int OTHER_THING = 55;
or
public class Role {
public static final Stirng ROLE_USER = "ROLE_USER",
ROLE_LOOSER = "ROLE_LOOSER";
}
String yourRole = Role.ROLE_LOOSER

Can Hadoop mapper produce multiple keys in output?

Can a single Mapper class produce multiple key-value pairs (of same type) in a single run?
We output the key-value pair in the mapper like this:
context.write(key, value);
Here's a trimmed down (and exemplified) version of the Key:
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.ObjectWritable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
public class MyKey extends ObjectWritable implements WritableComparable<MyKey> {
public enum KeyType {
KeyType1,
KeyType2
}
private KeyType keyTupe;
private Long field1;
private Integer field2 = -1;
private String field3 = "";
public KeyType getKeyType() {
return keyTupe;
}
public void settKeyType(KeyType keyType) {
this.keyTupe = keyType;
}
public Long getField1() {
return field1;
}
public void setField1(Long field1) {
this.field1 = field1;
}
public Integer getField2() {
return field2;
}
public void setField2(Integer field2) {
this.field2 = field2;
}
public String getField3() {
return field3;
}
public void setField3(String field3) {
this.field3 = field3;
}
#Override
public void readFields(DataInput datainput) throws IOException {
keyTupe = KeyType.valueOf(datainput.readUTF());
field1 = datainput.readLong();
field2 = datainput.readInt();
field3 = datainput.readUTF();
}
#Override
public void write(DataOutput dataoutput) throws IOException {
dataoutput.writeUTF(keyTupe.toString());
dataoutput.writeLong(field1);
dataoutput.writeInt(field2);
dataoutput.writeUTF(field3);
}
#Override
public int compareTo(MyKey other) {
if (getKeyType().compareTo(other.getKeyType()) != 0) {
return getKeyType().compareTo(other.getKeyType());
} else if (getField1().compareTo(other.getField1()) != 0) {
return getField1().compareTo(other.getField1());
} else if (getField2().compareTo(other.getField2()) != 0) {
return getField2().compareTo(other.getField2());
} else if (getField3().compareTo(other.getField3()) != 0) {
return getField3().compareTo(other.getField3());
} else {
return 0;
}
}
public static class MyKeyComparator extends WritableComparator {
public MyKeyComparator() {
super(MyKey.class);
}
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
return compareBytes(b1, s1, l1, b2, s2, l2);
}
}
static { // register this comparator
WritableComparator.define(MyKey.class, new MyKeyComparator());
}
}
And this is how we tried to output both keys in the Mapper:
MyKey key1 = new MyKey();
key1.settKeyType(KeyType.KeyType1);
key1.setField1(1L);
key1.setField2(23);
MyKey key2 = new MyKey();
key2.settKeyType(KeyType.KeyType2);
key2.setField1(1L);
key2.setField3("abc");
context.write(key1, value1);
context.write(key2, value2);
Our job's output format class is: org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
I'm stating this because in other output format classes I've seen the output not appending and just committing in their implementation of write method.
Also, we are using the following classes for Mapper and Context:
org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Context
Writing to the context multiple times in one map task is perfectly fine.
However, you may have several problems with your key class. Whenever you implement WritableComparable for a key, you should also implement equals(Object) and hashCode() methods. These aren't part of the WritableComparable interface, since they are defined in Object, but you must provide implementations.
The default partitioner uses the hashCode() method to decide which reducer each key/value pair goes to. If you don't provide a sane implementation, you can get strange results.
As a rule of thumb, whenever you implement hashCode() or any sort of comparison method, you should provide an equals(Object) method as well. You will have to make sure it accepts an Object as the parameter, as this is how it is defined in the Object class (whose implementation you are probably overriding).

Resources