Can Hadoop mapper produce multiple keys in output?

Can Hadoop mapper produce multiple keys in output? - hadoop

Can a single Mapper class produce multiple key-value pairs (of same type) in a single run?
We output the key-value pair in the mapper like this:
context.write(key, value);
Here's a trimmed down (and exemplified) version of the Key:
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.ObjectWritable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
public class MyKey extends ObjectWritable implements WritableComparable<MyKey> {
public enum KeyType {
KeyType1,
KeyType2
}
private KeyType keyTupe;
private Long field1;
private Integer field2 = -1;
private String field3 = "";
public KeyType getKeyType() {
return keyTupe;
}
public void settKeyType(KeyType keyType) {
this.keyTupe = keyType;
}
public Long getField1() {
return field1;
}
public void setField1(Long field1) {
this.field1 = field1;
}
public Integer getField2() {
return field2;
}
public void setField2(Integer field2) {
this.field2 = field2;
}
public String getField3() {
return field3;
}
public void setField3(String field3) {
this.field3 = field3;
}
#Override
public void readFields(DataInput datainput) throws IOException {
keyTupe = KeyType.valueOf(datainput.readUTF());
field1 = datainput.readLong();
field2 = datainput.readInt();
field3 = datainput.readUTF();
}
#Override
public void write(DataOutput dataoutput) throws IOException {
dataoutput.writeUTF(keyTupe.toString());
dataoutput.writeLong(field1);
dataoutput.writeInt(field2);
dataoutput.writeUTF(field3);
}
#Override
public int compareTo(MyKey other) {
if (getKeyType().compareTo(other.getKeyType()) != 0) {
return getKeyType().compareTo(other.getKeyType());
} else if (getField1().compareTo(other.getField1()) != 0) {
return getField1().compareTo(other.getField1());
} else if (getField2().compareTo(other.getField2()) != 0) {
return getField2().compareTo(other.getField2());
} else if (getField3().compareTo(other.getField3()) != 0) {
return getField3().compareTo(other.getField3());
} else {
return 0;
}
}
public static class MyKeyComparator extends WritableComparator {
public MyKeyComparator() {
super(MyKey.class);
}
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
return compareBytes(b1, s1, l1, b2, s2, l2);
}
}
static { // register this comparator
WritableComparator.define(MyKey.class, new MyKeyComparator());
}
}
And this is how we tried to output both keys in the Mapper:
MyKey key1 = new MyKey();
key1.settKeyType(KeyType.KeyType1);
key1.setField1(1L);
key1.setField2(23);
MyKey key2 = new MyKey();
key2.settKeyType(KeyType.KeyType2);
key2.setField1(1L);
key2.setField3("abc");
context.write(key1, value1);
context.write(key2, value2);
Our job's output format class is: org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
I'm stating this because in other output format classes I've seen the output not appending and just committing in their implementation of write method.
Also, we are using the following classes for Mapper and Context:
org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Context

Writing to the context multiple times in one map task is perfectly fine.
However, you may have several problems with your key class. Whenever you implement WritableComparable for a key, you should also implement equals(Object) and hashCode() methods. These aren't part of the WritableComparable interface, since they are defined in Object, but you must provide implementations.
The default partitioner uses the hashCode() method to decide which reducer each key/value pair goes to. If you don't provide a sane implementation, you can get strange results.
As a rule of thumb, whenever you implement hashCode() or any sort of comparison method, you should provide an equals(Object) method as well. You will have to make sure it accepts an Object as the parameter, as this is how it is defined in the Object class (whose implementation you are probably overriding).

Related

Comparing dates in ascending and descending order in arrayList not working

I have an array of email objects. I need to sort them by two fields. the first field is by subject, which i have working, and the other is by date which I cannot get. I tried a bunch of other people's solutions but nothing seems to work. Here is an excerpt of my code:
import java.time.LocalDateTime;
import java.util.Calendar;
import java.util.GregorianCalendar;
import java.util.Comparator;
import java.util.Date;
public class Email implements Comparable{
private String to;
private String cc;
private String bcc;
private String subject;
private String body;
private GregorianCalendar timestamp; //time that email was created
//constructors and getters and setters (besides timeStamp) omitted to save space
public Date getTimestamp() {
return timestamp.getTime();
}
public void setTimestamp(GregorianCalendar timestamp) {
this.timestamp = timestamp;
}
//This works
public static Comparator<Email> emailSubjectComparatorAscending = new Comparator<Email>(){
public int compare(Email e1, Email e2) {
String EmailSubject1 = e1.getSubject().toLowerCase();
String EmailSubject2 = e2.getSubject().toLowerCase();
return EmailSubject1.compareTo(EmailSubject2);
}
};
//This works
public static Comparator<Email> emailSubjectComparatorDescending = new Comparator<Email>(){
public int compare(Email e1, Email e2) {
String EmailSubject1 = e1.getSubject().toLowerCase();
String EmailSubject2 = e2.getSubject().toLowerCase();
return EmailSubject2.compareTo(EmailSubject1);
}
};
//This doesn't work
public static Comparator<Email> emailDateComparatorAscending = new Comparator<Email>(){
#Override
public int compare(Email o1, Email o2) {
if(o1.getTimestamp().equals(o2.getTimestamp()))
return 0;
if(o2.getTimestamp().before(o1.getTimestamp()))
return 1;
else
return -1;
}
};
//This doesn't work
public static Comparator<Email> emailDateComparatorDescending = new Comparator<Email>(){
#Override
public int compare(Email o1, Email o2) {
if(o1.getTimestamp().equals(o2.getTimestamp()))
return 0;
if(o1.getTimestamp().before(o2.getTimestamp()))
return 1;
else
return -1;
}
};
#Override
public String toString(){
return("To: " + to + "\nCC: " + cc + "\nBCC: " + bcc + "\nSubject: " + subject + "\n" + body);
}
#Override
public int compareTo(Object o) {
// TODO Auto-generated method stub
return 0;
}
}
here is the part of my other class, in which I call these sorting methods:
import java.util.*;
public class Folder{
private ArrayList<Email> emails; //initialize?
private String name;
private String currentSortingMethod; //default = descending
public Folder(String name) {
super();
this.name = name;
currentSortingMethod = "sortByDateDescending";
emails = new ArrayList<Email>(); //instantiate the arrayList
}
//omitted getters, setters and other irrelevant methods
//This works
public void sortBySubjectAscending(){
Collections.sort(emails, Email.emailSubjectComparatorAscending);
currentSortingMethod = "sortBySubjectAscending";
}
//This works
public void sortBySubjectDescending(){
Collections.sort(emails, Email.emailSubjectComparatorDescending);
currentSortingMethod = "sortBySubjectDescending";
}
//This doesn't work
public void sortByDateAscending(){
Collections.sort(emails, Email.emailDateComparatorAscending);
currentSortingMethod = "sortByDateAscending";
}
//This doesn't work
public void sortByDateDescending(){
Collections.sort(emails, Email.emailDateComparatorDescending);
currentSortingMethod = "sortByDateDescending";
}
I am really confused with this and I have tried so many different ways and none work. If anyone could explain how to make this work I would appreciate it greatly

Does default sorting in mapreduce uses Comparator defined in WritableComparable class or the comapreTo() method?

How does sort happens in mapreduce before the output is passed from mapper to reducer. If my output key from mapper is of type IntWritable, does it uses the comparator defined in IntWritable class or compareTo method in the class, if yes how the call is made. If not how the sort is performed, how the call is made?

Map job outputs are first collected and then sent to the Partitioner, responsible for determining to which Reducer the data will be sent (it's not yet grouped by reduce() call though). The default Partitioner uses the hashCode() method of the Key and a modulo with the number of Reducers to do that.
After that, the Comparator will be called to perform a sort on the Map outputs. Flow looks like that:
Collector --> Partitioner --> Spill --> Comparator --> Local Disk (HDFS) <-- MapOutputServlet
Each Reducer will then copy the data from the mapper that has been assigned to it by the partitioner, and pass it through a Grouper that will determine how records are grouped for a single Reducer function call:
MapOutputServlet --> Copy to Local Disk (HDFS) --> Group --> Reduce
Before a function call, the records will also go through a Sorting phase to determine in which order they arrive to the reducer. The Sorter (WritableComparator()) will call the compareTo() (WritableComparable() interface) method of the Key.
To give you a better idea, here is how you would implement a basic compareTo(), grouper and sorter for a custom composite key:
public class CompositeKey implements WritableComparable<CompositeKey> {
IntWritable primaryField = new IntWritable();
IntWritable secondaryField = new IntWritable();
public CompositeKey(IntWritable p, IntWritable s) {
this.primaryField.set(p);
this.secondaryField = s;
}
public void write(DataOutput out) throws IOException {
this.primaryField.write(out);
this.secondaryField.write(out);
}
public void readFields(DataInput in) throws IOException {
this.primaryField.readFields(in);
this.secondaryField.readFields(in);
}
// Called by the partitionner to group map outputs to same reducer instance
// If the hash source is simple (primary type or so), a simple call to their hashCode() method is good enough
public int hashCode() {
return this.primaryField.hashCode();
}
#Override
public int compareTo(CompositeKey other) {
if (this.getPrimaryField().equals(other.getPrimaryField())) {
return this.getSecondaryField().compareTo(other.getSecondaryField());
} else {
return this.getPrimaryField().compareTo(other.getPrimaryField());
}
}
}
public class CompositeGroupingComparator extends WritableComparator {
public CompositeGroupingComparator() {
super(CompositeKey.class, true);
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
CompositeKey first = (CompositeKey) a;
CompositeKey second = (CompositeKey) b;
return first.getPrimaryField().compareTo(second.getPrimaryField());
}
}
public class CompositeSortingComparator extends WritableComparator {
public CompositeSortingComparator() {
super (CompositeKey.class, true);
}
#Override
public int compare (WritableComparable a, WritableComparable b){
CompositeKey first = (CompositeKey) a;
CompositeKey second = (CompositeKey) b;
return first.compareTo(second);
}
}

After Mapper framework takes care about comparing for us for all the default datatypes like IntWritable, DoubleWritable e.t.c ... But if you have a user defined keytype you need to implement WritableComparable Interface.
WritableComparables can be compared to each other, typically via Comparators. Any type which is to be used as a key in the Hadoop Map-Reduce framework should implement this interface.
Note that hashCode() is frequently used in Hadoop to partition keys. It's important that your implementation of hashCode() returns the same result across different instances of the JVM. Note also that the default hashCode() implementation in Object does not satisfy this property.
Example:
public class MyWritableComparable implements WritableComparable {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public int compareTo(MyWritableComparable o) {
int thisValue = this.value;
int thatValue = o.value;
return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
}
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + counter;
result = prime * result + (int) (timestamp ^ (timestamp >>> 32));
return result
}
}
From :https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/WritableComparable.html

hadoop partitioner not working

public class Partitioner_2 implements Partitioner<Text,Text>{
#Override
public int getPartition(Text key, Text value, int numPartitions) {
int hashValue=0;
for(char c: key.toString().split("\\|\\|")[0].toCharArray()){
hashValue+=(int)c;
}
return Math.abs(hashValue * 127) % numPartitions;
}
}
That is my partitioner code and the key is in the form:
"str1||str2" , I would like to send all keys that have the same value for str1 to the same reducer.
My GroupComparator and KeyComparator are as follows:
public static class GroupComparator_2 extends WritableComparator {
protected GroupComparator_2() {
super(Text.class, true);
}
#Override
public int compare(WritableComparable w1, WritableComparable w2) {
Text kw1 = (Text) w1;
Text kw2 = (Text) w2;
String k1=kw1.toString().split("||")[0].trim();
String k2=kw2.toString().split("||")[0].trim();
return k1.compareTo(k2);
}
}
public static class KeyComparator_2 extends WritableComparator {
protected KeyComparator_2() {
super(Text.class, true);
}
#Override
public int compare(WritableComparable w1, WritableComparable w2) {
Text key1 = (Text) w1;
Text key2 = (Text) w2;
String kw1_key1=key1.toString().split("||")[0];
String kw1_key2=key2.toString().split("||")[0];
int cmp=kw1_key1.compareTo(kw1_key2);
if(cmp==0){
String kw2_key1=key1.toString().split("||")[1].trim();
String kw2_key2=key2.toString().split("||")[1].trim();
cmp=kw2_key1.compareTo(kw2_key2);
}
return cmp;
}
}
The error I am currently receiving is :
KeywordKeywordCoOccurrence_2.java:92: interface expected here
public class Partitioner_2 implements Partitioner<Text,Text>{
^
KeywordKeywordCoOccurrence_2.java:94: method does not override or implement a method from a supertype
#Override
^
KeywordKeywordCoOccurrence_2.java:147: setPartitionerClass(java.lang.Class<? extends org.apache.hadoop.mapreduce.Partitioner>) in org.apache.hadoop.mapreduce.Job cannot be applied to (java.lang.Class<KeywordKeywordCoOccurrence_2.Partitioner_2>)
job.setPartitionerClass(Partitioner_2.class);
But as far as I can tell I have overridden the getPartition() method which is the only method in the Partitioner interface? Any help in identifying what I am doing wrong and how to fix it would be much appreciated.
Thanks in advance!

Partitioner is an abstract class in the new mapreduce API (that you're apparently using).
So you should define it as:
public class Partitioner_2 extends Partitioner<Text, Text> {

Oracle char type issue in Hibernate HQL query

I have Oracle table, which contains char type columns. In my Entity class i mapped oracle char type to java string type.
Here is the code for my Entity class.
#Entity
#Table(name="ORG")
public class Organization {
private String serviceName;
private String orgAcct;
//Some other properties goes here...
#Column(name="ORG_ACCT", nullable=false, length=16)
public String getOrgAcct() {
return this.orgAcct;
}
public void setOrgAcct(String orgAcct) {
this.orgAcct = orgAcct;
}
#Column(name="SERVICE_NAME",nullable=true, length=16)
public String getServiceName() {
return this.serviceName;
}
public void setServiceName(String serviceName) {
this.serviceName = serviceName;
}
}
Here both serviceName and orgAcct are char type variables in Oracle
In my DAO class I wrote a HQL query to fetch Oranization entity object using serviceName and orgAcct properties.
#Repository
#Scope("singleton") //By default scope is singleton
public class OrganizationDAOImpl implementsOrganizationDAO {
public OrganizationDAOImpl(){
}
public Organization findOrganizationByOrgAcctAndServiceName(String orgAcct,String serviceName){
String hqlQuery = "SELECT org FROM Organization org WHERE org.serviceName = :serName AND org.orgAcct = :orgAct";
Query query = getCurrentSession().createQuery(hqlQuery)
.setString("serName", serviceName)
.setString("orgAct", orgAcct);
Organization org = findObject(query);
return org;
}
}
But when I call findOrganizationByOrgAcctAndServiceName() method , I am getting Organization object as null(i.e. HQL query is not retrieving Char type data ).
Please help me to fix this issue. Here I can't change Oracle type char to Varchar2. I need to work with oracle char type variables.
#EngineerDollery After going throw above post, I modified my Entity class with columnDefinition , #Column annotation attribute.
#Column(name="SERVICE_NAME",nullable=true,length=16,columnDefinition="CHAR")
public String getServiceName() {
return this.serviceName;
}
But still I am not able to retrieve the data for corresponding columns.
and I added column size as well in columnDefinition attribute.
#Column(name="SERVICE_NAME",nullable=true,length=16,columnDefinition="CHAR(16)
But still same issue I am facing.
Any thing Am I doing wrong. Please help me.

I resolved this problem using OraclePreparedStatement and Hibernate UserType interface.
Crated a new UserType class by extending org.hibernate.usertype.UserType interface and provided implementation for nullSafeSet(), nullSafeGet() methods .
nullSafeSet() method, we have first parameter as PreparedStatement, inside the method I casted PreparedStatement into OraclePreparedStatement object and pass String value using setFixedCHAR() method.
Here is the complete code of UserType impl class.
package nc3.jws.persistence.userType;
import java.io.Serializable;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Types;
import org.apache.commons.lang.StringUtils;
import org.hibernate.type.StringType;
import org.hibernate.usertype.UserType;
/**
*
* based on www.hibernate.org/388.html
*/
public class OracleFixedLengthCharType implements UserType {
public OracleFixedLengthCharType() {
System.out.println("OracleFixedLengthCharType constructor");
}
public int[] sqlTypes() {
return new int[] { Types.CHAR };
}
public Class<String> returnedClass() {
return String.class;
}
public boolean equals(Object x, Object y) {
return (x == y) || (x != null && y != null && (x.equals(y)));
}
#SuppressWarnings("deprecation")
public Object nullSafeGet(ResultSet inResultSet, String[] names, Object o) throws SQLException {
//String val = (String) Hibernate.STRING.nullSafeGet(inResultSet, names[0]);
String val = StringType.INSTANCE.nullSafeGet(inResultSet, names[0]);
//System.out.println("From nullSafeGet method valu is "+val);
return val == null ? null : StringUtils.trim(val);
}
public void nullSafeSet(PreparedStatement inPreparedStatement, Object o,
int i)
throws SQLException {
String val = (String) o;
//Get the delegatingStmt object from DBCP connection pool PreparedStatement object.
org.apache.commons.dbcp.DelegatingStatement delgatingStmt = (org.apache.commons.dbcp.DelegatingStatement)inPreparedStatement;
//Get OraclePreparedStatement object using deletatingStatement object.
oracle.jdbc.driver.OraclePreparedStatement oraclePreparedStmpt = (oracle.jdbc.driver.OraclePreparedStatement)delgatingStmt.getInnermostDelegate();
//Call setFixedCHAR method, by passing string type value .
oraclePreparedStmpt.setFixedCHAR(i, val);
}
public Object deepCopy(Object o) {
if (o == null) {
return null;
}
return new String(((String) o));
}
public boolean isMutable() {
return false;
}
public Object assemble(Serializable cached, Object owner) {
return cached;
}
public Serializable disassemble(Object value) {
return (Serializable) value;
}
public Object replace(Object original, Object target, Object owner) {
return original;
}
public int hashCode(Object obj) {
return obj.hashCode();
}
}
Configured this class with #TypeDefs annotation in Entity class.
#TypeDefs({
#TypeDef(name = "fixedLengthChar", typeClass = nc3.jws.persistence.userType.OracleFixedLengthCharType.class)
})
Added this type to CHAR type columns
#Type(type="fixedLengthChar")
#Column(name="SERVICE_NAME",nullable=true,length=16)
public String getServiceName() {
return this.serviceName;
}

char types are padded with spaces in the table. This means that if you have
foo
in one of these columns, what you actually have is
foo<space><space><space>...
until the actual length of the string is 16.
Consequently, if you're looking for an organization having "foo" as its service name, you won't find any, because the actual value in the table if foo padded with 13 spaces.
You'll thus have to make sure all your query parameters are also padded with spaces.

Spring -Mongodb storing/retrieving enums as int not string

My enums are stored as int in mongodb (from C# app). Now in Java, when I try to retrieve them, it throws an exception (it seems enum can be converted from string value only). Is there any way I can do it?
Also when I save some collections into mongodb (from Java), it converts enum values to string (not their value/cardinal). Is there any override available?
This can be achieved by writing mongodb-converter on class level but I don't want to write mondodb-converter for each class as these enums are in many different classes.
So do we have something on the field level?

After a long digging in the spring-mongodb converter code,
Ok i finished and now it's working :) here it is (if there is simpler solution i will be happy see as well, this is what i've done ) :
first define :
public interface IntEnumConvertable {
public int getValue();
}
and a simple enum that implements it :
public enum tester implements IntEnumConvertable{
vali(0),secondvali(1),thirdvali(5);
private final int val;
private tester(int num)
{
val = num;
}
public int getValue(){
return val;
}
}
Ok, now you will now need 2 converters , one is simple ,
the other is more complex. the simple one (this simple baby is also handling the simple convert and returns a string when cast is not possible, that is great if you want to have enum stored as strings and for enum that are numbers to be stored as integers) :
public class IntegerEnumConverters {
#WritingConverter
public static class EnumToIntegerConverter implements Converter<Enum<?>, Object> {
#Override
public Object convert(Enum<?> source) {
if(source instanceof IntEnumConvertable)
{
return ((IntEnumConvertable)(source)).getValue();
}
else
{
return source.name();
}
}
}
}
the more complex one , is actually a converter factory :
public class IntegerToEnumConverterFactory implements ConverterFactory<Integer, Enum> {
#Override
public <T extends Enum> Converter<Integer, T> getConverter(Class<T> targetType) {
Class<?> enumType = targetType;
while (enumType != null && !enumType.isEnum()) {
enumType = enumType.getSuperclass();
}
if (enumType == null) {
throw new IllegalArgumentException(
"The target type " + targetType.getName() + " does not refer to an enum");
}
return new IntegerToEnum(enumType);
}
#ReadingConverter
public static class IntegerToEnum<T extends Enum> implements Converter<Integer, Enum> {
private final Class<T> enumType;
public IntegerToEnum(Class<T> enumType) {
this.enumType = enumType;
}
#Override
public Enum convert(Integer source) {
for(T t : enumType.getEnumConstants()) {
if(t instanceof IntEnumConvertable)
{
if(((IntEnumConvertable)t).getValue() == source.intValue()) {
return t;
}
}
}
return null;
}
}
}
and now for the hack part , i personnaly didnt find any "programmitacly" way to register a converter factory within a mongoConverter , so i digged in the code and with a little casting , here it is (put this 2 babies functions in your #Configuration class)
#Bean
public CustomConversions customConversions() {
List<Converter<?, ?>> converters = new ArrayList<Converter<?, ?>>();
converters.add(new IntegerEnumConverters.EnumToIntegerConverter());
// this is a dummy registration , actually it's a work-around because
// spring-mongodb doesnt has the option to reg converter factory.
// so we reg the converter that our factory uses.
converters.add(new IntegerToEnumConverterFactory.IntegerToEnum(null));
return new CustomConversions(converters);
}
#Bean
public MappingMongoConverter mappingMongoConverter() throws Exception {
MongoMappingContext mappingContext = new MongoMappingContext();
mappingContext.setApplicationContext(appContext);
DbRefResolver dbRefResolver = new DefaultDbRefResolver(mongoDbFactory());
MappingMongoConverter mongoConverter = new MappingMongoConverter(dbRefResolver, mappingContext);
mongoConverter.setCustomConversions(customConversions());
ConversionService convService = mongoConverter.getConversionService();
((GenericConversionService)convService).addConverterFactory(new IntegerToEnumConverterFactory());
mongoConverter.afterPropertiesSet();
return mongoConverter;
}

You will need to implement your custom converters and register it with spring.
http://static.springsource.org/spring-data/data-mongo/docs/current/reference/html/#mongo.custom-converters

Isn't it easier to use plain constants rather than an enum...
int SOMETHING = 33;
int OTHER_THING = 55;
or
public class Role {
public static final Stirng ROLE_USER = "ROLE_USER",
ROLE_LOOSER = "ROLE_LOOSER";
}
String yourRole = Role.ROLE_LOOSER

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Can Hadoop mapper produce multiple keys in output? - hadoop

Related

Comparing dates in ascending and descending order in arrayList not working

Does default sorting in mapreduce uses Comparator defined in WritableComparable class or the comapreTo() method?

hadoop partitioner not working

Oracle char type issue in Hibernate HQL query

Spring -Mongodb storing/retrieving enums as int not string

Categories

Resources