Sort order with Hadoop MapRed - sorting

Well,
I'd like to know how can I change the sort order of my simple WordCount program after the reduce task? I've already made another map to order by value instead by keys, but it still ordered in ascending order.
Is there an easy simple method to do this (change the sort order)?!
Thanks
Vellozo

If you are using the older API (mapred.*), then set the OutputKeyComparatorClass in the job conf:
jobConf.setOutputKeyComparatorClass(ReverseComparator.class);
ReverseComparator can be something like this:
static class ReverseComparator extends WritableComparator {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
public ReverseComparator() {
super(Text.class);
}
#Override
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
try {
return (-1)* TEXT_COMPARATOR
.compare(b1, s1, l1, b2, s2, l2);
} catch (IOException e) {
throw new IllegalArgumentException(e);
}
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
if (a instanceof Text && b instanceof Text) {
return (-1)*(((Text) a)
.compareTo((Text) b)));
}
return super.compare(a, b);
}
}
In the new API (mapreduce.*), I think you need to use the Job.setSortComparator() method.

This one is almost the same as above, just looks a bit simpler
class MyKeyComparator extends WritableComparator {
protected DescendingKeyComparator() {
super(Text.class, true);
}
#SuppressWarnings("rawtypes")
#Override
public int compare(WritableComparable w1, WritableComparable w2) {
Text key1 = (Text) w1;
Text key2 = (Text) w2;
return -1 * key1.compareTo(key2);
}
}
Then add it it to the job
job.setSortComparatorClass(MyKeyComparator.class);
Text key1 = (Text) w1;
Text key2 = (Text) w2;
you can change the above text type as per ur use.

Related

I want to write a Hadoop MapReduce Join in Java

I'm completely new in Hadoop Framework and I want to write a "MapReduce" program (HadoopJoin.java) that joins on x attribute between two tables R and S. The structure of the two tables is :
R (tag : char, x : int, y : varchar(30))
and
S (tag : char, x : int, z : varchar(30))
For example we have for R table :
r 10 r-10-0
r 11 r-11-0
r 12 r-12-0
r 21 r-21-0
And for S table :
s 11 s-11-0
s 21 s-41-0
s 21 s-41-1
s 12 s-31-0
s 11 s-31-1
The result should look like :
r 11 r-11-0 s 11 s-11-0
etc.
Can anyone help me please ?
It will be very difficult to describe join in mapreduce for someone who is new to this Framework but here I provide a working implementation for your situation and I definitely recommend you to read section 9 of Hadoop The Definitive Guide 4th Eddition. It has described how to implement Join in mapreduce very well.
First of all you might consider using higher level frameworks such as Pig, Hive and Spark because they provide join operation in their core part of implementation.
Secondly There are many ways to implement mapreduce depending of the nature of your data. This ways include map-side join and reduce-side join. In this answer I have implemented the reduce-side join:
Implementation:
First of all we should have two different mapper for two different datset notice that in your case same mapper can be used for two dataset but in many situation you need different mappers for different dataset and because of that I have defined two mappers to make this solution more general:
I have used TextPair that have two attributes, one of them is the key that is used to join data and the other one is a tag that specify which dataset this record belongs to. If it belongs to the first dataset this tag will be 0. otherwise it will be 1.
I have implemented TextPair.FirstComparator to ensure that for each key(join by key) the record of the first dataset is the first key which is received by reducer. And all the other records in second dataset with that id are received after that. Actually this line of code will do the trick for us:
job.setGroupingComparatorClass(TextPair.FirstComparator.class);
So in reducer the first record that we will receive is the record from dataset1 and after that we receive record from dataset2. The only thing that should be done is that we have to write those records.
Mapper for dataset1:
public class JoinDataSet1Mapper
extends Mapper<LongWritable, Text, TextPair, Text> {
#Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] data = value.toString().split(" ");
context.write(new TextPair(data[1], "0"), value);
}
}
Mapper for DataSet2:
public class JoinDataSet2Mapper
extends Mapper<LongWritable, Text, TextPair, Text> {
#Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] data = value.toString().split(" ");
context.write(new TextPair(data[1], "1"), value);
}
}
Reducer:
public class JoinReducer extends Reducer<TextPair, Text, NullWritable, Text> {
public static class KeyPartitioner extends Partitioner<TextPair, Text> {
#Override
public int getPartition(TextPair key, Text value, int numPartitions) {
return (key.getFirst().hashCode() & Integer.MAX_VALUE) % numPartitions;
}
}
#Override
protected void reduce(TextPair key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Iterator<Text> iter = values.iterator();
Text stationName = new Text(iter.next());
while (iter.hasNext()) {
Text record = iter.next();
Text outValue = new Text(stationName.toString() + "\t" + record.toString());
context.write(NullWritable.get(), outValue);
}
}
}
Custom key:
public class TextPair implements WritableComparable<TextPair> {
private Text first;
private Text second;
public TextPair() {
set(new Text(), new Text());
}
public TextPair(String first, String second) {
set(new Text(first), new Text(second));
}
public TextPair(Text first, Text second) {
set(first, second);
}
public void set(Text first, Text second) {
this.first = first;
this.second = second;
}
public Text getFirst() {
return first;
}
public Text getSecond() {
return second;
}
#Override
public void write(DataOutput out) throws IOException {
first.write(out);
second.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
first.readFields(in);
second.readFields(in);
}
#Override
public int hashCode() {
return first.hashCode() * 163 + second.hashCode();
}
#Override
public boolean equals(Object o) {
if (o instanceof TextPair) {
TextPair tp = (TextPair) o;
return first.equals(tp.first) && second.equals(tp.second);
}
return false;
}
#Override
public String toString() {
return first + "\t" + second;
}
#Override
public int compareTo(TextPair tp) {
int cmp = first.compareTo(tp.first);
if (cmp != 0) {
return cmp;
}
return second.compareTo(tp.second);
}
public static class FirstComparator extends WritableComparator {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
public FirstComparator() {
super(TextPair.class);
}
#Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
try {
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
} catch (IOException e) {
throw new IllegalArgumentException(e);
}
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
if (a instanceof TextPair && b instanceof TextPair) {
return ((TextPair) a).first.compareTo(((TextPair) b).first);
}
return super.compare(a, b);
}
}
}
JobDriver:
public class JoinJob extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), "Join two DataSet");
job.setJarByClass(getClass());
Path ncdcInputPath = new Path(getConf().get("job.input1.path"));
Path stationInputPath = new Path(getConf().get("job.input2.path"));
Path outputPath = new Path(getConf().get("job.output.path"));
MultipleInputs.addInputPath(job, ncdcInputPath,
TextInputFormat.class, JoinDataSet1Mapper.class);
MultipleInputs.addInputPath(job, stationInputPath,
TextInputFormat.class, JoinDataSet2Mapper.class);
FileOutputFormat.setOutputPath(job, outputPath);
job.setPartitionerClass(JoinReducer.KeyPartitioner.class);
job.setGroupingComparatorClass(TextPair.FirstComparator.class);
job.setMapOutputKeyClass(TextPair.class);
job.setReducerClass(JoinReducer.class);
job.setOutputKeyClass(Text.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new JoinJob(), args);
System.exit(exitCode);
}
}

what is the significance of RawComparator and at what scenarios we use this

What is RawComparator and its significance?
Is it mandatory to use RawComparator for every mapreduce program?
A RawComparator directly operates on byte representations of objects
it is not mandatory to use it in every map reduce program
MapReduce is fundamentally a batch processing system, and is not
suitable for interactive analysis. You can’t run a query and get results back in a few seconds or less. Queries typically take minutes or more, so it’s best for offline use, where there is n’t a human sitting in the processing loop waiting for results.
If you still want to optimize time taken by Map Reduce Job, then you have to use RawComparator.
Use of RawComparator:
Intermediate key value pairs have been passed from Mapper to Reducer. before these values reach Reducer from Mapper, shuffle and sorting steps will be performed.
Sorting is improved because the RawComparator will compare the keys by byte. If we did not use RawComparator, the intermediary keys would have to be completely de-serialized to perform a comparison.
Example:
public class IndexPairComparator extends WritableComparator {
protected IndexPairComparator() {
super(IndexPair.class);
}
#Override
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
int i1 = readInt(b1, s1);
int i2 = readInt(b2, s2);
int comp = (i1 < i2) ? -1 : (i1 == i2) ? 0 : 1;
if(0 != comp)
return comp;
int j1 = readInt(b1, s1+4);
int j2 = readInt(b2, s2+4);
comp = (j1 < j2) ? -1 : (j1 == j2) ? 0 : 1;
return comp;
}
}
In above example, we did not directly implement RawComparator. Instead we extended WritableComparator, which internally implements RawComparator.
Refer to this RawComparator article for more details.
I know I am answering to an old question.
Here is another example of writing a RawComparator for a WritableComparable
public class CompositeWritable2 implements WritableComparable<CompositeWritable2> {
private Text textData1;
private LongWritable longData;
private Text textData2;
static {
WritableComparator.define(CompositeWritable2.class, new Comparator());
}
/**
* Empty constructor
*/
public CompositeWritable2() {
textData1 = new Text();
longData = new LongWritable();
textData2 = new Text();
}
/**
* Comparator
*
* #author CuriousCat
*/
public static class Comparator extends WritableComparator {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
private static final LongWritable.Comparator LONG_COMPARATOR = new LongWritable.Comparator();
public Comparator() {
super(CompositeWritable2.class);
}
/*
* (non-Javadoc)
*
* #see org.apache.hadoop.io.WritableComparator#compare(byte[], int, int, byte[], int, int)
*/
#Override
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
int cmp;
try {
// Find the length of the first text property
int textData11Len = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
int textData12Len = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
// Compare the first text data as bytes
cmp = TEXT_COMPARATOR.compare(b1, s1, textData11Len, b2, s2, textData12Len);
if (cmp != 0) {
return cmp;
}
// Read and compare the next 8 bytes starting from the length of first text property.
// The reason for hard coding 8 is, because the second property is long.
cmp = LONG_COMPARATOR.compare(b1, textData11Len, 8, b2, textData12Len, 8);
if (cmp != 0) {
return cmp;
}
// Move the index to the end of the second long property
textData11Len += 8;
textData12Len += 8;
// Find the length of the second text property
int textData21Len = WritableUtils.decodeVIntSize(b1[textData11Len]) + readVInt(b1, textData11Len);
int textData22Len = WritableUtils.decodeVIntSize(b2[textData12Len]) + readVInt(b2, textData12Len);
// Compare the second text data as bytes
return TEXT_COMPARATOR.compare(b1, textData11Len, textData21Len, b2, textData12Len, textData22Len);
} catch (IOException ex) {
throw new IllegalArgumentException("Failed in CompositeWritable's RawComparator!", ex);
}
}
}
/**
* #return the textData1
*/
public Text getTextData1() {
return textData1;
}
/**
* #return the longData
*/
public LongWritable getLongData() {
return longData;
}
/**
* #return the textData2
*/
public Text getTextData2() {
return textData2;
}
/**
* Setter method
*/
public void set(Text textData1, LongWritable longData, Text textData2) {
this.textData1 = textData1;
this.longData = longData;
this.textData2 = textData2;
}
/*
* (non-Javadoc)
*
* #see org.apache.hadoop.io.Writable#write(java.io.DataOutput)
*/
#Override
public void write(DataOutput out) throws IOException {
textData1.write(out);
longData.write(out);
textData2.write(out);
}
/*
* (non-Javadoc)
*
* #see org.apache.hadoop.io.Writable#readFields(java.io.DataInput)
*/
#Override
public void readFields(DataInput in) throws IOException {
textData1.readFields(in);
longData.readFields(in);
textData2.readFields(in);
}
/*
* (non-Javadoc)
*
* #see java.lang.Comparable#compareTo(java.lang.Object)
*/
#Override
public int compareTo(CompositeWritable2 o) {
int cmp = textData1.compareTo(o.getTextData1());
if (cmp != 0) {
return cmp;
}
cmp = longData.compareTo(o.getLongData());
if (cmp != 0) {
return cmp;
}
return textData2.compareTo(o.getTextData2());
}
}

Part of key changes when iterating through values when using composite key - Hadoop

I have implemented Secondary sort on Hadoop and I don't really understand the behavior of the framework.
I have created a composite key which contains original key and part of value, that is used for sorting.
To achieve this I have implemented my own partitioner
public class CustomPartitioner extends Partitioner<CoupleAsKey, LongWritable>{
#Override
public int getPartition(CoupleAsKey couple, LongWritable value, int numPartitions) {
return Long.hashCode(couple.getKey1()) % numPartitions;
}
My own group comparator
public class GroupComparator extends WritableComparator {
protected GroupComparator()
{
super(CoupleAsKey.class, true);
}
#Override
public int compare(WritableComparable w1, WritableComparable w2) {
CoupleAsKey c1 = (CoupleAsKey)w1;
CoupleAsKey c2 = (CoupleAsKey)w2;
return Long.compare(c1.getKey1(), c2.getKey1());
}
}
And defined the couple in the following way
public class CoupleAsKey implements WritableComparable<CoupleAsKey>{
private long key1;
private long key2;
public CoupleAsKey() {
}
public CoupleAsKey(long key1, long key2) {
this.key1 = key1;
this.key2 = key2;
}
public long getKey1() {
return key1;
}
public void setKey1(long key1) {
this.key1 = key1;
}
public long getKey2() {
return key2;
}
public void setKey2(long key2) {
this.key2 = key2;
}
#Override
public void write(DataOutput output) throws IOException {
output.writeLong(key1);
output.writeLong(key2);
}
#Override
public void readFields(DataInput input) throws IOException {
key1 = input.readLong();
key2 = input.readLong();
}
#Override
public int compareTo(CoupleAsKey o2) {
int cmp = Long.compare(key1, o2.getKey1());
if(cmp != 0)
return cmp;
return Long.compare(key2, o2.getKey2());
}
#Override
public String toString() {
return key1 + "," + key2 + ",";
}
}
And here is the driver
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(SSDriver.class);
job.setMapperClass(SSMapper.class);
job.setReducerClass(SSReducer.class);
job.setMapOutputKeyClass(CoupleAsKey.class);
job.setMapOutputValueClass(LongWritable.class);
job.setPartitionerClass(CustomPartitioner.class);
job.setGroupingComparatorClass(GroupComparator.class);
FileInputFormat.addInputPath(job, new Path("/home/marko/WORK/Whirlpool/input.csv"));
FileOutputFormat.setOutputPath(job, new Path("/home/marko/WORK/Whirlpool/output"));
job.waitForCompletion(true);
Now, this works, but what is really strange is that while iterating in reducer for a key, second part of the key (the value part) changes in each iteration. Why and how?
#Override
protected void reduce(CoupleAsKey key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
for (LongWritable value : values) {
//key.key2 changes during iterations, why?
context.write(key, value);
}
}
Definition says that "if you want all your relevant rows within a partition of data sent to a single reducer you must implement a grouping comparator". This only ensures that those set of keys will be sent to a single reduce call, and not that the key will change from composite (or whatever) to something that only contains that part of key on which grouping was done.
However, when you iterate over values, the corresponding keys will also change. We normally do not observe this happening, as by default the values are grouped on the same (non-composite) key, and thus, even when the value changes, the (value of-) key remains the same.
You can try printing the object reference of the key, and you shall notice that with every iteration, the object reference of the key is also changing (like this:)
IntWritable#1235ft
IntWritable#6635gh
IntWritable#9804as
Alternatively, you can also try applying a group-comparator on an IntWritable in a following way (you will have to write your own logic to do so):
Group1:
1 a
1 b
2 c
Group2:
3 c
3 d
4 a
and you shall see that with every iteration of value, your key is also changing.

how to get the keys sorted by custom comparator in map-reduce job in Hadoop?

Consider this class: (From Hadoop: The definitive guide 3rd edition):
import java.io.*;
import org.apache.hadoop.io.*;
public class TextPair implements WritableComparable<TextPair> {
private Text first;
private Text second;
public TextPair() {
set(new Text(), new Text());
}
public TextPair(String first, String second) {
set(new Text(first), new Text(second));
}
public TextPair(Text first, Text second) {
set(first, second);
}
public void set(Text first, Text second) {
this.first = first;
this.second = second;
}
public Text getFirst() {
return first;
}
public Text getSecond() {
return second;
}
#Override
public void write(DataOutput out) throws IOException {
first.write(out);
second.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
first.readFields(in);
second.readFields(in);
}
#Override
public int hashCode() {
return first.hashCode() * 163 + second.hashCode();
}
#Override
public boolean equals(Object o) {
if (o instanceof TextPair) {
TextPair tp = (TextPair) o;
return first.equals(tp.first) && second.equals(tp.second);
}
return false;
}
#Override
public String toString() {
return first + "\t" + second;
}
#Override
public int compareTo(TextPair tp) {
int cmp = first.compareTo(tp.first);
if (cmp != 0) {
return cmp;
}
return second.compareTo(tp.second);
}
// ^^ TextPair
// vv TextPairComparator
public static class Comparator extends WritableComparator {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
public Comparator() {
super(TextPair.class);
}
#Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
try {
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
int cmp = TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
if (cmp != 0) {
return cmp;
}
return TEXT_COMPARATOR.compare(b1, s1 + firstL1, l1 - firstL1,
b2, s2 + firstL2, l2 - firstL2);
} catch (IOException e) {
throw new IllegalArgumentException(e);
}
}
}
static {
WritableComparator.define(TextPair.class, new Comparator());
}
// ^^ TextPairComparator
// vv TextPairFirstComparator
public static class FirstComparator extends WritableComparator {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
public FirstComparator() {
super(TextPair.class);
}
#Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
try {
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
} catch (IOException e) {
throw new IllegalArgumentException(e);
}
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
if (a instanceof TextPair && b instanceof TextPair) {
return ((TextPair) a).first.compareTo(((TextPair) b).first);
}
return super.compare(a, b);
}
}
// ^^ TextPairFirstComparator
// vv TextPair
}
// ^^ TextPair
There are two kinds of comparators defined:
one is sorting by first followed by second which is the default comparator.
The other is sorting by first ONLY, which is the firstComparator.
If I have to use use firstComparator for sorting my keys, how do I achieve that?
That is, how do I override my default comparator with the first comparator, I defined above.
Secondly, how would I unitTest this since the output of map job is not sorted. ?
If I have to use use firstComparator for sorting my keys, how do I achieve that? That is, how do I override my default comparator with the first comparator, I defined above.
I assume you expect a method something like setComparator(firstComparator). As far as I know there is no such method. The keys are sorted (on the mapper side) using the compareTo() of the Writeable type representing the keys. In your case, the compareTo() method checks the first value and then the second one. In other words, the keys will be sorted by the first value and, then, the keys in the same group (i.e. having the same first value) will be sorted by their second value.
All in all, this means that your keys will always be sorted by the first value (+ by the second value if the first one isn't able to take the decision). Which in turn means that there is no need to have a different comparator (firstComparator) which looks only at the first value because that is already achieved with the compareTo() method of your TextPair class.
On the other hand, if the firstComparator sorts the keys completely differently, the only solution is to move the logic in firstComparator to the compareTo() method of the Writable class representing your key. I don't see any reason why you wouldn't do that. If you already have the firstComparator and want to reuse it, you can instantiate it and invoke it in the compareTo() method of the TexPair Writable.
You might also want to take a look at the GroupingComparator which is used to decide which keys are used together in the same call of the reduce() method. Since you didn't describe exactly what you want to achieve, I can't say for sure if this will be helpful or not.
Secondly, how would I unitTest this since the output of map job is not sorted. ?
Unit testing, as the name says, implies testing a single unit of code (most of the time a method/function/procedure). If you want to unit-test your reduce method you have to provide the interesting input cases and to check that the method under test outputs the expected result. More concretely, you have to create/mock a sorted Iterable over your keys and invoke your reduce function with it. Unit testing a reduce method shouldn't rely on the execution of the corresponding map method.

Hadoop Raw comparator

I am trying to implement the following in a Raw Comparator but not sure how to write this?
the tumestamp field here is of tyoe LongWritable.
if (this.getNaturalKey().compareTo(o.getNaturalKey()) != 0) {
return this.getNaturalKey().compareTo(o.getNaturalKey());
} else if (this.timeStamp != o.timeStamp) {
return timeStamp.compareTo(o.timeStamp);
} else {
return 0;
}
I found a hint here, but not sure how do I implement this dealing with a LongWritabel type?
http://my.safaribooksonline.com/book/databases/hadoop/9780596521974/serialization/id3548156
Thanks for your help
Let say i have a CompositeKey that represents a pair of (String stockSymbol, long timestamp).
We can do a primary grouping pass on the stockSymbol field to get all of the data of one type together, and then our "secondary sort" during the shuffle phase uses the timestamp long member to sort the timeseries points so that they arrive at the reducer partitioned and in sorted order.
public class CompositeKey implements WritableComparable<CompositeKey> {
// natural key is (stockSymbol)
// composite key is a pair (stockSymbol, timestamp)
private String stockSymbol;
private long timestamp;
......//Getter setter omiited for clarity here
#Override
public void readFields(DataInput in) throws IOException {
this.stockSymbol = in.readUTF();
this.timestamp = in.readLong();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeUTF(this.stockSymbol);
out.writeLong(this.timestamp);
}
#Override
public int compareTo(CompositeKey other) {
if (this.stockSymbol.compareTo(other.stockSymbol) != 0) {
return this.stockSymbol.compareTo(other.stockSymbol);
}
else if (this.timestamp != other.timestamp) {
return timestamp < other.timestamp ? -1 : 1;
}
else {
return 0;
}
}
Now the CompositeKey comparator would be:
public class CompositeKeyComparator extends WritableComparator {
protected CompositeKeyComparator() {
super(CompositeKey.class, true);
}
#Override
public int compare(WritableComparable wc1, WritableComparable wc2) {
CompositeKey ck1 = (CompositeKey) wc1;
CompositeKey ck2 = (CompositeKey) wc2;
int comparison = ck1.getStockSymbol().compareTo(ck2.getStockSymbol());
if (comparison == 0) {
// stock symbols are equal here
if (ck1.getTimestamp() == ck2.getTimestamp()) {
return 0;
}
else if (ck1.getTimestamp() < ck2.getTimestamp()) {
return -1;
}
else {
return 1;
}
}
else {
return comparison;
}
}
}
Are you asking about way to compare LongWritable type provided by hadoop ?
If yes, then the answer is to use compare() method. For more details, scroll down here.
The best way to correctly implement RawComparator is to extend WritableComparator and override compare() method. The WritableComparator is very good written, so you can easily understand it.
It is already implemented from what I see in the LongWritable class:
/** A Comparator optimized for LongWritable. */
public static class Comparator extends WritableComparator {
public Comparator() {
super(LongWritable.class);
}
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
long thisValue = readLong(b1, s1);
long thatValue = readLong(b2, s2);
return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
}
}
That byte comparision is the override of the RawComparator.

Resources