Hadoop Raw comparator - hadoop

I am trying to implement the following in a Raw Comparator but not sure how to write this?
the tumestamp field here is of tyoe LongWritable.
if (this.getNaturalKey().compareTo(o.getNaturalKey()) != 0) {
return this.getNaturalKey().compareTo(o.getNaturalKey());
} else if (this.timeStamp != o.timeStamp) {
return timeStamp.compareTo(o.timeStamp);
} else {
return 0;
}
I found a hint here, but not sure how do I implement this dealing with a LongWritabel type?
http://my.safaribooksonline.com/book/databases/hadoop/9780596521974/serialization/id3548156
Thanks for your help

Let say i have a CompositeKey that represents a pair of (String stockSymbol, long timestamp).
We can do a primary grouping pass on the stockSymbol field to get all of the data of one type together, and then our "secondary sort" during the shuffle phase uses the timestamp long member to sort the timeseries points so that they arrive at the reducer partitioned and in sorted order.
public class CompositeKey implements WritableComparable<CompositeKey> {
// natural key is (stockSymbol)
// composite key is a pair (stockSymbol, timestamp)
private String stockSymbol;
private long timestamp;
......//Getter setter omiited for clarity here
#Override
public void readFields(DataInput in) throws IOException {
this.stockSymbol = in.readUTF();
this.timestamp = in.readLong();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeUTF(this.stockSymbol);
out.writeLong(this.timestamp);
}
#Override
public int compareTo(CompositeKey other) {
if (this.stockSymbol.compareTo(other.stockSymbol) != 0) {
return this.stockSymbol.compareTo(other.stockSymbol);
}
else if (this.timestamp != other.timestamp) {
return timestamp < other.timestamp ? -1 : 1;
}
else {
return 0;
}
}
Now the CompositeKey comparator would be:
public class CompositeKeyComparator extends WritableComparator {
protected CompositeKeyComparator() {
super(CompositeKey.class, true);
}
#Override
public int compare(WritableComparable wc1, WritableComparable wc2) {
CompositeKey ck1 = (CompositeKey) wc1;
CompositeKey ck2 = (CompositeKey) wc2;
int comparison = ck1.getStockSymbol().compareTo(ck2.getStockSymbol());
if (comparison == 0) {
// stock symbols are equal here
if (ck1.getTimestamp() == ck2.getTimestamp()) {
return 0;
}
else if (ck1.getTimestamp() < ck2.getTimestamp()) {
return -1;
}
else {
return 1;
}
}
else {
return comparison;
}
}
}

Are you asking about way to compare LongWritable type provided by hadoop ?
If yes, then the answer is to use compare() method. For more details, scroll down here.

The best way to correctly implement RawComparator is to extend WritableComparator and override compare() method. The WritableComparator is very good written, so you can easily understand it.

It is already implemented from what I see in the LongWritable class:
/** A Comparator optimized for LongWritable. */
public static class Comparator extends WritableComparator {
public Comparator() {
super(LongWritable.class);
}
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
long thisValue = readLong(b1, s1);
long thatValue = readLong(b2, s2);
return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
}
}
That byte comparision is the override of the RawComparator.

Related

LeetCode 155. Min Stack

I am trying to solve the problem using extra space. In the pop() function, when I compare the top of both the stacks inside the if condition, the following test case is failing:
["MinStack","push","push","push","push","pop","getMin","pop","getMin","pop","getMin"]\ [[],[512],[-1024],[-1024],[512],[],[],[],[],[],[]]
When I store the top of the first stack and then compare it with the top of the second stack, all the test cases pass.
Can someone please help me understand what is causing this?
The below code caused the test case to fail.
class MinStack {
Stack<Integer> s;
Stack<Integer> auxStack;
public MinStack() {
s = new Stack<Integer>();
auxStack = new Stack<Integer>();
}
public void push(int val) {
this.s.push(val);
if (this.auxStack.empty() || val <= this.auxStack.peek()) {
this.auxStack.push(val);
}
}
public void pop() {
if (this.s.peek() == this.auxStack.peek()) {
this.auxStack.pop();
}
this.s.pop();
}
public int top() {
return this.s.peek();
}
public int getMin() {
return this.auxStack.peek();
}
}
The below code worked for all the test cases.
class MinStack {
Stack<Integer> s;
Stack<Integer> auxStack;
public MinStack() {
s = new Stack<Integer>();
auxStack = new Stack<Integer>();
}
public void push(int val) {
this.s.push(val);
if (this.auxStack.empty() || val <= this.auxStack.peek()) {
this.auxStack.push(val);
}
}
public void pop() {
int ans = this.s.pop();
if (ans == this.auxStack.peek()) {
this.auxStack.pop();
}
}
public int top() {
return this.s.peek();
}
public int getMin() {
return this.auxStack.peek();
}
}
The problem is that you are comparing Integer objects, not int values. The data type stored on the stack is Integer. So the peek() method returns an Integer, not int, which means that the following comparison is always false:
this.s.peek() == this.auxStack.peek()
Fix this by explicitly converting at least one of those two Integer objects to an int:
this.s.peek().intValue() == this.auxStack.peek()
Or use the equals method:
this.s.peek().equals(this.auxStack.peek())

get average value from a tree of nodes

I have to implement this method:
public int GetAverage(Node root){
//TODO implement
}
this method should get average value of all nodes of root tree. where :
public interface Node {
int getValue();
List<Node> getNodes();
}
do you have any ideas how to implement this method?
thank you
my attempt:
public static double value;
public static int count;
public static double getAverage(Node root) {
count++;
value += root.getValue();
for (Node node : root.getNodes()) {
getAverage(node);
}
return value / count;
}
but how to do it without the static fields outside of the method?
Simply traverse through all nodes and remember the count and the overall sum of all values. Then calculate the average. This is an example written in Java.
public interface INode {
int getValue();
List<INode> getNodes();
}
public class Node implements INode {
private List<INode> children = new ArrayList<INode>();
private int value;
#Override
public int getValue() {
return value;
}
#Override
public List<INode> getNodes() {
return children;
}
public static int getAverage(INode root) {
if (root == null)
return 0;
Counter c = new Counter();
calculateAverage(root, c);
return c.sum / c.count;
}
class Counter {
public int sum;
public int count;
}
private static void calculateAverage(INode root, Counter counter) {
if (root == null)
return;
counter.sum += root.getValue();
counter.count++;
// recursively through all children
for (INode child : root.getNodes())
calculateAverage(child, counter);
}
}
public static double getAverage(Node root) {
Pair p = new Pair(0,0);
algo(root, p);
return ((double) p.element1) / ((double) p.element2);
}
private static void algo(Node root, Pair acc) {
for(Node child : root.getNodes()) {
algo(child, acc);
}
acc.sum += root.getValue();
acc.nbNodes++;
}
With Pair defined as follows:
public class Pair {
public int sum;
public int nbNodes;
public Pair(int elt1, int elt2) {
this.sum = elt1;
this.nbNodes = elt2;
}
}

how to get the keys sorted by custom comparator in map-reduce job in Hadoop?

Consider this class: (From Hadoop: The definitive guide 3rd edition):
import java.io.*;
import org.apache.hadoop.io.*;
public class TextPair implements WritableComparable<TextPair> {
private Text first;
private Text second;
public TextPair() {
set(new Text(), new Text());
}
public TextPair(String first, String second) {
set(new Text(first), new Text(second));
}
public TextPair(Text first, Text second) {
set(first, second);
}
public void set(Text first, Text second) {
this.first = first;
this.second = second;
}
public Text getFirst() {
return first;
}
public Text getSecond() {
return second;
}
#Override
public void write(DataOutput out) throws IOException {
first.write(out);
second.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
first.readFields(in);
second.readFields(in);
}
#Override
public int hashCode() {
return first.hashCode() * 163 + second.hashCode();
}
#Override
public boolean equals(Object o) {
if (o instanceof TextPair) {
TextPair tp = (TextPair) o;
return first.equals(tp.first) && second.equals(tp.second);
}
return false;
}
#Override
public String toString() {
return first + "\t" + second;
}
#Override
public int compareTo(TextPair tp) {
int cmp = first.compareTo(tp.first);
if (cmp != 0) {
return cmp;
}
return second.compareTo(tp.second);
}
// ^^ TextPair
// vv TextPairComparator
public static class Comparator extends WritableComparator {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
public Comparator() {
super(TextPair.class);
}
#Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
try {
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
int cmp = TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
if (cmp != 0) {
return cmp;
}
return TEXT_COMPARATOR.compare(b1, s1 + firstL1, l1 - firstL1,
b2, s2 + firstL2, l2 - firstL2);
} catch (IOException e) {
throw new IllegalArgumentException(e);
}
}
}
static {
WritableComparator.define(TextPair.class, new Comparator());
}
// ^^ TextPairComparator
// vv TextPairFirstComparator
public static class FirstComparator extends WritableComparator {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
public FirstComparator() {
super(TextPair.class);
}
#Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
try {
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
} catch (IOException e) {
throw new IllegalArgumentException(e);
}
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
if (a instanceof TextPair && b instanceof TextPair) {
return ((TextPair) a).first.compareTo(((TextPair) b).first);
}
return super.compare(a, b);
}
}
// ^^ TextPairFirstComparator
// vv TextPair
}
// ^^ TextPair
There are two kinds of comparators defined:
one is sorting by first followed by second which is the default comparator.
The other is sorting by first ONLY, which is the firstComparator.
If I have to use use firstComparator for sorting my keys, how do I achieve that?
That is, how do I override my default comparator with the first comparator, I defined above.
Secondly, how would I unitTest this since the output of map job is not sorted. ?
If I have to use use firstComparator for sorting my keys, how do I achieve that? That is, how do I override my default comparator with the first comparator, I defined above.
I assume you expect a method something like setComparator(firstComparator). As far as I know there is no such method. The keys are sorted (on the mapper side) using the compareTo() of the Writeable type representing the keys. In your case, the compareTo() method checks the first value and then the second one. In other words, the keys will be sorted by the first value and, then, the keys in the same group (i.e. having the same first value) will be sorted by their second value.
All in all, this means that your keys will always be sorted by the first value (+ by the second value if the first one isn't able to take the decision). Which in turn means that there is no need to have a different comparator (firstComparator) which looks only at the first value because that is already achieved with the compareTo() method of your TextPair class.
On the other hand, if the firstComparator sorts the keys completely differently, the only solution is to move the logic in firstComparator to the compareTo() method of the Writable class representing your key. I don't see any reason why you wouldn't do that. If you already have the firstComparator and want to reuse it, you can instantiate it and invoke it in the compareTo() method of the TexPair Writable.
You might also want to take a look at the GroupingComparator which is used to decide which keys are used together in the same call of the reduce() method. Since you didn't describe exactly what you want to achieve, I can't say for sure if this will be helpful or not.
Secondly, how would I unitTest this since the output of map job is not sorted. ?
Unit testing, as the name says, implies testing a single unit of code (most of the time a method/function/procedure). If you want to unit-test your reduce method you have to provide the interesting input cases and to check that the method under test outputs the expected result. More concretely, you have to create/mock a sorted Iterable over your keys and invoke your reduce function with it. Unit testing a reduce method shouldn't rely on the execution of the corresponding map method.

Storm Trident 'average aggregator

I am a newbie to Trident and I'm looking to create an 'Average' aggregator similar to 'Sum(), but for 'Average'.The following does not work:
public class Average implements CombinerAggregator<Long>.......{
public Long init(TridentTuple tuple)
{
(Long)tuple.getValue(0);
}
public Long Combine(long val1,long val2){
return val1+val2/2;
}
public Long zero(){
return 0L;
}
}
It may not be exactly syntactically correct, but that's the idea. Please help if you can. Given 2 tuples with values [2,4,1] and [2,2,5] and fields 'a','b' and 'c' and doing an average on field 'b' should return '3'. I'm not entirely sure how init() and zero() work.
Thank you so much for your help in advance.
Eli
public class Average implements CombinerAggregator<Number> {
int count = 0;
double sum = 0;
#Override
public Double init(final TridentTuple tuple) {
this.count++;
if (!(tuple.getValue(0) instanceof Double)) {
double d = ((Number) tuple.getValue(0)).doubleValue();
this.sum += d;
return d;
}
this.sum += (Double) tuple.getValue(0);
return (Double) tuple.getValue(0);
}
#Override
public Double combine(final Number val1, final Number val2) {
return this.sum / this.count;
}
#Override
public Double zero() {
this.sum = 0;
this.count = 0;
return 0D;
}
}
I am a complete newbie when it comes to Trident as well, and so I'm not entirely if the following will work. But it might:
public class AvgAgg extends BaseAggregator<AvgState> {
static class AvgState {
long count = 0;
long total = 0;
double getAverage() {
return total/count;
}
}
public AvgState init(Object batchId, TridentCollector collector) {
return new AvgState();
}
public void aggregate(AvgState state, TridentTuple tuple, TridentCollector collector) {
state.count++;
state.total++;
}
public void complete(AvgState state, TridentCollector collector) {
collector.emit(new Values(state.getAverage()));
}
}

Using a custom Object as key emitted by mapper

I have a situation in which mapper emits as key an object of custom type.
It has two fields an intWritable ID, and a data array IntArrayWritable.
The implementation is as follows.
`
import java.io.*;
import org.apache.hadoop.io.*;
public class PairDocIdPerm implements WritableComparable<PairDocIdPerm> {
public PairDocIdPerm(){
this.permId = new IntWritable(-1);
this.SignaturePerm = new IntArrayWritable();
}
public IntWritable getPermId() {
return permId;
}
public void setPermId(IntWritable permId) {
this.permId = permId;
}
public IntArrayWritable getSignaturePerm() {
return SignaturePerm;
}
public void setSignaturePerm(IntArrayWritable signaturePerm) {
SignaturePerm = signaturePerm;
}
private IntWritable permId;
private IntArrayWritable SignaturePerm;
public PairDocIdPerm(IntWritable permId,IntArrayWritable SignaturePerm) {
this.permId = permId;
this.SignaturePerm = SignaturePerm;
}
#Override
public void write(DataOutput out) throws IOException {
permId.write(out);
SignaturePerm.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
permId.readFields(in);
SignaturePerm.readFields(in);
}
#Override
public int hashCode() { // same permId must go to same reducer. there fore just permId
return permId.get();//.hashCode();
}
#Override
public boolean equals(Object o) {
if (o instanceof PairDocIdPerm) {
PairDocIdPerm tp = (PairDocIdPerm) o;
return permId.equals(tp.permId) && SignaturePerm.equals(tp.SignaturePerm);
}
return false;
}
#Override
public String toString() {
return permId + "\t" +SignaturePerm.toString();
}
#Override
public int compareTo(PairDocIdPerm tp) {
int cmp = permId.compareTo(tp.permId);
Writable[] ar, other;
ar = this.SignaturePerm.get();
other = tp.SignaturePerm.get();
if (cmp == 0) {
for(int i=0;i<ar.length;i++){
if(((IntWritable)ar[i]).get() == ((IntWritable)other[i]).get()){cmp= 0;continue;}
else if(((IntWritable)ar[i]).get() < ((IntWritable)other[i]).get()){ return -1;}
else if(((IntWritable)ar[i]).get() > ((IntWritable)other[i]).get()){return 1;}
}
}
return cmp;
//return 1;
}
}`
I require the keys with same Id to go to the same reducer with their sort order as coded in the compareTo method.
However when i use this, my job execution status is always map100% reduce 0%.
The reduce never runs to completion. Is there any thing wrong in this implementation?
In general what is the likely problem if reducer status is always 0%.
I think this might be a possible null pointer exception in the read method:
#Override
public void readFields(DataInput in) throws IOException {
permId.readFields(in);
SignaturePerm.readFields(in);
}
permId is null in this case.
So what you have to do is this:
IntWritable permId = new IntWritable();
Either in the field initializer or before the read.
However, your code is horrible to read.

Resources