I'm trying to understand/learn Consumer/BiConsumer in Java8.
test1 and test2 methods are working fine.
But if I tried to use the old fashion way by implementing BiConsumer in a class in test3 method.
And then override accept method in the class, str.substring method cannot resolve the method substring.
Can't I use the old fashion way in #FunctionalInterface or did I do something wrong in the code?
public class BiConsumerTest {
static void test1(String name, Integer count) {
method1(name, count, (str, i) -> {
System.out.println(str.substring(i));
});
}
static void test2(String name, Integer count) {
BiConsumer<String, Integer> consumer = (str, i) -> {
System.out.println(str.substring(i));
};
method1(name, count, consumer);
}
private static void method1(String name, Integer count, BiConsumer<String, Integer> consumer) {
consumer.accept(name, count);
}
private void test3(String name, Integer count) {
BiConsumer<String, Integer> consumer = new ConsumerImpl<String, Integer>();
consumer.accept(name, count);
}
class ConsumerImpl<String, Integer> implements BiConsumer<String, Integer> {
#Override
public void accept(String str, Integer count) {
str.substring(count); // str cannot find substring method !!!
}
}
public static void main(String[] args) {
String name = "aaa bbb ccc";
Integer count = 6;
test1(name, count);
test2(name, count);
}
}
You cannot define a class with known types as type-parameters (so this is incorrect - ConsumerImpl<String, Integer>). Plus there were few other syntactical mistakes. Below works -
import java.util.function.BiConsumer;
public class TestClass {
static void test1(String name, Integer count) {
method1(name, count, (str, i) -> {
System.out.println(str.substring(i));
});
}
static void test2(String name, Integer count) {
BiConsumer<String, Integer> consumer = (str, i) -> {
System.out.println(str.substring(i));
};
method1(name, count, consumer);
}
private static void method1(String name, Integer count, BiConsumer<String, Integer> consumer) {
consumer.accept(name, count);
}
static void test3(String name, Integer count) {
BiConsumer<String, Integer> consumer = new ConsumerImpl();
consumer.accept(name, count);
}
static class ConsumerImpl implements BiConsumer<String, Integer> {
#Override
public void accept(String str, Integer count) {
System.out.println(str.substring(count)); // str cannot find substring method !!!
}
}
public static void main(String[] args) {
String name = "aaa bbb ccc";
Integer count = 6;
test1(name, count);
test2(name, count);
test3(name, count);
}
}
Related
I'm completely new in Hadoop Framework and I want to write a "MapReduce" program (HadoopJoin.java) that joins on x attribute between two tables R and S. The structure of the two tables is :
R (tag : char, x : int, y : varchar(30))
and
S (tag : char, x : int, z : varchar(30))
For example we have for R table :
r 10 r-10-0
r 11 r-11-0
r 12 r-12-0
r 21 r-21-0
And for S table :
s 11 s-11-0
s 21 s-41-0
s 21 s-41-1
s 12 s-31-0
s 11 s-31-1
The result should look like :
r 11 r-11-0 s 11 s-11-0
etc.
Can anyone help me please ?
It will be very difficult to describe join in mapreduce for someone who is new to this Framework but here I provide a working implementation for your situation and I definitely recommend you to read section 9 of Hadoop The Definitive Guide 4th Eddition. It has described how to implement Join in mapreduce very well.
First of all you might consider using higher level frameworks such as Pig, Hive and Spark because they provide join operation in their core part of implementation.
Secondly There are many ways to implement mapreduce depending of the nature of your data. This ways include map-side join and reduce-side join. In this answer I have implemented the reduce-side join:
Implementation:
First of all we should have two different mapper for two different datset notice that in your case same mapper can be used for two dataset but in many situation you need different mappers for different dataset and because of that I have defined two mappers to make this solution more general:
I have used TextPair that have two attributes, one of them is the key that is used to join data and the other one is a tag that specify which dataset this record belongs to. If it belongs to the first dataset this tag will be 0. otherwise it will be 1.
I have implemented TextPair.FirstComparator to ensure that for each key(join by key) the record of the first dataset is the first key which is received by reducer. And all the other records in second dataset with that id are received after that. Actually this line of code will do the trick for us:
job.setGroupingComparatorClass(TextPair.FirstComparator.class);
So in reducer the first record that we will receive is the record from dataset1 and after that we receive record from dataset2. The only thing that should be done is that we have to write those records.
Mapper for dataset1:
public class JoinDataSet1Mapper
extends Mapper<LongWritable, Text, TextPair, Text> {
#Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] data = value.toString().split(" ");
context.write(new TextPair(data[1], "0"), value);
}
}
Mapper for DataSet2:
public class JoinDataSet2Mapper
extends Mapper<LongWritable, Text, TextPair, Text> {
#Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] data = value.toString().split(" ");
context.write(new TextPair(data[1], "1"), value);
}
}
Reducer:
public class JoinReducer extends Reducer<TextPair, Text, NullWritable, Text> {
public static class KeyPartitioner extends Partitioner<TextPair, Text> {
#Override
public int getPartition(TextPair key, Text value, int numPartitions) {
return (key.getFirst().hashCode() & Integer.MAX_VALUE) % numPartitions;
}
}
#Override
protected void reduce(TextPair key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Iterator<Text> iter = values.iterator();
Text stationName = new Text(iter.next());
while (iter.hasNext()) {
Text record = iter.next();
Text outValue = new Text(stationName.toString() + "\t" + record.toString());
context.write(NullWritable.get(), outValue);
}
}
}
Custom key:
public class TextPair implements WritableComparable<TextPair> {
private Text first;
private Text second;
public TextPair() {
set(new Text(), new Text());
}
public TextPair(String first, String second) {
set(new Text(first), new Text(second));
}
public TextPair(Text first, Text second) {
set(first, second);
}
public void set(Text first, Text second) {
this.first = first;
this.second = second;
}
public Text getFirst() {
return first;
}
public Text getSecond() {
return second;
}
#Override
public void write(DataOutput out) throws IOException {
first.write(out);
second.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
first.readFields(in);
second.readFields(in);
}
#Override
public int hashCode() {
return first.hashCode() * 163 + second.hashCode();
}
#Override
public boolean equals(Object o) {
if (o instanceof TextPair) {
TextPair tp = (TextPair) o;
return first.equals(tp.first) && second.equals(tp.second);
}
return false;
}
#Override
public String toString() {
return first + "\t" + second;
}
#Override
public int compareTo(TextPair tp) {
int cmp = first.compareTo(tp.first);
if (cmp != 0) {
return cmp;
}
return second.compareTo(tp.second);
}
public static class FirstComparator extends WritableComparator {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
public FirstComparator() {
super(TextPair.class);
}
#Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
try {
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
} catch (IOException e) {
throw new IllegalArgumentException(e);
}
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
if (a instanceof TextPair && b instanceof TextPair) {
return ((TextPair) a).first.compareTo(((TextPair) b).first);
}
return super.compare(a, b);
}
}
}
JobDriver:
public class JoinJob extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), "Join two DataSet");
job.setJarByClass(getClass());
Path ncdcInputPath = new Path(getConf().get("job.input1.path"));
Path stationInputPath = new Path(getConf().get("job.input2.path"));
Path outputPath = new Path(getConf().get("job.output.path"));
MultipleInputs.addInputPath(job, ncdcInputPath,
TextInputFormat.class, JoinDataSet1Mapper.class);
MultipleInputs.addInputPath(job, stationInputPath,
TextInputFormat.class, JoinDataSet2Mapper.class);
FileOutputFormat.setOutputPath(job, outputPath);
job.setPartitionerClass(JoinReducer.KeyPartitioner.class);
job.setGroupingComparatorClass(TextPair.FirstComparator.class);
job.setMapOutputKeyClass(TextPair.class);
job.setReducerClass(JoinReducer.class);
job.setOutputKeyClass(Text.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new JoinJob(), args);
System.exit(exitCode);
}
}
I am learning Flink and I started with a simple word count using DataStream. To enhance the processing I filtered the output to show only the results with 3 or more words found.
DataStream<Tuple2<String, Integer>> dataStream = env
.socketTextStream("localhost", 9000)
.flatMap(new Splitter())
.keyBy(0)
.timeWindow(Time.seconds(5))
.apply(new MyWindowFunction())
.sum(1)
.filter(word -> word.f1 >= 3);
I would like to create a WindowFunction to sort the output by the value of words found. The WindowFunction that I am trying to implement does not compile at all. I am struggling to define the apply method and the parameters of the WindowFunction interface.
public static class MyWindowFunction implements WindowFunction<
Tuple2<String, Integer>, // input type
Tuple2<String, Integer>, // output type
Tuple2<String, Integer>, // key type
TimeWindow> {
void apply(Tuple2<String, Integer> key, TimeWindow window, Iterable<Tuple2<String, Integer>> input, Collector<Tuple2<String, Integer>> out) {
String word = ((Tuple2<String, Integer>)key).f0;
Integer count = ((Tuple2<String, Integer>)key).f1;
.........
out.collect(new Tuple2<>(word, count));
}
}
I am updating this answer to use Flink 1.12.0. In order to sort the elements of a stream in I had to use a KeyedProcessFunction after counting the stream with a ReduceFunction. Then I had to set the parallelism of the very last transformation to 1 in order to not change the order of the elements that I sorted using KeyedProcessFunction. The sequence that I am using is socketTextStream -> flatMap -> keyBy -> reduce -> keyBy -> process -> print().setParallelism(1). Bellow it the example:
public class SocketWindowWordCountJava {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.socketTextStream("localhost", 9000)
.flatMap(new SplitterFlatMap())
.keyBy(new WordKeySelector())
.reduce(new SumReducer())
.keyBy(new WordKeySelector())
.process(new SortKeyedProcessFunction(3 * 1000))
.print().setParallelism(1);
String executionPlan = env.getExecutionPlan();
System.out.println("ExecutionPlan ........................ ");
System.out.println(executionPlan);
System.out.println("........................ ");
env.execute("Window WordCount sorted");
}
}
The UDF that I used to sort the stream is the SortKeyedProcessFunction which extends KeyedProcessFunction. I use a ValueState<List<Event>> listState of Event implements Comparable<Event> to have a sorted list as state. On the processElement method I register the time stamp that I added the event to the state context.timerService().registerProcessingTimeTimer(timeoutTime); and I collect the event at the onTimer method. I am also using a time window of 3 seconds here.
public class SortKeyedProcessFunction extends KeyedProcessFunction<String, Tuple2<String, Integer>, Event> {
private static final long serialVersionUID = 7289761960983988878L;
// delay after which an alert flag is thrown
private final long timeOut;
// state to remember the last timer set
private ValueState<List<Event>> listState = null;
private ValueState<Long> lastTime = null;
public SortKeyedProcessFunction(long timeOut) {
this.timeOut = timeOut;
}
#Override
public void open(Configuration conf) {
// setup timer and HLL state
ValueStateDescriptor<List<Event>> descriptor = new ValueStateDescriptor<>(
// state name
"sorted-events",
// type information of state
TypeInformation.of(new TypeHint<List<Event>>() {
}));
listState = getRuntimeContext().getState(descriptor);
ValueStateDescriptor<Long> descriptorLastTime = new ValueStateDescriptor<Long>(
"lastTime",
TypeInformation.of(new TypeHint<Long>() {
}));
lastTime = getRuntimeContext().getState(descriptorLastTime);
}
#Override
public void processElement(Tuple2<String, Integer> value, Context context, Collector<Event> collector) throws Exception {
// get current time and compute timeout time
long currentTime = context.timerService().currentProcessingTime();
long timeoutTime = currentTime + timeOut;
// register timer for timeout time
context.timerService().registerProcessingTimeTimer(timeoutTime);
List<Event> queue = listState.value();
if (queue == null) {
queue = new ArrayList<Event>();
}
Long current = lastTime.value();
queue.add(new Event(value.f0, value.f1));
lastTime.update(timeoutTime);
listState.update(queue);
}
#Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<Event> out) throws Exception {
// System.out.println("onTimer: " + timestamp);
// check if this was the last timer we registered
System.out.println("timestamp: " + timestamp);
List<Event> queue = listState.value();
Long current = lastTime.value();
if (timestamp == current.longValue()) {
Collections.sort(queue);
queue.forEach( e -> {
out.collect(e);
});
queue.clear();
listState.clear();
}
}
}
class Event implements Comparable<Event> {
String value;
Integer qtd;
public Event(String value, Integer qtd) {
this.value = value;
this.qtd = qtd;
}
public String getValue() { return value; }
public Integer getQtd() { return qtd; }
#Override
public String toString() {
return "Event{" +"value='" + value + '\'' +", qtd=" + qtd +'}';
}
#Override
public int compareTo(#NotNull Event event) {
return this.getValue().compareTo(event.getValue());
}
}
So when I use $ nc -lk 9000 and type the words on the console I see them in order on the output
...
Event{value='soccer', qtd=7}
Event{value='swim', qtd=5}
...
Event{value='basketball', qtd=9}
Event{value='soccer', qtd=8}
Event{value='swim', qtd=6}
The other UDFs are for the other transformations of the stream program and they are here for completeness.
public class SplitterFlatMap implements FlatMapFunction<String, Tuple2<String, Integer>> {
private static final long serialVersionUID = 3121588720675797629L;
#Override
public void flatMap(String sentence, Collector<Tuple2<String, Integer>> out) throws Exception {
for (String word : sentence.split(" ")) {
out.collect(Tuple2.of(word, 1));
}
}
}
public class WordKeySelector implements KeySelector<Tuple2<String, Integer>, String> {
#Override
public String getKey(Tuple2<String, Integer> value) throws Exception {
return value.f0;
}
}
public class SumReducer implements ReduceFunction<Tuple2<String, Integer>> {
#Override
public Tuple2<String, Integer> reduce(Tuple2<String, Integer> event1, Tuple2<String, Integer> event2) throws Exception {
return Tuple2.of(event1.f0, event1.f1 + event2.f1);
}
}
The .sum(1) method will do everything you need (no need for using apply()), as long as the Splitter class (which should be a FlatMapFunction) is emitting Tuple2<String, Integer> records, where String is the word, and Integer is always 1.
So then .sum(1) will do the aggregation for you. If you needed something different than what sum() does, you would typically use .reduce(new MyCustomReduceFunction()), as that's going to be the most efficient and scalable approach, in terms of not needing to buffer lots in memory.
No matter how simple I make the compareTo of my complex key, I don't get expected results. With the exception of if I use one key that is the same for every record, it will appropriately reduce to one record. I've also witnessed that this happens only when I process the full load, if I break off a few of the records that didn't reduce and run it on a much smaller scale those records get combined.
The sum of the output records is correct, but there is duplication at the record level of items I would have expected to group together. So where I would expect say 500 records summing up to 5,000, I end up with 1232 records summing up to 5,000 with obvious records that should have been reduced into one.
I've read about the problems with object references and complex keys and values, but I don't see anywhere that I have potential for that left. To that end you will find places that I'm creating new objects that I probably don't need to, but I'm trying everything at this point and will dial it back once it is working.
I'm out of ideas on what to try or where and how to poke to figure this out. Please help!
public static class Map extends
Mapper<LongWritable, Text, IMSTranOut, IMSTranSums> {
//private SimpleDateFormat dtFormat = new SimpleDateFormat("yyyyddd");
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
SimpleDateFormat dtFormat = new SimpleDateFormat("yyyyddd");
IMSTranOut dbKey = new IMSTranOut();
IMSTranSums sumVals = new IMSTranSums();
String[] tokens = line.split(",", -1);
dbKey.setLoadKey(-99);
dbKey.setTranClassKey(-99);
dbKey.setTransactionCode(tokens[0]);
dbKey.setTransactionType(tokens[1]);
dbKey.setNpaNxx(getNPA(dbKey.getTransactionCode()));
try {
dbKey.setTranDate(new Date(dtFormat.parse(tokens[2]).getTime()));
} catch (ParseException e) {
}// 2
dbKey.setTranHour(getTranHour(tokens[3]));
try {
dbKey.setStartDate(new Date(dtFormat.parse(tokens[4]).getTime()));
} catch (ParseException e) {
}// 4
dbKey.setStartHour(getTranHour(tokens[5]));
try {
dbKey.setStopDate(new Date(dtFormat.parse(tokens[6]).getTime()));
} catch (ParseException e) {
}// 6
dbKey.setStopHour(getTranHour(tokens[7]));
sumVals.setTranCount(1);
sumVals.setInputQTime(Double.parseDouble(tokens[8]));
sumVals.setElapsedTime(Double.parseDouble(tokens[9]));
sumVals.setCpuTime(Double.parseDouble(tokens[10]));
context.write(dbKey, sumVals);
}
}
public static class Reduce extends
Reducer<IMSTranOut, IMSTranSums, IMSTranOut, IMSTranSums> {
#Override
public void reduce(IMSTranOut key, Iterable<IMSTranSums> values,
Context context) throws IOException, InterruptedException {
int tranCount = 0;
double inputQ = 0;
double elapsed = 0;
double cpu = 0;
for (IMSTranSums val : values) {
tranCount += val.getTranCount();
inputQ += val.getInputQTime();
elapsed += val.getElapsedTime();
cpu += val.getCpuTime();
}
IMSTranSums sumVals=new IMSTranSums();
IMSTranOut dbKey=new IMSTranOut();
sumVals.setCpuTime(inputQ);
sumVals.setElapsedTime(elapsed);
sumVals.setInputQTime(cpu);
sumVals.setTranCount(tranCount);
dbKey.setLoadKey(key.getLoadKey());
dbKey.setTranClassKey(key.getTranClassKey());
dbKey.setNpaNxx(key.getNpaNxx());
dbKey.setTransactionCode(key.getTransactionCode());
dbKey.setTransactionType(key.getTransactionType());
dbKey.setTranDate(key.getTranDate());
dbKey.setTranHour(key.getTranHour());
dbKey.setStartDate(key.getStartDate());
dbKey.setStartHour(key.getStartHour());
dbKey.setStopDate(key.getStopDate());
dbKey.setStopHour(key.getStopHour());
dbKey.setInputQTime(inputQ);
dbKey.setElapsedTime(elapsed);
dbKey.setCpuTime(cpu);
dbKey.setTranCount(tranCount);
context.write(dbKey, sumVals);
}
}
Here is the implementation of the DBWritable class:
public class IMSTranOut implements DBWritable,
WritableComparable<IMSTranOut> {
private int loadKey;
private int tranClassKey;
private String npaNxx;
private String transactionCode;
private String transactionType;
private Date tranDate;
private double tranHour;
private Date startDate;
private double startHour;
private Date stopDate;
private double stopHour;
private double inputQTime;
private double elapsedTime;
private double cpuTime;
private int tranCount;
public void readFields(ResultSet rs) throws SQLException {
setLoadKey(rs.getInt("LOAD_KEY"));
setTranClassKey(rs.getInt("TRAN_CLASS_KEY"));
setNpaNxx(rs.getString("NPA_NXX"));
setTransactionCode(rs.getString("TRANSACTION_CODE"));
setTransactionType(rs.getString("TRANSACTION_TYPE"));
setTranDate(rs.getDate("TRAN_DATE"));
setTranHour(rs.getInt("TRAN_HOUR"));
setStartDate(rs.getDate("START_DATE"));
setStartHour(rs.getInt("START_HOUR"));
setStopDate(rs.getDate("STOP_DATE"));
setStopHour(rs.getInt("STOP_HOUR"));
setInputQTime(rs.getInt("INPUT_Q_TIME"));
setElapsedTime(rs.getInt("ELAPSED_TIME"));
setCpuTime(rs.getInt("CPU_TIME"));
setTranCount(rs.getInt("TRAN_COUNT"));
}
public void write(PreparedStatement ps) throws SQLException {
ps.setInt(1, loadKey);
ps.setInt(2, tranClassKey);
ps.setString(3, npaNxx);
ps.setString(4, transactionCode);
ps.setString(5, transactionType);
ps.setDate(6, tranDate);
ps.setDouble(7, tranHour);
ps.setDate(8, startDate);
ps.setDouble(9, startHour);
ps.setDate(10, stopDate);
ps.setDouble(11, stopHour);
ps.setDouble(12, inputQTime);
ps.setDouble(13, elapsedTime);
ps.setDouble(14, cpuTime);
ps.setInt(15, tranCount);
}
public int getLoadKey() {
return loadKey;
}
public void setLoadKey(int loadKey) {
this.loadKey = loadKey;
}
public int getTranClassKey() {
return tranClassKey;
}
public void setTranClassKey(int tranClassKey) {
this.tranClassKey = tranClassKey;
}
public String getNpaNxx() {
return npaNxx;
}
public void setNpaNxx(String npaNxx) {
this.npaNxx = new String(npaNxx);
}
public String getTransactionCode() {
return transactionCode;
}
public void setTransactionCode(String transactionCode) {
this.transactionCode = new String(transactionCode);
}
public String getTransactionType() {
return transactionType;
}
public void setTransactionType(String transactionType) {
this.transactionType = new String(transactionType);
}
public Date getTranDate() {
return tranDate;
}
public void setTranDate(Date tranDate) {
this.tranDate = new Date(tranDate.getTime());
}
public double getTranHour() {
return tranHour;
}
public void setTranHour(double tranHour) {
this.tranHour = tranHour;
}
public Date getStartDate() {
return startDate;
}
public void setStartDate(Date startDate) {
this.startDate = new Date(startDate.getTime());
}
public double getStartHour() {
return startHour;
}
public void setStartHour(double startHour) {
this.startHour = startHour;
}
public Date getStopDate() {
return stopDate;
}
public void setStopDate(Date stopDate) {
this.stopDate = new Date(stopDate.getTime());
}
public double getStopHour() {
return stopHour;
}
public void setStopHour(double stopHour) {
this.stopHour = stopHour;
}
public double getInputQTime() {
return inputQTime;
}
public void setInputQTime(double inputQTime) {
this.inputQTime = inputQTime;
}
public double getElapsedTime() {
return elapsedTime;
}
public void setElapsedTime(double elapsedTime) {
this.elapsedTime = elapsedTime;
}
public double getCpuTime() {
return cpuTime;
}
public void setCpuTime(double cpuTime) {
this.cpuTime = cpuTime;
}
public int getTranCount() {
return tranCount;
}
public void setTranCount(int tranCount) {
this.tranCount = tranCount;
}
public void readFields(DataInput input) throws IOException {
setNpaNxx(input.readUTF());
setTransactionCode(input.readUTF());
setTransactionType(input.readUTF());
setTranDate(new Date(input.readLong()));
setStartDate(new Date(input.readLong()));
setStopDate(new Date(input.readLong()));
setLoadKey(input.readInt());
setTranClassKey(input.readInt());
setTranHour(input.readDouble());
setStartHour(input.readDouble());
setStopHour(input.readDouble());
setInputQTime(input.readDouble());
setElapsedTime(input.readDouble());
setCpuTime(input.readDouble());
setTranCount(input.readInt());
}
public void write(DataOutput output) throws IOException {
output.writeUTF(npaNxx);
output.writeUTF(transactionCode);
output.writeUTF(transactionType);
output.writeLong(tranDate.getTime());
output.writeLong(startDate.getTime());
output.writeLong(stopDate.getTime());
output.writeInt(loadKey);
output.writeInt(tranClassKey);
output.writeDouble(tranHour);
output.writeDouble(startHour);
output.writeDouble(stopHour);
output.writeDouble(inputQTime);
output.writeDouble(elapsedTime);
output.writeDouble(cpuTime);
output.writeInt(tranCount);
}
public int compareTo(IMSTranOut o) {
return (Integer.compare(loadKey, o.getLoadKey()) == 0
&& Integer.compare(tranClassKey, o.getTranClassKey()) == 0
&& npaNxx.compareTo(o.getNpaNxx()) == 0
&& transactionCode.compareTo(o.getTransactionCode()) == 0
&& (transactionType.compareTo(o.getTransactionType()) == 0)
&& tranDate.compareTo(o.getTranDate()) == 0
&& Double.compare(tranHour, o.getTranHour()) == 0
&& startDate.compareTo(o.getStartDate()) == 0
&& Double.compare(startHour, o.getStartHour()) == 0
&& stopDate.compareTo(o.getStopDate()) == 0
&& Double.compare(stopHour, o.getStopHour()) == 0) ? 0 : 1;
}
}
Implementation of the Writable class for the complex values:
public class IMSTranSums
implements Writable{
private double inputQTime;
private double elapsedTime;
private double cpuTime;
private int tranCount;
public double getInputQTime() {
return inputQTime;
}
public void setInputQTime(double inputQTime) {
this.inputQTime = inputQTime;
}
public double getElapsedTime() {
return elapsedTime;
}
public void setElapsedTime(double elapsedTime) {
this.elapsedTime = elapsedTime;
}
public double getCpuTime() {
return cpuTime;
}
public void setCpuTime(double cpuTime) {
this.cpuTime = cpuTime;
}
public int getTranCount() {
return tranCount;
}
public void setTranCount(int tranCount) {
this.tranCount = tranCount;
}
public void write(DataOutput output) throws IOException {
output.writeDouble(inputQTime);
output.writeDouble(elapsedTime);
output.writeDouble(cpuTime);
output.writeInt(tranCount);
}
public void readFields(DataInput input) throws IOException {
inputQTime=input.readDouble();
elapsedTime=input.readDouble();
cpuTime=input.readDouble();
tranCount=input.readInt();
}
}
Your compareTo is flawed, it will totally fail the sort algorithm, because you seem to break transivity in your ordering.
I would recommend you to use a CompareToBuilder from Apache Commons or a ComparisonChain from Guava to make your comparisons much more readable (and correct!).
I am a newbie to Trident and I'm looking to create an 'Average' aggregator similar to 'Sum(), but for 'Average'.The following does not work:
public class Average implements CombinerAggregator<Long>.......{
public Long init(TridentTuple tuple)
{
(Long)tuple.getValue(0);
}
public Long Combine(long val1,long val2){
return val1+val2/2;
}
public Long zero(){
return 0L;
}
}
It may not be exactly syntactically correct, but that's the idea. Please help if you can. Given 2 tuples with values [2,4,1] and [2,2,5] and fields 'a','b' and 'c' and doing an average on field 'b' should return '3'. I'm not entirely sure how init() and zero() work.
Thank you so much for your help in advance.
Eli
public class Average implements CombinerAggregator<Number> {
int count = 0;
double sum = 0;
#Override
public Double init(final TridentTuple tuple) {
this.count++;
if (!(tuple.getValue(0) instanceof Double)) {
double d = ((Number) tuple.getValue(0)).doubleValue();
this.sum += d;
return d;
}
this.sum += (Double) tuple.getValue(0);
return (Double) tuple.getValue(0);
}
#Override
public Double combine(final Number val1, final Number val2) {
return this.sum / this.count;
}
#Override
public Double zero() {
this.sum = 0;
this.count = 0;
return 0D;
}
}
I am a complete newbie when it comes to Trident as well, and so I'm not entirely if the following will work. But it might:
public class AvgAgg extends BaseAggregator<AvgState> {
static class AvgState {
long count = 0;
long total = 0;
double getAverage() {
return total/count;
}
}
public AvgState init(Object batchId, TridentCollector collector) {
return new AvgState();
}
public void aggregate(AvgState state, TridentTuple tuple, TridentCollector collector) {
state.count++;
state.total++;
}
public void complete(AvgState state, TridentCollector collector) {
collector.emit(new Values(state.getAverage()));
}
}
I have a situation in which mapper emits as key an object of custom type.
It has two fields an intWritable ID, and a data array IntArrayWritable.
The implementation is as follows.
`
import java.io.*;
import org.apache.hadoop.io.*;
public class PairDocIdPerm implements WritableComparable<PairDocIdPerm> {
public PairDocIdPerm(){
this.permId = new IntWritable(-1);
this.SignaturePerm = new IntArrayWritable();
}
public IntWritable getPermId() {
return permId;
}
public void setPermId(IntWritable permId) {
this.permId = permId;
}
public IntArrayWritable getSignaturePerm() {
return SignaturePerm;
}
public void setSignaturePerm(IntArrayWritable signaturePerm) {
SignaturePerm = signaturePerm;
}
private IntWritable permId;
private IntArrayWritable SignaturePerm;
public PairDocIdPerm(IntWritable permId,IntArrayWritable SignaturePerm) {
this.permId = permId;
this.SignaturePerm = SignaturePerm;
}
#Override
public void write(DataOutput out) throws IOException {
permId.write(out);
SignaturePerm.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
permId.readFields(in);
SignaturePerm.readFields(in);
}
#Override
public int hashCode() { // same permId must go to same reducer. there fore just permId
return permId.get();//.hashCode();
}
#Override
public boolean equals(Object o) {
if (o instanceof PairDocIdPerm) {
PairDocIdPerm tp = (PairDocIdPerm) o;
return permId.equals(tp.permId) && SignaturePerm.equals(tp.SignaturePerm);
}
return false;
}
#Override
public String toString() {
return permId + "\t" +SignaturePerm.toString();
}
#Override
public int compareTo(PairDocIdPerm tp) {
int cmp = permId.compareTo(tp.permId);
Writable[] ar, other;
ar = this.SignaturePerm.get();
other = tp.SignaturePerm.get();
if (cmp == 0) {
for(int i=0;i<ar.length;i++){
if(((IntWritable)ar[i]).get() == ((IntWritable)other[i]).get()){cmp= 0;continue;}
else if(((IntWritable)ar[i]).get() < ((IntWritable)other[i]).get()){ return -1;}
else if(((IntWritable)ar[i]).get() > ((IntWritable)other[i]).get()){return 1;}
}
}
return cmp;
//return 1;
}
}`
I require the keys with same Id to go to the same reducer with their sort order as coded in the compareTo method.
However when i use this, my job execution status is always map100% reduce 0%.
The reduce never runs to completion. Is there any thing wrong in this implementation?
In general what is the likely problem if reducer status is always 0%.
I think this might be a possible null pointer exception in the read method:
#Override
public void readFields(DataInput in) throws IOException {
permId.readFields(in);
SignaturePerm.readFields(in);
}
permId is null in this case.
So what you have to do is this:
IntWritable permId = new IntWritable();
Either in the field initializer or before the read.
However, your code is horrible to read.