Spliterator Java 8 - java-8

I have a number from 1 to 10,000 stored in an array of long. When adding them sequentially it will give a result of 50,005,000.
I have writing an Spliterator where if a size of array is longer than 1000, it will be splitted to another array.
Here is my code. But when I run it, the result from addition is far greater than 50,005,000. Can someone tell me what is wrong with my code?
Thank you so much.
import java.util.Arrays;
import java.util.Optional;
import java.util.Spliterator;
import java.util.function.Consumer;
import java.util.stream.LongStream;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
public class SumSpliterator implements Spliterator<Long> {
private final long[] numbers;
private int currentPosition = 0;
public SumSpliterator(long[] numbers) {
super();
this.numbers = numbers;
}
#Override
public boolean tryAdvance(Consumer<? super Long> action) {
action.accept(numbers[currentPosition++]);
return currentPosition < numbers.length;
}
#Override
public long estimateSize() {
return numbers.length - currentPosition;
}
#Override
public int characteristics() {
return SUBSIZED;
}
#Override
public Spliterator<Long> trySplit() {
int currentSize = numbers.length - currentPosition;
if( currentSize <= 1_000){
return null;
}else{
currentPosition = currentPosition + 1_000;
return new SumSpliterator(Arrays.copyOfRange(numbers, 1_000, numbers.length));
}
}
public static void main(String[] args) {
long[] twoThousandNumbers = LongStream.rangeClosed(1, 10_000).toArray();
Spliterator<Long> spliterator = new SumSpliterator(twoThousandNumbers);
Stream<Long> stream = StreamSupport.stream(spliterator, false);
System.out.println( sumValues(stream) );
}
private static long sumValues(Stream<Long> stream){
Optional<Long> optional = stream.reduce( ( t, u) -> t + u );
return optional.get() != null ? optional.get() : Long.valueOf(0);
}
}

I have the strong feeling that you didn’t get the purpose of splitting right. It’s not meant to copy the underlying data but just provide access to a range of it. Keep in mind that spliterators provide read-only access. So you should pass the original array to the new spliterator and configure it with an appropriate position and length instead of copying the array.
But besides the inefficiency of copying, the logic is obviously wrong: You pass Arrays.copyOfRange(numbers, 1_000, numbers.length) to the new spliterator, so the new spliterator contains the elements from position 1000 to the end of the array and you advance the current spliterator’s position by 1000, so the old spliterator covers the elements from currentPosition + 1_000 to the end of the array. So both spliterators will cover elements at the end of the array while at the same time, depending on the previous value of currentPosition, elements at the beginning might not be covered at all. So when you want to advance the currentPosition by 1_000 the skipped range is expressed by Arrays.copyOfRange(numbers, currentPosition, 1_000) instead, referring to the currentPosition before advancing.
It’s should also be noted, that a spliterator should attempt to split balanced, that is, in the middle if the size is known. So splitting off thousand elements is not the right strategy for an array.
Further, your tryAdvance method is wrong. It should not test after calling the consumer but before, returning false if there are no more elements, which also implies that the consumer has not been called.
Putting it all together, the implementation may look like
public class MyArraySpliterator implements Spliterator<Long> {
private final long[] numbers;
private int currentPosition, endPosition;
public MyArraySpliterator(long[] numbers) {
this(numbers, 0, numbers.length);
}
public MyArraySpliterator(long[] numbers, int start, int end) {
this.numbers = numbers;
currentPosition=start;
endPosition=end;
}
#Override
public boolean tryAdvance(Consumer<? super Long> action) {
if(currentPosition < endPosition) {
action.accept(numbers[currentPosition++]);
return true;
}
return false;
}
#Override
public long estimateSize() {
return endPosition - currentPosition;
}
#Override
public int characteristics() {
return ORDERED|NONNULL|SIZED|SUBSIZED;
}
#Override
public Spliterator<Long> trySplit() {
if(estimateSize()<=1000) return null;
int middle = (endPosition + currentPosition)>>>1;
MyArraySpliterator prefix
= new MyArraySpliterator(numbers, currentPosition, middle);
currentPosition=middle;
return prefix;
}
}
But of course, it’s recommended to provide a specialized forEachRemaining implementation, where possible:
#Override
public void forEachRemaining(Consumer<? super Long> action) {
int pos=currentPosition, end=endPosition;
currentPosition=end;
for(;pos<end; pos++) action.accept(numbers[pos]);
}
As a final note, for the task of summing longs from an array, a Spliterator.OfLong and a LongStream is preferred and that work has already been done, see Arrays.spliterator() and LongStream.sum(), making the whole task as simple as Arrays.stream(numbers).sum().

Related

How to use generics that implements comparable

I am very confused about something in java. So the project I was given is write stacks in java, and the program begins with public class Stack<T extends Comparable<? super T>>. However, when trying to run it in my testing program, no matter the kind of Object I throw at it (Integer, String), it all return the error Ljava.lang.Object; cannot be cast to [Ljava.lang.Comparable. My question is how does this sort of generics that implement generic T but also extends Comparable work, and why String still returned the error message (I thought String already implemented Comparable<String>?). Thanks in advance for taking the time to read the question! Here is the rough outline of the code:
public class Stack <T extends Comparable<? super T>>{
private T[] arr;
private int ptr;
private int size;
public Stack(){
this.size = 20;
arr = (T[]) new Object[size];
ptr = -1;
}
public boolean push(T element){
try {
arr[ptr+1] = element;
ptr++;
return true;
}
catch (Exception e) {
return false;
}
}
public T[] toArray(){
return arr;
}
}
I got this error from creating a JUnit testing class, with the implementation of something like:
import static org.junit.Assert.assertEquals;
public class StackTester{
#Test
public void testPush(){
Stack<String> st = new Stack<String>();
for (int i = 0; i < 5; i++){
String var = new Integer(i).toString();
st.push(var);
}
assertEquals(new Integer[]{"3","4","5"}, st.toArray());
}
}
public class StackTester{
#Test
public void testPush(){
Stack<String> st = new Stack<String>();
//normal case
for (int i = 0; i < 5; i++){
String var = new Integer(i).toString();
st.push(var);
}
assertEquals(new Integer[]{3,4,5}, st.toArray());
}
}
Also, I have not added a compareTo method as I don't know what it should compare to, and I don't think there's a particular use case for adding that method. I should add that the goal of this project is to use stacks to manipulate Strings (such as going from infix to postfix).
P.S: I would also mention that I don't really need to compare stacks in my project, which is why the stack class itself is not implementing the Comparable interface.

How to skip even lines of a Stream<String> obtained from the Files.lines

In this case just odd lines have meaningful data and there is no character that uniquely identifies those lines. My intention is to get something equivalent to the following example:
Stream<DomainObject> res = Files.lines(src)
.filter(line -> isOddLine())
.map(line -> toDomainObject(line))
Is there any “clean” way to do it, without sharing global state?
No, there's no way to do this conveniently with the API. (Basically the same reason as to why there is no easy way of having a zipWithIndex, see Is there a concise way to iterate over a stream with indices in Java 8?).
You can still use Stream, but go for an iterator:
Iterator<String> iter = Files.lines(src).iterator();
while (iter.hasNext()) {
iter.next(); // discard
toDomainObject(iter.next()); // use
}
(You might want to use try-with-resource on that stream though.)
A clean way is to go one level deeper and implement a Spliterator. On this level you can control the iteration over the stream elements and simply iterate over two items whenever the downstream requests one item:
public class OddLines<T> extends Spliterators.AbstractSpliterator<T>
implements Consumer<T> {
public static <T> Stream<T> oddLines(Stream<T> source) {
return StreamSupport.stream(new OddLines(source.spliterator()), false);
}
private static long odd(long l) { return l==Long.MAX_VALUE? l: (l+1)/2; }
Spliterator<T> originalLines;
OddLines(Spliterator<T> source) {
super(odd(source.estimateSize()), source.characteristics());
originalLines=source;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if(originalLines==null || !originalLines.tryAdvance(action))
return false;
if(!originalLines.tryAdvance(this)) originalLines=null;
return true;
}
#Override
public void accept(T t) {}
}
Then you can use it like
Stream<DomainObject> res = OddLines.oddLines(Files.lines(src))
.map(line -> toDomainObject(line));
This solution has no side effects and retains most advantages of the Stream API like the lazy evaluation. However, it should be clear that it hasn’t a useful semantics for unordered stream processing (beware about the subtle aspects like using forEachOrdered rather than forEach when performing a terminal action on all elements) and while supporting parallel processing in principle, it’s unlikely to be very efficient…
As aioobe said, there isn't a convenient way to do this, but there are several inconvenient ways. :-)
Here's another spliterator-based approach. Unlike Holger's, which wraps another spliterator, this one does the I/O itself. This gives greater control over things like ordering, but it also means that it has to deal with IOException and close handling. I also threw in a Predicate parameter that lets you get a crack at which lines get passed through.
static class LineSpliterator extends Spliterators.AbstractSpliterator<String>
implements AutoCloseable {
final BufferedReader br;
final LongPredicate pred;
long count = 0L;
public LineSpliterator(Path path, LongPredicate pred) throws IOException {
super(Long.MAX_VALUE, Spliterator.ORDERED);
br = Files.newBufferedReader(path);
this.pred = pred;
}
#Override
public boolean tryAdvance(Consumer<? super String> action) {
try {
String s;
while ((s = br.readLine()) != null) {
if (pred.test(++count)) {
action.accept(s);
return true;
}
}
return false;
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
#Override
public void close() {
try {
br.close();
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
public static Stream<String> lines(Path path, LongPredicate pred) throws IOException {
LineSpliterator ls = new LineSpliterator(path, pred);
return StreamSupport.stream(ls, false)
.onClose(() -> ls.close());
}
}
You'd use it within a try-with-resources to ensure that the file is closed, even if an exception occurs:
static void printOddLines() throws IOException {
try (Stream<String> lines = LineSpliterator.lines(PATH, x -> (x & 1L) == 1L)) {
lines.forEach(System.out::println);
}
}
You can do this with a custom spliterator:
public class EvenOdd {
public static final class EvenSpliterator<T> implements Spliterator<T> {
private final Spliterator<T> underlying;
boolean even;
public EvenSpliterator(Spliterator<T> underlying, boolean even) {
this.underlying = underlying;
this.even = even;
}
#Override
public boolean tryAdvance(Consumer<? super T> action) {
if (even) {
even = false;
return underlying.tryAdvance(action);
}
if (!underlying.tryAdvance(t -> {})) {
return false;
}
return underlying.tryAdvance(action);
}
#Override
public Spliterator<T> trySplit() {
if (!hasCharacteristics(SUBSIZED)) {
return null;
}
final Spliterator<T> newUnderlying = underlying.trySplit();
if (newUnderlying == null) {
return null;
}
final boolean oldEven = even;
if ((newUnderlying.estimateSize() & 1) == 1) {
even = !even;
}
return new EvenSpliterator<>(newUnderlying, oldEven);
}
#Override
public long estimateSize() {
return underlying.estimateSize()>>1;
}
#Override
public int characteristics() {
return underlying.characteristics();
}
}
public static void main(String[] args) {
final EvenSpliterator<Integer> spliterator = new EvenSpliterator<>(IntStream.range(1, 100000).parallel().mapToObj(Integer::valueOf).spliterator(), false);
final List<Integer> result = StreamSupport.stream(spliterator, true).parallel().collect(Collectors.toList());
final List<Integer> expected = IntStream.range(1, 100000 / 2).mapToObj(i -> i * 2).collect(Collectors.toList());
if (result.equals(expected)) {
System.out.println("Yay! Expected result.");
}
}
}
Following the #aioobe algorithm, here's another spliterator-based approach, as proposed by #Holger but more concise, even if less effective.
public static <T> Stream<T> filterOdd(Stream<T> src) {
Spliterator<T> iter = src.spliterator();
AbstractSpliterator<T> res = new AbstractSpliterator<T>(Long.MAX_VALUE, Spliterator.ORDERED)
{
#Override
public boolean tryAdvance(Consumer<? super T> action) {
iter.tryAdvance(item -> {}); // discard
return iter.tryAdvance(action); // use
}
};
return StreamSupport.stream(res, false);
}
Then you can use it like
Stream<DomainObject> res = Files.lines(src)
filterOdd(res)
.map(line -> toDomainObject(line))

how to get the keys sorted by custom comparator in map-reduce job in Hadoop?

Consider this class: (From Hadoop: The definitive guide 3rd edition):
import java.io.*;
import org.apache.hadoop.io.*;
public class TextPair implements WritableComparable<TextPair> {
private Text first;
private Text second;
public TextPair() {
set(new Text(), new Text());
}
public TextPair(String first, String second) {
set(new Text(first), new Text(second));
}
public TextPair(Text first, Text second) {
set(first, second);
}
public void set(Text first, Text second) {
this.first = first;
this.second = second;
}
public Text getFirst() {
return first;
}
public Text getSecond() {
return second;
}
#Override
public void write(DataOutput out) throws IOException {
first.write(out);
second.write(out);
}
#Override
public void readFields(DataInput in) throws IOException {
first.readFields(in);
second.readFields(in);
}
#Override
public int hashCode() {
return first.hashCode() * 163 + second.hashCode();
}
#Override
public boolean equals(Object o) {
if (o instanceof TextPair) {
TextPair tp = (TextPair) o;
return first.equals(tp.first) && second.equals(tp.second);
}
return false;
}
#Override
public String toString() {
return first + "\t" + second;
}
#Override
public int compareTo(TextPair tp) {
int cmp = first.compareTo(tp.first);
if (cmp != 0) {
return cmp;
}
return second.compareTo(tp.second);
}
// ^^ TextPair
// vv TextPairComparator
public static class Comparator extends WritableComparator {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
public Comparator() {
super(TextPair.class);
}
#Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
try {
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
int cmp = TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
if (cmp != 0) {
return cmp;
}
return TEXT_COMPARATOR.compare(b1, s1 + firstL1, l1 - firstL1,
b2, s2 + firstL2, l2 - firstL2);
} catch (IOException e) {
throw new IllegalArgumentException(e);
}
}
}
static {
WritableComparator.define(TextPair.class, new Comparator());
}
// ^^ TextPairComparator
// vv TextPairFirstComparator
public static class FirstComparator extends WritableComparator {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
public FirstComparator() {
super(TextPair.class);
}
#Override
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
try {
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
} catch (IOException e) {
throw new IllegalArgumentException(e);
}
}
#Override
public int compare(WritableComparable a, WritableComparable b) {
if (a instanceof TextPair && b instanceof TextPair) {
return ((TextPair) a).first.compareTo(((TextPair) b).first);
}
return super.compare(a, b);
}
}
// ^^ TextPairFirstComparator
// vv TextPair
}
// ^^ TextPair
There are two kinds of comparators defined:
one is sorting by first followed by second which is the default comparator.
The other is sorting by first ONLY, which is the firstComparator.
If I have to use use firstComparator for sorting my keys, how do I achieve that?
That is, how do I override my default comparator with the first comparator, I defined above.
Secondly, how would I unitTest this since the output of map job is not sorted. ?
If I have to use use firstComparator for sorting my keys, how do I achieve that? That is, how do I override my default comparator with the first comparator, I defined above.
I assume you expect a method something like setComparator(firstComparator). As far as I know there is no such method. The keys are sorted (on the mapper side) using the compareTo() of the Writeable type representing the keys. In your case, the compareTo() method checks the first value and then the second one. In other words, the keys will be sorted by the first value and, then, the keys in the same group (i.e. having the same first value) will be sorted by their second value.
All in all, this means that your keys will always be sorted by the first value (+ by the second value if the first one isn't able to take the decision). Which in turn means that there is no need to have a different comparator (firstComparator) which looks only at the first value because that is already achieved with the compareTo() method of your TextPair class.
On the other hand, if the firstComparator sorts the keys completely differently, the only solution is to move the logic in firstComparator to the compareTo() method of the Writable class representing your key. I don't see any reason why you wouldn't do that. If you already have the firstComparator and want to reuse it, you can instantiate it and invoke it in the compareTo() method of the TexPair Writable.
You might also want to take a look at the GroupingComparator which is used to decide which keys are used together in the same call of the reduce() method. Since you didn't describe exactly what you want to achieve, I can't say for sure if this will be helpful or not.
Secondly, how would I unitTest this since the output of map job is not sorted. ?
Unit testing, as the name says, implies testing a single unit of code (most of the time a method/function/procedure). If you want to unit-test your reduce method you have to provide the interesting input cases and to check that the method under test outputs the expected result. More concretely, you have to create/mock a sorted Iterable over your keys and invoke your reduce function with it. Unit testing a reduce method shouldn't rely on the execution of the corresponding map method.

Storm Trident 'average aggregator

I am a newbie to Trident and I'm looking to create an 'Average' aggregator similar to 'Sum(), but for 'Average'.The following does not work:
public class Average implements CombinerAggregator<Long>.......{
public Long init(TridentTuple tuple)
{
(Long)tuple.getValue(0);
}
public Long Combine(long val1,long val2){
return val1+val2/2;
}
public Long zero(){
return 0L;
}
}
It may not be exactly syntactically correct, but that's the idea. Please help if you can. Given 2 tuples with values [2,4,1] and [2,2,5] and fields 'a','b' and 'c' and doing an average on field 'b' should return '3'. I'm not entirely sure how init() and zero() work.
Thank you so much for your help in advance.
Eli
public class Average implements CombinerAggregator<Number> {
int count = 0;
double sum = 0;
#Override
public Double init(final TridentTuple tuple) {
this.count++;
if (!(tuple.getValue(0) instanceof Double)) {
double d = ((Number) tuple.getValue(0)).doubleValue();
this.sum += d;
return d;
}
this.sum += (Double) tuple.getValue(0);
return (Double) tuple.getValue(0);
}
#Override
public Double combine(final Number val1, final Number val2) {
return this.sum / this.count;
}
#Override
public Double zero() {
this.sum = 0;
this.count = 0;
return 0D;
}
}
I am a complete newbie when it comes to Trident as well, and so I'm not entirely if the following will work. But it might:
public class AvgAgg extends BaseAggregator<AvgState> {
static class AvgState {
long count = 0;
long total = 0;
double getAverage() {
return total/count;
}
}
public AvgState init(Object batchId, TridentCollector collector) {
return new AvgState();
}
public void aggregate(AvgState state, TridentTuple tuple, TridentCollector collector) {
state.count++;
state.total++;
}
public void complete(AvgState state, TridentCollector collector) {
collector.emit(new Values(state.getAverage()));
}
}

Hadoop Raw comparator

I am trying to implement the following in a Raw Comparator but not sure how to write this?
the tumestamp field here is of tyoe LongWritable.
if (this.getNaturalKey().compareTo(o.getNaturalKey()) != 0) {
return this.getNaturalKey().compareTo(o.getNaturalKey());
} else if (this.timeStamp != o.timeStamp) {
return timeStamp.compareTo(o.timeStamp);
} else {
return 0;
}
I found a hint here, but not sure how do I implement this dealing with a LongWritabel type?
http://my.safaribooksonline.com/book/databases/hadoop/9780596521974/serialization/id3548156
Thanks for your help
Let say i have a CompositeKey that represents a pair of (String stockSymbol, long timestamp).
We can do a primary grouping pass on the stockSymbol field to get all of the data of one type together, and then our "secondary sort" during the shuffle phase uses the timestamp long member to sort the timeseries points so that they arrive at the reducer partitioned and in sorted order.
public class CompositeKey implements WritableComparable<CompositeKey> {
// natural key is (stockSymbol)
// composite key is a pair (stockSymbol, timestamp)
private String stockSymbol;
private long timestamp;
......//Getter setter omiited for clarity here
#Override
public void readFields(DataInput in) throws IOException {
this.stockSymbol = in.readUTF();
this.timestamp = in.readLong();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeUTF(this.stockSymbol);
out.writeLong(this.timestamp);
}
#Override
public int compareTo(CompositeKey other) {
if (this.stockSymbol.compareTo(other.stockSymbol) != 0) {
return this.stockSymbol.compareTo(other.stockSymbol);
}
else if (this.timestamp != other.timestamp) {
return timestamp < other.timestamp ? -1 : 1;
}
else {
return 0;
}
}
Now the CompositeKey comparator would be:
public class CompositeKeyComparator extends WritableComparator {
protected CompositeKeyComparator() {
super(CompositeKey.class, true);
}
#Override
public int compare(WritableComparable wc1, WritableComparable wc2) {
CompositeKey ck1 = (CompositeKey) wc1;
CompositeKey ck2 = (CompositeKey) wc2;
int comparison = ck1.getStockSymbol().compareTo(ck2.getStockSymbol());
if (comparison == 0) {
// stock symbols are equal here
if (ck1.getTimestamp() == ck2.getTimestamp()) {
return 0;
}
else if (ck1.getTimestamp() < ck2.getTimestamp()) {
return -1;
}
else {
return 1;
}
}
else {
return comparison;
}
}
}
Are you asking about way to compare LongWritable type provided by hadoop ?
If yes, then the answer is to use compare() method. For more details, scroll down here.
The best way to correctly implement RawComparator is to extend WritableComparator and override compare() method. The WritableComparator is very good written, so you can easily understand it.
It is already implemented from what I see in the LongWritable class:
/** A Comparator optimized for LongWritable. */
public static class Comparator extends WritableComparator {
public Comparator() {
super(LongWritable.class);
}
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
long thisValue = readLong(b1, s1);
long thatValue = readLong(b2, s2);
return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
}
}
That byte comparision is the override of the RawComparator.

Resources