Java Collector.combiner getting called with supplier values always - java-8

Problem : Create a Collector implementation which would multiply stream of Integers in parallel and return Long.
Implementation:
public class ParallelMultiplier implements Collector<Integer, Long, Long> {
#Override
public BiConsumer<Long, Integer> accumulator() {
// TODO Auto-generated method stub
return (operand1, operand2) -> {
System.out.println("Accumulating Values (Accumulator, Element): (" + operand1 + ", " + operand2 + ")");
long Lval = operand1.longValue();
int Ival = operand2.intValue();
Lval *= Ival;
operand1 = Long.valueOf(Lval);
System.out.println("Acc Updated : " + operand1);
};
}
#Override
public Set<java.util.stream.Collector.Characteristics> characteristics() {
// TODO Auto-generated method stub
return Collections.unmodifiableSet(EnumSet.of(UNORDERED));
}
#Override
public BinaryOperator<Long> combiner() {
return (operand1, operand2) -> {
System.out.println("Combining Values : (" + operand1 + ", " + operand2 + ")");
long Lval1 = operand1.longValue();
long Lval2 = operand2.longValue();
Lval1 *= Lval2;
return Long.valueOf(Lval1);
};
}
#Override
public Function<Long, Long> finisher() {
// TODO Auto-generated method stub
return Function.identity();
}
#Override
public Supplier<Long> supplier() {
return () -> new Long(1L);
}
}
Observed Output:
Accumulating Values (Accumulator, Element): (1, 4)
Acc Updated : 4
Accumulating Values (Accumulator, Element): (1, 3)
Acc Updated : 3
Combining Values : (1, 1)
Accumulating Values (Accumulator, Element): (1, 8)
Accumulating Values (Accumulator, Element): (1, 6)
Accumulating Values (Accumulator, Element): (1, 2)
Acc Updated : 2
Acc Updated : 8
Accumulating Values (Accumulator, Element): (1, 5)
Accumulating Values (Accumulator, Element): (1, 1)
Acc Updated : 5
Acc Updated : 6
Combining Values : (1, 1)
Accumulating Values (Accumulator, Element): (1, 7)
Acc Updated : 7
Combining Values : (1, 1)
Combining Values : (1, 1)
Acc Updated : 1
Combining Values : (1, 1)
Combining Values : (1, 1)
Combining Values : (1, 1)
Invocation:
List<Integer> intList = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8);
Collector<Integer, Long, Long> parallelMultiplier = new ParallelMultiplier();
result = intList.parallelStream().collect(parallelMultiplier);
i.e. multiplication result is 1, where it should have been 8 factorial. I am not using 'concurrent' Characteristics either.
Ideally, i should have gotten multiplication result of substreams, fed into combiner() operation, but that seems to be not happening here.
Keeping aside the inefficient implementations of boxing/unboxing, any clue where I might have made mistake??

Your collector is slightly off. Here is a simplified version(the why your does not work - see at the end):
static class ParallelMultiplier implements Collector<Integer, long[], Long> {
#Override
public BiConsumer<long[], Integer> accumulator() {
return (left, right) -> left[0] *= right;
}
#Override
public BinaryOperator<long[]> combiner() {
return (left, right) -> {
left[0] = left[0] * right[0];
return left;
};
}
#Override
public Function<long[], Long> finisher() {
return arr -> arr[0];
}
#Override
public Supplier<long[]> supplier() {
return () -> new long[] { 1L };
}
#Override
public Set<java.util.stream.Collector.Characteristics> characteristics() {
return Collections.unmodifiableSet(EnumSet.noneOf(Characteristics.class));
}
}
You problems can be exemplified like this:
static Long test(Long left, Long right) {
left = left * right;
return left;
}
long l = 12L;
long r = 13L;
test(l, r);
System.out.println(l); // still 12

As Flown stated, Java's primitive wrapper types are immutable and cannot be used as an accumulator. Because you're computing the multiplication in parallel, we'll want to use a thread-safe implementation of a mutable Long, which is an AtomicLong.
import java.util.*;
import java.util.concurrent.atomic.*;
import java.util.function.*;
import java.util.stream.*;
public class ParallelMultiplier implements Collector<Integer, AtomicLong, Long> {
#Override
public BiConsumer<AtomicLong, Integer> accumulator() {
return (operand1, operand2) -> operand1.set(operand1.longValue() * operand2.longValue());
}
#Override
public Set<java.util.stream.Collector.Characteristics> characteristics() {
return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED));
}
#Override
public BinaryOperator<AtomicLong> combiner() {
return (operand1, operand2) -> new AtomicLong(operand1.longValue() * operand2.longValue());
}
#Override
public Function<AtomicLong, Long> finisher() {
return l -> l.longValue();
}
#Override
public Supplier<AtomicLong> supplier() {
return () -> new AtomicLong(1);
}
}
Testing this with what you've provided results in the correct answer, 8! = 40320.

Related

Java8 Method chaining for Single object without Stream/Optional?

I felt it easiest to capture my question with the below example. I would like to apply multiple transformations on an object (in this case, they all return same class, Number, but not necessarily). With an Optional (Method 3) or Stream (Method 4), I can use the .map elegantly and legibly. However, when used with a single object, I have to either just make an Optional just to use the .map chaining (with a .get() in the end), or use Stream.of() with a findFirst in the end, which seems like unnecessary work.
[My Preference]: I prefer methods 3 & 4, as they seem better for readability than the pre-java8 options - methods 1 & 2.
[Question]: Is there a better/neater/more preferable/more elegant way of achieving the same than all the methods used here? If not, what method would you use?
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class Tester {
static class Number {
private final int value;
private Number(final int value) {
this.value = value;
}
public int getValue() {
return value;
}
#Override
public String toString() {
return String.valueOf(value);
}
}
private static Number add(final Number number, final int val) {
return new Number(number.getValue() + val);
}
private static Number multiply(final Number number, final int val) {
return new Number(number.getValue() * val);
}
private static Number subtract(final Number number, final int val) {
return new Number(number.getValue() - val);
}
public static void main(final String[] args) {
final Number input = new Number(1);
System.out.println("output1 = " + method1(input)); // 100
System.out.println("output2 = " + method2(input)); // 100
System.out.println("output3 = " + method3(input)); // 100
System.out.println("output4 = " + method4(input)); // 100
processAList();
}
// Processing an object - Method 1
private static Number method1(final Number input) {
return subtract(multiply(add(input, 10), 10), 10);
}
// Processing an object - Method 2
private static Number method2(final Number input) {
final Number added = add(input, 10);
final Number multiplied = multiply(added, 10);
return subtract(multiplied, 10);
}
// Processing an object - Method 3 (Contrived use of Optional)
private static Number method3(final Number input) {
return Optional.of(input)
.map(number -> add(number, 10))
.map(number -> multiply(number, 10))
.map(number -> subtract(number, 10)).get();
}
// Processing an object - Method 4 (Contrived use of Stream)
private static Number method4(final Number input) {
return Stream.of(input)
.map(number -> add(number, 10))
.map(number -> multiply(number, 10))
.map(number -> subtract(number, 10))
.findAny().get();
}
// Processing a list (naturally uses the Stream advantage)
private static void processAList() {
final List<Number> inputs = new ArrayList<>();
inputs.add(new Number(1));
inputs.add(new Number(2));
final List<Number> outputs = inputs.stream()
.map(number -> add(number, 10))
.map(number -> multiply(number, 10))
.map(number -> subtract(number, 10))
.collect(Collectors.toList());
System.out.println("outputs = " + outputs); // [100, 110]
}
}
The solution is to build your methods into your Number class. For example:
static class Number {
// instance variable, constructor and getter unchanged
public Number add(final int val) {
return new Number(getValue() + val);
}
// mulitply() and subtract() in the same way
// toString() unchanged
}
Now your code becomes very simple and readable:
private static Number method5(final Number input) {
return input
.add(10)
.multiply(10)
.subtract(10);
}
You may even write the return statement on one line if you prefer:
return input.add(10).multiply(10).subtract(10);
Edit: If you can't change the Number class, my personal taste would be for method2. Using Optional or Stream would be misuse or at least misplaced and could easily confuse your reader. If you insist, write your own Mandatory class, like Optional except it always holds a value, which makes it simpler. For my part I wouldn't bother.

Converting to lambda expression with ForEach for a breaking for loop

Have the following codes with breaking behavior in a for loop:
package test;
import java.util.Arrays;
import java.util.List;
public class Test {
private static List<Integer> integerList = Arrays.asList(1, 2, 3, 4);
public static void main(String[] args) {
countTo2(integerList);
}
public static void countTo2(List<Integer> integerList) {
for (Integer integer : integerList) {
System.out.println("counting " + integer);
if (integer >= 2) {
System.out.println("returning!");
return;
}
}
}
}
trying to express it with Lambda using forEach() will change the behavior as the for loop is not breaking anymore:
public static void countTo2(List<Integer> integerList) {
integerList.forEach(integer -> {
System.out.println("counting " + integer);
if (integer >= 2) {
System.out.println("returning!");
return;
}
});
}
This actually makes sense as the return; statements are only enforced within the lambda expression itself (within the internal iteration) and not for the whole execution sequence, so is there a way to get the desired behavior (breaking the for loop) using the lambda expression?
The following code is logically equivalent to yours:
public static void countTo2(List<Integer> integerList) {
integerList.stream()
.peek(i -> System.out.println("counting " + i))
.filter(i -> i >= 2)
.findFirst()
.ifPresent(i -> System.out.println("returning!"));
}
If you're confused about anything, please let me know!
What you are looking for is a short-circuit terminal operation and while this is the way to do it:
integerList.stream()
.peek(x -> System.out.println("counting = " + x))
.filter(x -> x >= 2)
.findFirst()
.ifPresent(x -> System.out.println("retunrning"));
That's an equivalent only when dealing with sequential stream. As soon as you add parallel that peek might show elements that you would not expect, because there is no defined processing order, but there is encounter order - meaning that elements will be correctly fed to the terminal operation.
One way I could think of doing that would be using anyMatch and the inverse:
if (integerList.stream().noneMatch(val -> val >= 2)) {
System.out.println("counting " + val);
}
if (integerList.stream().anyMatch(val -> val >= 2)) {
System.out.println("returning!");
}
but internally that would iterate over the list twice and wouldn't be very optimal I believe.

Java Streaming: get max if no duplicates

I'm trying to write a function that takes in a Map and returns an Entry. If the entry with the max Integer value is unique, it should return that entry. However, if there are duplicate entries with the same max value, it should return a new Entry with a key of "MULTIPLE" and a value of 0. It's easy enough for me to get the max value ignoring duplicates:
public static Entry<String,Integer> getMax(Map<String,Integer> map1) {
return map1.entrySet().stream()
.max((a,b) -> a.getValue().compareTo(b.getValue()))
.get();
}
But in order for me to do what I said initially, I could only find a solution where I had to create an initial stream to do a boolean check if there were multiple max values and then do another stream if not to get the value. I'd like to find a solution where I can do both tasks with only one stream.
Here's my little test case:
#Test
public void test1() {
Map<String,Integer> map1 = new HashMap<>();
map1.put("A", 100);
map1.put("B", 100);
map1.put("C", 100);
map1.put("D", 105);
Assert.assertEquals("D", getMax(map1).getKey());
Map<String,Integer> map2 = new HashMap<>();
map2.put("A", 100);
map2.put("B", 105);
map2.put("C", 100);
map2.put("D", 105);
Assert.assertEquals("MULTIPLE", getMax(map2).getKey());
This is a simple case of reduction, and you don't need any external libraries.
Map.Entry<String, Integer> max(Map<String, Integer> map) {
return map.entrySet().stream()
.reduce((e1, e2) -> {
if (e1.getValue() == e2.getValue()) {
return new SimpleImmutableEntry<>("MULTIPLE", 0);
} else {
return Collections.max(asList(e1, e2), comparingInt(Map.Entry::getValue));
}
})
.orElse(new SimpleImmutableEntry<>("NOT_FOUND", 0));
}
Here is the solution by StreamEx
public Entry<String, Integer> getMax(Map<String, Integer> map) {
return StreamEx.of(map.entrySet()).collect(collectingAndThen(MoreCollectors.maxAll(Map.Entry.comparingByValue()),
l -> l.size() == 1 ? l.get(0) : new AbstractMap.SimpleImmutableEntry<>("MULTIPLE", 0)));
}
Another solution is iterating the map twice with potential better performance:
public Entry<String, Integer> getMax(Map<String, Integer> map) {
int max = map.entrySet().stream().mapToInt(e -> e.getValue()).max().getAsInt();
return StreamEx.of(map.entrySet()).filter(e -> e.getValue().intValue() == max).limit(2)
.toListAndThen(l -> l.size() == 1 ? l.get(0) : new AbstractMap.SimpleImmutableEntry<>("MULTIPLE", 0));
}

How to sort comma separated keys in Reducer ouput?

I am running an RFM Analysis program using MapReduce. The OutputKeyClass is Text.class and I am emitting comma separated R (Recency), F (Frequency), M (Monetory) as the key from Reducer where R=BigInteger, F=Binteger, M=BigDecimal and the value is also a Text representing Customer_ID. I know that Hadoop sorts output based on keys but my final result is a bit wierd. I want the output keys to be sorted by R first, then F and then M. But I am getting the following output sort order for unknown reasons:
545,1,7652 100000
545,23,390159.402343750 100001
452,13,132586 100002
452,4,32202 100004
452,1,9310 100007
452,1,4057 100018
452,3,18970 100021
But I want the following output:
545,23,390159.402343750 100001
545,1,7652 100000
452,13,132586 100002
452,4,32202 100004
452,3,18970 100021
452,1,9310 100007
452,1,4057 100018
NOTE: The customer_ID was the key in Map phase and all the RFM values belonging to a particular Customer_ID are brought together at the Reducer for aggregation.
So after a lot of searching I found some useful material the compilation of which I am posting now:
You have to start with your custom data type. Since I had three comma separated values which needed to be sorted descendingly, I had to create a TextQuadlet.java data type in Hadoop. The reason I am creating a quadlet is because the first part of the key will be the natural key and the rest of the three parts will be the R, F, M:
import java.io.*;
import org.apache.hadoop.io.*;
public class TextQuadlet implements WritableComparable<TextQuadlet> {
private String customer_id;
private long R;
private long F;
private double M;
public TextQuadlet() {
}
public TextQuadlet(String customer_id, long R, long F, double M) {
set(customer_id, R, F, M);
}
public void set(String customer_id2, long R2, long F2, double M2) {
this.customer_id = customer_id2;
this.R = R2;
this.F = F2;
this.M=M2;
}
public String getCustomer_id() {
return customer_id;
}
public long getR() {
return R;
}
public long getF() {
return F;
}
public double getM() {
return M;
}
#Override
public void write(DataOutput out) throws IOException {
out.writeUTF(this.customer_id);
out.writeLong(this.R);
out.writeLong(this.F);
out.writeDouble(this.M);
}
#Override
public void readFields(DataInput in) throws IOException {
this.customer_id = in.readUTF();
this.R = in.readLong();
this.F = in.readLong();
this.M = in.readDouble();
}
// This hashcode function is important as it is used by the custom
// partitioner for this class.
#Override
public int hashCode() {
return (int) (customer_id.hashCode() * 163 + R + F + M);
}
#Override
public boolean equals(Object o) {
if (o instanceof TextQuadlet) {
TextQuadlet tp = (TextQuadlet) o;
return customer_id.equals(tp.customer_id) && R == (tp.R) && F==(tp.F) && M==(tp.M);
}
return false;
}
#Override
public String toString() {
return customer_id + "," + R + "," + F + "," + M;
}
// LHS in the conditional statement is the current key
// RHS in the conditional statement is the previous key
// When you return a negative value, it means that you are exchanging
// the positions of current and previous key-value pair
// Returning 0 or a positive value means that you are keeping the
// order as it is
#Override
public int compareTo(TextQuadlet tp) {
// Here my natural is is customer_id and I don't even take it into
// consideration.
// So as you might have concluded, I am sorting R,F,M descendingly.
if (this.R != tp.R) {
if(this.R < tp.R) {
return 1;
}
else{
return -1;
}
}
if (this.F != tp.F) {
if(this.F < tp.F) {
return 1;
}
else{
return -1;
}
}
if (this.M != tp.M){
if(this.M < tp.M) {
return 1;
}
else{
return -1;
}
}
return 0;
}
public static int compare(TextQuadlet tp1, TextQuadlet tp2) {
int cmp = tp1.compareTo(tp2);
return cmp;
}
public static int compare(Text customer_id1, Text customer_id2) {
int cmp = customer_id1.compareTo(customer_id1);
return cmp;
}
}
Next you'll need a custom partitioner so that all the values which have the same key end up at one reducer:
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;
public class FirstPartitioner_RFM extends Partitioner<TextQuadlet, Text> {
#Override
public int getPartition(TextQuadlet key, Text value, int numPartitions) {
return (int) key.hashCode() % numPartitions;
}
}
Thirdly, you'll need a custom group comparater so that all the values are grouped together by their natural key which is customer_id and not the composite key which is customer_id,R,F,M:
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
public class GroupComparator_RFM_N extends WritableComparator {
protected GroupComparator_RFM_N() {
super(TextQuadlet.class, true);
}
#SuppressWarnings("rawtypes")
#Override
public int compare(WritableComparable w1, WritableComparable w2) {
TextQuadlet ip1 = (TextQuadlet) w1;
TextQuadlet ip2 = (TextQuadlet) w2;
// Here we tell hadoop to group the keys by their natural key.
return ip1.getCustomer_id().compareTo(ip2.getCustomer_id());
}
}
Fourthly, you'll need a key comparater which will again sort the keys based on R,F,M descendingly and implement the same sort technique which is used in TextQuadlet.java. Since I got lost while coding, I slightly changed the way I compared data types in this function but the underlying logic is the same as in TextQuadlet.java:
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
public class KeyComparator_RFM extends WritableComparator {
protected KeyComparator_RFM() {
super(TextQuadlet.class, true);
}
#SuppressWarnings("rawtypes")
#Override
public int compare(WritableComparable w1, WritableComparable w2) {
TextQuadlet ip1 = (TextQuadlet) w1;
TextQuadlet ip2 = (TextQuadlet) w2;
// LHS in the conditional statement is the current key-value pair
// RHS in the conditional statement is the previous key-value pair
// When you return a negative value, it means that you are exchanging
// the positions of current and previous key-value pair
// If you are comparing strings, the string which ends up as the argument
// for the `compareTo` method turns out to be the previous key and the
// string which is invoking the `compareTo` method turns out to be the
// current key.
if(ip1.getR() == ip2.getR()){
if(ip1.getF() == ip2.getF()){
if(ip1.getM() == ip2.getM()){
return 0;
}
else{
if(ip1.getM() < ip2.getM())
return 1;
else
return -1;
}
}
else{
if(ip1.getF() < ip2.getF())
return 1;
else
return -1;
}
}
else{
if(ip1.getR() < ip2.getR())
return 1;
else
return -1;
}
}
}
And finally, in your driver class, you'll have to include our custom classes. Here I have used TextQuadlet,Text as k-v pair. But you can choose any other class depending on your needs.:
job.setPartitionerClass(FirstPartitioner_RFM.class);
job.setSortComparatorClass(KeyComparator_RFM.class);
job.setGroupingComparatorClass(GroupComparator_RFM_N.class);
job.setMapOutputKeyClass(TextQuadlet.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(TextQuadlet.class);
job.setOutputValueClass(Text.class);
Do correct me if I am technically going wrong somewhere in the code or in the explanation as I have based this answer purely on my personal understanding from what I read on the internet and it works for me perfectly.

Generating all the elements of a power set

Power set is just set of all subsets for given set.
It includes all subsets (with empty set).
It's well-known that there are 2^N elements in this set, where N is count of elements in original set.
To build power set, following thing can be used:
Create a loop, which iterates all integers from 0 till 2^N-1
Proceed to binary representation for each integer
Each binary representation is a set of N bits (for lesser numbers, add leading zeros).
Each bit corresponds, if the certain set member is included in current subset.
import java.util.NoSuchElementException;
import java.util.BitSet;
import java.util.Iterator;
import java.util.Set;
import java.util.TreeSet;
public class PowerSet<E> implements Iterator<Set<E>>, Iterable<Set<E>> {
private final E[] ary;
private final int subsets;
private int i;
public PowerSet(Set<E> set) {
ary = (E[])set.toArray();
subsets = (int)Math.pow(2, ary.length) - 1;
}
public Iterator<Set<E>> iterator() {
return this;
}
#Override
public void remove() {
throw new UnsupportedOperationException("Cannot remove()!");
}
#Override
public boolean hasNext() {
return i++ < subsets;
}
#Override
public Set<E> next() {
if (!hasNext()) {
throw new NoSuchElementException();
}
Set<E> subset = new TreeSet<E>();
BitSet bitSet = BitSet.valueOf(new long[] { i });
if (bitSet.cardinality() == 0) {
return subset;
}
for (int e = bitSet.nextSetBit(0); e != -1; e = bitSet.nextSetBit(e + 1)) {
subset.add(ary[e]);
}
return subset;
}
// Unit Test
public static void main(String[] args) {
Set<Integer> numbers = new TreeSet<Integer>();
for (int i = 1; i < 4; i++) {
numbers.add(i);
}
PowerSet<Integer> pSet = new PowerSet<Integer>(numbers);
for (Set<Integer> subset : pSet) {
System.out.println(subset);
}
}
}
The output I am getting is:
[2]
[3]
[2, 3]
java.util.NoSuchElementException
at PowerSet.next(PowerSet.java:47)
at PowerSet.next(PowerSet.java:20)
at PowerSet.main(PowerSet.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at edu.rice.cs.drjava.model.compiler.JavacCompiler.runCommand(JavacCompiler.java:272)
So, the problems are:
I am got getting all the elements(debugging shows me next is called only for even i's).
The exception should not have been thrown.
The problem is in your hasNext. You have i++ < subsets there. What happens is that since hasNext is called once from next() and once more during the iteration for (Set<Integer> subset : pSet) you increment i by 2 each time. You can see this since
for (Set<Integer> subset : pSet) {
}
is actually equivalent to:
Iterator<PowerSet> it = pSet.iterator();
while (it.hasNext()) {
Set<Integer> subset = it.next();
}
Also note that
if (bitSet.cardinality() == 0) {
return subset;
}
is redundant. Try instead:
#Override
public boolean hasNext() {
return i <= subsets;
}
#Override
public Set<E> next() {
if (!hasNext()) {
throw new NoSuchElementException();
}
Set<E> subset = new TreeSet<E>();
BitSet bitSet = BitSet.valueOf(new long[] { i });
for (int e = bitSet.nextSetBit(0); e != -1; e = bitSet.nextSetBit(e + 1)) {
subset.add(ary[e]);
}
i++;
return subset;
}

Resources