adding parallell to a stream causes NullPointerException - parallel-processing

I'm trying to get my head around Java streams. It was my understanding that they provide an easy way to parallellize behaviour, and that also not all operations benefit from parallellization, but that you always have the option to do it by just slapping .parallell() on to an existing stream. This might make the stream go slower in some cases, or return the elements in a different order at the end etc, but you always have the option to parallellize a stream. That's why I got confused when I changed this method:
public static List<Integer> primeSequence() {
List<Integer> list = new LinkedList<Integer>();
IntStream.range(1, 10)
.filter(x -> isPrime(x))
.forEach(list::add);
return list;
}
//returns {2,3,5,7}
to this:
public static List<Integer> primeSequence() {
List<Integer> list = new LinkedList<Integer>();
IntStream.range(1, 10).parallel()
.filter(x -> isPrime(x))
.forEach(list::add);
return list;
}
//throws NullPointerException();
I thought all streams were serial unless otherwise stated and parallel() just made then execute in parallel. What am I missing here? Why does it throw an Exception?

There is one significant issue with your initial primeSequence method implementation - you mix stream iteration with outer list modification. You should avoid using streams that way, otherwise you will face a lot of problems. Like the one you have described. If you take a look at how add(E element) method is implemented you will see something like this:
public boolean add(E e) {
this.linkLast(e);
return true;
}
void linkLast(E e) {
LinkedList.Node<E> l = this.last;
LinkedList.Node<E> newNode = new LinkedList.Node(l, e, (LinkedList.Node)null);
this.last = newNode;
if (l == null) {
this.first = newNode;
} else {
l.next = newNode;
}
++this.size;
++this.modCount;
}
If you use CopyOnWriteArrayList instead of a LinkedList in your example, there will be no NullPointerException thrown - only because CopyOnWriteArrayList uses locking for multithread execution synchronization:
public boolean add(E e) {
ReentrantLock lock = this.lock;
lock.lock();
boolean var6;
try {
Object[] elements = this.getArray();
int len = elements.length;
Object[] newElements = Arrays.copyOf(elements, len + 1);
newElements[len] = e;
this.setArray(newElements);
var6 = true;
} finally {
lock.unlock();
}
return var6;
}
But it is still not the best way to utilize parallel stream.
Correct way to use Stream API
Consider following modification to your code:
public static List<Integer> primeSequence() {
return IntStream.range(1, 10)
.parallel()
.filter(x -> isPrime(x))
.boxed()
.collect(Collectors.toList());
}
Instead of modifying some outer list (of any kind) we are collecting the result and return a final list. You can transform any list to a stream using .stream() method and you don't have to worry about initial list - all operation you will apply to that list won't modify the input and the result will be a copy of the input list.
I hope it helps.

Related

JDK java.util.concurrent.ConcurrentSkipListSet.equals(Object o) implementation efficiency

The equalsimplementatin of java.util.concurrent.ConcurrentSkipListSet in JDK is as following
public boolean equals(Object o) {
// Override AbstractSet version to avoid calling size()
if (o == this)
return true;
if (!(o instanceof Set))
return false;
Collection<?> c = (Collection<?>) o;
try {
return containsAll(c) && c.containsAll(this);
} catch (ClassCastException unused) {
return false;
} catch (NullPointerException unused) {
return false;
}
}
But I think the code below seems to be more efficient
public boolean myEquals(Object o) {
if (o == this)
return true;
if (!(o instanceof Set))
return false;
Collection<?> c = (Collection<?>) o;
if (c.size() != this.size()) {
return false;
}
Iterator ic = c.iterator();
Iterator id = iterator();
while (ic.hasNext() && id.hasNext()) {
if (!ic.next().equals(id.next())) {
return false;
}
}
return true;
}
And a simple test is also likely supporting the second equals
public class Test {
public static void main(String[] args) {
ConcurrentSkipListSet<Integer> set1 = new ConcurrentSkipListSet<Integer>();
ConcurrentSkipListSet<Integer> set2 = new ConcurrentSkipListSet<Integer>();
for (int i = 0; i < 10000000; i++) {
set1.add(i);
set2.add(i);
}
long ts = System.currentTimeMillis();
System.out.println(set1.equals(set2));
System.out.println(System.currentTimeMillis() - ts);
ts = System.currentTimeMillis();
System.out.println(myset1.myEquals(myset2));
System.out.println(System.currentTimeMillis() - ts);
}
}
Output result
true
2713
true
589
In the JDK comment it says, This definition ensures that the equals method works properly across different implementations of the set interface. Could anyone kindly explain this?
For reference, the OpenJDK thread resulted in creating JDK-8181146 ConcurrentSkipListSet.equals efficiency.
In the JDK comment it says, This definition ensures that the equals method works properly across different implementations of the set interface. Could anyone kindly explain this?
It comes from Set.equals(Object). Per the documentation:
Returns true if the specified object is also a set, the two sets have the same size, and every member of the specified set is contained in this set (or equivalently, every member of this set is contained in the specified set). This definition ensures that the equals method works properly across different implementations of the set interface.
It is implying that Set.equals implementations should be defined by the behavior of Set.contains(Object). Which then leads you to this verbiage from the java.util.SortedSet:
Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
So why the 'this contains that and that contains this' in the ConcurrentSkipListSet? First off you want to avoid the call to ConcurrentSkipListSet.size() because:
Beware that, unlike in most collections, this method is NOT a constant-time operation. Because of the asynchronous nature of these sets, determining the current number of elements requires traversing them all to count them. Additionally, it is possible for the size to change during execution of this method, in which case the returned result will be inaccurate. Thus, this method is typically not very useful in concurrent applications.
The second reason is that you want to be 'consistent with equals'.
Let's make a cruel example based off your code:
private static boolean myEquals(Set o1, Set o2) {
if (o1.size() == 1 && o2.size() == 1) {
Iterator ic = o2.iterator();
Iterator id = o1.iterator();
while (ic.hasNext() && id.hasNext()) {
if (!ic.next().equals(id.next())) {
return false;
}
}
return true;
}
return o1.equals(o2);
}
public static void main(String[] args) {
print(skiplist(new BigDecimal("1.0")), tree(new BigDecimal("1.00")));
print(skiplist(new BigDecimal("1.0")), hash(new BigDecimal("1.00")));
print(skiplist(new BigDecimal("1.0")), identity(new BigDecimal("1.00")));
print(skiplist(BigDecimal.ONE), identity(new BigDecimal(BigInteger.ONE, 0)));
}
private static Collection<BigDecimal> e() {
return Arrays.asList(new BigDecimal("1.0"));
}
private static <E> Set<E> hash(E... e) {
return new HashSet<>(Arrays.asList(e));
}
private static <E> Set<E> skiplist(E... e) {
return new ConcurrentSkipListSet<>(Arrays.asList(e));
}
private static <E> Set<E> tree(E... e) {
return new TreeSet<>(Arrays.asList(e));
}
private static <E> Set<E> identity(E... e) {
Set<E> s = Collections.newSetFromMap(new IdentityHashMap<E, Boolean>());
Collections.addAll(s, e);
return s;
}
private static void print(Set o1, Set o2) {
System.out.println(o1.getClass().getName()
+ "==" + o2.getClass().getName() + ": "
+ o1.equals(o2) + ": " + myEquals(o1, o2));
System.out.println(o2.getClass().getName()
+ "==" + o1.getClass().getName() + ": " + o2.equals(o1)
+ ": " + myEquals(o2, o1));
}
Which outputs:
java.util.concurrent.ConcurrentSkipListSet==java.util.TreeSet: true: false
java.util.TreeSet==java.util.concurrent.ConcurrentSkipListSet: true: false
java.util.concurrent.ConcurrentSkipListSet==java.util.HashSet: false: false
java.util.HashSet==java.util.concurrent.ConcurrentSkipListSet: false: false
java.util.concurrent.ConcurrentSkipListSet==java.util.Collections$SetFromMap: false: false
java.util.Collections$SetFromMap==java.util.concurrent.ConcurrentSkipListSet: false: false
java.util.concurrent.ConcurrentSkipListSet==java.util.Collections$SetFromMap: false: true
java.util.Collections$SetFromMap==java.util.concurrent.ConcurrentSkipListSet: false: true
That output shows that the new implementation would not be consistent with equals:
The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C. Note that null is not an instance of any class, and e.compareTo(null) should throw a NullPointerException even though e.equals(null) returns false.
Now we could fix this by replacing the element check with
((Comparable) e1).compareTo((Comparable) e2) != 0 or Comparator.compare(e1, e2) != 0 and add checks to try to determine that the two sets use the same ordering but keep in mind that collections can be wrapped and there is nothing stopping a caller from hiding the fact that a set is backed by sorted set. Now you are back to the 'this contains that and that contains this' implementation of equals which can deal with collection wrappers.
Another nice property of the 'this contains that and that contains this' implementation is that the equals implementation is not creating an iterator object for the given collection which in the worst case could have an implementation like Arrays.asList(s.toArray()).iterator() under the hood.
Without relaxing spec, relaxing the existing behavior, or adding a collection method that returns a BiPredicate to capture the 'equivalence relationship' for a collection, I think it will be hard to add an optimization like this to the JDK.

Why java Map.merge does not pass a supplier?

I want in java a method which allows me to modify a value if exist, or insert one if it doesn't. Similar to merge, but:
I want to pass a value supplier and not a value, to avoid creating it when not needed
In case the value exists, I don't want to reinsert it nor remove it, just access its methods with a container.
I had to write this. The problem with writing it myself is that the version for Concurrent maps is not trivial
public static <K, V> V putOrConsume(Map<K, V> map, K key, Supplier<V> ifAbsent, Consumer<V> ifPresent) {
V val = map.get(key);
if (val != null) {
ifPresent.accept(val);
} else {
map.put(key, ifAbsent.get());
}
return val;
}
The best "standard" way of achieving it is to use compute():
Map<String, String> map = new HashMap<>();
BiFunction<String, String, String> convert = (k, v) -> v == null ? "new_" + k : "old_" + v;
map.compute("x", convert);
map.compute("x", convert);
System.out.println(map.get("x")); //prints old_new_x
Now, say, you have your Supplier and Consumer and would like to follow DRY principle. Then you could use a simple function combinator:
Map<String, String> map = new HashMap<>();
Supplier<String> ifAbsent = () -> "new";
Consumer<String> ifPresent = System.out::println;
BiFunction<String, String, String> putOrConsume = (k, v) -> {
if (v == null) return ifAbsent.get();
ifPresent.accept(v);
return v;
};
map.compute("x", putOrConsume); //nothing
map.compute("x", putOrConsume); //prints "new"
Obviously, you could write a combinator function that takes supplier and consumer and returns BiFunction to make the code above even more generic.
The drawback of this proposed approach is in the extra call to map.put() even if you simply consume the value, i.e. it will be slightly slower, by the time of key lookup. The good news are, map implementations will simply replace the value without creating the new node. I.e. no new objects will be created or garbage collected. Most of the time such trade-offs are justified.
map.compute(...) and map.putIfAbsent(...) are much more powerful than fairly specialized proposed putOrConsume(...). It is so asymmetrical I would actually review the reasons why you need it in the code.
You can achieve what you want with Map.compute and a trivial helper method, as well as with the help of a local class to know if your ifAbsent supplier has been used:
public static <K, V> V putOrConsume(
Map<K, V> map,
K key,
Supplier<V> ifAbsent,
Consumer<V> ifPresent) {
class AbsentSupplier implements Supplier<V> {
boolean used = false;
public V get() {
used = true;
return ifAbsent.get();
}
}
AbsentSupplier absentSupplier = new AbsentSupplier();
V computed = map.compute(
key,
(k, v) -> v == null ?
absentSupplier.get() :
consumeAndReturn(v, ifPresent));
return absentSupplier.used ? null : computed;
}
private static <V> V consumeAndReturn(V v, Consumer<V> consumer) {
consumer.accept(v);
return v;
}
The tricky part is finding whether you have used your ifAbsent supplier to return either null or the existent, consumed value.
The helper method simply adapts the ifPresent consumer so that it behaves like a unary operator that consumes the given value and returns it.
different from others answers, you also using Map.compute method and combine Functions with interface default methods / static methods to make your code more readable. for example:
Usage
//only consuming if value is present
Consumer<V> action = ...;
map.compute(key,ValueMapping.ifPresent(action));
//create value if value is absent
Supplier<V> supplier = ...;
map.compute(key,ValueMapping.ifPresent(action).orElse(supplier));
//map value from key if value is absent
Function<K,V> mapping = ...;
map.compute(key,ValueMapping.ifPresent(action).orElse(mapping));
//orElse supports short-circuit feature
map.compute(key,ValueMapping.ifPresent(action)
.orElse(supplier)
.orElse(() -> fail("it should not be called "+
"if the value computed by the previous orElse")));
<T> T fail(String message) {
throw new AssertionError(message);
}
ValueMapping
interface ValueMapping<T, R> extends BiFunction<T, R, R> {
default ValueMapping<T, R> orElse(Supplier<R> other) {
return orElse(k -> other.get());
}
default ValueMapping<T, R> orElse(Function<T, R> other) {
return (k, v) -> {
R result = this.apply(k, v);
return result!=null ? result : other.apply(k);
};
}
static <T, R> ValueMapping<T, R> ifPresent(Consumer<R> action) {
return (k, v) -> {
if (v!=null) {
action.accept(v);
}
return v;
};
}
}
Note
I used Objects.isNull in ValueMapping in previous version. and #Holger point out that is an overusing case, and should replacing it with simpler condition it != null.

How do I add a Stream operation parameter to my function?

I have a function that performs some stream operations (a filter, in particular) on a List.
public List<String> getAndFilterNames(List<Person> people, Predicate<Person> nameFilter){
List<String> allNames = people.stream()
.map(person -> person.getName())
.filter(nameFilter)
.collect(Collectors.toList());
return allNames;
}
I want to be able to pass an intermediate stream operation (filter, distinct, etc.) and have my function perform that operation before running the terminal operation. Something like:
public List<String> getAndProcessNames(List<Person> people, <intermediate stream operation>){
List<String> allNames = people.stream()
.map(person -> person.getName())
// perform <intermediate stream operation> here
.collect(Collectors.toList());
return allNames;
}
Though, with my current level of experience, it seems impossible. Some intermediate operations have a parameter (like filter), and others don't (like distinct), so I can't set up a single parameter type that will handle all cases...I suppose I could create a couple of signatures though, using Function and Supplier.
Even then, the functional interface that the functions with parameters require varies...filter takes a Predicate, map takes a Function, and from what I understand, there is no way to denote a generic functional interface. Is that correct? There is no actual common class or interface that they all draw from.
So, in the end, it seems that my best bet is to just map and collect and then run my desired stream operation in a case-by-case basis, like:
public List<String> getNames(List<Person> people){
List<String> allNames = people.stream()
.map(person -> person.getName())
.collect(Collectors.toList());
return allNames;
}
List<Person> employees = // a bunch of people
List<String> employeeNames = getNames(employees);
employeeNames = employeeNames.stream(). // desired operations
EDIT: Or, per #Holger:
public Stream<String> getNamesStream(List<Person> people){
Stream<String> namesStream = people.stream()
.map(person -> person.getName());
return namesStream;
}
List<Person> employees = // a bunch of people
Stream<String> employeeNamesStream = getNamesStream(employees);
employeeNamesStream(). // desired operations
Or is there something I'm missing?
What you want is a function from Stream to Stream. For example:
public List<String> processNames(Function<Stream<String>, Stream<String>> f) {
return f.apply(people.stream().map(Person::getName))
.collect(toList());
}
Then invoke like:
List<String> filteredNames = processNames(s -> s.filter(n -> n.startsWith("A")));
Yes, you are right. There's no common base interface that intermediate operations would extend, but there should not be one too because of the different operations that they actually perform.
You could create some methods that you could chain those calls:
private static <T> Stream<T> applyPredicates(Stream<T> input, List<Predicate<T>> predicates) {
Stream<T> result = Stream.concat(input, Stream.empty());
for (Predicate<T> pred : predicates) {
result = result.filter(pred);
}
return result;
}
/**
* this could be modified to accept more then one parameters
*/
private static <T, R> Stream<R> applyFunction(Stream<T> input, Function<T, R> function) {
return input.map(function);
}
And then for example :
List<String> list = Arrays.asList("one", "two", null, "three");
Predicate<String> p1 = t -> t != null;
Predicate<String> p2 = t -> t.startsWith("t");
applyFunction(applyPredicates(list.stream(), Arrays.asList(p1, p2)), String::length)
.collect(Collectors.toList());
Notice that applyFunctions would have to be overloaded to take numerous Functions as input parameters, since each map operation might change the stream from T to R and then to Y and so on.

Java 8 stream short circuit manually [duplicate]

This question already has answers here:
Limit a stream by a predicate
(19 answers)
Closed 6 years ago.
Is there any way to manually short circuit a stream (like in findFirst)?
Example:
Imagine a huge dictionary ordered by word size and alphabet:
cat
... (many more)
lamp
mountain
... (many more)
Only ready and compute the file from beginning, return immediately when line size exceeds 4:
read cat, compute cat
...
read tree, compute lamp
read mountain, return
The following code is very concise but does not take into advantage the order of the stream, it has to ready every line:
try (Stream<String> lines = Files.lines(Paths.get(DICTIONARY_PATH))) {
return lines
// filter for words with the correct size
.filter(line -> line.length() == 4)
// do stuff...
.collect(Collectors.toList());
}
Answer based on Limit a stream by a predicate, processing correctly stops when predicate returns false. Hopefully this method comes available in Java 9:
private static List<String> getPossibleAnswers(int numberOfChars, char[][] possibleChars) throws IOException {
try (Stream<String> lines = Files.lines(Paths.get(DICTIONARY_PATH)) {
return takeWhile(lines, line -> line.length() <= numberOfChars)
// filter length
.filter(line -> line.length() == numberOfChars)
// do stuff
.collect(Collectors.toList());
}
}
static <T> Spliterator<T> takeWhile(Spliterator<T> splitr, Predicate<? super T> predicate) {
return new Spliterators.AbstractSpliterator<T>(splitr.estimateSize(), 0) { boolean stillGoing = true;
#Override
public boolean tryAdvance(Consumer<? super T> consumer) {
if (stillGoing) {
boolean hadNext = splitr.tryAdvance(elem -> {
if (predicate.test(elem)) {
consumer.accept(elem);
} else {
stillGoing = false;
}
});
return hadNext && stillGoing;
}
return false;
}
};
}
static <T> Stream<T> takeWhile(Stream<T> stream, Predicate<? super T> predicate) {
return StreamSupport.stream(takeWhile(stream.spliterator(), predicate), false);
}

Java 8 is not maintaining the order while grouping

I m using Java 8 for grouping by data. But results obtained are not in order formed.
Map<GroupingKey, List<Object>> groupedResult = null;
if (!CollectionUtils.isEmpty(groupByColumns)) {
Map<String, Object> mapArr[] = new LinkedHashMap[mapList.size()];
if (!CollectionUtils.isEmpty(mapList)) {
int count = 0;
for (LinkedHashMap<String, Object> map : mapList) {
mapArr[count++] = map;
}
}
Stream<Map<String, Object>> people = Stream.of(mapArr);
groupedResult = people
.collect(Collectors.groupingBy(p -> new GroupingKey(p, groupByColumns), Collectors.mapping((Map<String, Object> p) -> p, toList())));
public static class GroupingKey
public GroupingKey(Map<String, Object> map, List<String> cols) {
keys = new ArrayList<>();
for (String col : cols) {
keys.add(map.get(col));
}
}
// Add appropriate isEqual() ... you IDE should generate this
#Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final GroupingKey other = (GroupingKey) obj;
if (!Objects.equals(this.keys, other.keys)) {
return false;
}
return true;
}
#Override
public int hashCode() {
int hash = 7;
hash = 37 * hash + Objects.hashCode(this.keys);
return hash;
}
#Override
public String toString() {
return keys + "";
}
public ArrayList<Object> getKeys() {
return keys;
}
public void setKeys(ArrayList<Object> keys) {
this.keys = keys;
}
}
Here i am using my class groupingKey by which i m dynamically passing from ux. How can get this groupByColumns in sorted form?
Not maintaining the order is a property of the Map that stores the result. If you need a specific Map behavior, you need to request a particular Map implementation. E.g. LinkedHashMap maintains the insertion order:
groupedResult = people.collect(Collectors.groupingBy(
p -> new GroupingKey(p, groupByColumns),
LinkedHashMap::new,
Collectors.mapping((Map<String, Object> p) -> p, toList())));
By the way, there is no reason to copy the contents of mapList into an array before creating the Stream. You may simply call mapList.stream() to get an appropriate Stream.
Further, Collectors.mapping((Map<String, Object> p) -> p, toList()) is obsolete. p->p is an identity mapping, so there’s no reason to request mapping at all:
groupedResult = mapList.stream().collect(Collectors.groupingBy(
p -> new GroupingKey(p, groupByColumns), LinkedHashMap::new, toList()));
But even the GroupingKey is obsolete. It basically wraps a List of values, so you could just use a List as key in the first place. Lists implement hashCode and equals appropriately (but you must not modify these key Lists afterwards).
Map<List<Object>, List<Object>> groupedResult=
mapList.stream().collect(Collectors.groupingBy(
p -> groupByColumns.stream().map(p::get).collect(toList()),
LinkedHashMap::new, toList()));
Based on #Holger's great answer. I post this to help those who want to keep the order after grouping as well as changing the mapping.
Let's simplify and suppose we have a list of persons (int age, String name, String adresss...etc) and we want the names grouped by age while keeping ages in order:
final LinkedHashMap<Integer, List<String> map = myList
.stream()
.sorted(Comparator.comparing(p -> p.getAge())) //sort list by ages
.collect(Collectors.groupingBy(p -> p.getAge()),
LinkedHashMap::new, //keeps the order
Collectors.mapping(p -> p.getName(), //map name
Collectors.toList())));

Resources