What is the most elegant way to run a lambda for each element of a Java 8 stream and simultaneously count how many elements were processed? - java-8

What is the most elegant way to run a lambda for each element of a Java 8 stream and simultaneously count how many items were processed, assuming I want to process the stream only once and not mutate a variable outside the lambda?

It might be tempting to use
long count = stream.peek(action).count();
and it may appear to work. However, peek’s action will only be performed when an element is being processed, but for some streams, the count may be available without processing the elements. Java 9 is going to take this opportunity, which makes the code above fail to perform action for some streams.
You can use a collect operation that doesn’t allow to take short-cuts, e.g.
long count = stream.collect(
Collectors.mapping(s -> { action.accept(s); return s; }, Collectors.counting()));
or
long count = stream.collect(Collectors.summingLong(s -> { action.accept(s); return 1; }));

I would go with a reduce operation of some sort, something like this:
int howMany = Stream.of("a", "vc", "ads", "ts", "ta").reduce(0, (i, string) -> {
if (string.contains("a")) {
// process a in any other way
return i+1;
}
return i;
}, (left, right) -> null); // override if parallel stream required
System.out.println(howMany);

This can be done with peek function, as it returns a stream consisting of the elements of this stream, additionally performing the provided action on each element as elements are consumed from the resulting stream.
AtomicInteger counter = new AtomicInteger(0);
elements
.stream()
.forEach(doSomething())
.peek(elem -> counter.incrementAndGet());
int elementsProcessed = counter.get();

Streams are lazily evaluated and therefore processed in a single step, combining all intermediate operations when a final operation is called, no matter how many operations you perform over them.
This way, you don't have to worry because your stream will be processed at once. But the best way to perform some operation on each stream's element and count the number of elements processed depends on your goal.
Anyway, the two examples below don't mutate a variable to perform that count.
Both examples create a Stream of Strings, perform a trim() on each String to remove blank spaces and then, filter the Strings that have some content.
Example 1
Uses the peek method to perform some operation over each filtered string. In this case, just print each one. Finally, it just uses the count() to get how many Strings were processed.
Stream<String> stream =
Stream.of(" java", "", " streams", " are", " lazily ", "evaluated");
long count = stream
.map(String::trim)
.filter(s -> !s.isEmpty())
.peek(System.out::println)
.count();
System.out.printf(
"\nNumber of non-empty strings after a trim() operation: %d\n\n", count);
Example 2
Uses the collect method after filtering and mapping to get all the processed Strings into a List. By this way, the List can be printed separately and the number of elements got from list.size()
Stream<String> stream =
Stream.of(" java", "", " streams", " are", " lazily ", "evaluated");
List<String> list = stream
.map(String::trim)
.filter(s -> !s.isEmpty())
.collect(Collectors.toList());
list.forEach(System.out::println);
System.out.printf(
"\nNumber of non-empty strings after a trim() operation: %d\n\n", list.size());

Related

Is there any performance benefit of using Arrays.stream() over iterating on an array?

I need to iterate on all the enum values, check if they were used to construct an int (called input) and if so, add them to a Set (called usefulEnums). I can either use streams API or iterate over all the enums to do this task. Is there any benefit of using Arrays.stream() over the traditional approach of iterating over the values() array?
enum TestEnum { VALUE1, VALUE2, VALUE3 };
Set<TestEnum> usefulEnums = new HashSet<>();
Arrays.stream(TestEnum.values())
.filter(t -> (input & t.getValue()) != 0)
.forEach(usefulEnums::add);
for (TestEnum t : TestEnum.values()) {
if ((input & t.getValue()) != 0) {
usefulEnums.add(t);
}
}
If you care for efficiency, you should consider:
Set<TestEnum> usefulEnums = EnumSet.allOf(TestEnum.class);
usefulEnums.removeIf(t -> (input & t.getValue()) == 0);
Note that when you have to iterate over all enum constants of a type, using EnumSet.allOf(EnumType.class).stream() avoids the array creation of EnumType.values() entirely, however, most enum types don’t have enough constants for this to make a difference. Further, the JVM’s optimizer may remove the temporary array creation anyway.
But for this specific task, where the result is supposed to be a Set<TestEnum>, using an EnumSet instead of a HashSet may even improve subsequent operations working with the Set. Creating an EnumSet holding all constants and removing unintented constants like in the solution above, means just initializing a long with 0b111, followed by clearing the bits of nonmatching elements.
For this short operation the for loop is going to be faster (nano-seconds faster), but to me the stream operation is more verbose, it tells exactly what is being done here. It's like reading diagonally.
Also you could collect directly to a HashSet:
Arrays.stream(TestEnum.values())
.filter(t -> (input & t.getValue()) != 0)
.collect(Collectors.toCollection(HashSet::new));
Valuable input from Holger as usual makes this even nicer:
EnumSet<TestEnum> filtered = EnumSet.allOf(TestEnum.class).stream()
.filter(t -> (input & t.getValue()) != 0)
.collect(Collectors.toCollection(() -> EnumSet.noneOf(TestEnum.class)));

Java8 style for comparing arrays? (Streams and Math3)

I'm just beginning to learn Java8 streams and Apache commons Math3 at the same time, and looking for missed opportunities to simplify my solution for comparing instances for equality. Consider this Math3 RealVector:
RealVector testArrayRealVector =
new ArrayRealVector(new double [] {1d, 2d, 3d});
and consider this member variable containing boxed doubles, plus this copy of it as an array list collection:
private final Double [] m_ADoubleArray = {13d, 14d, 15d};
private final Collection<Double> m_CollectionArrayList =
new ArrayList<>(Arrays.asList(m_ADoubleArray));
Here is my best shot at comparing these in a functional style in a JUnit class (full gist here), using protonpack from codepoetix because I couldn't find zip in the Streams library. This looks really baroque to my eyes and I wonder whether I've missed ways to make this shorter, faster, simpler, better because I'm just beginning to learn this stuff and don't know much.
// Make a stream out of the RealVector:
DoubleStream testArrayRealVectorStream =
Arrays.stream(testArrayRealVector.toArray());
// Check the type of that Stream
assertTrue("java.util.stream.DoublePipeline$Head" ==
testArrayRealVectorStream.getClass().getTypeName());
// Use up the stream:
assertEquals(3, testArrayRealVectorStream.count());
// Old one is used up; make another:
testArrayRealVectorStream = Arrays.stream(testArrayRealVector.toArray());
// Make a new stream from the member-var arrayList;
// do arithmetic on the copy, leaving the original unmodified:
Stream<Double> collectionStream = getFreshMemberVarStream();
// Use up the stream:
assertEquals(3, collectionStream.count());
// Stream is now used up; make new one:
collectionStream = getFreshMemberVarStream();
// Doesn't seem to be any way to use zip on the real array vector
// without boxing it.
Stream<Double> arrayRealVectorStreamBoxed =
testArrayRealVectorStream.boxed();
assertTrue(zip(
collectionStream,
arrayRealVectorStreamBoxed,
(l, r) -> Math.abs(l - r) < DELTA)
.reduce(true, (a, b) -> a && b));
where
private Stream<Double> getFreshMemberVarStream() {
return m_CollectionArrayList
.stream()
.map(x -> x - 12.0);
}
Again, here is a gist of my entire JUnit test class.
It seems you are trying to bail in Streams at all cost.
If I understand you correctly, you have
double[] array1=testArrayRealVector.toArray();
Double[] m_ADoubleArray = {13d, 14d, 15d};
as starting point. Then, the first thing you can do is to verify the lengths of these arrays:
assertTrue(array1.length==m_ADoubleArray.length);
assertEquals(3, array1.length);
There is no point in wrapping the arrays into a stream and calling count() and, of course, even less in wrapping an array into a collection to call stream().count() on it. Note that if your starting point is a Collection, calling size() will do as well.
Given that you already verified the length, you can simply do
IntStream.range(0, 3).forEach(ix->assertEquals(m_ADoubleArray[ix]-12, array1[ix], DELTA));
to compare the elements of the arrays.
or when you want to apply arithmetic as a function:
// keep the size check as above as the length won’t change
IntToDoubleFunction f=ix -> m_ADoubleArray[ix]-12;
IntStream.range(0, 3).forEach(ix -> assertEquals(f.applyAsDouble(ix), array1[ix], DELTA));
Note that you can also just create a new array using
double[] array2=Arrays.stream(m_ADoubleArray).mapToDouble(d -> d-12).toArray();
and compare the arrays similar to above:
IntStream.range(0, 3).forEach(ix -> assertEquals(array1[ix], array2[ix], DELTA));
or just using
assertArrayEquals(array1, array2, DELTA);
as now both arrays have the same type.
Don’t think about that temporary three element array holding the intermediate result. All other attempts consume far more memory…

How to retainAll of List of Lists using stream reduce

I faced following problem. I have a list of lists which i simply want to retainAll. I'm trying to do with streams
private List<List<Long>> ids = new ArrayList<List<Long>>();
// some ids.add(otherLists);
List<Long> reduce = ids.stream().reduce(ids.get(0), (a, b) -> a.addAll(b));
unfortunately I got the error
Error:(72, 67) java: incompatible types: bad return type in lambda expression
boolean cannot be converted to java.util.List<java.lang.Long>
If you want to reduce (I think you mean flatten by that) the list of lists, you should do it like this:
import static java.util.stream.Collectors.toList
...
List<Long> reduce = ids.stream().flatMap(List::stream).collect(toList());
Using reduce, the first value should be the identity value which is not the case in your implementation, and your solution will produce unexpected results when running the stream in parallel (because addAll modifies the list in place, and in this case the identity value will be the same list for partial results).
You'd need to copy the content of the partial result list, and add the other list in it to make it working when the pipeline is run in parallel:
List<Long> reduce = ids.parallelStream().reduce(new ArrayList<>(), (a, b) -> {
List<Long> list = new ArrayList<Long>(a);
list.addAll(b);
return list;
});
addAll returns a boolean, not the union of the two lists. You want
List<Long> reduce = ids.stream().reduce(ids.get(0), (a, b) -> {
a.addAll(b);
return a;
});

How do I sync RxJS updates so that intermediate values aren't passed through the stream?

In my system I have a source, two "steps" that map the source to a new value, and then a sum that combines those two steps to create a final value. The initial run through of this system works as I hoped, generating a single sum of 3.
var source = new Rx.BehaviorSubject(0);
var stepOne = source.map(function (value) {
return value + 1;
});
var stepTwo = source.map(function (value) {
return value + 2;
});
var sum = Rx.Observable.combineLatest(
stepOne,
stepTwo,
function (s1, s2) {
console.log('calc sum: ' + (s1 + s2));
return s1 + s2;
}).subscribe(function (sum) {
});
Outputs:
> calc sum: 3
But if I then put in a new value for source I get two results like this:
source.onNext(1);
> calc sum: 4
> calc sum: 5
The first is an intermediate result… as the new source value passes through one part of the system, and then I get the final result when all values have finished propagating.
So my questions is, what's the recommended way to configure things so that a new value pushed into source will pass through the system atomically and only generate one sum result?
Thanks!
That's how combineLatest works, it indeed is confusing since it allows these temporarily inconsistent states as you pointed out. The key thing to learn from combineLatest is that it emits a new item whenever any one of its sources emits a new item, and it does so partially, it doesn't have any sort of "waiting" mechanism.
In diagrams, http://rxmarbles.com/#combineLatest.
What you probably want is the zip operator. Zip waits for its inputs to emit items that match with each other. In other words, zip's output emits its n-th item once all the n-th items from all inputs have been emitted. It is ideal for this diamond case where you have source generating stepOne and stepTwo and you want to combine stepOne and stepTwo.
In diagrams, http://rxmarbles.com/#zip.
Keep in mind that zip assumes the inputs have the same frequency of emissions. In other cases, you might want to combine items from stepOne with stepTwo when they have a different frequency of emissions. Then you need to use combineLatest.

java 8 stream interference versus non-interference

I understand why the following code is ok. Because the collection is being modified before calling the terminal operation.
List<String> wordList = ...;
Stream<String> words = wordList.stream();
wordList.add("END"); // Ok
long n = words.distinct().count();
But why is this code is not ok?
Stream<String> words = wordList.stream();
words.forEach(s -> if (s.length() < 12) wordList.remove(s)); // Error—interference
Stream.forEach() is a terminal operation, and the underlying wordList collection is modified after the terminal has been started/called.
Joachim's answer is correct, +1.
You didn't ask specifically, but for the benefit of other readers, here are a couple techniques for rewriting the program a different way, avoiding stream interference problems.
If you want to mutate the list in-place, you can do so with a new default method on List instead of using streams:
wordList.removeIf(s -> s.length() < 12);
If you want to leave the original list intact but create a modified copy, you can use a stream and a collector to do that:
List<String> newList = wordList.stream()
.filter(s -> s.length() >= 12)
.collect(Collectors.toList());
Note that I had to invert the sense of the condition, since filter takes a predicate that keeps values in the stream if the condition is true.

Resources