Creating a stream of booleans from a boolean array? [duplicate] - java-8

There is no nice way to convert given boolean[] foo array into stream in Java-8 in one statement, or I am missing something?
(I will not ask why?, but it is really incomprehensible: why not add stream support for all primitive types?)
Hint: Arrays.stream(foo) will not work, there is no such method for boolean[] type.

Given boolean[] foo use
Stream<Boolean> stream = IntStream.range(0, foo.length)
.mapToObj(idx -> foo[idx]);
Note that every boolean value will be boxed, but it's usually not a big problem as boxing for boolean does not allocate additional memory (just uses one of predefined values - Boolean.TRUE or Boolean.FALSE).

You can use Guava's Booleans class:
Stream<Boolean> stream = Booleans.asList(foo).stream();
This is a pretty efficient way because Booleans.asList returns a wrapper for the array and does not make any copies.

of course you could create a stream directly
Stream.Builder<Boolean> builder = Stream.builder();
for (int i = 0; i < foo.length; i++)
builder.add(foo[i]);
Stream<Boolean> stream = builder.build();
…or by wrapping an AbstractList around foo
Stream<Boolean> stream = new AbstractList<Boolean>() {
public Boolean get(int index) {return (foo[index]);}
public int size() {return foo.length;}
}.stream();

Skimming through the early access JavaDoc (ie. java.base module) of the newest java-15, there is still no neat way to make the primitive boolean array work with Stream API together well. There is no new feature in the API with treating a primitive boolean array since java-8.
Note that there exist IntStream, DoubleStream and LongStream, but nothing like BooleanStream that would represent of a variation of a sequence of primitive booleans. Also the overloaded methods of Stream are Stream::mapToInt, Stream::mapToDouble and Stream::mapToLong, but not Stream::mapToBoolean returning such hypothetical BooleanStream.
Oracle seems to keep following this pattern, which could be found also in Collectors. There is also no such support for float primitives (there is for double primitives instead). In my opinion, unlike of float, the boolean support would make sense to implement.
Back to the code... if you have a boxed boolean array (ie. Boolean[] array), the things get easier:
Boolean[] array = ...
Stream<Boolean> streamOfBoxedBoolean1 = Arrays.stream(array);
Stream<Boolean> streamOfBoxedBoolean2 = Stream.of(array);
Otherwise you have to use more than one statement as said in this or this answer.
However, you asked (emphasizes mine):
way to convert given boolean[] foo array into stream in Java-8 in one statement.
... there is actually a way to achieve this through one statement using a Spliterator made from an Iterator. It is definetly not nice but :
boolean[] array = ...
Stream<Boolean> stream = StreamSupport.stream(
Spliterators.spliteratorUnknownSize(
new Iterator<>() {
int index = 0;
#Override public boolean hasNext() { return index < array.length; }
#Override public Boolean next() { return array[index++]; }
}, 0), false);

Related

How do I avoid returning a null value while avoiding mutation?

I am trying to create a method that will take a list of items with set weights and choose 1 at random. My solution was to use a Hashmap that will use Integer as a weight to randomly select 1 of the Keys from the Hashmap. The keys of the HashMap can be a mix of Object types and I want to return 1 of the selected keys.
However, I would like to avoid returning a null value on top of avoiding mutation. Yes, I know this is Java, but there are more elegant ways to write Java and hoping to solve this problem as it stands.
public <T> T getRandomValue(HashMap<?, Integer> VALUES) {
final int SIZE = VALUES.values().stream().reduce(0, (a, b) -> a + b);
final int RAND_SELECTION = ThreadLocalRandom.current().nextInt(SIZE) + 1;
int currentWeightSum = 0;
for (Map.Entry<?, Integer> entry : VALUES.entrySet()) {
if (RAND_SELECTION > currentWeightSum && RAND_SELECTION <= (currentWeightSum + entry.getValue())) {
return (T) entry.getKey();
} else {
currentWeightSum += entry.getValue();
}
}
return null;
}
Since the code after the loop should never be reached under normal circumstances, you should indeed not write something like return null at this point, but rather throw an exception, so that irregular conditions can be spotted right at this point, instead of forcing the caller to eventually debug a NullPointerException, perhaps occurring at an entirely different place.
public static <T> T getRandomValue(Map<T, Integer> values) {
if(values.isEmpty())
throw new NoSuchElementException();
final int totalSize = values.values().stream().mapToInt(Integer::intValue).sum();
if(totalSize<=0)
throw new IllegalArgumentException("sum of weights is "+totalSize);
final int threshold = ThreadLocalRandom.current().nextInt(totalSize) + 1;
int currentWeightSum = 0;
for (Map.Entry<T, Integer> entry : values.entrySet()) {
currentWeightSum += entry.getValue();
if(threshold <= currentWeightSum) {
return entry.getKey();
}
}
// if we reach this point, the map's content must have been changed in-between
throw new ConcurrentModificationException();
}
Note that the code fixes some other issues of your code. You should not promise to return an arbitrary T without knowing the actual type of the map. If the map contains objects of different type as key, i.e. is a Map<Object,Integer>, the caller can’t expect to get anything more specific than Object. Besides that, you should not insist of the parameter to be a HashMap when any Map is sufficient. Further, I changed the variable names to adhere to Java’s naming convention and simplified the loop’s body.
If you want to support empty maps as legal input, changing the return type to Optional<T> would be the best solution, returning an empty optional for empty maps and an optional containing the value otherwise (this would disallow null keys). Still, the supposed-to-be-unreachable code point after the loop should be flagged with an exception.

Method Ref in arrays

I have a stream like so. Is it possible to change from .map(i->arr[i]) to something like .map(arr) since both are i?
public String toString() {
return Arrays.toString(IntStream.range(0, position).map(i->arr[i]).toArray());
}
There's no way to express i -> arr[i] with a method reference. However, Arrays class contains methods that can streamline this code.
You can use 3-argument Arrays.stream to avoid streaming over indexes: Arrays.stream(arr, 0, position) is equivalent to IntStream.range(0, position).map(i -> arr[i])
Since all you are doing with the stream is making a new array out of it, you can use Arrays.copyOfRange(arr, 0, position) to avoid making a stream altogether.

Why filter with side effects performs better than a Spliterator based implementation?

Regarding the question How to skip even lines of a Stream obtained from the Files.lines I followed the accepted answer approach implementing my own filterEven() method based on Spliterator<T> interface, e.g.:
public static <T> Stream<T> filterEven(Stream<T> src) {
Spliterator<T> iter = src.spliterator();
AbstractSpliterator<T> res = new AbstractSpliterator<T>(Long.MAX_VALUE, Spliterator.ORDERED)
{
#Override
public boolean tryAdvance(Consumer<? super T> action) {
iter.tryAdvance(item -> {}); // discard
return iter.tryAdvance(action); // use
}
};
return StreamSupport.stream(res, false);
}
which I can use in the following way:
Stream<DomainObject> res = Files.lines(src)
filterEven(res)
.map(line -> toDomainObject(line))
However measuring the performance of this approach against the next one which uses a filter() with side effects I noticed that the next one performs better:
final int[] counter = {0};
final Predicate<String> isEvenLine = item -> ++counter[0] % 2 == 0;
Stream<DomainObject> res = Files.lines(src)
.filter(line -> isEvenLine ())
.map(line -> toDomainObject(line))
I tested the performance with JMH and I am not including the file load in the benchmark. I previously load it into an array. Then each benchmark starts by creating a Stream<String> from previous array, then filtering even lines, then applying a mapToInt() to extract the value of an int field and finally a max() operation. Here it is one of the benchmarks (you can check the whole Program here and here you have the data file with about 186 lines):
#Benchmark
public int maxTempFilterEven(DataSource src){
Stream<String> content = Arrays.stream(src.data)
.filter(s-> s.charAt(0) != '#') // Filter comments
.skip(1); // Skip line: Not available
return filterEven(content) // Filter daily info and skip hourly
.mapToInt(line -> parseInt(line.substring(14, 16)))
.max()
.getAsInt();
}
I am not getting why the filter() approach has better performance (~80ops/ms) than the filterEven() (~50ops/ms)?
Intro
I think I know the reason but unfortunately I have no idea how to improve performance of Spliterator-based solution (at least without rewritting of the whole Streams API feature).
Sidenote 1: performance was not the most important design goal when Stream API was designed. If performance is critical, most probably re-writting the code without Stream API will make the code faster. (For example, Stream API unavoidably increases memory allocation and thus GC-pressure). On the other hand in most of the scenarios Stream API provides a nicer higher-level API at a cost of a relatively small performance degradation.
Part 1 or Short theoretical answer
Stream is designed to implement a kind of internal iteration as the main mean of consuming and external iteration (i.e. Spliterator-based) is an additional mean that is kind of "emulated". Thus external iteration involves some overhead. Laziness adds some limits to the efficiency of external iteration and a need to support flatMap makes it necessary to use some kind of dynamic buffer in this process.
Sidenote 2 In some cases Spliterator-based iteration might be as fast as the internal iteration (i.e. filter in this case). Particularly it is so in the cases when you create a Spliterator directly from that data-containing Stream. To see it, you can modify your tests to materialize your first filter into a Strings array:
String[] filteredData = Arrays.stream(src.data)
.filter(s-> s.charAt(0) != '#') // Filter comments
.skip(1)
.toArray(String[]::new);
and then compare preformance of maxTempFilter and maxTempFilterEven modified to accept that pre-filtered String[] filteredData. If you want to know why this is so, you probably should read the rest of this long answer or at least Part 2.
Part 2 or Longer theoretical answer:
Streams were designed to be mainly consumed as a whole by some terminal operation. Iterating elements one by one although supported is not designed as a main way to consume streams.
Note that using the "functional" Stream API such as map, flatMap, filter, reduce, and collect you can't say at some step "I have had enough data, stop iterating over the source and pushing values". You can discard some incoming data (as filter does) but can't stop iteration. (take and skip transformations are actually implemented using Spliterator inside; and anyMatch, allMatch, noneMatch, findFirst, findAny, etc. use non-public API j.u.s.Sink.cancellationRequested, also they are easier as there can't be several terminal operations). If all transformations in the pipeline are synchronous, you can combine them into a single aggregated function (Consumer) and call it in a simple loop (optionally splitting the loop execution over several thread). This is what my simplified version of the state based filter represents (see the code in the Show me some code section). It gets a bit more complicated if there is a flatMap in the pipeline but idea is still the same.
Spliterator-based transformation is fundamentally different because it adds an asynchronous consumer-driven step to the pipeline. Now the Spliterator rather than the source Stream drives the iteration process. If you ask for a Spliterator directly on the source Stream, it might be able to return you some implementation that just iterates over its internal data structure and this is why materializing pre-filtered data should remove performance difference. However, if you create a Spliterator for some non-empty pipeline, there is no other (simple) choice other than asking the source to push elements one by one through the pipeline until some element passes all the filters (see also second example in the Show me some code section). The fact that source elements are pushed one by one rather than in some batches is a consequence of the fundamental decision to make Streams lazy. The need for a buffer instead of just one element is the consequence of support for flatMap: pushing one element from the source can produce many elements for Spliterator.
Part 3 or Show me some code
This part tries to provide some backing with the code (both links to the real code and simulated code) of what was described in the "theoretical" parts.
First of all, you should know that current Streams API implementation accumulates non-terminal (intermediate) operations into a single lazy pipeline (see j.u.s.AbstractPipeline and its children such as j.u.s.ReferencePipeline. Then, when the terminal operation is applied, all the elements from the original Stream are "pushed" through the pipeline.
What you see is the result of two things:
the fact that streams pipelines are different for cases when you
have a Spliterator-based step inside.
the fact that your OddLines is not the first step in the pipeline
The code with a stateful filter is more or less similar to the following straightforward code:
static int similarToFilter(String[] data)
{
final int[] counter = {0};
final Predicate<String> isEvenLine = item -> ++counter[0] % 2 == 0;
int skip = 1;
boolean reduceEmpty = true;
int reduceState = 0;
for (String outerEl : data)
{
if (outerEl.charAt(0) != '#')
{
if (skip > 0)
skip--;
else
{
if (isEvenLine.test(outerEl))
{
int intEl = parseInt(outerEl.substring(14, 16));
if (reduceEmpty)
{
reduceState = intEl;
reduceEmpty = false;
}
else
{
reduceState = Math.max(reduceState, intEl);
}
}
}
}
}
return reduceState;
}
Note that this is effectively a single loop with some calculations (filtering/transformations) inside.
When you add a Spliterator into the pipeline on the other hand, things change significantly and even with simplifications code that is reasonably similar to what actually happens becomes much larger such as:
interface Sp<T>
{
public boolean tryAdvance(Consumer<? super T> action);
}
static class ArraySp<T> implements Sp<T>
{
private final T[] array;
private int pos;
public ArraySp(T[] array)
{
this.array = array;
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
if (pos < array.length)
{
action.accept(array[pos]);
pos++;
return true;
}
else
{
return false;
}
}
}
static class WrappingSp<T> implements Sp<T>, Consumer<T>
{
private final Sp<T> sourceSp;
private final Predicate<T> filter;
private final ArrayList<T> buffer = new ArrayList<T>();
private int pos;
public WrappingSp(Sp<T> sourceSp, Predicate<T> filter)
{
this.sourceSp = sourceSp;
this.filter = filter;
}
#Override
public void accept(T t)
{
buffer.add(t);
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
while (true)
{
if (pos >= buffer.size())
{
pos = 0;
buffer.clear();
sourceSp.tryAdvance(this);
}
// failed to fill buffer
if (buffer.size() == 0)
return false;
T nextElem = buffer.get(pos);
pos++;
if (filter.test(nextElem))
{
action.accept(nextElem);
return true;
}
}
}
}
static class OddLineSp<T> implements Sp<T>, Consumer<T>
{
private Sp<T> sourceSp;
public OddLineSp(Sp<T> sourceSp)
{
this.sourceSp = sourceSp;
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
if (sourceSp == null)
return false;
sourceSp.tryAdvance(this);
if (!sourceSp.tryAdvance(action))
{
sourceSp = null;
}
return true;
}
#Override
public void accept(T t)
{
}
}
static class ReduceIntMax
{
boolean reduceEmpty = true;
int reduceState = 0;
public int getReduceState()
{
return reduceState;
}
public void accept(int t)
{
if (reduceEmpty)
{
reduceEmpty = false;
reduceState = t;
}
else
{
reduceState = Math.max(reduceState, t);
}
}
}
static int similarToSpliterator(String[] data)
{
ArraySp<String> src = new ArraySp<>(data);
int[] skip = new int[1];
skip[0] = 1;
WrappingSp<String> firstFilter = new WrappingSp<String>(src, (s) ->
{
if (s.charAt(0) == '#')
return false;
if (skip[0] != 0)
{
skip[0]--;
return false;
}
return true;
});
OddLineSp<String> oddLines = new OddLineSp<>(firstFilter);
final ReduceIntMax reduceIntMax = new ReduceIntMax();
while (oddLines.tryAdvance(s ->
{
int intValue = parseInt(s.substring(14, 16));
reduceIntMax.accept(intValue);
})) ; // do nothing in the loop body
return reduceIntMax.getReduceState();
}
This code is larger because the logic is impossible (or at least very hard) to represent without some non-trivial stateful callbacks inside the loop. Here interface Sp is a mix of j.u.s.Stream and j.u.Spliterator interfaces.
Class ArraySp represents a result of Arrays.stream.
Class WrappingSp is similar to j.u.s.StreamSpliterators.WrappingSpliterator which in the real code represents an implementation of Spliterator interface for any non-empty pipeline i.e. a Stream with at least one intermediate operation applied to it (see j.u.s.AbstractPipeline.spliterator method). In my code I merged it with a StatelessOp subclass and put there logic responsible for filter method implementation. Also for simplcity I implemented skip using filter.
OddLineSp corresponds to your OddLines and its resulting Stream
ReduceIntMax represents ReduceOps terminal operation for Math.max for int
So what's important in this example? The important thing here is that since you first filter you original stream, your OddLineSp is created from a non-empty pipeline i.e. from a WrappingSp. And if you take a closer look at WrappingSp, you'll notice that every time tryAdvance is called, it delegates the call to the sourceSp and accumulates that result(s) into a buffer. Moreover, since you have no flatMap in the pipeline, elements to the buffer will be copied one by one. I.e. every time WrappingSp.tryAdvance is called, it will call ArraySp.tryAdvance, get back exactly one element (via callback), and pass it further to the consumer provided by the caller (unless the element doesn't match the filter in which case ArraySp.tryAdvance will be called again and again but still the buffer is never filled with more than one element at a time).
Sidenote 3: If you want to look at the real code, the most intersting places are j.u.s.StreamSpliterators.WrappingSpliterator.tryAdvance which calls
j.u.s.StreamSpliterators.AbstractWrappingSpliterator.doAdvance which in turn calls j.u.s.StreamSpliterators.AbstractWrappingSpliterator.fillBuffer which in turn calls pusher that is initialized at j.u.s.StreamSpliterators.WrappingSpliterator.initPartialTraversalState
So the main thing that's hurting performance is this copying into the buffer.
Unfortunately for us, usual Java developers, current implementation of the Stream API is pretty much closed and you can't modify only some aspects of the internal behavior using inheritance or composition.
You may use some reflection-based hacking to make copying-to-buffer more efficient for your specific case and gain some performance (but sacrifice laziness of the Stream) but you can't avoid this copying altogether and thus Spliterator-based code will be slower anyway.
Going back to the example from the Sidenote #2, Spliterator-based test with materialized filteredData works faster because there is no WrappingSp in the pipeline before OddLineSp and thus there will be no copying into an intermediate buffer.

Need help overriding compareTo

I'm a Java novice that is having some trouble overriding the compareTo method in the Comparable interface. My code creates a HashMap that associates strings to an int. I would like to override compareTo so that the strings in the ArrayList keys are sorted based on their HashMap values, not alphabetically. Under this implementation, however, the strings are still sorted alphabetically.
Oh, and to clarify, nameWeight is the HashMap of String,Integer pairs.
Any ideas?
List<String> keys = new ArrayList<String>(nameWeight.keySet());
System.out.println(keys);
Collections.sort(keys);
public int compareTo(String that){
int gtr = 1;
int less = -1;
int eql = 0;
System.out.print(this);
System.out.print(that);
if(that=="JOHN")
return less;
int valThis = nameWeight.get(this);
int valThat = nameWeight.get(that);
if(valThis==valThat)
return eql;
if(valThis>valThat)
return gtr;
if(valThis<valThat)
return less;
return gtr;
}
You're sorting a list of strings so the compareTo method called is the one defined in class String (or its superclass). Since you can't modify String you would have to create a subclass of String, override compareTo in that class and use List<StringSubClass>. But since String is final you're not allowed to subclass from it (thanks #pst).
Alternatively you don't use String objects in the list but objects of the type you created and in which you override compareTo (don't forget to add implements Comparable to the class definition).
Or (shameless plug from #pst again), and that is probably the best solution, you pass a comparator to the sort function which will be used to sort the strings instead of the default implementation.

Boolean that can only be flipped to true

Is there a name for a data structure (read: boolean) that can only be moved from false to true, and not back to false? Imagine something encapsulated like so:
private var _value = false
def value = _value
def turnOnValue() = value = true
And out of curiosity, are there any platforms that support it natively? This seems like something somebody must have come across before...
You're describing a temporal property of a variable; rather than a data structure as such. The data type is a simple boolean, but it is how it is used that is interesting -- as a sort of 'latch' in time.
That kind of latch property on a boolean data type would make it an example of a linearly typed boolean. Linear types, and other kinds of uniqueness types are used to enforce temporal properties of variables -- e.g. that they can only be used once; or cannot be shared.
They're useful for enforcing at compile time that an action has happened (e.g. initialization) or having a compile-time proof that an object is not shared. Thus, they're most common in systems programming, where proofs of low level properties of this are key to correct software design.
In perl you have Tie Variables and you can build your on scalar value and make this kind of "type". But natively... maybe in Smalltalk can be possible to build something like this, or Prolog, but I don't know.
Make your own data type
public final class CustomBoolean {
private boolean value;
public void setValue(boolean value){
// Bitwise OR
this.value |= value;
}
public boolean getValue(){
return value;
}
}
Example ::
public static void main (String[] args)
{
CustomBoolean foo = new CustomBoolean();
foo.setValue(false);
System.out.println(foo.getValue());
foo.setValue(true);
System.out.println(foo.getValue());
foo.setValue(false);
System.out.println(foo.getValue());
}
The output would be ::
false
true
true
This means you'll have to call getValue() before doing any explicit boolean operations
ie
if(foo.getValue() && 1 == 1)
The example is written in Java.

Resources