Turning an array of Java8 streams into a stream of tuples - java-8

Let's say I have an array of Java 8 streams: Stream<T>[] streams, I'd like to make a Stream where each element of the new stream is an array composed by picking one element from each of the initial base streams (let's assume they're all sequential).
For instance if I have:
streams [ 0 ] returning: ( "A", "B", "C" ),
streams [ 1 ] returning ( "X", "Y", "Z" )
and streams [ 2 ] as ( "0", "1", "2" )
I'd like a stream that returns
( { "A", "X", "0" }, { "B", "Y", "1" }, { "C", "Z", "2" } )
Is there some code that already implements this? I have an idea of how to do it, it would be a generalisation of the pair case, but I'd like to know if something reusable is already around.
EDIT: sorry, I realised I need some clarification:
I don't want to create the whole matrix, I want a stream that dynamically returns one row at a time (first A/X/0, then B/Y/1, etc), without having to occupy memory with all the rows in advance. I'm fine with reasonable assumptions over the sizes of base streams (eg, taking the minimum, stopping as soon as there is a stream that has no more elements to return).
I know this can be implemented by first turning the base streams into iterators, then creating a new iterator which of next() picks one element from each of the underlining iterators and returns a new row. That is what the pair example I've linked above does and I could implement it that way on myself, here I'm trying to understand if it has been already done in some library (I know JDK has no such function).

First things first, it's a very bad idea to keep an array of streams, because they can't be reused and it complicates already complicated possible solutions.
No, it's not possible in the plain JDK. There is no zip functionality, neither we have Tuples, so I am afraid this is the best thing you can come up with:
Stream[] streams = Stream.of(
Stream.of("A", "B", "C"),
Stream.of("X", "Y", "Z"),
Stream.of("0", "1", "2"))
.toArray(Stream[]::new);
String[][] arrays = Arrays.stream(streams)
.map(s -> s.toArray(String[]::new))
.toArray(String[][]::new);
int minSize = Arrays.stream(arrays)
.mapToInt(s -> s.length)
.min().orElse(0);
String[][] zipped = IntStream.range(0, minSize)
.mapToObj(i -> Arrays.stream(arrays)
.map(s -> s[i])
.toArray(String[]::new))
.toArray(String[][]::new);
First, we need to convert an array of streams into an array of arrays or anything else that we can traverse more than once.
Second, you did not specify what to do if streams inside the array have varying lengths, I assumed standard zip behaviour which joins elements as long as we can extract elements from each collection.
Third, I am creating here a stream of all possible indexes for zipping (IntStream.range(0, minSize)) and manually extracting element by element from each nested array.
It's fine to use .get() on Optional here because calculating minSize guarantees that there will be something in there.
Here is a more reasonable approach assuming that we are dealing with lists of lists:
List<List<String>> lists = Arrays.asList(
Arrays.asList("A", "B", "C"),
Arrays.asList("X", "Y", "Z"),
Arrays.asList("0", "1", "2"));
final int minSize = lists.stream()
.mapToInt(List::size)
.min().orElse(0);
List<List<String>> result = IntStream.range(0, minSize)
.mapToObj(i -> lists.stream()
.map(s -> s.get(i))
.collect(Collectors.toList()))
.collect(Collectors.toList());
Java 9's Stream API additions will probably allow us to drop the calculation of minSize.
If you want the generation of sequences to remain lazy, you can simply not collect the results:
IntStream.range(0, minSize)
.mapToObj(i -> lists.stream()
.map(s -> s.get(i))
.collect(Collectors.toList()));

If you really mean an arbitrary number of Streams as input - the's not TupleX that I can think of, but if you really know that the incoming streams are all the same size (no infinite Streams), then may be this will fit your needs:
#SafeVarargs
static <T> Stream<Stream<T>> streamOfStreams(Stream<T>... streams) {
#SuppressWarnings("unchecked")
Iterator<T>[] iterators = new Iterator[streams.length];
for (int i = 0; i < streams.length; ++i) {
iterators[i] = streams[i].iterator();
}
Iterator<T> first = iterators[0];
Builder<Stream<T>> outer = Stream.builder();
Builder<T> inner = Stream.builder();
while (first.hasNext()) {
for (int i = 0; i < streams.length; ++i) {
inner.add(iterators[i].next());
}
outer.add(inner.build());
inner = Stream.builder();
}
return outer.build();
}

Since Guava version 21, you can use the Streams.zip utility method, which does what you want, except that it only works for two streams.
Now, if you turn your array of streams into a stream of streams, you could use this Streams.zip method to perform a reduction:
Stream<List<String>> zipped = Arrays.stream(streams)
.map(s -> s.map(e -> {
List<String> l = new ArrayList<>();
l.add(e);
return l;
}))
.reduce((s1, s2) -> Streams.zip(s1, s2, (l1, l2) -> {
l1.addAll(l2);
return l1;
}))
.orElse(Stream.empty());
List<List<String>> tuples = zipped.collect(Collectors.toList());
System.out.println(tuples); // [[A, X, 0], [B, Y, 1], [C, Z, 2]]
Note that before reducing, you need to map each Stream<T> to Stream<List<T>>, so that you can use List.addAll to zip the streams.
Edit: The code above works, but I have serious concerns regarding its performance and memory footprint, mainly due to the creation of multiple lists of one single element.
Maybe using the version of Stream.reduce that accepts an identity, an accumulator and a combiner works better:
Stream<List<String>> zipped = Arrays.stream(streams)
.reduce(
IntStream.range(0, streams.length).mapToObj(n -> new ArrayList<>()),
(z, s) -> Streams.zip(z, s, (l, e) -> {
l.add(e);
return l;
}),
(s1, s2) -> Streams.zip(s1, s2, (l1, l2) -> {
l1.addAll(l2);
return l1;
}));
List<List<String>> tuples = zipped.collect(Collectors.toList());
System.out.println(tuples); // [[A, X, 0], [B, Y, 1], [C, Z, 2]]
The identity needs to be a stream of n empty lists, with n being the length of the streams array, while the accumulator uses Streams.zip to zip a stream of lists with a stream of elements. The combiner remains the same as before: it uses Streams.zip to zip two streams of lists.

OK, it seeems there isn't anything like that around, so I've written it myself:
TupleSpliterator, to build a tuple spliterator starting from an array of spliterators;
Tuple Stream Builder, which builds a tuple stream, starting from an array of streams and exploiting a tuple iterator.
The Spliteraror/Iterator based allow for parallelism (under certain conditions), in case you want something simpler, but sequential, a TupleIterator is available as well.
Usage examples available in unit tests (here and here), the classes are part of this utility package.
EDIT: I've added the Spliterator implementation, after the comment from Federico, noticing that the Iterator-based version can't be parallel.

Related

Saving the in-order visit of a binary tree in an array

The question is the following:
is there an algorithm which, given an binary tree T and an array, allows to store in the array the result of the correspondent in-order visit of the tree?
Pseudo-code of a "normal" in-order visit:
inOrder(x){ // x is a node of a binary tree
if(x != NIL){ // the node is not null
inOrder(x.left)
print(x.key)
inOrder(x.right)
}
}
// Function calling inOrder
printInOrder(T){ // T is a binary tree
inOrder(T.root) // T.root is the root of the tree
}
Example:
Given the following tree
5
/ \
3 8
/ \ /
2 7 1
the algorithm above should output 2 3 7 5 1 8.
I'm sure this can be achieved and it shouldn't be too hard but I'm currently struggling for this problem.
Writing to an array (instead of printing) means you need to keep track of which index to write at in the array. If you need to do this without any mutable state other than the array itself, you need to pass the current index as an argument, and return the new current index.
The code below is written in static single assignment form so even the local variables are not mutated. If that isn't required then the code can be simplified a bit. I'm assuming that the array length is known; if it needs to be computed, that is a separate problem.
inOrder(x, arr, i) {
if(x == NIL) {
return i
} else {
i2 = inOrder(x.left, arr, i)
arr[i2] = x.key
i3 = inOrder(x.right, arr, i2 + 1)
return i3
}
}
getArrayInOrder(T, n) {
arr = new array of length n
inOrder(T.root, arr, 0)
return arr
}
First about arrays : In order to populate an array you need to know the length of the array beforehand. If you don't need to specify the length for instantiating an array ( depending on the language that you use ) then that's not really an array. Its a dynamic data structure whose size is increased automatically by the language implementation.
Now I assume you don't know the size of the tree beforehand. If you do know the size, you can instantiate an array of specified size. Assuming you don't know the size of the array, you need to go with a dynamic data structure like ArrayList in Java.
So at each print(x.key) in your code, just append the x.key to the list ( like list.add(x.key) ). After the traversal is complete you may turn your List to array.
You could use iterative version of the traversal too.
One simple solution for recursive approach is to use a single element array to track the index like:
void inOrder(x, int[] idx, int[] arr):
if x != NIL:
inOrder(x.left, idx, arr)
arr[idx[0]++] = x.key
inOrder(x.right, idx, arr)
although I m sure there may be other ways that can become cumbersome (maybe). I prefer iterative version anyway.
If your language / use-case allows putting ints into the array, you could store the index in the array. I'm going backwards because that's simpler then:
inOrder(x, arr){
if(x != NIL){
inOrder(x.right)
arr[--arr[0]] = x.key
inOrder(x.left)
}
}
saveInOrder(T, n){
arr = new int[n]
arr[0] = n
inOrder(T.root, arr)
return arr
}

Why stream isn't limited, is there is filter() before limit()? [duplicate]

List<Integer> integer = Stream.generate(new Supplier<Integer>() {
int i = 0 ;
#Override
public Integer get() {
return ++i;
}
}).filter(j -> j < 5)
.limit(10) // Note the call to limit here
.collect(Collectors.toList());
Counter to my expectation, the collect call never returns. Setting limit before filter produces the expected result. Why?
Since there are only 4 elements that pass the filter, limit(10) never reaches 10 elements, so the Stream pipeline keeps generating new elements and feeding them to the filter, trying to reach 10 elements that pass the filter, but since only the first 4 elements pass the filter, the processing never ends (at least until i overflows).
The Stream pipeline is not smart enough to know that no more elements can pass the filter, so it keeps processing new elements.
Flipping the limit and the filter clauses has different behaviors.
If you put the limit first, the stream will first generate 10 integers [1..10], and then filter them leaving only those smaller than 5.
In the original ordering, with the filter applied first, integers are generated and filtered until you reach 10 elements. This isn't an infinite operator, as i in the supplier will eventually overflow, but it will take a while, especially on a slow computer, to reach MAX_INT.
If you want to stop either if number 5 is reached or 10 elements are collected, there's Stream.takeWhile() method added in Java-9:
List<Integer> integer = Stream.generate(new Supplier<Integer>() {
int i = 0 ;
#Override
public Integer get() {
return ++i;
}
}).takeWhile(j -> j < 5).limit(10).collect(Collectors.toList());
It will finish, after the Supplier overflows and starts generating negative numbers. The resulting list will contain:
[1, 2, 3, 4, -2147483648, -2147483647, -2147483646, -2147483645, -2147483644, -2147483643]
The reason for this is in other answers. On my i7 machine it took 40 seconds to complete.

Function that checks whether a pair is included in a whitelist?

So say I have a list of "whitelist pairs" like so:
a | b
a | c
f | g
And say I want to write a method like so:
function checkIfInWhitelist(itemOne, itemTwo) {
...
}
Here's the desired functionality:
checkIfInWhiteList(a, b) // true
checkIfInWhitelist(b, a) // true
checkIfInWhitelist(b, c) // false
checkIfInWhiteList(g, f) // true
(Basically I want to check if the pair exists in the whitelist)
What's the best and most efficient way to do this?
I was thinking a dictionary where the keys are anything that appears in the whitelist and the values are a list of things that are matched with the key?
For instance, the three whitelist pairs above would map to:
a: [b, c]
b: [a]
f: [g]
g: [f]
Then, checkIfInWhitelist would be implemented like so:
function checkIfInWhitelist(itemOne, itemTwo) {
return map.contains(itemOne) && map[itemOne].contains(itemTwo)
}
Is there a better way to do this?
If you have a reasonable implementation of hash which works on std::pair (such as the one in Boost), and the objects have a fast total order method, then you can do it with a single hash table without artificially doubling the size of the table. Just use a std::unordered_set and normalize each pair into non-decreasing order before inserting it. (That is, if a < b, insert std::make_pair(a, b); otherwise insert std::make_pair(b, a).)
Very rough code, missing lots of boilerplate. I should have used perfect forwarding. Not tested.
template<typename T> struct PairHash {
std::unordered_set<std::pair<T, T>> the_hash_;
using iterator = typename std::unordered_set<std::pair<T, T>>::iterator;
std::pair<iterator, bool> insert(const T& a, const T& b) {
return the_hash_.insert(a < b ? std::make_pair(a, b)
: std::make_pair(b, a));
}
bool check(const T& a, const T& b) {
return the_hash_.end() != the_hash_.find(
a < b ? std::make_pair(a, b)
: std::make_pair(b, a));
}
};
Minimal way to do this:
Have an hashmap with data and what you want to check.
As you want to check an unsorted pair, keep an hashmap of unsorted pair
Possible solutions with unsorted datas:
Solution A keep a set of two
Solution B keep the two values in one certain order, but with no meaning. For example, (x,y) => X/Y, and (y,x) => X/Y.
you only choose ascendent (or descendent) : X < Y
Solution A takes more space, more time (you have to compare sets).
Solution B needs a little process (order a,b) but everything else is faster
For checking, you have to pre-process: unorder your data: (a,b) => a,b if a < b, or b,a if b < a
Solution C with sorted datas
It is longer to pre-process, but (a little faster) to check:
keep every (a,b), and (b,a) in your Hashmap: HashMap of List for example (or some implementation of Pair)
So your check is direct. But your first pre-process takes: O(2n . log 2.n) = 2 O(n.log n) + 2 log2 O(n).
So it depends on how many checks you will process after.
As it is very fast to compare and invert a and b, I would recommend solution B.
And if you know the type of your datas (alpha for example), you can store the couple in Strings, like: a-b
your solution is not optimal.
It combines bad effects of my solutions C and A: 2 embedded Hash collections, and duplicate datas.
And it forgets c:[a] .
Hope it helps.
You can't possibly do better than O(1) - just use a hash implementation that gives you O(1) lookup time on average (C++ STL unordered_map for example). Assuming you are okay with the memory hit, this should be the most performant solution (performant in terms of execution time, not necessarily memory overhead).

Scala - sort based on Future result predicate

I have an array of objects I want to sort, where the predicate for sorting is asynchronous. Does Scala have either a standard or 3rd party library function for sorting based on a predicate with type signature of (T, T) -> Future[Bool] rather than just (T, T) -> Bool?
Alternatively, is there some other way I could structure this code? I've considered finding all the 2-pair permutations of list elements, running the predicate over each pair and storing the result in a Map((T, T), Bool) or some structure to that effect, and then sorting on it - but I suspect that will have many more comparisons executed than even a naive sorting algorithm would.
If your predicate is async you may prefer to get an async result too and avoid blocking threads using Await
If you want to sort a List[(T,T)] according to a future boolean predicate, the easiest it to sort a List[(T,T,Boolean)]
So given a you have a List[(T,T)] and a predicate (T, T) -> Future[Bool], how can you get a List[(T,T,Boolean)]? Or rather a Future[List[(T,T,Boolean)]] as you want to keep the async behavior.
val list: List[(T,T)] = ...
val predicate = ...
val listOfFutures: List[Future[(T,T,Boolean]] = list.map { tuple2 =>
predicate(tuple2).map( bool => (tuple2._1, tuple2._2, bool)
}
val futureList: Future[List[(T,T,Boolean)]] = Future.sequence(listOfFutures)
val futureSortedResult: Future[List[(T,T)]] = futureList.map { list =>
list.sort(_._3).map(tuple3 => (tuple3._1,tuple3._2))
}
This is pseudo-code, I didn't compile it and it may not, but you get the idea.
The key is Future.sequence, very useful, which somehow permits to transform Monad1[Monad2[X]] to Monad2[Monad1[X]] but notice that if any of your predicate future fail, the global sort operation will also be a failure.
If you want better performance it may be a better solution to "batch" the call to the service returning the Future[Boolean].
For example instead of (T, T) -> Future[Bool] maybe you can design a service (if you own it obviously) like List[(T, T)] -> Future[List[(T,T,Bool)] so that you can get everything you need in a async single call.
A not so satisfactory alternative would be to block each comparison until the future is evaluated. If evaluating your sorting predicate is expensive, sorting will take a long time. In fact, this just translates a possibly concurrent program into a sequential one; all benefits of using futures will be lost.
import scala.concurrent.duration._
implicit val executionContext = ExecutionContext.Implicits.global
val sortingPredicate: (Int, Int) => Future[Boolean] = (a, b) => Future{
Thread.sleep(20) // Assume this is a costly comparison
a < b
}
val unsorted = List(4, 2, 1, 5, 7, 3, 6, 8, 3, 12, 1, 3, 2, 1)
val sorted = unsorted.sortWith((a, b) =>
Await.result(sortingPredicate(a, b), 5000.millis) // careful: May throw an exception
)
println(sorted) // List(1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 6, 7, 8, 12)
I don't know if there is an out of the box solution that utilizes asynchronous comparison. However, you could try to implement your own sorting algorithm. If we consider Quicksort, which runs in O(n log(n)) on average, then we can actually utilize asynchronous comparison quite easy.
If you're not familiar with Quicksort, the algorithm basically does the following
Choose an element from the collection (called the Pivot)
Compare the pivot with all remaining elements. Create a collection with elements that are less than the pivot and one with elements that are greater than the pivot.
Sort the two new collections and concatenate them, putting the pivot in the middle.
Since step 2 performs a lot of independent comparisons we can evaluate the comparisons concurrently.
Here's an unoptimized implementation:
object ParallelSort {
val timeout = Duration.Inf
implicit class QuickSort[U](elements: Seq[U]) {
private def choosePivot: (U, Seq[U]) = elements.head -> elements.tail
def sortParallelWith(predicate: (U, U) => Future[Boolean]): Seq[U] =
if (elements.isEmpty || elements.size == 1) elements
else if (elements.size == 2) {
if (Await.result(predicate(elements.head, elements.tail.head), timeout)) elements else elements.reverse
}
else {
val (pivot, other) = choosePivot
val ordering: Seq[(Future[Boolean], U)] = other map { element => predicate(element, pivot) -> element }
// This is where we utilize asynchronous evaluation of the sorting predicate
val (left, right) = ordering.partition { case (lessThanPivot, _) => Await.result(lessThanPivot, timeout) }
val leftSorted = left.map(_._2).sortParallelWith(predicate)
val rightSorted = right.map(_._2).sortParallelWith(predicate)
leftSorted ++ (pivot +: rightSorted)
}
}
}
which can be used (same example as above) as follows:
import ParallelSort.QuickSort
val sorted2 = unsorted.sortParallelWith(sortingPredicate)
println(sorted2) // List(1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 6, 7, 8, 12)
Note that whether or not this implementation of Quicksort is faster or slower than the completely sequential built-in sorting algorithm highly depends on the cost of a comparison: The longer a comparison has to block, the worse is the alternative solution mentioned above. On my machine, given a costly comparison (20 milliseconds) and the above list, the built-in sorting algorithm runs in ~1200 ms while this custom Quicksort runs in ~200 ms. If you're worried about performance, you'd probably want to come up with something smarter. Edit: I just checked how many comparisons both, the built-in sorting algorithm and the custom Quicksort algorithm perform: Apparently, for the given list (and some other lists I randomly typed in) the built-in algorithm uses more comparisons, so the performance improvements thanks to parallel execution might not be that great. I don't know about bigger lists, but you'd have to profile them on your specific data anyways.

Scala map sorting

How do I sort a map of this kind:
"01" -> List(34,12,14,23), "11" -> List(22,11,34)
by the beginning values?
One way is to use scala.collection.immutable.TreeMap, which is always sorted by keys:
val t = TreeMap("01" -> List(34,12,14,23), "11" -> List(22,11,34))
//If you have already a map...
val m = Map("01" -> List(34,12,14,23), "11" -> List(22,11,34))
//... use this
val t = TreeMap(m.toSeq:_*)
You can convert it to a Seq or List and sort it, too:
//by specifying an element for sorting
m.toSeq.sortBy(_._1) //sort by comparing keys
m.toSeq.sortBy(_._2) //sort by comparing values
//by providing a sort function
m.toSeq.sortWith(_._1 < _._1) //sort by comparing keys
There are plenty of possibilities, each more or less convenient in a certain context.
As stated, the default Map type is unsorted, but there's always SortedMap
import collection.immutable.SortedMap
SortedMap("01" -> List(34,12,14,23), "11" -> List(22,11,34))
Although I'm guessing you can't use that, because I recognise this homework and suspect that YOUR map is the result of a groupBy operation. So you have to create an empty SortedMap and add the values:
val unsorted = Map("01" -> List(34,12,14,23), "11" -> List(22,11,34))
val sorted = SortedMap.empty[String, List[Int]] ++ unsorted
//or
val sorted = SortedMap(unsorted.toSeq:_*)
Or if you're not wedded to the Map interface, you can just convert it to a sequence of tuples. Note that this approach will only work if both the keys and values have a defined ordering. Lists don't have a default ordering defined, so this won't work with your example code - I therefore made up some other numbers instead.
val unsorted = Map("01" -> 56, "11" -> 34)
val sorted = unsorted.toSeq.sorted
This might be useful if you can first convert your lists to some other type (such as a String), which is best done using mapValues
update: See Landei's answer, which shows how you can provide a custom sort function that'll make this approach work.

Resources