Scala - sort based on Future result predicate - algorithm

I have an array of objects I want to sort, where the predicate for sorting is asynchronous. Does Scala have either a standard or 3rd party library function for sorting based on a predicate with type signature of (T, T) -> Future[Bool] rather than just (T, T) -> Bool?
Alternatively, is there some other way I could structure this code? I've considered finding all the 2-pair permutations of list elements, running the predicate over each pair and storing the result in a Map((T, T), Bool) or some structure to that effect, and then sorting on it - but I suspect that will have many more comparisons executed than even a naive sorting algorithm would.

If your predicate is async you may prefer to get an async result too and avoid blocking threads using Await
If you want to sort a List[(T,T)] according to a future boolean predicate, the easiest it to sort a List[(T,T,Boolean)]
So given a you have a List[(T,T)] and a predicate (T, T) -> Future[Bool], how can you get a List[(T,T,Boolean)]? Or rather a Future[List[(T,T,Boolean)]] as you want to keep the async behavior.
val list: List[(T,T)] = ...
val predicate = ...
val listOfFutures: List[Future[(T,T,Boolean]] = list.map { tuple2 =>
predicate(tuple2).map( bool => (tuple2._1, tuple2._2, bool)
}
val futureList: Future[List[(T,T,Boolean)]] = Future.sequence(listOfFutures)
val futureSortedResult: Future[List[(T,T)]] = futureList.map { list =>
list.sort(_._3).map(tuple3 => (tuple3._1,tuple3._2))
}
This is pseudo-code, I didn't compile it and it may not, but you get the idea.
The key is Future.sequence, very useful, which somehow permits to transform Monad1[Monad2[X]] to Monad2[Monad1[X]] but notice that if any of your predicate future fail, the global sort operation will also be a failure.
If you want better performance it may be a better solution to "batch" the call to the service returning the Future[Boolean].
For example instead of (T, T) -> Future[Bool] maybe you can design a service (if you own it obviously) like List[(T, T)] -> Future[List[(T,T,Bool)] so that you can get everything you need in a async single call.

A not so satisfactory alternative would be to block each comparison until the future is evaluated. If evaluating your sorting predicate is expensive, sorting will take a long time. In fact, this just translates a possibly concurrent program into a sequential one; all benefits of using futures will be lost.
import scala.concurrent.duration._
implicit val executionContext = ExecutionContext.Implicits.global
val sortingPredicate: (Int, Int) => Future[Boolean] = (a, b) => Future{
Thread.sleep(20) // Assume this is a costly comparison
a < b
}
val unsorted = List(4, 2, 1, 5, 7, 3, 6, 8, 3, 12, 1, 3, 2, 1)
val sorted = unsorted.sortWith((a, b) =>
Await.result(sortingPredicate(a, b), 5000.millis) // careful: May throw an exception
)
println(sorted) // List(1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 6, 7, 8, 12)
I don't know if there is an out of the box solution that utilizes asynchronous comparison. However, you could try to implement your own sorting algorithm. If we consider Quicksort, which runs in O(n log(n)) on average, then we can actually utilize asynchronous comparison quite easy.
If you're not familiar with Quicksort, the algorithm basically does the following
Choose an element from the collection (called the Pivot)
Compare the pivot with all remaining elements. Create a collection with elements that are less than the pivot and one with elements that are greater than the pivot.
Sort the two new collections and concatenate them, putting the pivot in the middle.
Since step 2 performs a lot of independent comparisons we can evaluate the comparisons concurrently.
Here's an unoptimized implementation:
object ParallelSort {
val timeout = Duration.Inf
implicit class QuickSort[U](elements: Seq[U]) {
private def choosePivot: (U, Seq[U]) = elements.head -> elements.tail
def sortParallelWith(predicate: (U, U) => Future[Boolean]): Seq[U] =
if (elements.isEmpty || elements.size == 1) elements
else if (elements.size == 2) {
if (Await.result(predicate(elements.head, elements.tail.head), timeout)) elements else elements.reverse
}
else {
val (pivot, other) = choosePivot
val ordering: Seq[(Future[Boolean], U)] = other map { element => predicate(element, pivot) -> element }
// This is where we utilize asynchronous evaluation of the sorting predicate
val (left, right) = ordering.partition { case (lessThanPivot, _) => Await.result(lessThanPivot, timeout) }
val leftSorted = left.map(_._2).sortParallelWith(predicate)
val rightSorted = right.map(_._2).sortParallelWith(predicate)
leftSorted ++ (pivot +: rightSorted)
}
}
}
which can be used (same example as above) as follows:
import ParallelSort.QuickSort
val sorted2 = unsorted.sortParallelWith(sortingPredicate)
println(sorted2) // List(1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 6, 7, 8, 12)
Note that whether or not this implementation of Quicksort is faster or slower than the completely sequential built-in sorting algorithm highly depends on the cost of a comparison: The longer a comparison has to block, the worse is the alternative solution mentioned above. On my machine, given a costly comparison (20 milliseconds) and the above list, the built-in sorting algorithm runs in ~1200 ms while this custom Quicksort runs in ~200 ms. If you're worried about performance, you'd probably want to come up with something smarter. Edit: I just checked how many comparisons both, the built-in sorting algorithm and the custom Quicksort algorithm perform: Apparently, for the given list (and some other lists I randomly typed in) the built-in algorithm uses more comparisons, so the performance improvements thanks to parallel execution might not be that great. I don't know about bigger lists, but you'd have to profile them on your specific data anyways.

Related

convert to divide and conquer algorithm. Kotlin

convert method "FINAL" to divide and conquer algorithm
the task sounded like this: The buyer has n coins of
H1,...,Hn.
The seller has m
coins in denominations of
B1,...,Bm.
Can the buyer purchase the item
the cost S so that the seller has an exact change (if
necessary).
fun Final(H: ArrayList<Int>, B: ArrayList<Int>, S: Int): Boolean {
var Clon_Price = false;
var Temp: Int;
for (i in H) {
if (i == S)
return true;
}
for (i in H.withIndex()) {
Temp = i.value - S;
for (j in B) {
if (j == Temp)
Clon_Price = true;
}
}
return Clon_Price;
}
fun main(args: Array<String>) {
val H:ArrayList<Int> = ArrayList();
val B:ArrayList<Int> = ArrayList();
println("Enter the number of coins the buyer has:");
var n: Int = readln().toInt();
println("Enter their nominal value:")
while (n > 0){
H.add(readln().toInt());
n--
}
println("Enter the number of coins the seller has:");
var m: Int = readln().toInt();
println("Enter their nominal value:")
while (m > 0){
B.add(readln().toInt());
m--
}
println("Enter the product price:");
val S = readln().toInt();
if(Final(H,B,S)){
println("YES");
}
else{
println("No!");
}
Introduction
Since this is an assignment, I will only give you insights to solve this problem and you will need to do the coding yourself.
The algorithm
Receives two ArrayList<Int> and an Int parameter
if the searched (S) element can be found in H, then the result is true
Otherwise it loops H
Computes the difference between the current element and S
Searches for a match in B and if it's found, then true is being returned
If the method has not returned yet, then return false;
Divide et impera (Divide and conquer)
Divide and conquer is the process of breaking down a complicated task into similar, but simpler subtasks, repeating this breaking down until the subtasks become trivial (this was the divide part) and then, using the results of the trivial subtasks we can solve the slightly more complicated subtasks and go upwards in our layers of unsolved complexities until the problem is solved (this is the conquer part).
A very handy data-structure to use is the Stack. You can use the stack of your memory, which are fancy words for recursion, or, you can solve it iteratively, by managing such a stack yourself.
This specific problem
This algorithm does not seem to necessitate divide and conquer, given the fact that you only have two array lists that can be iterated, so, I guess, this is an early assignment.
To make sure this is divide and conquer, you can add two parameters to your method (which are 0 and length - 1 at the start) that reflect the current problem-space. And upon each call, check whether the starting and ending index (the two new parameters) are equal. If they are, you already have a trivial, simplified subtask and you just iterate the second ArrayList.
If they are not equal, then you still need to divide. You can simply
//... Some code here
return Final(H, B, S, start, end / 2) || Final(H, B, S, end / 2 + 1, end);
(there you go, I couldn't resist writing code, after all)
for your nontrivial cases. This automatically breaks down the problem into sub-problems.
Self-criticism
The idea above is a simplistic solution for you to get the gist. But, in reality, programmers dislike recursion, as it can lead to trouble. So, once you complete the implementation of the above, you are well-advised to convert your algorithm to make sure it's iterative, which should be fairly easy once you succeeded implementing the recursive version.

design a recursive algorithm that find the minimum of an array

I was thinking about a recursive algorithm (it's a theoretical question, so it's not important the programming language). It consists of finding the minimum of a set of numbers
I was thinking of this way: let "n" be the number of elements in the set. let's rearrange the set as:
(a, (b, c, ..., z) ).
the function moves from left to right, and the first element is assumed as minimum in the first phase (it's, of course, the 0-th element, a). next steps are defined as follows:
(a, min(b, c, ..., z) ), check if a is still minimum, or if b is to be assumed as minimum, then (a or b, min(c, d, ..., z) ), another check condition, (a or b or c, min(d, e, ..., z)), check condition, etc.
I think the theoretical pseudocode may be as follows:
f(x) {
// base case
if I've reached the last element, assume it's a possible minimum, and check if y < z. then return a value to stop recursive calls.
// inductive steps
if ( f(i-th element) < f(i+1, next element) ) {
/* just assume the current element is the current minimum */
}
}
I'm having trouble with the base case. I don't know how to formalize it. I think I've understood the basic idea about it: it's basically what I've written in the pseudocode, right?
does what I've written so far make sense? Sorry if it's not clear but I'm a beginner, and I'm studying recursion for the first time, and I personally find it confusing. So, I've tried my best to explain it. If it's not clear, let me know, and I'll try to explain it better with different words.
Recursive problems can be hard to visualize. Let's take an example : arr = [3,5,1,6]
This is a relatively small array but still it's not easy to visualize how recursion will work here from start to end.
Tip : Try to reduce the size of the input. This will make it easy to visualize and help you with finding the base case. First decide what our function should do. In our case it finds the minimum number from an array. If our function works for array of size n then it should also work for array of size n-1 (Recursive leap of faith). Now using this we can reduce the size of input until we cannot reduce it any further, which should give us our base case.
Let's use the above example: arr = [3,5,1,6]
Let create a function findMin(arr, start) which takes an array and a start index and returns the minimum number from start index to end of array.
1st Iteration : [3,5,1,6]
// arr[start] = 3, If we can somehow find minimum from the remaining array,
// then we can compare it with current element and return the minimum of the two.
// so our input is now reduced to the remaining array [5,1,6]
2nd Iteration : [5,1,6]
// arr[start] = 5, If we can somehow find minimum from the remaining array,
// then we can compare it with current element and return the minimum of the two.
// so our input is now reduced to the remaining array [1,6]
3rd Iteration : [1,6]
// arr[start] = 1, If we can somehow find minimum from the remaining array,
// then we can compare it with current element and return the minimum of the two.
// so our input is now reduced to the remaining array [6]
4th Iteration : [6]
// arr[start] = 6, Since it is the only element in the array, it is the minimum.
// This is our base case as we cannot reduce the input any further.
// We will simply return 6.
------------ Tracking Back ------------
3rd Iteration : [1,6]
// 1 will be compared with whatever the 4th Iteration returned (6 in this case).
// This iteration will return minimum(1, 4th Iteration) => minimum(1,6) => 1
2nd Iteration : [5,1,6]
// 5 will be compared with whatever the 3th Iteration returned (1 in this case).
// This iteration will return minimum(5, 3rd Iteration) => minimum(5,1) => 1
1st Iteration : [3,5,1,6]
// 3 will be compared with whatever the 2nd Iteration returned (1 in this case).
// This iteration will return minimum(3, 2nd Iteration) => minimum(3,1) => 1
Final answer = 1
function findMin(arr, start) {
if (start === arr.length - 1) return arr[start];
return Math.min(arr[start], findMin(arr, start + 1));
}
const arr = [3, 5, 1, 6];
const min = findMin(arr, 0);
console.log('Minimum element = ', min);
This is a good problem for practicing recursion for beginners. You can also try these problems for practice.
Reverse a string using recursion.
Reverse a stack using recursion.
Sort a stack using recursion.
To me, it's more like this:
int f(int[] x)
{
var minimum = head of X;
if (x has tail)
{
var remainder = f(tail of x);
if (remainder < minimum)
{
minimum = remainder;
}
}
return minimum;
}
You have the right idea.
You've correctly observed that
min_recursive(array) = min(array[0], min_recursive(array[1:]))
The function doesn't care about who's calling it or what's going on outside of it -- it just needs to return the minimum of the array passed in. The base case is when the array has a single value. That value is the minimum of the array so it should just return it. Otherwise find the minimum of the rest of the array by calling itself again and compare the result with the head of the array.
The other answers show some coding examples.
This is a recursive solution to the problem that you posed, using JavaScript:
a = [5,12,3,5,34,12]
const min = a => {
if (!a.length) { return 0 }
if (a.length === 1) { return a[0] }
return Math.min(a[0], min(a.slice(1)))
}
min(a)
Note the approach which is to first detect the simplest case (empty array), then a more complex case (single element array), then finally a recursive call which will reduce more complex cases to functions of simpler ones.
However, you don't need recursion to traverse a one dimensional array.

Generator for iterating over a random non-repeated sequence

I have p items (let's assume p=5, items={0,1,2,3,4}). I need to be able to iterate over them in a random order, but without repeating them (unless all were visited) while maintaining only as small seed-like metadata as possible between the iterations. The generator is otherwise stateless. It would be used like this:
Initialization (metadata is long in this example, but it could be anything "small"):
long metadata = randomLong()
Usage:
(metadata, result) = generator.generate(metadata)
return(result)
If it works properly, it should continuously return something like 3, 1, 0, 4, 2, 3, 1, 0, 4, 2, 3...
Is that possible?
I know I could easily pre-generate the sequence, then metadata would contain whole this sequence and an index, but that's not viable for me, as the sequence will have thousands of items and the metadata must be slim.
I also found this, which resembles what I am trying to achieve, but it's either too brief or too math-y for me.
Added: I am aware of the fact, that for p=1000, there are 1000! ways of ordering the sequence, which would definitely not fit into a long, but both "having metadata something bigger than long" and "generator may be unable to generate some sequences" is OK for me.
I would, as a base, use Fisher-Yates algorithm.
It is able to construct a random permutation of a given ordered list of elements in O(n).
Then the trick could be to construct an iterator that shuffles an internal list of elements and iterate through it, and when this internal iteration ends, shuffles again and iterate on the result...
Something like:
function next() -> element {
internal data:
i an integer;
d an array of elements;
code:
if i equals to d.length { shuffle(d); i <-- 0; }
return d[i++];
}

How can I modify my Akka streams Prime sieve to exclude modulo checks for known primes?

I wrote a sieve using akka streams to find prime members of an arbitrary source of Int:
object Sieve extends App {
implicit val system = ActorSystem()
implicit val mat = ActorMaterializer(ActorMaterializerSettings(system))
implicit val ctx = implicitly[ExecutionContext](system.dispatcher)
val NaturalNumbers = Source.fromIterator(() => Iterator.from(2))
val IsPrimeByEurithmethes: Flow[Int, Int, _] = Flow[Int].filter {
case n: Int =>
(2 to Math.floor(Math.sqrt(n)).toInt).par.forall(n % _ != 0)
}
NaturalNumbers.via(IsPrimeByEurithmethes).throttle(100000, 1 second, 100000, ThrottleMode.Shaping).to(Sink.foreach(println)).run()
}
Ok, so this appears to work decently well. However, there are at least a few potential areas of concern:
The modulo checks are run using par.forall, ie they are totally hidden within the Flow that filters, but I can see how it would be useful to have a Map from the candidate n to another Map of each n % _. Maybe.
I am checking way too many of the candidates needlessly - both in terms of checking n that I will already know are NOT prime based on previous results, and by checking n % _ that are redundant. In fact, even if I think the n is prime, it suffices to check only the known primes up until that point.
The second point is my more immediate concern.
I think I can prove rather easily that there is a more efficient way - by filtering out the source given each NEW prime.
So then....
2, 3, 4, 5, 6, 7, 8, 9, 10, 11... => (after finding p=2)
2, 3, 5, 7, 9, , 11... => (after finding p=3)
2, 3, 5, 7, , 11... => ...
Now after finding a p and filtering the source, we need to know whether the next candidate is a p. Well, we can say for sure it is prime if the largest known prime is greater than its root, which will Always happen I believe, so it suffices to just pick the next element...
2, 3, 4, 5, 6, 7, 8, 9, 10, 11... => (after finding p=2) PICK n(2) = 3
2, 3, 5, 7, 9, , 11... => (after finding p=3) PICK n(3) = 5
2, 3, 5, 7, , 11... => (after finding p=5) PICK n(5) = 7
This seems to me like a rewriting of the originally-provided sieve to do far fewer checks at the cost of introducing a strict sequential dependency.
Another idea - I could remove the constraint by working things out in terms of symbols, like the minimum set of modulo checks that necessitate primality, etc.
Am I barking up the wrong tree? IF not, how can I go about messing with my source in this manner?
I just started fiddling around with akka streams recently so there might be better solutions than this (especially since the code feels kind of clumsy to me) - but your second point seemed to be just the right challenge for me to try out building a feedback loop within akka streams.
Find my full solution here: https://gist.github.com/MartinHH/de62b3b081ccfee4ae7320298edd81ee
The main idea was to accumulate the primes that are already found and merge them with the stream of incoming natural numbers so the primes-check could be done based on the results up to N like this:
def isPrime(n: Int, primesSoFar: SortedSet[Int]): Boolean =
!primesSoFar.exists(n % _ == 0) &&
!(primesSoFar.lastOption.getOrElse(2) to Math.floor(Math.sqrt(n)).toInt).par.exists(n % _ == 0)

Scala: fastest `remove(i: Int)` in mutable sequence

Which implementation from scala.collection.mutable package should I take if I intend to do lots of by-index-deletions, like remove(i: Int), in a single-threaded environment? The most obvious choice, ListBuffer, says that it may take linear time depending on buffer size. Is there some collection with log(n) or even constant time for this operation?
Removal operators, including buf remove i, are not part of Seq, but it's actually part of Buffer trait under scala.mutable. (See Buffers)
See the first table on Performance Characteristics. I am guessing buf remove i has the same characteristic as insert, which are linear for both ArrayBuffer and ListBuffer.
As documented in Array Buffers, they use arrays internally, and Link Buffers use linked lists (that's still O(n) for remove).
As an alternative, immutable Vector may give you an effective constant time.
Vectors are represented as trees with a high branching factor. Every tree node contains up to 32 elements of the vector or contains up to 32 other tree nodes. [...] So for all vectors of reasonable size, an element selection involves up to 5 primitive array selections. This is what we meant when we wrote that element access is "effectively constant time".
scala> import scala.collection.immutable._
import scala.collection.immutable._
scala> def remove[A](xs: Vector[A], i: Int) = (xs take i) ++ (xs drop (i + 1))
remove: [A](xs: scala.collection.immutable.Vector[A],i: Int)scala.collection.immutable.Vector[A]
scala> val foo = Vector(1, 2, 3, 4, 5)
foo: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4, 5)
scala> remove(foo, 2)
res0: scala.collection.immutable.Vector[Int] = Vector(1, 2, 4, 5)
Note, however, a high constant time with lots of overhead may not win a quick linear access until the data size is significantly large.
Depending on your exact use case, you may be able to use LinkedHashMap from scala.collection.mutable.
Although you cannot remove by index, you can remove by a unique key in constant time, and it maintains a deterministic ordering when you iterate.
scala> val foo = new scala.collection.mutable.LinkedHashMap[String,String]
foo: scala.collection.mutable.LinkedHashMap[String,String] = Map()
scala> foo += "A" -> "A"
res0: foo.type = Map((A,A))
scala> foo += "B" -> "B"
res1: foo.type = Map((A,A), (B,B))
scala> foo += "C" -> "C"
res2: foo.type = Map((A,A), (B,B), (C,C))
scala> foo -= "B"
res3: foo.type = Map((A,A), (C,C))
Java's ArrayList effectively has constant time complexity if the last element is the one to be removed. Look at the following snippet copied from its source code,
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // clear to let GC do its work
As you can see, if numMoved is equal to 0, remove will not shift and copy the array at all. This in some scenarios can be quite useful. For example, if you do not care about the ordering that much, to remove an element, you can always swap it with the last element, and then delete the last element from the ArrayList, which effectively makes the remove operation all the way constant time. I was hoping ArrayBuffer would do the same, unfortunately that is not the case.

Resources