Scala: fastest `remove(i: Int)` in mutable sequence - performance

Which implementation from scala.collection.mutable package should I take if I intend to do lots of by-index-deletions, like remove(i: Int), in a single-threaded environment? The most obvious choice, ListBuffer, says that it may take linear time depending on buffer size. Is there some collection with log(n) or even constant time for this operation?

Removal operators, including buf remove i, are not part of Seq, but it's actually part of Buffer trait under scala.mutable. (See Buffers)
See the first table on Performance Characteristics. I am guessing buf remove i has the same characteristic as insert, which are linear for both ArrayBuffer and ListBuffer.
As documented in Array Buffers, they use arrays internally, and Link Buffers use linked lists (that's still O(n) for remove).
As an alternative, immutable Vector may give you an effective constant time.
Vectors are represented as trees with a high branching factor. Every tree node contains up to 32 elements of the vector or contains up to 32 other tree nodes. [...] So for all vectors of reasonable size, an element selection involves up to 5 primitive array selections. This is what we meant when we wrote that element access is "effectively constant time".
scala> import scala.collection.immutable._
import scala.collection.immutable._
scala> def remove[A](xs: Vector[A], i: Int) = (xs take i) ++ (xs drop (i + 1))
remove: [A](xs: scala.collection.immutable.Vector[A],i: Int)scala.collection.immutable.Vector[A]
scala> val foo = Vector(1, 2, 3, 4, 5)
foo: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4, 5)
scala> remove(foo, 2)
res0: scala.collection.immutable.Vector[Int] = Vector(1, 2, 4, 5)
Note, however, a high constant time with lots of overhead may not win a quick linear access until the data size is significantly large.

Depending on your exact use case, you may be able to use LinkedHashMap from scala.collection.mutable.
Although you cannot remove by index, you can remove by a unique key in constant time, and it maintains a deterministic ordering when you iterate.
scala> val foo = new scala.collection.mutable.LinkedHashMap[String,String]
foo: scala.collection.mutable.LinkedHashMap[String,String] = Map()
scala> foo += "A" -> "A"
res0: foo.type = Map((A,A))
scala> foo += "B" -> "B"
res1: foo.type = Map((A,A), (B,B))
scala> foo += "C" -> "C"
res2: foo.type = Map((A,A), (B,B), (C,C))
scala> foo -= "B"
res3: foo.type = Map((A,A), (C,C))

Java's ArrayList effectively has constant time complexity if the last element is the one to be removed. Look at the following snippet copied from its source code,
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // clear to let GC do its work
As you can see, if numMoved is equal to 0, remove will not shift and copy the array at all. This in some scenarios can be quite useful. For example, if you do not care about the ordering that much, to remove an element, you can always swap it with the last element, and then delete the last element from the ArrayList, which effectively makes the remove operation all the way constant time. I was hoping ArrayBuffer would do the same, unfortunately that is not the case.

Related

Preallocate or change size of vector

I have a situation where I have a process which needs to "burn-in". This means that I
Start with p values, p relatively small
For n>p, generate nth value using most recently generated p values (e.g. p+1 generated from values 1 to p, p+2 generated from values 2, p+1, etc.)
Repeat until n=N, where N large
Now, only the most recently generated p values will be useful to me, so there are two ways for me to implement this. I can either
Start with a vector of p initial values. At each iteration, mutate the vector, removing the first element, and replacing the last element with the most recently generated value or,
Preallocate a large array of length N, where first p elements are initial values. At iteration n, mutate nth value with most recently generated value
There are pros and cons to both approaches.
Pros of the first, are that we only store most relevant values. Cons of the first are that we are changing the length of the vector at each iteration.
Pros of the second are that we preallocate all the memory we need. Cons of the second is that we store much more than we need.
What is the best way to proceed? Does it depend on what aspect of performance I most need to care about? What will be the quickest?
Cheers in advance.
edit: approximately, p is usually in the order of low tens, N can be several thousand
The first solution has another huge cons: removing the first item of an array takes O(n) time since elements should be moved in memory. This certainly cause the algorithm to runs in quadratic time which is not reasonable. Shifting the items as proposed by #ForceBru should also cause this quadratic run time (since many items are moved just to add one value every time).
The second solution should be pretty fast compared to the first but, indeed, it can use a lot of memory so it should be sub-optimal (it takes time to write values in the RAM).
A faster solution is to use a data structure called a deque. Such data structure enable you to remove the first item in constant time and append a new value at the end also in constant time. That being said, it also introduces some overhead to be able to do that. Julia provide such data structure (more especially queues).
Since the number of in-flight items appears to be bounded in your algorithm, you can implement a rolling buffer. Fortunately, Julia also implement this: see CircularBuffer. This solution should be quite simple and fast (since the operations you want to do are done in O(1) time on it).
It is probably simplest to use CircularArrays.jl for your use case:
julia> using CircularArrays
julia> c = CircularArray([1,2,3,4])
4-element CircularVector(::Vector{Int64}):
1
2
3
4
julia> for i in 5:10
c[i] = i
#show c
end
c = [5, 2, 3, 4]
c = [5, 6, 3, 4]
c = [5, 6, 7, 4]
c = [5, 6, 7, 8]
c = [9, 6, 7, 8]
c = [9, 10, 7, 8]
In this way - as you can see - you can can continue using an increasing index and array will wrap around internally as needed (discarding old values that are not needed any more).
In this way you always store last p values in the array without having to copy anything or re-allocate memory in each step.
...only the most recently generated p values will be useful to me...
Start with a vector of p initial values. At each iteration, mutate the vector, removing the first element, and replacing the last element with the most recently generated value.
Cons of the first are that we are changing the length of the vector at each iteration.
There's no need to change the length of the vector. Simply shift its elements to the left (overwriting the first element) and write the new data to the_vector[end]:
the_vector = [1,2,3,4,5,6]
function shift_and_add!(vec::AbstractVector, value)
vec[1:end-1] .= #view vec[2:end] # shift
vec[end] = value # replace the last value
vec
end
#assert shift_and_add!(the_vector, 80) == [2,3,4,5,6,80]
# `the_vector` will be mutated
#assert the_vector == [2,3,4,5,6,80]

Can I check whether a bounded list contains duplicates, in linear time?

Suppose I have an Int list where elements are known to be bounded and the list is known to be no longer than their range, so that it is entirely possible for it not to contain duplicates. How can I test most quickly whether it is the case?
I know of nubOrd. It is quite fast. We can pass our list through and see if it becomes shorter. But the efficiency of nubOrd is still not linear.
My idea is that we can trade space for time efficiency. Imperatively, we would allocate a bit field as wide as our range, and then traverse the list, marking the entries corresponding to the list elements' values. As soon as we try to flip a bit that is already 1, we return False. It only takes (read + compare + write) * length of the list. No binary search trees, no nothing.
Is it reasonable to attempt a similar construction in Haskell?
The discrimination package has a linear time nub you can use. Or a linear time group that doesn't require the equivalent elements to be adjacent in order to group them, so you could see if any of the groups are not size 1.
The whole package is based on sidestepping the well known bounds on comparison-based sorts (and joins, and etc) by using algorithms based on "discrimination" rather than ones based on comparisons. As I understand it, the technique is somewhat like a radix sort, but generalised to ADTs.
For integers (and other Ix-like types), you could use a mutable array, for example with the array package.
We can here use a STUArray here, like:
import Control.Monad.ST
import Data.Array.ST
updateDups_ :: [Int] -> STArray s Int Bool -> ST s Bool
updateDups_ [] _ = return False
updateDups_ (x:xs) arr = do
contains <- readArray arr x
if contains then return True
else writeArray arr x True >> updateDups_ xs arr
withDups_ :: Int -> [Int] -> ST s Bool
withDups_ mx l = newArray (0, mx) False >>= updateDups_ l
withDups :: Int -> [Int] -> Bool
withDups mx ls = runST (withDups_ mx ls)
For example:
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,5]
False
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,1]
True
Prelude Control.Monad.ST Data.Array.ST> withDups 17 [1,4,2,16,2]
True
So here the first parameter is the maximum value that can be added in the list, and the second parameter the list of values we want to check.
So you have a list of size N, and you know that the elements in the list are within the range min .. min+N-1.
There is a simple linear time algorithm that requires O(1) space.
First, scan the list to find the minimum and maximum elements.
If (max - min + 1) < N then you know there's a duplicate. Otherwise ...
Because the range is N, the minimum item can go at a[0], and the max item at a[n-1]. You can map any item to its position in the array simply by subtracting min. You can do an in-place sort in O(n) because you know exactly where every item should go.
Starting at the beginning of the list, take the first element and subtract min to determine where it should go. Go to that position, and replace the item that's there. With the new item, compute where it should go, and replace the item in that position, etc.
If you ever get to a point where you're you're trying to place an item at a[x], and the value already there is the value that's supposed to be there (i.e. a[x] == x+min), then you've found a duplicate.
The code to do all this is pretty simple:
Corrected code.
min, max = findMinMax()
currentIndex = 0
while currentIndex < N
temp = a[currentIndex]
targetIndex = temp - min;
// Do this until we wrap around to the current index
// If the item is already in place, then targetIndex == currentIndex,
// and we won't enter the loop.
while targetIndex != currentIndex
if (a[targetIndex] == temp)
// the item at a[targetIndex] is the item that's supposed to be there.
// The only way that can happen is if the item we have in temp is a duplicate.
found a duplicate
end if
save = a[targetIndex]
a[targetIndex] = temp
temp = save
targetIndex = temp - min
end while
// At this point, targetIndex == currentIndex.
// We've wrapped around and need to place the last item.
// There's no need to check here if a[targetIndex] == temp, because if it did,
// we would not have entered the loop.
a[targetIndex] = temp
++currentIndex
end while
That's the basic idea.

find keystrokes for On Screen Keyboard scala

I am trying to solve a recent interview question using Scala..
You have an on screen keyboard which is a grid of 6 rows , 5 columns each. With alphabets from A to Z and blank space are arranged in the grid row first.
You can use this on screen keyboard to type words.. by using your TV Remote by press Left, Right, Up , Down or OK keys to type each character.
Question: given an input string, find the sequence of keystrokes needed to be pressed on the remote to type the input.
The code implementation can be found at
https://github.com/mradityagoyal/scala/blob/master/OnScrKb/src/main/scala/OnScrKB.scala
I have tried to solve this using three different approaches..
Simple forldLeft.
def keystrokesByFL(input: String, startChar: Char = 'A'): String = {
val zero = ("", startChar)
//(acc, last) + next => (acc+ aToB , next)
def op(zero: (String, Char), next: Char): (String, Char) = zero match {
case (acc, last) => (acc + path(last, next), next)
}
val result = input.foldLeft(zero)(op)
result._1
}
divide and conquer - Uses divide and conquer mechanism. The algorithm is similar to merge sort. * We split the input word into two if the length is > 3 * we recursively call the subroutine to get the path of left and right halves from the split. * In the end.. we add the keystrokes for first + keystrokes from end of first string to start of second string + keystrokes for second. * Essentially we divide the input string in two smaller halves till we get to size 4. for smaller than 4 we use the fold right.
def keystrokesByDnQ(input: String, startChar: Char = 'A'): String = {
def splitAndMerge(in: String, startChar: Char): String = {
if (in.length() < 4) {
//if length is <4 then dont split.. as you might end up with one side having only 1 char.
keystrokesByFL(in, startChar)
} else {
//split
val (x, y) = in.splitAt(in.length() / 2)
splitAndMerge(x, startChar) + splitAndMerge(y, x.last)
}
}
splitAndMerge(input, startChar)
}
Fold - uses the property that the underlying operation is associative (but not commutative). * For eg.. the keystrokes("ABCDEFGHI", startChar = 'A') == keystrokes("ABC", startChar='A')+keystrokes("DEF", 'C') + keystrokes("GHI", 'F')
def keystrokesByF(input: String, startChar: Char = 'A'): String = {
val mapped = input.map { x => PathAcc(text = "" + x, path = "") } // map each character in input to case class PathAcc("CharAsString", "")
val z = PathAcc(text = ""+startChar, path = "") //the starting char.
def op(left: PathAcc, right: PathAcc): PathAcc = {
PathAcc(text = left.text + right.text, path = left.path + path(left.text.last, right.text.head) + right.path)
}
val foldresult = mapped.fold(z)(op)
foldresult.path
}
My questions:
1. Is the divide and conquer approach better than Fold?
are Fold and Divide and conquer better than foldLeft (for this specific problem)
Is there a way i can represent the divide and conquer approach or the Fold approach as a Monad? I can see the associative law being satisfied... but i am not able to figure out if a monoid is present here.. and if yes.. what does it achieve for me?
Is Divide and conquer approach the best one available for this particular problem?
Which approach is better suited for spark?
Any suggestions are welcome..
Here's how I would do it:
def keystrokes(input: String, start: Char): String =
((start + input) zip input).par.map((path _).tupled).fold("")(_ ++ _)
The main point here is using the par method to parallelize the sequence of (Char, Char), so that it can parallelize the map, and take the optimal implementation for fold.
The algorithm simply take the characters in the String two by two (representing the units of path to be walked), computes the path between them, and then concatenates the result. Note that fold("")(_ ++ _) is basically mkString (although mkString on parallel collection is implemented by seq.mkString so it is much less efficient).
What your implementations dearly miss is parallelization of tasks. Even in your divide-and-conquer approach, you never run code in parallel, so you will wait for the first half to be finished before starting the second half (even though they are totally independant).
Assuming you use parallelization, the classical implementation of fold on parallel sequences is precisely the divide-and-conquer algorithm you described, but it may be that it is better optimized (for instance, it may choose another value than 3 for chunk size, I tend to trust the scala-collection implementers on these matters).
Note that fold on String is probably implemented with foldLeft, so there is no added value than what you did with foldLeft, unless you use .par before.
Back to your questions (I'll mostly repeat what I just said):
1) Yes, the divide and conquer is better than fold... on String (but not on parallelized String)
2) Fold can only be better than FoldLeft with some kind of parallelization, in which case it will be as good as (or better than, if there is a better implementation for a particular parallelized collection) divide-and-conquer.
3) I don't see what monads have to do with anything here. the operator and zero for fold must indeed form a monoid (otherwise, you'll have some problems with operation ordering if the operator is not associative, and unwanted noise if zero is not a neutral element).
4) Yes, that I know of, once parallelized
5) Spark is inherently parallel, so the main issue would be to join all the pieces back together in the end. What I mean is that an RDD is not ordered, so you'll need to keep some information on which piece of input should be put where in your cluster. Once you've done that correctly (using partitions and such, this would probably be a whole question itself), using map and fold still works as a charm (Spark was designed to have an API as close as possible to scala-collection, so that's really nice here).

Scala - sort based on Future result predicate

I have an array of objects I want to sort, where the predicate for sorting is asynchronous. Does Scala have either a standard or 3rd party library function for sorting based on a predicate with type signature of (T, T) -> Future[Bool] rather than just (T, T) -> Bool?
Alternatively, is there some other way I could structure this code? I've considered finding all the 2-pair permutations of list elements, running the predicate over each pair and storing the result in a Map((T, T), Bool) or some structure to that effect, and then sorting on it - but I suspect that will have many more comparisons executed than even a naive sorting algorithm would.
If your predicate is async you may prefer to get an async result too and avoid blocking threads using Await
If you want to sort a List[(T,T)] according to a future boolean predicate, the easiest it to sort a List[(T,T,Boolean)]
So given a you have a List[(T,T)] and a predicate (T, T) -> Future[Bool], how can you get a List[(T,T,Boolean)]? Or rather a Future[List[(T,T,Boolean)]] as you want to keep the async behavior.
val list: List[(T,T)] = ...
val predicate = ...
val listOfFutures: List[Future[(T,T,Boolean]] = list.map { tuple2 =>
predicate(tuple2).map( bool => (tuple2._1, tuple2._2, bool)
}
val futureList: Future[List[(T,T,Boolean)]] = Future.sequence(listOfFutures)
val futureSortedResult: Future[List[(T,T)]] = futureList.map { list =>
list.sort(_._3).map(tuple3 => (tuple3._1,tuple3._2))
}
This is pseudo-code, I didn't compile it and it may not, but you get the idea.
The key is Future.sequence, very useful, which somehow permits to transform Monad1[Monad2[X]] to Monad2[Monad1[X]] but notice that if any of your predicate future fail, the global sort operation will also be a failure.
If you want better performance it may be a better solution to "batch" the call to the service returning the Future[Boolean].
For example instead of (T, T) -> Future[Bool] maybe you can design a service (if you own it obviously) like List[(T, T)] -> Future[List[(T,T,Bool)] so that you can get everything you need in a async single call.
A not so satisfactory alternative would be to block each comparison until the future is evaluated. If evaluating your sorting predicate is expensive, sorting will take a long time. In fact, this just translates a possibly concurrent program into a sequential one; all benefits of using futures will be lost.
import scala.concurrent.duration._
implicit val executionContext = ExecutionContext.Implicits.global
val sortingPredicate: (Int, Int) => Future[Boolean] = (a, b) => Future{
Thread.sleep(20) // Assume this is a costly comparison
a < b
}
val unsorted = List(4, 2, 1, 5, 7, 3, 6, 8, 3, 12, 1, 3, 2, 1)
val sorted = unsorted.sortWith((a, b) =>
Await.result(sortingPredicate(a, b), 5000.millis) // careful: May throw an exception
)
println(sorted) // List(1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 6, 7, 8, 12)
I don't know if there is an out of the box solution that utilizes asynchronous comparison. However, you could try to implement your own sorting algorithm. If we consider Quicksort, which runs in O(n log(n)) on average, then we can actually utilize asynchronous comparison quite easy.
If you're not familiar with Quicksort, the algorithm basically does the following
Choose an element from the collection (called the Pivot)
Compare the pivot with all remaining elements. Create a collection with elements that are less than the pivot and one with elements that are greater than the pivot.
Sort the two new collections and concatenate them, putting the pivot in the middle.
Since step 2 performs a lot of independent comparisons we can evaluate the comparisons concurrently.
Here's an unoptimized implementation:
object ParallelSort {
val timeout = Duration.Inf
implicit class QuickSort[U](elements: Seq[U]) {
private def choosePivot: (U, Seq[U]) = elements.head -> elements.tail
def sortParallelWith(predicate: (U, U) => Future[Boolean]): Seq[U] =
if (elements.isEmpty || elements.size == 1) elements
else if (elements.size == 2) {
if (Await.result(predicate(elements.head, elements.tail.head), timeout)) elements else elements.reverse
}
else {
val (pivot, other) = choosePivot
val ordering: Seq[(Future[Boolean], U)] = other map { element => predicate(element, pivot) -> element }
// This is where we utilize asynchronous evaluation of the sorting predicate
val (left, right) = ordering.partition { case (lessThanPivot, _) => Await.result(lessThanPivot, timeout) }
val leftSorted = left.map(_._2).sortParallelWith(predicate)
val rightSorted = right.map(_._2).sortParallelWith(predicate)
leftSorted ++ (pivot +: rightSorted)
}
}
}
which can be used (same example as above) as follows:
import ParallelSort.QuickSort
val sorted2 = unsorted.sortParallelWith(sortingPredicate)
println(sorted2) // List(1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 6, 7, 8, 12)
Note that whether or not this implementation of Quicksort is faster or slower than the completely sequential built-in sorting algorithm highly depends on the cost of a comparison: The longer a comparison has to block, the worse is the alternative solution mentioned above. On my machine, given a costly comparison (20 milliseconds) and the above list, the built-in sorting algorithm runs in ~1200 ms while this custom Quicksort runs in ~200 ms. If you're worried about performance, you'd probably want to come up with something smarter. Edit: I just checked how many comparisons both, the built-in sorting algorithm and the custom Quicksort algorithm perform: Apparently, for the given list (and some other lists I randomly typed in) the built-in algorithm uses more comparisons, so the performance improvements thanks to parallel execution might not be that great. I don't know about bigger lists, but you'd have to profile them on your specific data anyways.

Performance of List.permute

I implemented a Fisher-Yates shuffle recently, which used List.permute to shuffle the list, and noted that as the size of the list increased, there was a significant performance decrease. I suspect this is due to the fact that while the algorithm assumes it is operating on an array, permute must be accessing the list elements by index, which is O(n).
To confirm this, I tried out applying a permutation to a list to reverse its element, comparing working directly on the list, and transforming the list into an array and back to a list:
let permute i max = max - i - 1
let test = [ 0 .. 10000 ]
let rev1 list =
let perm i = permute i (List.length list)
List.permute perm list
let rev2 list =
let array = List.toArray list
let perm i = permute i (Array.length array)
Array.permute perm array |> Array.toList
I get the following results, which tend to confirm my assumption:
rev1 test;;
Real: 00:00:00.283, CPU: 00:00:00.265, GC gen0: 0, gen1: 0, gen2: 0
rev2 test;;
Real: 00:00:00.003, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
My question is the following:
1) Should List.permute be avoided for performance reasons? And, relatedly, shouldn't the implementation of List.permute automatically do the transformation into an Array behind the scenes?
2) Besides using an Array, is there a more functional way / data structure suitable for this type of work, i.e. shuffling of elements? Or is this simply a problem for which an Array is the right data structure to use?
List.permute converts the list to an array, calls Array.permute, then converts it back to a list. Based on that, you can probably figure out what you need to do (hint: work with arrays!).
Should List.permute be avoided for performance reasons?
The only performance problem here is in your own code, specifically calling List.length.
Besides using an Array, is there a more functional way / data structure suitable for this type of work, i.e. shuffling of elements? Or is this simply a problem for which an Array is the right data structure to use?
You are assuming that arrays cannot be used functionally when, in fact, they can by not mutating their elements. Consider the permute function:
let permute f (xs: _ []) = Array.init xs.Length (fun i -> xs.[f i])
Although it acts upon an array and produces an array it is not mutating anything so it is using an array as a purely functional data structure.

Resources