I would like to create an extension function for a collection to check if one collection contains any item of defined set.
I think about two implementations:
infix fun <T> Iterable<T>.containsAny(values: Iterable<T>): Boolean = any(values::contains)
or
infix fun <T> Iterable<T>.containsAny(values: Iterable<T>): Boolean = intersect(values).isNotEmpty()
The question is which way is more efficient and why? And is there any better solution?
The first way with any is O(n*m) unless the parameter Iterable is a Set, in which case it's O(n).
The second way with intersect is O(n).
So the second way is much faster unless the parameter is already a Set or both inputs are so tiny that it's worth iterating repeatedly to avoid copying the receiver Iterable to a MutableSet.
The O(n) way could be improved to allow the early exit behavior of any by doing this:
infix fun <T> Iterable<T>.containsAny(values: Iterable<T>): Boolean =
any(values.toSet()::contains)
and further to avoid an unnecessary set copy:
infix fun <T> Iterable<T>.containsAny(values: Iterable<T>): Boolean =
any((values as? Set<T> ?: values.toSet())::contains)
And if the receiver Iterable is usually bigger than the parameter Iterable, you might want to swap which one is the set and which one is being iterated.
Related
I know that most of the Vector methods are effectively O(1) (constant time) because of the tree they use, but I cannot find any information on the contains method. My first thought is that it would have to be O(n) to check all the elements but I am not sure.
Answering the question in the title, performance characteristics (2.13 docs version) of basic operations head, tail, apply, update, prepend, append, insert are all listed as eC for Vector:
eC The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys.
You are correct contains is O(N), as there is no hashing or nothing else that would avoid the need to compare with all items. Still, if you want to be sure, it is best to check the implementation.
As finding the correct implementation in the library sources can be difficult because of many traits and overrides used to implement the containers, the best way to check this is the debugger. Use a code like:
val v = Vector(0, 1, 2)
v.contains(1)
Use the debugger to step into v.contains and the source you will see is:
def contains[A1 >: A](elem: A1): Boolean = exists (_ == elem)
If you are still not convinced at this point, some more "step into" will lead you to:
def exists(p: A => Boolean): Boolean = {
var res = false
while (!res && hasNext) res = p(next())
res
}
How can I write this
Comparator <Item> sort = (i1, i2) -> Boolean.compare(i2.isOpen(), i1.isOpen());
to something like this (code does not work):
Comparator<Item> sort = Comparator.comparing(Item::isOpen).reversed();
Comparing method does not have something like Comparator.comparingBool(). Comparator.comparing returns int and not "Item".
Why can't you write it like this?
Comparator<Item> sort = Comparator.comparing(Item::isOpen);
Underneath Boolean.compareTo is called, which in turn is the same as Boolean.compare
public static int compare(boolean x, boolean y) {
return (x == y) ? 0 : (x ? 1 : -1);
}
And this: Comparator.comparing returns int and not "Item". make little sense, Comparator.comparing must return a Comparator<T>; in your case it correctly returns a Comparator<Item>.
The overloads comparingInt, comparingLong, and comparingDouble exist for performance reasons only. They are semantically identical to the unspecialized comparing method, so using comparing instead of comparingXXX has the same outcome, but might having boxing overhead, but the actual implications depend on the particular execution environment.
In case of boolean values, we can predict that the overhead will be negligible, as the method Boolean.valueOf will always return either Boolean.TRUE or Boolean.FALSE and never create new instances, so even if a particular JVM fails to inline the entire code, it does not depend on the presence of Escape Analysis in the optimizer.
As you already figured out, reversing a comparator is implemented by swapping the argument internally, just like you did manually in your lambda expression.
Note that it is still possible to create a comparator fusing the reversal and an unboxed comparison without having to repeat the isOpen() expression:
Comparator<Item> sort = Comparator.comparingInt(i -> i.isOpen()? 0: 1);
but, as said, it’s unlikely to have a significantly higher performance than the Comparator.comparing(Item::isOpen).reversed() approach.
But note that if you have a boolean sort criteria and care for the maximum performance, you may consider replacing the general-purpose sort algorithm with a bucket sort variant. E.g.
If you have a Stream, replace
List<Item> result = /* stream of Item */
.sorted(Comparator.comparing(Item::isOpen).reversed())
.collect(Collectors.toList());
with
Map<Boolean,List<Item>> map = /* stream of Item */
.collect(Collectors.partitioningBy(Item::isOpen,
Collectors.toCollection(ArrayList::new)));
List<Item> result = map.get(true);
result.addAll(map.get(false));
or, if you have a List, replace
list.sort(Comparator.comparing(Item::isOpen).reversed());
with
ArrayList<Item> temp = new ArrayList<>(list.size());
list.removeIf(item -> !item.isOpen() && temp.add(item));
list.addAll(temp);
etc.
Use comparing using key extractor parameter:
Comparator<Item> comparator =
Comparator.comparing(Item::isOpen, Boolean::compare).reversed();
When there is a collection and you must perform two or more operations on all of its elements, what is faster?:
val f1: String => String = _.reverse
val f2: String => String = _.toUpperCase
val elements: Seq[String] = List("a", "b", "c")
iterate multiple times and perform one operation on one loop
val result = elements.map(f1).map(f2)
This approach does have the advantage, that the result after application of the first function could be reused.
iterate one time and perform all operation on each element together
val result = elements.map(element => f2(f1(element)))
or
val result = elements.map(element => f1.compose(f2)
Is there any difference in performance between these two approaches? And if yes, which is faster?
Here's the thing, transformation of a collection is more or less of runtime O(N) , * runtime cost of all the functions applied. So I doubt the 2nd set of choices you present above would make even the slightest difference in runtime. The first option you list, is a different story. New collection creation can be avoided, because that could result in overhead. That's where "view" collections come in (see this good example I spotted)
In Scala, what does "view" do?
If you had the apply several mapping operations you might do this:
val result = elements.view.map(f1).map(f2).force
(force at the end, causes all functions to evaluate)
The 2nd set of examples above would maybe be a tiny bit faster, but the "view" option could make your code more readable if you had a lot of these or complex anonymous functions used in the mapping.
Composing functions to produce a single pass transformation will probably gain you some performance, but will quickly become unreadable. Consider using views as an alernative. While this will create intermediate collections:
val result = elements.map(f1).map(f2)
This will perform lazy evaluation and will perform functional composition the same way you do:
val result = elements.view.map(f1).map(f2)
Notice that result type will be SeqView so you might want to convert it to list later with toList.
I have a class with few Int and Double fields. What is the fastes way to copy all data from one object to another?
class IntFields {
private val data : Array[Int] = Array(0,0)
def first : Int = data(0)
def first_= (value: Int) = data(0) = value
def second : Int = data(1)
def second_= (value : Int) = data(1) = value
def copyFrom(another : IntFields) =
Array.copy(another.data,0,data,0,2)
}
This is the way I may suggest. But I doubt it is really effective, since I have no clear understanding scala's internals
update1:
In fact I'm searching for scala's equivalent of c++ memcpy. I need just take one simple object and copy it contents byte by byte.
Array copying is just a hack, I've googled for normal scala supported method and find none.
update2:
I've tried to microbenchmark two holders: simple case class with 12 variables and one backed up with array. In all benchmarks (simple copying and complex calculations over collection) array-based solution works slower for about 7%.
So, I need other means for simulating memcpy.
Since both arrays used for Array.copy are arrays of primitive integers (i.e. it is not the case that one of the holds boxed integers, in which case a while loop with boxing/unboxing would have been used to copy the elements), it is equally effective as the Java System.arraycopy is. Which is to say - if this were a huge array, you would probably see the difference in performance compared to a while loop in which you copy the elements. Since the array only has 2 elements, it is probably more efficient to just do:
def copyFrom(another: IntFields) {
data(0) = another.data(0)
data(1) = another.data(1)
}
EDIT:
I'd say that the fastest thing is to just copy the fields one-by-one. If performance is really important, you should consider using Unsafe.getInt - some report it should be faster than using System.arraycopy for small blocks: https://stackoverflow.com/questions/5574241/interesting-uses-of-sun-misc-unsafe
I am searching for an efficient a technique to find a sequence of Op occurences in a Seq[Op]. Once an occurence is found, I want to replace the occurence with a defined replacement and run the same search again until the list stops changing.
Scenario:
I have three types of Op case classes. Pop() extends Op, Push() extends Op and Nop() extends Op. I want to replace the occurence of Push(), Pop() with Nop(). Basically the code could look like seq.replace(Push() ~ Pop() ~> Nop()).
Problem:
Now that I call seq.replace(...) I will have to search in the sequence for an occurence of Push(), Pop(). So far so good. I find the occurence. But now I will have to splice the occurence form the list and insert the replacement.
Now there are two options. My list could be mutable or immutable. If I use an immutable list I am scared regarding performance because those sequences are usually 500+ elements in size. If I replace a lot of occurences like A ~ B ~ C ~> D ~ E I will create a lot of new objects If I am not mistaken. However I could also use a mutable sequence like ListBuffer[Op].
Basically from a linked-list background I would just do some pointer-bending and after a total of four operations I am done with the replacement without creating new objects. That is why I am now concerned about performance. Especially since this is a performance-critical operation for me.
Question:
How would you implement the replace() method in a Scala fashion and what kind of data structure would you use keeping in mind that this is a performance-critical operation?
I am happy with answers that point me in the right direction or pseudo code. No need to write a full replace method.
Thank you.
Ok, some considerations to be made. First, recall that, on lists, tail does not create objects, and prepending (::) only creates one object for each prepended element. That's pretty much as good as you can get, generally speaking.
One way of doing this would be this:
def myReplace(input: List[Op], pattern: List[Op], replacement: List[Op]) = {
// This function should be part of an KMP algorithm instead, for performance
def compare(pattern: List[Op], list: List[Op]): Boolean = (pattern, list) match {
case (x :: xs, y :: ys) if x == y => compare(xs, ys)
case (Nil, Nil) => true
case _ => false
}
var processed: List[Op] = Nil
var unprocessed: List[Op] = input
val patternLength = pattern.length
val reversedReplacement = replacement.reverse
// Do this until we finish processing the whole sequence
while (unprocessed.nonEmpty) {
// This inside algorithm would be better if replaced by KMP
// Quickly process non-matching sequences
while (unprocessed.nonEmpty && unprocessed.head != pattern.head) {
processed ::= unprocessed.head
unprocessed = unprocessed.tail
}
if (unprocessed.nonEmpty) {
if (compare(pattern, unprocessed)) {
processed :::= reversedReplacement
unprocessed = unprocessed drop patternLength
} else {
processed ::= unprocessed.head
unprocessed = unprocessed.tail
}
}
}
processed.reverse
}
You may gain speed by using KMP, particularly if the pattern searched for is long.
Now, what is the problem with this algorithm? The problem is that it won't test if the replaced pattern causes a match before that position. For instance, if I replace ACB with C, and I have an input AACBB, then the result of this algorithm will be ACB instead of C.
To avoid this problem, you should create a backtrack. First, you check at which position in your pattern the replacement may happen:
val positionOfReplacement = pattern.indexOfSlice(replacement)
Then, you modify the replacement part of the algorithm this:
if (compare(pattern, unprocessed)) {
if (positionOfReplacement > 0) {
unprocessed :::= replacement
unprocessed :::= processed take positionOfReplacement
processed = processed drop positionOfReplacement
} else {
processed :::= reversedReplacement
unprocessed = unprocessed drop patternLength
}
} else {
This will backtrack enough to solve the problem.
This algorithm won't deal efficiently, however, with multiply patterns at the same time, which I guess is where you are going. For that, you'll probably need some adaptation of KMP, to do it efficiently, or, otherwise, use a DFA to control possible matchings. It gets even worse if you want to match both AB and ABC.
In practice, the full blow problem is equivalent to regex match & replace, where the replace is a function of the match. Which means, of course, you may want to start looking into regex algorithms.
EDIT
I was forgetting to complete my reasoning. If that technique doesn't work for some reason, then my advice is going with an immutable tree-based vector. Tree-based vectors enable replacement of partial sequences with low amount of copying.
And if that doesn't do, then the solution is doubly linked lists. And pick one from a library with slice replacement -- otherwise you may end up spending way too much time debugging a known but tricky algorithm.