IndexedSeq.last complexity

IndexedSeq.last complexity - performance

When working with indexed collections (most often immutable Vectors) I am often using coll.last as what I supposed to be a convenient short-cut to coll(coll.size-1). When randomly inspecting my sources, I have clicked to see the last implementation, and the IntelliJ IDE took me to TraversableLike.last implementation, which traverses all elements to eventually reach the last one.
This was a surprise to me, and I am not sure now what is the reason for this. Is last really implemented this way? Is there some reason preventing last to be implemented for IndexedSeq (or perhaps for IndexedSeqLike) efficiently?
(Scala SDK used is 2.11.4)

IndexedSeq does not override last (it only inherits it from TraversableLike) - the fact that a particular sequence supports indexed access does not necessarily make indexed lookups faster than traversals. However, such optimized implementations are given in IndexedSeqOptimized, which I would expect many implementations to inherit from. In the specific case of Vector, last is overridden explicitly in the class itself.

IndexedSeq has constant access time for the arbitrary element. LinearSeq has linear time. TraversableLike is just common interface and you may find that it's overriden inside IndexedSeqOptimized trait:
A template trait for indexed sequences of type IndexedSeq[A] which
optimizes the implementation of several methods under the
assumption of fast random access.
def last: A = if (length > 0) this(length - 1) else super.last
You may also find the quick random access implementation inside Vector.getElem - it uses a tree of arrays with high branching factor, so usually it's O(1) for apply. It doesn't use IndexedSeqOptimized, but it has its own overriden last:
override /*TraversableLike*/ def last: A = {
if (isEmpty) throw new UnsupportedOperationException("empty.last")
apply(length-1)
}
So it's a little mess inside Scala collections, which is very common for Scala internals. Anyway last on IndexedSeqs is O(1) de facto, regardless such tricky collections architecture.
The Scala collections intricacy is actually an active topic. A talk (and slides) with Scala's collection framework criticism may be found at Paul Phillips: Scala Collections: Why Not?, and Paul Phillips is developing his alternate version of std.

Related

Better or Faster: String.Contains vs List.Contains

I know this is a stupid question, but I feel like someone might want to know (or inform a <redacted> co-worker about) this. I am not attaching a specific programming language to this since I think it could apply to all of them. Correct me if I am wrong about that.
Main question: Is it faster and/or better to look for entries in a constant String or in a List<String>?
Details: Let's say that I want to see if a given extension is in a list of supported extensions. Which of the following is the best (in regards to programming style) and/or fastest:
const String SUPPORTED = ".exe.bin.sh.png.bmp" /*etc*/;
public static bool IsSupported(String ext){
ext = Normalize(ext);//put the extension in some expected state like lowercase
//String.Contains
return SUPPORTED.Contains(ext);
}
final static List<String> SUPPORTED = MakeAnImmutableList(".exe", ".bin", ".sh",
".png",".bmp" /*etc*/);
public static bool IsSupported(String ext){
ext = Normalize(ext);//put the extension in some expected state like lowercase
//List.Contains
return SUPPORTED.Contains(ext);
}

First, it is important to note that the solutions are not functionally equivalent. The substring search will return true for strings like x and exe.bin while the List<String>.contains() will not. In that sense the List<String> version is likely to be closer to the semantic you want. Any possible performance comparison should keep that in mind.
Now, on to performance.
Theoretical
From an asymptotic and algorithm complexity point of view, the List<String>.contains() approach will be faster than the other alternative as the length of the strings grows. Conceptually, the String.contains version needs to look for a match at each position in the SUPPORTED String, while the List.contains() version only needs to match starting at the start of each substring - as soon as finds a mismatch in the current candidate, it skips to the next. This is related to the above note that the options aren't functionally equivalent: the String.contains options can in theory match a much wider universe of inputs, so it has to do more work before rejecting candidates.
Complexity-wise, this difference could be something like O(N) for List.contains() versus O(N^2) for String.contains() if you take N to be the number of candidates and assume each candidate has a bounded length, and to the String.contains() method in the usual brute-force "look for a match starting at each position" algorithm. As it turns out, the Java String.contains() implementation isn't exactly doing the basic O(N^2) search, but it isn't doing Boyer-Moore either. In general you can expect that once the substrings get long enough, the List.String approach will be faster.
Close(r) to the Metal
At a closer to the metal perspective, both approaches have their advantages. The String.contains() solution avoids the overhead of iterating over the List elements: the entire call will be spent in the intrinsified String.contains implementation, and all the chars making up the entire SUPPORTED Strings are contiguous, so this is memory-friendly. The List.contains() approach will spend a lot of time doing the double-dereferencing needed to go from each List element to the contained String and then to the contained char[] array, and this is likely to dominate if the strings you are comparing against are very short.
On other hand, the List.contains solution ultimately calls into String.equals which is likely implemented in terms of Arrays.equal(char[], char[]) which is heavily optimized with SSE and AVX intrinsics on x86 platforms and likely to be blazing fast, even compared to the optimized version of String.contains(). So if the Strings become long, again expect List.contains() to pull ahead.
All that said, there is a simple, canonical way to do this quickly: a HashSet<String> with all the candidate strings. That's just a simple String.hash() (which is cached and so often "free") and a single lookup in the hash table.

Well, it can vary from implementation to implementation, but if want to look to this problem in a generalized way, lets see.
If you want to look for a specific sub-string inside a string, let say a file extension inside a immutable string containing different extensions, you just need to traverse the string with extensions once.
In the other hand, with a list of immutable strings, still need to traverse each one of the string in that list plus the overhead of iterate over that list.
As a conclusion, in a generalized way, you can see that using a list to store the strings need more processing.
But, you can look to both solutions by its readability, maintainability, etc. For example, if you want to add or remove new extensions or apply more complex operations, may worth the overhead using a list of string.

How does JIT optimize branching while processing elements of collections? (in Scala)

This is a question about performance of code written in Scala.
Consider the following two code snippets, assume that x is some collection containing ~50 million elements:
def process(x: Traversable[T]) = {
processFirst x.head
x reduce processPair
processLast x.last
}
Versus something like this (assume for now we have some way to determine if we're operating on the first element versus the last element):
def isFirstElement[T](x: T) = ???
def isLastElement[T](x: T) = ???
def process(x: Traversable[T]) = {
x reduce {
(left, right) =>
if (isFirstElement(left)
processFirst(left)
else if (isLastElement(right))
processLast(right)
processPair(left, right)
}
}
Which approach is faster? and for ~50 million elements, how much faster?
It seems to me that the first example would be faster because there are fewer conditional checks occurring for all but the first and last elements. However for the latter example there is some argument to suggest that the JIT might be clever enough to optimize away those additional head/last conditional checks that would otherwise occur for all but the first/last elements.
Is the JIT clever enough to perform such operations? The obvious advantage of the latter approach is that all business can be placed in the same function body while in the latter case business must be partitioned into three separate function bodies invoked separately.
** EDIT **
Thanks for all the great responses. While I am leaving the second code snippet above to illustrate its incorrectness, I want to revise the first approach slightly to reflect better the problem I am attempting to solve:
// x is some iterator
def process(x: Iterator[T]) = {
if (x.hasNext)
{
var previous = x.next
var current = null
processFirst previous
while(x.hasNext)
{
current = x.next
processPair(previous, current)
previous = current
}
processLast previous
}
}
While there are no additional checks occurring in the body, there is an additional reference assignment that appears to be unavoidable (previous = current). This is also a much more imperative approach that relies on nullable mutable variables. Implementing this in a functional yet high performance manner would be another exercise for another question.
How does this code snippet stack-up against the last of the two examples above? (the single-iteration block approach containing all the branches). The other thing I realize is that the latter of the two examples is also broken on collections containing fewer than two elements.

If your underlying collection has an inexpensive head and last method (not true for a generic Traversable), and the reduction operations are relatively inexpensive, then the second way takes about 10% longer (maybe a little less) than the first on my machine. (You can use a var to get first, and you can keep updating a second far with the right argument to obtain last, and then do the final operation outside of the loop.)
If you have an expensive last (i.e. you have to traverse the whole collection), then the first operation takes about 10% longer (maybe a little more).
Mostly you shouldn't worry too much about it and instead worry more about correctness. For instance, in a 2-element list your second code has a bug (because there is an else instead of a separate test). In a 1-element list, the second code never calls reduce's lambda at all, so again fails to work.
This argues that you should do it the first way unless you're sure last is really expensive in your case.
Edit: if you switch to a manual reduce-like-operation using an iterator, you might be able to shave off up to about 40% of your time compared to the expensive-last case (e.g. list). For inexpensive last, probably not so much (up to ~20%). (I get these values when operating on lengths of strings, for example.)

First of all, note that, depending on the concrete implementation of Traversable, doing something like x.last may be really expensive. Like, more expensive than all the rest of what's going on here.
Second, I doubt the cost of conditionals themselves is going to be noticeable, even on a 50 million collection, but actually figuring out whether a given element is the first or the last, might again, depending on implementation, get pricey.
Third, JIT will not be able to optimize the conditionals away: if there was a way to do that, you would have been able to write your implementation without conditionals to begin with.
Finally, if you are at a point where it starts looking like an extra if statement might affect performance, you might consider switching to java or even "C". Don't get me wrong, I love scala, it is a great language, with lots of power and useful features, but being super-fast just isn't one of them.

Efficiency of appending to vectors

Appending an element onto a million-element ArrayList has the cost of setting one reference now, and copying one reference in the future when the ArrayList must be resized.
As I understand it, appending an element onto a million-element PersistenVector must create a new path, which consists of 4 arrays of size 32. Which means more than 120 references have to be touched.
How does Clojure manage to keep the vector overhead to "2.5 times worse" or "4 times worse" (as opposed to "60 times worse"), which has been claimed in several Clojure videos I have seen recently? Has it something to do with caching or locality of reference or something I am not aware of?
Or is it somehow possible to build a vector internally with mutation and then turn it immutable before revealing it to the outside world?
I have tagged the question scala as well, since scala.collection.immutable.vector is basically the same thing, right?

Clojure's PersistentVector's have special tail buffer to enable efficient operation at the end of the vector. Only after this 32-element array is filled is it added to the rest of the tree. This keeps the amortized cost low. Here is one article on the implementation. The source is also worth a read.
Regarding, "is it somehow possible to build a vector internally with mutation and then turn it immutable before revealing it to the outside world?", yes! These are known as transients in Clojure, and are used for efficient batch changes.

Cannot tell about Clojure, but I can give some comments about Scala Vectors.
Persistent Scala vectors (scala.collection.immutable.Vectors) are much slower than an array buffer when it comes to appending. In fact, they are 10x slower than the List prepend operation. They are 2x slower than appending to Conc-trees, which we use in Parallel Collections.
But, Scala also has mutable vectors -- they're hidden in the class VectorBuilder. Appending to mutable vectors does not preserve the previous version of the vector, but mutates it in place by keeping the pointer to the rightmost leaf in the vector. So, yes -- keeping the vector mutable internally, and than returning an immutable reference is exactly what's done in Scala collections.
The VectorBuilder is slightly faster than the ArrayBuffer, because it needs to allocate its arrays only once, whereas ArrayBuffer needs to do it twice on average (because of growing). Conc.Buffers, which we use as parallel array combiners, are twice as fast compared to VectorBuilders.
Benchmarks are here. None of the benchmarks involve any boxing, they work with reference objects to avoid any bias:
comparison of Scala List, Vector and Conc
comparison of Scala ArrayBuffer, VectorBuilder and Conc.Buffer
More collections benchmarks here.
These tests were executed using ScalaMeter.

Scala performance question

In the article written by Daniel Korzekwa, he said that the performance of following code:
list.map(e => e*2).filter(e => e>10)
is much worse than the iterative solution written using Java.
Can anyone explain why? And what is the best solution for such code in Scala (I hope it's not a Java iterative version which is Scala-fied)?

The reason that particular code is slow is because it's working on primitives but it's using generic operations, so the primitives have to be boxed. (This could be improved if List and its ancestors were specialized.) This will probably slow things down by a factor of 5 or so.
Also, algorithmically, those operations are somewhat expensive, because you make a whole list, and then make a whole new list throwing a few components of the intermediate list away. If you did it in one swoop, then you'd be better off. You could do something like:
list collect (case e if (e*2>10) => e*2)
but what if the calculation e*2 is really expensive? Then you could
(List[Int]() /: list)((ls,e) => { val x = e*2; if (x>10) x :: ls else ls }
except that this would appear backwards. (You could reverse it if need be, but that requires creating a new list, which again isn't ideal algorithmically.)
Of course, you have the same sort of algorithmic problems in Java if you're using a singly linked list--your new list will end up backwards, or you have to create it twice, first in reverse and then forwards, or you have to build it with (non-tail) recursion (which is easy in Scala, but inadvisable for this sort of thing in either language since you'll exhaust the stack), or you have to create a mutable list and then pretend afterwards that it's not mutable. (Which, incidentally, you can do in Scala also--see mutable.LinkedList.)

Basically it's traversing a list twice. Once for multiplying every element with two. And then another time to filter the results.
Think of it as one while loop producing a LinkedList with the intermediate results. And then another loop applying the filter to produce the final results.
This should be faster:
list.view.map(e => e * 2).filter(e => e > 10).force

The solution lies mostly with JVM. Though Scala has a workaround in the figure of #specialization, that increases the size of any specialized class hugely, and only solves half the problem -- the other half being the creation of temporary objects.
The JVM actually does a good job optimizing a lot of it, or the performance would be even more terrible, but Java does not require the optimizations that Scala does, so JVM does not provide them. I expect that to change to some extent with the introduction of SAM not-real-closures in Java.
But, in the end, it comes down to balancing the needs. The same while loop that Java and Scala do so much faster than Scala's function equivalent can be done faster yet in C. Yet, despite what the microbenchmarks tell us, people use Java.

Scala approach is much more abstract and generic. Therefore it is hard to optimize every single case.
I could imagine that HotSpot JIT compiler might apply stream- and loop-fusion to the code in the future if it sees that the immediate results are not used.
Additionally the Java code just does much more.
If you really just want to mutate over a datastructure, consider transform.
It looks a bit like map but doesn't create a new collection, e. g.:
val array = Array(1,2,3,4,5,6,7,8,9,10).transform(_ * 2)
// array is now WrappedArray(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
I really hope some additional in-place operations will be added ion the future...

To avoid traversing the list twice, I think the for syntax is a nice option here:
val list2 = for(v <- list1; e = v * 2; if e > 10) yield e

Rex Kerr correctly states the major problem: Operating on immutable lists, the stated piece of code creates intermediate lists in memory. Note that this is not necessarily slower than equivalent Java code; you just never use immutable datastructures in Java.
Wilfried Springer has a nice, Scala idomatic solution. Using view, no (manipulated) copies of the whole list are created.
Note that using view might not always be ideal. For example, if your first call is filter that is expected to throw away most of the list, is might be worthwhile to create the shorter version explicitly and use view only after that in order to improve memory locality for later iterations.

list.filter(e => e*2>10).map(e => e*2)
This attempt reduces first the List. So the second traversing is on less elements.

Linq to Objects: filtering performance question

I was thinking about the way linq computes and it made me wonder:
If I write
var count = collection.Count(o => o.Category == 3);
Will that perform any differently than:
var count = collection.Where(o => o.Category == 3).Count();
After all, IEnumerable<T>.Where() will return IEnumerable<T> which doesn't implement Count property, so a subsequent Count() would actually have to walk through the items to determine the count which should cause extra time being spent on this.
I wrote some quick test code to get some metrics but they seem to beat each other at random. I won't put in the test code here initially, but if anyone requests, I'll get it in.
So, am I missing something?

There won't be a lot in it, really - both forms will iterate over the collection, check the predicate against each item, and count the matches. Both approaches will stream the data - it's not like Where is actually building an in-memory list of all matches, for example.
The first form has one fewer (thin) layer of indirection in, that's all. The main reason for using it (IMO) is for readability/simplicity, rather than performance.

As Jon Skeet says, both techniques will have to essentially do the same thing - enumerate the sequence while conditionally incrementing a counter when the predicate is matched. Any performance differences between the two should be slight: insignificant for almost all use-cases. If there is a token winner though, I would think it should be the first one, since from reflector it appears that the overload ofCountthat takes a predicate uses its ownforeachto enumerate rather than the more obvious way of offloading the work to a streaming aWhereinto a parameterlessCountas in your second example. This means technique #1 is likely to have two minor performance benefits:
Fewer argument validation (null-tests etc.) checks. Technique #2's Count will also check if its (piped) input is an ICollection or ICollection<T> , which it can't possibly be.
A single constructed enumerator vs two enumerators piped together (an additional state-machine has costs).
There is one minor in favour of technique #2 point though:Whereis slightly more sophisticated in constructing an enumerator for the source-sequence; it uses a different one for lists and arrays. This may make it more performant in certain scenarios.
Of course, I should reiterate that I might be plain wrong about my analysis - reasoning about performance through static code analysis, especially when the differences are likely to be slight, is not a good idea. There is only one way to find out - measuring the execution times for your specific setup.
FYI, the source I reflected was from .NET 3.5 SP1.

I know what you are thinking here. At least, I think I do; Count() will look to see if Count is available as a property, and will simply return that if so. Otherwise, it has to enumerate the items to get its return value.
The version of Count() which accepts the predicate, though, will always cause the collection to be iterated, since it has to do it to see which ones match.

Above answers make good points, consider also that if you break away into any Linq-To-X implementations that deferred execution (Linq to Sql being the primary), the Expression parameters used in these methods may cause different results.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio