Find bottleneck in Scala merge algorithm

Find bottleneck in Scala merge algorithm - performance

I am learning Scala and as a starting point I am trying to write a mergeSort algorithm. I am having problem with the performance of the merge part of it.
I know that there are other implementations on this site but I would like to know why my one is not working well.
This is my code:
#tailrec
def merge(l1:List[Int], l2:List[Int], acc:List[Int]): List[Int] = {
if(l1.isEmpty || l2.isEmpty) l1 ++ l2 ++ acc
else if(l1.last> l2.last) merge(l1.init, l2, l1.last :: acc)
else merge(l1, l2.init, l2.last :: acc)
}
val a1 = List(1,4,65,52151)
val a2 = List(2,52,124,5251,124125125)
println(merge(a1, a2, List()))
How can you see the merge function is tail recursive and (if I am not wrong) the list methods that I am using should take constant time.
The code gets very slow with a list of 100000 elements.

last and init are terribly expensive on List: O(N). The efficient operations are head and tail: O(1). If you can't work at the start, reverse the lists up front (O(N) but just once, not at each iteration), or reverse your output at the end, but you need to work at the beginning of the list.

The best way to find bottlenecks is with a profiler. I understand netbeans has a free one; if you can get jprofiler or yourkit they're very nice to use. In this specific case I'd point out that last and init are O(n), because List is a (singly) linked list.

Related

Writing Bubble Sort using Foldl or Foldr in SML

Is there a way to implement bubble sort using the Foldl or Foldr methods available in SML? Any guidance would be helpful.

I just wrote an implementation in OCaml to demonstrate the technique to my own satisfaction.
I broke the sort process into two parts. One is a compare-and-swap function that is called via fold_left (foldl). This function has the type (with the bool being whether a swap has occurred in this scan):
bool * 'a list -> 'a -> bool * 'a list
Each time it runs it does a swap if appropriate, building up a new list in its result that is composed in reverse order from the input. (This is necessary because of foldl's left-to-right, tail-recursive behavior.) It also keeps track of whether any swaps were made in this scan of the list (necessary so we know when to stop sorting).
The other function is recursive and simply keeps invoking the scan until no change is made. This function also has a boolean that it toggles on each call to keep track of whether the list is currently reversed. When it sees that no swap was made in the latest scan, then it returns the resulting list. If the list is currently reversed then it reverses it one last time before returning it.
This is the type of the second function (with the bool here being whether the list is currently reversed):
bool -> 'a list -> 'a list
It should be equally possible to write a bubble sort that uses foldr. It won't be tail-recursive (because foldr is not), and since it scans the list from right-to-left, you won't have to deal with the reversing issue that you have with foldl.

I know it is too late to answer your question, but hopefully this will help:
fun bsort [] = []
| bsort [x] = [x]
| bsort (x::y::xs) =
if(y<x) then
y::bsort(x::xs)
else
x::bsort(y::xs);
fun bubblesort [] = []
| bubblesort (x::xs) = bsort(x::(bubblesort(xs)));
Remember, we have to do the bubble sort till the list is completely sorted.

Benefits of differential lists with lazy evaluation

I struggle to understend why ++ is considered O(n) while differential lists are considered "O(1)".
In case of ++ let's assume it's defined as:
(++) :: [a] -> [a] -> [a]
(a:as) ++ b = a:(as ++ b)
[] ++ b = b
Now if we need to get an access first element in a ++ b we can do it in O(1) (assuming that a can be made HNF in 1 step), similarly the second etc. It changes with appending multiple lists setting to Ω(1)/O(m), where m is number of unevaluated appendings. Accessing last element can be done with Θ(n + m), where n is length of list, unless I missed something. If we have differential list we also have access to first element in Θ(m) while last element is in Θ(n + m).
What do I miss?

Performance in theory
The O(1) refers to the fact that append for DLists is just (.) which takes one reduction, wheras (++) is O(n).
Worst case
++ has quadratic performance when you use it to repeatedly add to the end of an existing string, because each time you add another list you iterate through the existing list, so
"Existing long ...... answer" ++ "newbit"
traverses "Existing long ....... answer" each time you append a new bit.
On the other hand,
("Existing long ..... answer" ++ ) . ("newbit"++)
is only going to actually traverse "Existing long ...... answer" once, when the function chain is applied to [] to convert to a list.
Experience says
Years ago when I was a young Haskeller, I wrote a program that was searching for a counterexample to a conjecture, so was outputting data to disk constantly until I stopped it, except that once I took off the testing brakes, it output precisely nothing because of my left-associative tail recursive build-up of a string, and I realised my program was insufficiently lazy - it couldn't output anything until it had appended the final string, but there was no final string! I rolled my own DList (this was in the millenium preceding the one in which the DList library was written), and lo my program ran beautifully and happily churned out reams and reams of non-counterexamples on the server for days until we gave up on the project.
If you mess with large enough examples, you can see the performance difference, but it doesn't matter for small finite output. It certainly taught me the benefits of laziness.
Toy example
Silly example to prove my point:
plenty f = f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f.f
alot f = plenty f.plenty f.plenty f
Let's do the two sorts of appending, first the DList way
compose f = f . ("..and some more.."++)
append xs = xs ++ "..and some more.."
insufficiently_lazy = alot append []
sufficiently_lazy = alot compose id []
gives:
ghci> head $ sufficiently_lazy
'.'
(0.02 secs, 0 bytes)
ghci> head $ insufficiently_lazy
'.'
(0.02 secs, 518652 bytes)
and
ghci> insufficiently_lazy
-- (much output skipped)
..and some more....and some more....and some more.."
(0.73 secs, 61171508 bytes)
ghci> sufficiently_lazy
-- (much output skipped)
..and some more....and some more....and some more.."
(0.31 secs, 4673640 bytes).
-- less than a tenth the space and half the time
so it's faster in practice as well as in theory.

DLists are often most useful if you're repeatedly appending list fragments. To wit,
foldl1 (++) [a,b,c,d,e] == (((a ++ b) ++ c) ++ d) ++ e
is really bad while
foldr1 (++) [a,b,c,d,e] == a ++ (b ++ (c ++ (d ++ e)))
still is n steps away from the nth position. Unfortunately, you often build strings by traversing a structure and appending to the end of the accumulating string, so the left fold scenario isn't uncommon. For this reason, DLists are most useful in situations where you're repeatedly building up a string such as the Blaze/ByteString Builder libraries.

[After further thinking and reading other answers I believe I know what went wrong - but I don't think either explained it fully so I'm adding my own.]
Assume you had the lists a1:a2:[] b1:b2:[] and c1:c2:[]. Now you append them (a ++ b) ++ c. That gives:
(a1:a2:[] ++ b1:b2:[]) ++ c1:c2:[]
Now to take a head you need to take O(m) steps where m is number of appendings. This gives thunks as follows:
a1:((a2:[] ++ b1:b2:[]) ++ c1:c2:[])
To give next element you need to perform m or m-1 steps (I assumed it to be free in my reasoning). So after 2m or 2m-1 steps the view is as follows:
a1:a2(([] ++ b1:b2:[]) ++ c1:c2:[])
And so on. In worst case it gives m*n time to traverse the list as the traversal of the thunks is done each time.
EDIT - it looks like the answer to duplicate have even better pictures.

Optimize "list" indexing in Haskell

Say you have a very deterministic algorithm that produces a list, like inits in Data.List. Is there any way that a Haskell compiler can optimally perform an "indexing" operation on this algorithm without actually generating all the intermediate results?
For example, inits [1..] !! 10000 is pretty slow. Could a compiler somehow deduce what inits would produce on the 10000th element without any recursion, etc? Of course, this same idea could be generalized beyond lists.
Edit: While inits [1..] !! 10000 is constant, I am wondering about any "index-like" operation on some algorithm. For example, could \i -> inits [1..] !! i be optimized such that no [or minimal] recursion is performed to reach the result for any i?

Yes and no. If you look at the definition for Data.List.inits:
inits :: [a] -> [[a]]
inits xs = [] : case xs of
[] -> []
x : xs' -> map (x :) (inits xs')
you'll see that it's defined recursively. That means that each element of the resulting list is built on the previous element of the list. So if you want any nth element, you have to build all n-1 previous elements.
Now you could define a new function
inits' xs = [] : [take n xs | (n, _) <- zip [1..] xs]
which has the same behavior. If you try to take inits' [1..] !! 10000, it finishes very quickly because the successive elements of the list do not depend on the previous ones. Of course, if you were actually trying to generate a list of inits instead of just a single element, this would be much slower.
The compiler would have to know a lot of information to be able to optimize away recursion from a function like inits. That said, if a function really is "very deterministic", it should be trivial to rewrite it in a non recursive way.

Breaking lists at index

I have a performance question today.
I am making a (Haskell) program and, when profiling, I saw that most of the time is spent in the function you can find below. Its purpose is to take the nth element of a list and return the list without it besides the element itself. My current (slow) definition is as follows:
breakOn :: Int -> [a] -> (a,[a])
breakOn 1 (x:xs) = (x,xs)
breakOn n (x:xs) = (y,x:ys)
where
(y,ys) = breakOn (n-1) xs
The Int argument is known to be in the range 1..n where n is the length of the (never null) list (x:xs), so the function never arises an error.
However, I got a poor performance here. My first guess is that I should change lists for another structure. But, before start picking different structures and testing code (which will take me lot of time) I wanted to ask here for a third person opinion. Also, I'm pretty sure that I'm not doing it in the best way. Any pointers are welcome!
Please, note that the type a may not be an instance of Eq.
Solution
I adapted my code tu use Sequences from the Data.Sequence module. The result is here:
import qualified Data.Sequence as S
breakOn :: Int -> Seq a -> (a,Seq a)
breakOn n xs = (S.index zs 0, ys <> (S.drop 1 zs))
where
(ys,zs) = S.splitAt (n-1) xs
However, I still accept further suggestions of improvement!

Yes, this is inefficient. You can do a bit better by using splitAt (which unboxes the number during the recursive bit), a lot better by using a data structure with efficient splitting, e.g. a fingertree, and best by massaging the context to avoid needing this operation. If you post a bit more context, it may be possible to give more targeted advice.

Prelude functions are generally pretty efficient. You could rewrite your function using splitAt, as so:
breakOn :: Int -> [a] -> (a,[a])
breakOn n xs = (z,ys++zs)
where
(ys,z:zs) = splitAt (n-1) xs

is this implementation of merge sort good?

I've just started to learn Haskell last night and I've never used a functional programming language before.
I just want to know if my implemention of merge sort is good or bad and what exactly is good or bad.
Maybe it's even wrong - Well it does sort but maybe the Algorithm is not what I think what merge sort is.
Just tell me everything I could improve here. I by myself think its a pretty clear and simple implementation.
Thanks for your advice, here's the code :)
merge [] ys = ys
merge xs [] = xs
merge xs ys = sorted : merge left right
where
sorted = if head(xs) < head(ys) then head(xs) else head(ys)
left = if head(xs) <= head(ys) then tail(xs) else xs
right = if head(xs) > head(ys) then tail(ys) else ys
msort [] = []
msort [x] = [x]
msort xs = merge (msort left) (msort right)
where
left = take (div (length xs) 2) xs
right = drop (div (length xs) 2) xs

Well, first of all, we can rewrite merge to be a little more elegant using pattern matching
merge [] ys = ys
merge xs [] = xs
merge xs#(x:xs1) ys#(y:ys1)
| x <= y = x : merge xs1 ys
| otherwise = y : merge xs ys1
In general you should avoid using head and tail since they are a bit unsafe (they raise an error for the empty list) and use pattern matching whenever possible.
The implementation of msort is pretty much spot on, except that we can split the list in a more efficient way. That's because length xs - takes O(N) to complete. The compiler might save you and cache the result of the length call so that the second call to length won't traverse the list again. But the take and drop will pretty much cause another two traversals thus splitting the list using 3 traversals which may prove to be expensive. We can do better by splitting the list in two lists - the first one containing the elements on the odd positions and the second list with the elements placed on the even positions, like so:
msort [] = []
msort [x] = [x]
msort xs = merge (msort first) (msort second)
where
(first, second) = splitInHalves xs
splitInHalves [] = ([], [])
splitInHalves [x] = ([x], [])
splitInHalves (x:y:xs) =
let (xs1, ys1) = splitInHalves xs
in (x:xs1, y:ys1)
This gets you the same Merge Sort in O(NlogN) time. It feels different because you would probably implement it in place (by modifying the original list) in an imperative language such as C. This version is slightly more costly on the memory, but it does have it's advantages - it is more easy to reason about, so it is more maintainable, and also it is very easy to parallelize without being concerned of anything else except the algorithm itself - which is exactly what a good programming language should provide for the developers that use it.
EDIT 1 :
If the syntax is a bit much, here are some resources:
Pattern Matching - the bit with the # symbol is called an as-pattern. You'll find it in there
let is a keyword used to declare a variable to be used in the expression that follows it (whereas where binds a variable in the expression that precedes it). More on Haskell syntax, including guards (the things with | condition = value) can be found here, in this chapter of Learn You a Haskell
EDIT 2 :
#is7s proposed a far more concise version of splitInHalves using the foldr function:
splitInHalves = foldr (\x (l,r) -> (x:r,l)) ([],[])
EDIT 3 :
Here is another answer which provides an alternative implementation of merge sort, which also has the property of being stable:
Lazy Evaluation and Time Complexity
Hope this helps and welcome to the wonderful world of Functional Programming !

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Find bottleneck in Scala merge algorithm - performance

last and init are terribly expensive on List: O(N). The efficient operations are head and tail: O(1). If you can't work at the start, reverse the lists up front (O(N) but just once, not at each iteration), or reverse your output at the end, but you need to work at the beginning of the list.

The best way to find bottlenecks is with a profiler. I understand netbeans has a free one; if you can get jprofiler or yourkit they're very nice to use. In this specific case I'd point out that last and init are O(n), because List is a (singly) linked list.

Related

Writing Bubble Sort using Foldl or Foldr in SML

Benefits of differential lists with lazy evaluation

Optimize "list" indexing in Haskell

Breaking lists at index

is this implementation of merge sort good?

Categories

Resources