Haskell performance : Inversion count algorithm - performance

I have decided to solve first programing assignment from Standford algorithm course https://class.coursera.org/algo-005 using Haskell. Despite I am very new to language I implemented it much faster than in c++. I have 6+ years of work experience in c++ so it impressed me a bit. But performance is disappointing: 0.19 sec (c++) vs 9.88 (haskell) version.
How can I improve performance of Haskell implementation so it can be comparable to c++?
Here is my code in Haskell
data SortedList = SortedList {
inversionCount :: Int,
list :: [Int]
} deriving (Show)
-- first list accumulator
packm :: Int -> SortedList -> Int -> SortedList
packm x (SortedList count xs) add = SortedList (count + add) (x:xs)
merge2 :: [Int] -> [Int] -> SortedList
merge2 [] xs = SortedList 0 xs
merge2 xs [] = SortedList 0 xs
merge2 xlist#(x:xs) ylist#(y:ys)
| x < y = packm x (merge2 xs ylist) 0
| otherwise = packm y (merge2 xlist ys) $ length xlist
countAndMerge :: SortedList -> SortedList -> SortedList
countAndMerge (SortedList lcount lxs) (SortedList rcount rxs) =
let merged = merge2 lxs rxs
in SortedList (lcount + rcount + inversionCount merged) $ list merged
mergesort :: [Int] -> SortedList
mergesort [] = SortedList 0 []
mergesort [x] = SortedList 0 [x]
mergesort xs =
let leftsorted = mergesort $ take halfElements xs
rightsorted = mergesort $ drop halfElements xs
in countAndMerge leftsorted rightsorted
where halfElements = length xs `div` 2
main = do
contents <- getContents
let intlist = [ read x :: Int | x <- (lines contents) ]
print $ inversionCount $ mergesort intlist

The biggest problem is that the asymptotic performance isn't right to begin with; it's O(n^2 * log n) rather than the optimal O(n * log n). The culprit is merge2:
| otherwise = packm y (merge2 xlist ys) $ length xlist
length xlist is O(n). Supposing a random input list, we need to compute length xlist on about half of the merge2 calls, thus making one level of merging O(n^2).

otherwise = packm y (merge2 xlist ys) $ length xlist
This computes length at every other step of the merge on the average. This makes the whole business quadratic.
If you track length of lists not by counting elements, but by passing the count down from the top level, you restore the O(N log N) behaviour. For a list of 100000 elements this means execution time goes down from 20 seconds to 0.45 second (on my machine with -O2).
Scaling it further up without changing the algorithm is problematic, because it currently runs in linear stack space, and cannot cope with 1 million elements with default RTS options. Change mergesort to a merge-adjacent-pairs version, it is likely to run much better.

Related

permutations acting on lists of length n

In math, when I want to rearrange a list of length n, I'll act on the list with a permutation. For example:
(1 2) * (x, y, z) = (y, x, z)
(1 n 2) * (v[1], v[2], ..., v[n]) = (v[n], v[1], ..., v[2])
perm * (v[1], v[2], ..., v[n]) = ( v[perm(1)], v[perm(2)], ..., v[perm(n)] )
How would I do this in Haskell?
I would use the input permutation to build a map from old indices to new indices.
import Prelude hiding ((*))
import qualified Data.Map as M
infixr 5 * -- right-associative so we can compose permutations conveniently
(*) :: [Int] -> [a] -> [a]
perm * xs = zipWith (\i _ -> xs !! M.findWithDefault i i ixMap) [0..] xs
where ixMap = M.fromList (zip perm (drop 1 perm ++ take 1 perm))
You can see it in action in the ghci prompt (though as usual in programming it uses 0-based rather than 1-based indexing):
> [0,1] * "xyz"
"yxz"
> [0,4,1] * "abcde"
"eacdb"
This costs O(n^2 log m) where n is the length of xs and m is the length of perm. You can reduce this to O(n log(nm)) by switching from (!!) to M.lookup for the indexing into xs, too.

Provably correct permutation in less than O(n^2)

Written in Haskell, here is the data type that proves that one list is a permutation of another:
data Belongs (x :: k) (ys :: [k]) (zs :: [k]) where
BelongsHere :: Belongs x xs (x ': xs)
BelongsThere :: Belongs x xs xys -> Belongs x (y ': xs) (y ': xys)
data Permutation (xs :: [k]) (ys :: [k]) where
PermutationEmpty :: Permutation '[] '[]
PermutationCons :: Belongs x ys xys -> Permutation xs ys -> Permutation (x ': xs) xys
With a Permutation, we can now permute a record:
data Rec :: (u -> *) -> [u] -> * where
RNil :: Rec f '[]
(:&) :: !(f r) -> !(Rec f rs) -> Rec f (r ': rs)
insertRecord :: Belongs x ys zs -> f x -> Rec f ys -> Rec f zs
insertRecord BelongsHere v rs = v :& rs
insertRecord (BelongsThere b) v (r :& rs) = r :& insertRecord b v rs
permute :: Permutation xs ys -> Rec f xs -> Rec f ys
permute PermutationEmpty RNil = RNil
permute (PermutationCons b pnext) (r :& rs) = insertRecord b r (permute pnext rs)
This works fine. However, permute is O(n^2) where n is the length of the record. I'm wondering if there is a way to get it to be any faster by using a different data type to represent a permutation.
For comparison, in a mutable and untyped setting (which I know is a very different setting indeed), we could apply a permutation to a heterogeneous record like this in O(n) time. You represent the record as an array of values and the permutation as an array of new positions (no duplicates are allowed and all digits must be between 0 and n). Applying the permutation is just iterating that array and indexing into the record's array with those positions.
I don't expect that an O(n) permutation is possible in a more rigorously typed settings. But it seems like O(n*log(n)) might be possible. I appreciate any feedback, and let me know if I need to clarify anything. Also, answers to this can use Haskell, Agda, or Idris depending on what it feels easier to communicate with.
A faster simple solution is to compare the sorted permutation of the permutations.
Given permutation A and B.
Then there exist the sorted permutations,
As = sort(A)
Bs = sort(B)
As is a permutation of A and Bs is a permutation of B.
If As == Bs then A is a permutation of B.
Thus the order of this algorithm is O(n log(n)) < O(n²)
And this is leading to the optimal solution.
Using a different storage of permutation yields O(n)
Using the statements from above, we are changing the storage format of each permutation into
the sorted data
the original unsorted data
To determine if a list is a permutation of another one, simple a comparison of the sorted data is necessary -> O(n).
This answers the question correctly, but the effort is hidden in creating the doubled data storage ^^ So it will depend on the use if this is a real advantage or not.

Haskell: head . mergeSort (for min element) in linear time?

In the HaskellWiki https://wiki.haskell.org/Performance/Laziness they introduce the merge-sort function as non-lazy
merge_sort [] = []
merge_sort [x] = [x]
merge_sort lst = let (e,o) = cleave lst
in merge (merge_sort e) (merge_sort o) where
merge :: (Ord a) => [a] -> [a] -> [a]
merge xs [] = xs
merge [] ys = ys
merge xs#(x:t) ys#(y:u)
| x <= y = x : merge t ys
| otherwise = y : merge xs u
since you first have to recursively cleave the list
cleave = cleave' ([],[]) where
cleave' (eacc,oacc) [] = (eacc,oacc)
cleave' (eacc,oacc) [x] = (x:eacc,oacc)
cleave' (eacc,oacc) (x:x':xs) = cleave' (x:eacc,x':oacc) xs
and then, going up the reduction layers, merge these. So a merge-sort runs in n(log n) time. But the composition
min xs = head . merge_sort $ xs
supposedly runs in linear time. I can't see why, as you still have to cleave every sublist until you arrive at the singleton/empty lists and then merge these to guarantee the first element of the returned list is the smallest of all. What am I missing?
But lazyness still comes into play with the definitions like min xs = head . merge_sort $ xs. In finding the minimal element this way only the necessary amount of comparisons between elements will be performed (O(n) a.o.t. O(nlogn) comparisons needed to fully sort the whole list).
You are right, it will have a time complexity of O(n log(n)), however if you read the above paragraph carefully you will see, that it is talking about the amount of comparisons. Only O(n) comparisions will be performed, because every merge application only has to produce one element, so it only has to compare the first two elements of its arguments. So you get n/2 comparisons at the leaves of the recursion plus n/4 one level up, then n/4,... all the way up to the top-level of the recursion. If you work it out you get n-1 comparisons.

Haskell Quicksort efficiency [duplicate]

This question already has answers here:
is it possible to do quicksort of a list with only one passing?
(3 answers)
Why is the minimalist, example Haskell quicksort not a "true" quicksort?
(12 answers)
Closed 8 years ago.
Here's an example from Learn you a Haskell:
quicksort :: (Ord a) => [a] -> [a]
quicksort [] = []
quicksort (x:xs) =
let smallerSorted = quicksort [a | a <- xs, a <= x]
biggerSorted = quicksort [a | a <- xs, a > x]
in smallerSorted ++ [x] ++ biggerSorted
It seems that the list is iterated twice for each recursion, once for each list comprehension. Is there some magic in the compiler that optimizes this? If not, how can this be fixed?
Edit: I don't care if this is a real quicksort. Ignore the quicksort. My question is about the efficiency of the two list comprehensions, and how you can modify this specific algorithm (quicksort or not) in order to avoid iterating xs twice per recursion.
No. As of now, GHC 7.8.2 is not smart enough to figure out the clever in place quicksort algorithm from the above quicksort definition. You can do the same thing in a single pass by defining quicksort as
import Data.List (partition)
quicksort :: (Ord a) => [a] -> [a]
quicksort [] = []
quicksort (x:xs) = let (psx1, psx2, psx3) = partition3 x (x:xs) in
quicksort psx1 ++ psx2 ++ quicksort psx3
partition3 _ [] = ([], [], [])
partition3 a (x:xs)
| a == x = (pxs1, x:pxs2, pxs3)
| a < x = (pxs1, pxs2, x:pxs3)
| a > x = (x:pxs1, pxs2, pxs3)
where (pxs1, pxs2, pxs3) = partition3 a xs
But you should check is it possible to do quicksort of a list with only one passing? as it is more efficient than the above version.

Fast obtention of all the subsets of size N in Haskell

The following (unoptimal) code generates all the subsets of size N for certain subset.
This code works but, as I said, is highly unoptimal. Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set
Can anybody suggest how to optimize the code?
import qualified Data.Set as Set
subsetsOfSizeN :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
subsetsOfSizeN n s
| Set.size s < n || n < 0 = error "subsetOfSizeN: wrong parameters"
| otherwise = doSubsetsOfSizeN n s
where doSubsetsOfSizeN n s
| n == 0 = Set.singleton Set.empty
| Set.size s == n = Set.singleton s
| otherwise =
case Set.minView s of
Nothing -> Set.empty
Just (firstS, restS) ->
let partialN n = doSubsetsOfSizeN n restS in
Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n
This is inspired by Pascal's triangle.
choose :: [b] -> Int -> [[b]]
_ `choose` 0 = [[]]
[] `choose` _ = []
(x:xs) `choose` k = (x:) `fmap` (xs `choose` (k-1)) ++ xs `choose` k
This code works but, as I said, is highly unoptimal.
Doesn't seem so terribly bad to me. The number of subsets of size k of a set of size n is n `choose` k which grows rather fast for k ~ n/2. So creating all the subsets must scale badly.
Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set.
Hmm, I found using lists to give better performance. Not asymptotically, I think, but a not negligible more-or-less constant factor.
But first, there is an inefficiency in your code that is simple to repair:
Set.map (Set.insert firstS) (partialN (n-1))
Note that Set.map must rebuild a tree from scratch. But we know that firstS is always smaller than any element in any of the sets in partialN (n-1), so we can use Set.mapMonotonic that can reuse the spine of the set.
And that principle is also what makes lists attractive, the subsets are generated in lexicographic order, so instead of Set.fromList we can use the more efficient Set.fromDistinctAscList. Transcribing the algorithm yields
onlyLists :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
onlyLists n s
| n == 0 = Set.singleton Set.empty
| Set.size s < n || n < 0 = error "onlyLists: out of range n"
| Set.size s == n = Set.singleton s
| otherwise = Set.fromDistinctAscList . map Set.fromDistinctAscList $
go n (Set.size s) (Set.toList s)
where
go 1 _ xs = map return xs
go k l (x:xs)
| k == l = [x:xs]
| otherwise = map (x:) (go (k-1) (l-1) xs) ++ go k (l-1) xs
which in the few benchmarks I've run is between 1.5 and 2× faster than the amended algorithm using Sets.
And that is in turn, in my criterion benchmarks, nearly twice as fast as dave4420's.
subsets :: Int -> [a] -> [[a]]
subsets 0 _ = [[]]
subsets _ [] = []
subsets k (x:xs) = map (x:) (subsets (k - 1) xs) ++ subsets k xs
First, use a better algorithm.
Look at your final line:
Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n
Evaluating doSubsetsOfSizeN k (Set.fromList $ 1:2:xs) will involve evaluating doSubsetsOfSizeN (k-1) (Set.fromList xs) twice (once when inserting 1, and once when inserting 2). This duplication is wasteful.
Enter a better algorithm.
mine :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
mine n s | Set.size s < n || n < 0 = Set.empty
| otherwise = Set.foldr cons nil s !! n
where
nil :: Ord a => [Set.Set (Set.Set a)]
nil = Set.singleton Set.empty : repeat Set.empty
cons :: Ord a => a -> [Set.Set (Set.Set a)] -> [Set.Set (Set.Set a)]
cons x sets = zipWith Set.union sets
(Set.empty : map (Set.map $ Set.insert x) sets)
mine 9 (Data.Set.fromList [0..18]) `seq` () is faster than subsetsOfSizeN 9 (Data.Set.fromList [0..18]) `seq` () and should have better asymptotic performance.
I haven't tried optimising this any further. There may be a better algorithm still.
(If the cost of insert and fromList are issues, you should consider giving back a list of lists instead of a set of sets.)
I found this, may be it can help you
f [] = [[1]]
f l = (:) [u] l'
where
u = succ (head (head l))
l' = (++) l (map(\x->(:) u x) l)
fix f n = if (n==0) then [] else f (fix f (n-1))
To test it
$ length $ (fix f 10) => 1023 -- The empty set is always include then == 1024

Resources