Breaking lists at index - performance

I have a performance question today.
I am making a (Haskell) program and, when profiling, I saw that most of the time is spent in the function you can find below. Its purpose is to take the nth element of a list and return the list without it besides the element itself. My current (slow) definition is as follows:
breakOn :: Int -> [a] -> (a,[a])
breakOn 1 (x:xs) = (x,xs)
breakOn n (x:xs) = (y,x:ys)
where
(y,ys) = breakOn (n-1) xs
The Int argument is known to be in the range 1..n where n is the length of the (never null) list (x:xs), so the function never arises an error.
However, I got a poor performance here. My first guess is that I should change lists for another structure. But, before start picking different structures and testing code (which will take me lot of time) I wanted to ask here for a third person opinion. Also, I'm pretty sure that I'm not doing it in the best way. Any pointers are welcome!
Please, note that the type a may not be an instance of Eq.
Solution
I adapted my code tu use Sequences from the Data.Sequence module. The result is here:
import qualified Data.Sequence as S
breakOn :: Int -> Seq a -> (a,Seq a)
breakOn n xs = (S.index zs 0, ys <> (S.drop 1 zs))
where
(ys,zs) = S.splitAt (n-1) xs
However, I still accept further suggestions of improvement!

Yes, this is inefficient. You can do a bit better by using splitAt (which unboxes the number during the recursive bit), a lot better by using a data structure with efficient splitting, e.g. a fingertree, and best by massaging the context to avoid needing this operation. If you post a bit more context, it may be possible to give more targeted advice.

Prelude functions are generally pretty efficient. You could rewrite your function using splitAt, as so:
breakOn :: Int -> [a] -> (a,[a])
breakOn n xs = (z,ys++zs)
where
(ys,z:zs) = splitAt (n-1) xs

Related

Optimize "list" indexing in Haskell

Say you have a very deterministic algorithm that produces a list, like inits in Data.List. Is there any way that a Haskell compiler can optimally perform an "indexing" operation on this algorithm without actually generating all the intermediate results?
For example, inits [1..] !! 10000 is pretty slow. Could a compiler somehow deduce what inits would produce on the 10000th element without any recursion, etc? Of course, this same idea could be generalized beyond lists.
Edit: While inits [1..] !! 10000 is constant, I am wondering about any "index-like" operation on some algorithm. For example, could \i -> inits [1..] !! i be optimized such that no [or minimal] recursion is performed to reach the result for any i?
Yes and no. If you look at the definition for Data.List.inits:
inits :: [a] -> [[a]]
inits xs = [] : case xs of
[] -> []
x : xs' -> map (x :) (inits xs')
you'll see that it's defined recursively. That means that each element of the resulting list is built on the previous element of the list. So if you want any nth element, you have to build all n-1 previous elements.
Now you could define a new function
inits' xs = [] : [take n xs | (n, _) <- zip [1..] xs]
which has the same behavior. If you try to take inits' [1..] !! 10000, it finishes very quickly because the successive elements of the list do not depend on the previous ones. Of course, if you were actually trying to generate a list of inits instead of just a single element, this would be much slower.
The compiler would have to know a lot of information to be able to optimize away recursion from a function like inits. That said, if a function really is "very deterministic", it should be trivial to rewrite it in a non recursive way.

is this implementation of merge sort good?

I've just started to learn Haskell last night and I've never used a functional programming language before.
I just want to know if my implemention of merge sort is good or bad and what exactly is good or bad.
Maybe it's even wrong - Well it does sort but maybe the Algorithm is not what I think what merge sort is.
Just tell me everything I could improve here. I by myself think its a pretty clear and simple implementation.
Thanks for your advice, here's the code :)
merge [] ys = ys
merge xs [] = xs
merge xs ys = sorted : merge left right
where
sorted = if head(xs) < head(ys) then head(xs) else head(ys)
left = if head(xs) <= head(ys) then tail(xs) else xs
right = if head(xs) > head(ys) then tail(ys) else ys
msort [] = []
msort [x] = [x]
msort xs = merge (msort left) (msort right)
where
left = take (div (length xs) 2) xs
right = drop (div (length xs) 2) xs
Well, first of all, we can rewrite merge to be a little more elegant using pattern matching
merge [] ys = ys
merge xs [] = xs
merge xs#(x:xs1) ys#(y:ys1)
| x <= y = x : merge xs1 ys
| otherwise = y : merge xs ys1
In general you should avoid using head and tail since they are a bit unsafe (they raise an error for the empty list) and use pattern matching whenever possible.
The implementation of msort is pretty much spot on, except that we can split the list in a more efficient way. That's because length xs - takes O(N) to complete. The compiler might save you and cache the result of the length call so that the second call to length won't traverse the list again. But the take and drop will pretty much cause another two traversals thus splitting the list using 3 traversals which may prove to be expensive. We can do better by splitting the list in two lists - the first one containing the elements on the odd positions and the second list with the elements placed on the even positions, like so:
msort [] = []
msort [x] = [x]
msort xs = merge (msort first) (msort second)
where
(first, second) = splitInHalves xs
splitInHalves [] = ([], [])
splitInHalves [x] = ([x], [])
splitInHalves (x:y:xs) =
let (xs1, ys1) = splitInHalves xs
in (x:xs1, y:ys1)
This gets you the same Merge Sort in O(NlogN) time. It feels different because you would probably implement it in place (by modifying the original list) in an imperative language such as C. This version is slightly more costly on the memory, but it does have it's advantages - it is more easy to reason about, so it is more maintainable, and also it is very easy to parallelize without being concerned of anything else except the algorithm itself - which is exactly what a good programming language should provide for the developers that use it.
EDIT 1 :
If the syntax is a bit much, here are some resources:
Pattern Matching - the bit with the # symbol is called an as-pattern. You'll find it in there
let is a keyword used to declare a variable to be used in the expression that follows it (whereas where binds a variable in the expression that precedes it). More on Haskell syntax, including guards (the things with | condition = value) can be found here, in this chapter of Learn You a Haskell
EDIT 2 :
#is7s proposed a far more concise version of splitInHalves using the foldr function:
splitInHalves = foldr (\x (l,r) -> (x:r,l)) ([],[])
EDIT 3 :
Here is another answer which provides an alternative implementation of merge sort, which also has the property of being stable:
Lazy Evaluation and Time Complexity
Hope this helps and welcome to the wonderful world of Functional Programming !

Getting functional sieve of Eratosthenes fast

I read this other post about a F# version of this algorithm. I found it very elegant and tried to combine some ideas of the answers.
Although I optimized it to make fewer checks (check only numbers around 6) and leave out unnecessary caching, it is still painfully slow. Calculating the 10,000th prime already take more than 5 minutes. Using the imperative approach, I can test all 31-bit integers in not that much more time.
So my question is if I am missing something that makes all this so slow. For example in another post someone was speculating that LazyList may use locking. Does anyone have an idea?
As StackOverflow's rules say not to post new questions as answers, I feel I have to start a new topic for this.
Here's the code:
#r "FSharp.PowerPack.dll"
open Microsoft.FSharp.Collections
let squareLimit = System.Int32.MaxValue |> float32 |> sqrt |> int
let around6 = LazyList.unfold (fun (candidate, (plus, next)) ->
if candidate > System.Int32.MaxValue - plus then
None
else
Some(candidate, (candidate + plus, (next, plus)))
) (5, (2, 4))
let (|SeqCons|SeqNil|) s =
if Seq.isEmpty s then SeqNil
else SeqCons(Seq.head s, Seq.skip 1 s)
let rec lazyDifference l1 l2 =
if Seq.isEmpty l2 then l1 else
match l1, l2 with
| LazyList.Cons(x, xs), SeqCons(y, ys) ->
if x < y then
LazyList.consDelayed x (fun () -> lazyDifference xs l2)
elif x = y then
lazyDifference xs ys
else
lazyDifference l1 ys
| _ -> LazyList.empty
let lazyPrimes =
let rec loop = function
| LazyList.Cons(p, xs) as ll ->
if p > squareLimit then
ll
else
let increment = p <<< 1
let square = p * p
let remaining = lazyDifference xs {square..increment..System.Int32.MaxValue}
LazyList.consDelayed p (fun () -> loop remaining)
| _ -> LazyList.empty
loop (LazyList.cons 2 (LazyList.cons 3 around6))
If you are calling Seq.skip anywhere, then there's about a 99% chance that you have an O(N^2) algorithm. For nearly every elegant functional lazy Project Euler solution involving sequences, you want to use LazyList, not Seq. (See Juliet's comment link for more discussion.)
Even if you succeed in taming the strange quadratic F# sequences design issues, there is certain algorithmic improvements still ahead. You are working in (...((x-a)-b)-...) manner here. x, or around6, is getting deeper and deeper, but it's the most frequently-producing sequence. Transform it into (x-(a+b+...)) scheme -- or even use a tree structure there -- to gain an improvement in time complexity (sorry, that page is in Haskell). This gets actually very close to the complexity of imperative sieve, although still mush slower than the baseline C++ code.
Measuring local empirical orders of growth as O(n^a) <--> a = log(t_2/t_1) / log(n_2/n_1) (in n primes produced), the ideal n log(n) log(log(n)) translates into O(n^1.12) .. O(n^1.085) behaviour on n=10^5..10^7 range. A simple C++ baseline imperative code achieves O(n^1.45 .. 1.18 .. 1.14) while tree-merging code, as well as priority-queue based code, both exhibit steady O(n^1.20) behaviour, more or less. Of course C++ is ~5020..15 times faster, but that's mostly just a "constant factor". :)

Space leak in list program

I am solving some problems of Project Euler in Haskell. I wrote a program for a riddle in it and it did not work as I expected.
When I looked in the task manager when running the program I saw that it was using > 1 gigabyte of RAM on ghc. A friend of me wrote a program with the same meaning in Java and succeeded in 7 seconds.
import Data.List
opl = find vw $ map (\x-> fromDigits (x++[0,0,9]) )
$ sequence [[1],re,[2],re,[3],re,[4],re,[5],re,[6],re,[7],re,[8],re]
vw x = hh^2 == x
where hh = (round.sqrt.fromIntegral) x
re = [0..9]
fromDigits x = foldl1 (\n m->10*n+m) x
I know this program would output the number I want given enough RAM and time, but there has to be a better-performing way.
The main problem here is that sequence has a space leak. It is defined like this:
sequence [] = [[]]
sequence (xs:xss) = [ y:ys | y <- xs, ys <- sequence xss ]
so the problem is that the list produced by the recursive call sequence xss is re-used for each of the elements of xs, so it can't be discarded until the end. A version without the space leak is
myseq :: [[a]] -> [[a]]
myseq xs = go (reverse xs) []
where
go [] acc = [acc]
go (xs:xss) acc = concat [ go xss (x:acc) | x <- xs ]
PS. the answer seems to be Just 1229314359627783009
Edit version avoiding the concat:
seqlists :: [[a]] -> [[a]]
seqlists xss = go (reverse xss) [] []
where
go [] acc rest = acc : rest
go (xs:xss) acc rest = foldr (\y r -> go xss (y:acc) r) rest xs
note that both of these versions generate the results in a different order from the standard sequence, so while they work for this problem we can't use one as a specialised version of sequence.
Following on from the answer given by Simon Marlow, here's a version of sequence that avoids the space leak while otherwise working just like the original, including preserving the order.
It still uses the nice, simple list comprehension of the original sequence - the only difference is that a fake data dependency is introduced that prevents the recursive call from being shared.
sequenceDummy d [] = d `seq` [[]]
sequenceDummy _ (xs:xss) = [ y:ys | y <- xs, ys <- sequenceDummy (Just y) xss ]
sequenceUnshared = sequenceDummy Nothing
I think this is a better way of avoiding the sharing that leads to the space leak.
I'd blame the excessive sharing on the "full laziness" transformation. Normally this does a great job of creating sharing that avoids recomputions, but sometimes recompution is very much more efficient than storing shared results.
It'd be nice if there was a more direct way to tell the compiler not to share a specific expression - the above dummy Maybe argument works and is efficient, but it's basically a hack that's just complicated enough that ghc can't tell that there's no real dependency. (In a strict language you don't have these issues because you only have sharing where you explicitly bind a variable to a value.)
EDIT: I think I'm wrong here - changing the type signature to :: Maybe Word64 (which would be enough bits for this problem I think) also takes forever / has a space leak, so it couldn't be the old Integer bug.
Your problem seems to be an old GHC bug (that I thought was fixed) with Integer causing a space leak. The below code finishes in about 150 ms when compiled with -O2.
import Data.List
import Data.Word
main = print opl
opl :: Maybe Word32
opl = find vw $ map (\x-> fromDigits (x++[0,0,9]) ) $ sequence [[1],re,[2],re,[3],re,[4],re,[5],re,[6],re,[7],re,[8],re]
vw x = hh^2 == x
where hh = (round.sqrt.fromIntegral) x
re = [0..9]
fromDigits x = foldl1 (\n m->10*n+m) x
Since you're looking for a nineteen-digit number with those characteristics found in vw, I'd try to simplify the construction in the mapped function just say fromDigits x*1000+9 for starters. Appending to a list is O(length-of-the-left-list), so throwing those last three digits on the end hurts the computation time a bunch.
As an aside (to you both), using the strict version of the fold (foldl1') will also help.

Haskell mutable map/tree

I am looking for a mutable (balanced) tree/map/hash table in Haskell or a way how to simulate it inside a function. I.e. when I call the same function several times, the structure is preserved. So far I have tried Data.HashTable (which is OK, but somewhat slow) and tried Data.Array.Judy but I was unable to make it work with GHC 6.10.4. Are there any other options?
If you want mutable state, you can have it. Just keep passing the updated map around, or keep it in a state monad (which turns out to be the same thing).
import qualified Data.Map as Map
import Control.Monad.ST
import Data.STRef
memoize :: Ord k => (k -> ST s a) -> ST s (k -> ST s a)
memoize f = do
mc <- newSTRef Map.empty
return $ \k -> do
c <- readSTRef mc
case Map.lookup k c of
Just a -> return a
Nothing -> do a <- f k
writeSTRef mc (Map.insert k a c) >> return a
You can use this like so. (In practice, you might want to add a way to clear items from the cache, too.)
import Control.Monad
main :: IO ()
main = do
fib <- stToIO $ fixST $ \fib -> memoize $ \n ->
if n < 2 then return n else liftM2 (+) (fib (n-1)) (fib (n-2))
mapM_ (print <=< stToIO . fib) [1..10000]
At your own risk, you can unsafely escape from the requirement of threading state through everything that needs it.
import System.IO.Unsafe
unsafeMemoize :: Ord k => (k -> a) -> k -> a
unsafeMemoize f = unsafePerformIO $ do
f' <- stToIO $ memoize $ return . f
return $ unsafePerformIO . stToIO . f'
fib :: Integer -> Integer
fib = unsafeMemoize $ \n -> if n < 2 then n else fib (n-1) + fib (n-2)
main :: IO ()
main = mapM_ (print . fib) [1..1000]
Building on #Ramsey's answer, I also suggest you reconceive your function to take a map and return a modified one. Then code using good ol' Data.Map, which is pretty efficient at modifications. Here is a pattern:
import qualified Data.Map as Map
-- | takes input and a map, and returns a result and a modified map
myFunc :: a -> Map.Map k v -> (r, Map.Map k v)
myFunc a m = … -- put your function here
-- | run myFunc over a list of inputs, gathering the outputs
mapFuncWithMap :: [a] -> Map.Map k v -> ([r], Map.Map k v)
mapFuncWithMap as m0 = foldr step ([], m0) as
where step a (rs, m) = let (r, m') = myFunc a m in (r:rs, m')
-- this starts with an initial map, uses successive versions of the map
-- on each iteration, and returns a tuple of the results, and the final map
-- | run myFunc over a list of inputs, gathering the outputs
mapFunc :: [a] -> [r]
mapFunc as = fst $ mapFuncWithMap as Map.empty
-- same as above, but starts with an empty map, and ignores the final map
It is easy to abstract this pattern and make mapFuncWithMap generic over functions that use maps in this way.
Although you ask for a mutable type, let me suggest that you use an immutable data structure and that you pass successive versions to your functions as an argument.
Regarding which data structure to use,
There is an implementation of red-black trees at Kent
If you have integer keys, Data.IntMap is extremely efficient.
If you have string keys, the bytestring-trie package from Hackage looks very good.
The problem is that I cannot use (or I don't know how to) use a non-mutable type.
If you're lucky, you can pass your table data structure as an extra parameter to every function that needs it. If, however, your table needs to be widely distributed, you may wish to use a state monad where the state is the contents of your table.
If you are trying to memoize, you can try some of the lazy memoization tricks from Conal Elliott's blog, but as soon as you go beyond integer arguments, lazy memoization becomes very murky—not something I would recommend you try as a beginner. Maybe you can post a question about the broader problem you are trying to solve? Often with Haskell and mutability the issue is how to contain the mutation or updates within some kind of scope.
It's not so easy learning to program without any global mutable variables.
If I read your comments right, then you have a structure with possibly ~500k total values to compute. The computations are expensive, so you want them done only once, and on subsequent accesses, you just want the value without recomputation.
In this case, use Haskell's laziness to your advantage! ~500k is not so big: Just build a map of all the answers, and then fetch as needed. The first fetch will force computation, subsequent fetches of the same answer will reuse the same result, and if you never fetch a particular computation - it never happens!
You can find a small implementation of this idea using 3D point distances as the computation in the file PointCloud.hs. That file uses Debug.Trace to log when the computation actually gets done:
> ghc --make PointCloud.hs
[1 of 1] Compiling Main ( PointCloud.hs, PointCloud.o )
Linking PointCloud ...
> ./PointCloud
(1,2)
(<calc (1,2)>)
Just 1.0
(1,2)
Just 1.0
(1,5)
(<calc (1,5)>)
Just 1.0
(1,2)
Just 1.0
Are there any other options?
A mutable reference to a purely functional dictionary like Data.Map.

Resources