Compiler optimizations for infinite lists in Haskell? - performance

I've various "partial permutation" functions of type t -> Maybe t that either take me to a new location in a data structure by returning a Just or else return a Nothing if they cannot yet get there.
I routinely must applying these partial permutations in repeated specific patterns, building a list of all intermediate values, but truncating the list whenever I return to my starting position or a permutation fails.
scan_partial_perms :: Eq t => [t -> Maybe t] -> t -> [t]
scan_partial_perms ps v = map fromJust . takeWhile test $ scanl (>>=) (Just v) ps
where test (Just i) | i /= v = True
test _ = False
iterate_partial_perm = scan_partial_perm . iterate
cycle_partial_perms = scan_partial_perms perms . cycle
I'm fairly confident that scanl has the desirable strictness and tail recursion properties in this context. Any other tips on optimizing this code? In particular, what compiler options beyond -O3 -fllvm should I read about?
At worst, I could replace the scanl and infinite list with an accessor function defined like
perm l i = l !! i `rem` length l
I'd imagine this cannot improve performance with the right optimizations however.

I think you have a bug in scan_partial_perms,
scan_partial_perms ps v = map fromJust . takeWhile test $ scanl (>>=) (Just v) ps
scanl f s list always starts with s, so takeWhile test (scanl ...) is []. If that is intentional, it's quite obfuscated. Assuming what you want is
scan_partial_perms ps v = (v:) . map fromJust . takeWhile test . tail $ scanl (>>=) (Just v) ps
there's not much you can do. You can {-# SPECIALISE #-} it so the Eq dictionary is eliminated for the specialised-for types. That'll do you some good if the compiler doesn't do that on its own (which it may if it can see the use site). With ghc >= 7, you can instead make it {-# INLINABLE #-}, so that it can be specialised and perhaps inlined at each use site.
I don't know what happens down the llvm road, but at the core-level, map, fromJust and takeWhile are not yet inlined, so if you're desperate enough, you can get maybe a few tenths of a percent by inlining them manually if they aren't inlined later in the llvm backend:
scan_partial_perms ps v = v : go v ps
where
go w (q:qs) = case q w of
Just z
| z /= v -> z : go z qs
_ -> []
go _ _ = []
But those are very cheap functions, so the gains - if at all present - would be small.
So what you have is rather good already, if it's not good enough, you need a different route of attack.
The one with the list indexing,
perm l i = l !! (i `rem` length l)
-- parentheses necessary, I don't think (l !! i) `rem` length l was what you want
doesn't look good. length is expensive, (!!) is expensive too, so both should in general be avoided.

Related

Functional programming with OCAML

I'm new to functional programming and I'm trying to implement a basic algorithm using OCAML for course that I'm following currently.
I'm trying to implement the following algorithm :
Entries :
- E : a non-empty set of integers
- s : an integer
- d : a positive float different of 0
Output :
- T : a set of integers included into E
m <- min(E)
T <- {m}
FOR EACH e ∈ sort_ascending(E \ {m}) DO
IF e > (1+d)m AND e <= s THEN
T <- T U {e}
m <- e
RETURN T
let f = fun (l: int list) (s: int) (d: float) ->
List.fold_left (fun acc x -> if ... then (list_union acc [x]) else acc)
[(list_min l)] (list_sort_ascending l) ;;
So far, this is what I have, but I don't know how to handle the modification of the "m" variable mentioned in the algorithm... So I need help to understand what is the best way to implement the algorithm, maybe I'm not gone in the right direction.
Thanks by advance to anyone who will take time to help me !
The basic trick of functional programming is that although you can't modify the values of any variables, you can call a function with different arguments. In the initial stages of switching away from imperative ways of thinking, you can imagine making every variable you want to modify into the parameters of your function. To modify the variables, you call the function recursively with the desired new values.
This technique will work for "modifying" the variable m. Think of m as a function parameter instead.
You are already using this technique with acc. Each call inside the fold gets the old value of acc and returns the new value, which is then passed to the function again. You might imagine having both acc and m as parameters of this inner function.
Assuming list_min is defined you should think the problem methodically. Let's say you represent a set with a list. Your function takes this set and some arguments and returns a subset of the original set, given the elements meet certain conditions.
Now, when I read this for the first time, List.filter automatically came to my mind.
List.filter : ('a -> bool) -> 'a list -> 'a list
But you wanted to modify the m so this wouldn't be useful. It's important to know when you can use library functions and when you really need to create your own functions from scratch. You could clearly use filter while handling m as a reference but it wouldn't be the functional way.
First let's focus on your predicate:
fun s d m e -> (float e) > (1. +. d)*.(float m) && (e <= s)
Note that +. and *. are the plus and product functions for floats, and float is a function that casts an int to float.
Let's say the function predicate is that predicate I just mentioned.
Now, this is also a matter of opinion. In my experience I wouldn't use fold_left just because it's just complicated and not necessary.
So let's begin with my idea of the code:
let m = list_min l;;
So this is the initial m
Then I will define an auxiliary function that reads the m as an argument, with l as your original set, and s, d and m the variables you used in your original imperative code.
let rec f' l s d m =
match l with
| [] -> []
| x :: xs -> if (predicate s d m x) then begin
x :: (f' xs s d x)
end
else
f' xs s d m in
f' l s d m
Then for each element of your set, you check if it satisfies the predicate, and if it does, you call the function again but you replace the value of m with x.
Finally you could just call f' from a function f:
let f (l: int list) (s: int) (d: float) =
let m = list_min l in
f' l s d m
Be careful when creating a function like your list_min, what would happen if the list was empty? Normally you would use the Option type to handle those cases but you assumed you're dealing with a non-empty set so that's great.
When doing functional programming it's important to think functional. Pattern matching is super recommended, while pointers/references should be minimal. I hope this is useful. Contact me if you any other doubt or recommendation.

How to fix space leak caused by laziness when your algorithm relies on laziness

I have some algorithm which generates a search tree:
data SearchTree a = Solution a | Contradiction | Search [ SearchTree a ]
deriving (Show, Functor)
The algorithm generates this tree lazily. I also defined a simple evaluator, which is really just depth first search.
simpleEval :: MonadPlus m => SearchTree a -> m a
simpleEval (Solution a) = return a
simpleEval Contradiction = mzero
simpleEval (Search ps) = foldr mplus mzero $ map simpleEval ps
I noticed that many of the solutions that my algorithm produces look something like the following search tree:
nest :: Int -> SearchTree a -> SearchTree a
nest 0 = id
nest n = nest (n-1) . Search . (:[])
tree0 = Search ts where
ts = cycle $ t0 : replicate 100 t1 ++ [t2]
t0 = nest 100 $ Solution 'a'
t1 = nest 1000 $ Contradiction
t2 = nest 4 $ Solution 'b'
Namely, they have a lot of very deep branches with no solutions, a few deep branches with a solution, and very few shallow branches with a solution. On this basis, I decided I wanted another evaluator, one which will 'give up' on branches that are too deep. Call it cutoffEval. cutoffEval 5 tree0 should find only bs because it there are infinitely many branches of depth less than 5 to consider, and they only contain bs. I implemented it like so:
cutoff :: (MonadPlus m) => Int -> SearchTree a -> (m a, [SearchTree a])
cutoff cu = go cu where
plus ~(m0, l0) ~(m1, l1) = (mplus m0 m1, l0 ++ l1)
zero = (mzero, [])
go 0 x = (mzero, [x])
go _ Contradiction = zero
go _ (Solution a) = (return a, [])
go d (Search ps) = foldr plus zero $ map (go $ d-1) ps
cutoffEval :: MonadPlus m => Int -> SearchTree a -> m a
cutoffEval cu = go where
go t = case cutoff cu t of (r,ts) -> foldr mplus mzero $ r : map go ts
But this function produces a huge space leak, as compared to simpleEval:
putStrLn $ take 4000 $ simpleEval tree0 -- 2MB residency
putStrLn $ take 4000 $ cutoffEval 10 tree0 -- 600MB residency
Profiling reveals that almost all allocation occurs in cutoff.go; and the majority of allocation is due something mysterious called main:Tree.sat_s5jg and the (,) constructor. This seems to me due to the irrefutable patterns, the tuple constructors are being built up as thunks instead of being forced by plus; and normally the solution to a space leak is to make your function stricter, but here removing the irrefutable patterns causes cutoff to hang, so I cannot do it.
I tested this with GHC 7.6, 7.8 and 7.10. The problem was found in each one.
So my questions are: Can cutoffEval be written to run in constant space like simpleEval? And more generally, how do I fix a space leak, if I can't make my implementation stricter because the algorithm depends on it?
It seems to me that the reason for the memory leak is actually a bug in the implementation. Your cutoff function mixes together cutting off too deep branches with evaluating the upper part. And then in cutoffEval, you go deeper to the bottom, cut branches, and continue to explore them recursively. Which is essentially breadth-first search, going by cu levels in each pass. This means that the whole tree will eventually be constructed and retained in memory until the end! (Unlike the case of a depth-first search, where visited subtrees can be reclaimed by the GC.)
If you want to just cut off the branches that are too deep, getting the first part of the result of cutoff is what you want.
In any case I'd suggest to separate the evaluator and the cut-off part (see below). In such a case you can just use the original evaluator on the cut-off version of the tree.
One additional remark, from the MonadPlus constraint you use just the monoidal part - mzero and mplus. It'd be cleaner and more generic to use just Monoid. There are more monoids than monads (for example Sum to just count solutoins, or Last to find the last solution).
simpleEval :: (Monoid m) => (a -> m) -> SearchTree a -> m
simpleEval f = go
where
go (Solution a) = f a
go Contradiction = mempty
go (Search ps) = mconcat $ map go ps
cutoff :: Int -> SearchTree a -> SearchTree a
cutoff cu = go cu
where
go 0 _ = Contradiction -- too deep branches are just failures
go d (Search ps) = Search $ map (go (d - 1)) ps
go _ x = x
cutoffEval :: (Monoid m) => Int -> (a -> m) -> SearchTree a -> m
cutoffEval cu f = simpleEval f . cutoff cu

Repa 3 performance and correct usage of 'now'

There is a basic monad question in here, unrelated to Repa, plus several Repa-specific questions.
I am working on a library using Repa3. I am having trouble getting efficient parallel code. If I make my functions return delayed arrays, I get excruciatingly slow code that scales very well up to 8 cores. This code takes over 20GB of memory per the GHC profiler, and runs several orders of magnitude slower than the basic Haskell unboxed vectors.
Alternatively, if I make all of my functions return Unboxed manifest arrays (still attempting to use fusion within the functions, for example when I do a 'map'), I get MUCH faster code (still slower than using Haskell unboxed vectors) that doesn't scale at all, and in fact tends to get slightly slower with more cores.
Based on the FFT example code in Repa-Algorithms, it seems the correct approach is to always return manifest arrays. Is there ever a case where I should be returning delayed arrays?
The FFT code also makes plentiful use of the 'now' function. However, I get a type error when I try to use it in my code:
type Arr t r = Array t DIM1 r
data CycRingRepa m r = CRTBasis (Arr U r)
| PowBasis (Arr U r)
fromArray :: forall m r t. (BaseRing m r, Unbox r, Repr t r) => Arr t r -> CycRingRepa m r
fromArray =
let mval = reflectNum (Proxy::Proxy m)
in \x ->
let sh:.n = extent x
in assert (mval == 2*n) PowBasis $ now $ computeUnboxedP $ bitrev x
The code compiles fine without the 'now'. With the 'now', I get the following error:
Couldn't match type r' withArray U (Z :. Int) r'
`r' is a rigid type variable bound by
the type signature for
fromArray :: (BaseRing m r, Unbox r, Repr t r) =>
Arr t r -> CycRingRepa m r
at C:\Users\crockeea\Documents\Code\LatticeLib\CycRingRepa.hs:50:1
Expected type: CycRingRepa m r
Actual type: CycRingRepa m (Array U DIM1 r)
I don't think this is my problem. It would be helpful if someone could explain the how the Monad works in 'now'. By my best estimation, the monad seems to be creating a 'Arr U (Arr U r)'. I'm expecting a 'Arr U r', which would then match the data constructor pattern. What is going on and how do I fix this?
The type signatures are:
computeUnboxedP :: Fill r1 U sh e => Array r1 sh e -> Array U sh e
now :: (Shape sh, Repr r e, Monad m) => Array r sh e -> m (Array r sh e)
It would be helpful to have a better idea of when it is appropriate to use 'now'.
A couple other Repa questions:
Should I explicitly call computeUnboxedP (as in the FFT example code), or should I use the more general computeP (because the unbox part is inferred by my data type)?
Should I store delayed or manifest arrays in the data type CycRingRepa?
Eventually I would also like this code to work with Haskell Integers. Will this require me to write new code that uses something other than U arrays, or could I write polymorphic code that creates U arrays for unbox types and some other array for Integers/boxed types?
I realize there are a lot of questions in here, and I appreciate any/all answers!
Here's the source code for now:
now arr = do
arr `deepSeqArray` return ()
return arr
So it's really just a monadic version of deepSeqArray. You can use either of these to force evaluation, rather than hanging on to a thunk. This "evalulation" is different than the "computation" forced when computeP is called.
In your code, now doesn't apply, since you're not in a monad. But in this context deepSeqArray wouldn't help either. Consider this situation:
x :: Array U Int Double
x = ...
y :: Array U Int Double
y = computeUnboxedP $ map f x
Since y refers to x, we'd like to be sure x is computed before starting to compute y. If not, the available work won't be distributed correctly among the gang of threads. To get this to work out, it's better to write y as
y = deepSeqArray x . computeUnboxedP $ map f x
Now, for a delayed array, we have
deepSeqArray (ADelayed sh f) y = sh `deepSeq` f `seq` y
Rather than computing all the elements, this just makes sure the shape is computed, and reduces f to weak-head normal form.
As for manifest vs delayed arrays, there are certainly time delayed arrays are preferable.
multiplyMM arr brr
= [arr, brr] `deepSeqArrays`
A.sumP (A.zipWith (*) arrRepl brrRepl)
where trr = computeUnboxedP $ transpose2D brr
arrRepl = trr `deepSeqArray` A.extend (Z :. All :. colsB :. All) arr
brrRepl = trr `deepSeqArray` A.extend (Z :. rowsA :. All :. All) trr
(Z :. _ :. rowsA) = extent arr
(Z :. colsB :. _ ) = extent brr
Here "extend" generates a new array by copying the values across some set of new dimensions. In particular, this means that
arrRepl ! (Z :. i :. j :. k) == arrRepl ! (Z :. i :. j' :. k)
Thankfully, extend produces a delayed array, since it would be a waste to go through the trouble of all this copying.
Delayed arrays also allow the possiblity of fusion, which is impossible if the array is manifest.
Finally, computeUnboxedP is just computeP with a specialized type. Giving computeUnboxedP explicitly might allow GHC to optimize better, and makes the code a little clearer.
Repa 3.1 no longer requires the explict use of now. The parallel computation functions are all monadic, and automatically apply deepSeqArray to their results. The repa-examples package also contains a new implementation of matrix multiply that demonstrates their use.

Haskell mutable map/tree

I am looking for a mutable (balanced) tree/map/hash table in Haskell or a way how to simulate it inside a function. I.e. when I call the same function several times, the structure is preserved. So far I have tried Data.HashTable (which is OK, but somewhat slow) and tried Data.Array.Judy but I was unable to make it work with GHC 6.10.4. Are there any other options?
If you want mutable state, you can have it. Just keep passing the updated map around, or keep it in a state monad (which turns out to be the same thing).
import qualified Data.Map as Map
import Control.Monad.ST
import Data.STRef
memoize :: Ord k => (k -> ST s a) -> ST s (k -> ST s a)
memoize f = do
mc <- newSTRef Map.empty
return $ \k -> do
c <- readSTRef mc
case Map.lookup k c of
Just a -> return a
Nothing -> do a <- f k
writeSTRef mc (Map.insert k a c) >> return a
You can use this like so. (In practice, you might want to add a way to clear items from the cache, too.)
import Control.Monad
main :: IO ()
main = do
fib <- stToIO $ fixST $ \fib -> memoize $ \n ->
if n < 2 then return n else liftM2 (+) (fib (n-1)) (fib (n-2))
mapM_ (print <=< stToIO . fib) [1..10000]
At your own risk, you can unsafely escape from the requirement of threading state through everything that needs it.
import System.IO.Unsafe
unsafeMemoize :: Ord k => (k -> a) -> k -> a
unsafeMemoize f = unsafePerformIO $ do
f' <- stToIO $ memoize $ return . f
return $ unsafePerformIO . stToIO . f'
fib :: Integer -> Integer
fib = unsafeMemoize $ \n -> if n < 2 then n else fib (n-1) + fib (n-2)
main :: IO ()
main = mapM_ (print . fib) [1..1000]
Building on #Ramsey's answer, I also suggest you reconceive your function to take a map and return a modified one. Then code using good ol' Data.Map, which is pretty efficient at modifications. Here is a pattern:
import qualified Data.Map as Map
-- | takes input and a map, and returns a result and a modified map
myFunc :: a -> Map.Map k v -> (r, Map.Map k v)
myFunc a m = … -- put your function here
-- | run myFunc over a list of inputs, gathering the outputs
mapFuncWithMap :: [a] -> Map.Map k v -> ([r], Map.Map k v)
mapFuncWithMap as m0 = foldr step ([], m0) as
where step a (rs, m) = let (r, m') = myFunc a m in (r:rs, m')
-- this starts with an initial map, uses successive versions of the map
-- on each iteration, and returns a tuple of the results, and the final map
-- | run myFunc over a list of inputs, gathering the outputs
mapFunc :: [a] -> [r]
mapFunc as = fst $ mapFuncWithMap as Map.empty
-- same as above, but starts with an empty map, and ignores the final map
It is easy to abstract this pattern and make mapFuncWithMap generic over functions that use maps in this way.
Although you ask for a mutable type, let me suggest that you use an immutable data structure and that you pass successive versions to your functions as an argument.
Regarding which data structure to use,
There is an implementation of red-black trees at Kent
If you have integer keys, Data.IntMap is extremely efficient.
If you have string keys, the bytestring-trie package from Hackage looks very good.
The problem is that I cannot use (or I don't know how to) use a non-mutable type.
If you're lucky, you can pass your table data structure as an extra parameter to every function that needs it. If, however, your table needs to be widely distributed, you may wish to use a state monad where the state is the contents of your table.
If you are trying to memoize, you can try some of the lazy memoization tricks from Conal Elliott's blog, but as soon as you go beyond integer arguments, lazy memoization becomes very murky—not something I would recommend you try as a beginner. Maybe you can post a question about the broader problem you are trying to solve? Often with Haskell and mutability the issue is how to contain the mutation or updates within some kind of scope.
It's not so easy learning to program without any global mutable variables.
If I read your comments right, then you have a structure with possibly ~500k total values to compute. The computations are expensive, so you want them done only once, and on subsequent accesses, you just want the value without recomputation.
In this case, use Haskell's laziness to your advantage! ~500k is not so big: Just build a map of all the answers, and then fetch as needed. The first fetch will force computation, subsequent fetches of the same answer will reuse the same result, and if you never fetch a particular computation - it never happens!
You can find a small implementation of this idea using 3D point distances as the computation in the file PointCloud.hs. That file uses Debug.Trace to log when the computation actually gets done:
> ghc --make PointCloud.hs
[1 of 1] Compiling Main ( PointCloud.hs, PointCloud.o )
Linking PointCloud ...
> ./PointCloud
(1,2)
(<calc (1,2)>)
Just 1.0
(1,2)
Just 1.0
(1,5)
(<calc (1,5)>)
Just 1.0
(1,2)
Just 1.0
Are there any other options?
A mutable reference to a purely functional dictionary like Data.Map.

Laziness and tail recursion in Haskell, why is this crashing?

I have this fairly simple function to compute the mean of elements of a big list, using two accumulators to hold the sum so far and the count so far:
mean = go 0 0
where
go s l [] = s / fromIntegral l
go s l (x:xs) = go (s+x) (l+1) xs
main = do
putStrLn (show (mean [0..10000000]))
Now, in a strict language, this would be tail-recursive, and there would be no problem. However, as Haskell is lazy, my googling has led me to understand that (s+x) and (l+1) will be passed down the recursion as thunks. So this whole thing crashes and burns:
Stack space overflow: current size 8388608 bytes.
After further googling, I found seq and $!. Which it seems I don't understand because all my attempts at using them in this context proved futile, with error messages saying something about infinite types.
Finally I found -XBangPatterns, which solves it all by changing the recursive call:
go !s !l (x:xs) = go (s+x) (l+1) xs
But I'm not happy with this, as -XBangPatterns is currently an extension. I would like to know how to make the evaluation strict without the use of -XBangPatterns. (And maybe learn something too!)
Just so you understand my lack of understanding, here's what I tried (the only try that compiled, that is):
go s l (x:xs) = go (seq s (s+x)) (seq l (l+1)) xs
From what I could understand, seq should here force the evaluation of the s and l argument, thus avoiding the problem caused by thunks. But I still get a stack overflow.
I've written extensively on this:
Real World Haskell, ch 24: controlling evaluation
On recursion and strictness in Haskell
Firstly, yes, if you want to require strict evaluation of the accumulators use seq and stay in Haskell 98:
mean = go 0 0
where
go s l [] = s / fromIntegral l
go s l (x:xs) = s `seq` l `seq`
go (s+x) (l+1) xs
main = print $ mean [0..10000000]
*Main> main
5000000.0
Secondly: strictness analysis will kick in if you give some type annotations, and compile with -O2:
mean :: [Double] -> Double
mean = go 0 0
where
go :: Double -> Int -> [Double] -> Double
go s l [] = s / fromIntegral l
go s l (x:xs) = go (s+x) (l+1) xs
main = print $ mean [0..10000000]
$ ghc -O2 --make A.hs
[1 of 1] Compiling Main ( A.hs, A.o )
Linking A ...
$ time ./A
5000000.0
./A 0.46s user 0.01s system 99% cpu 0.470 total
Because 'Double' is a wrapper over the strict atomic type Double#, with optimizations on, and a precise type, GHC runs strictness analysis and infers that the strict version will be ok.
import Data.Array.Vector
main = print (mean (enumFromToFracU 1 10000000))
data Pair = Pair !Int !Double
mean :: UArr Double -> Double
mean xs = s / fromIntegral n
where
Pair n s = foldlU k (Pair 0 0) xs
k (Pair n s) x = Pair (n+1) (s+x)
$ ghc -O2 --make A.hs -funbox-strict-fields
[1 of 1] Compiling Main ( A.hs, A.o )
Linking A ...
$ time ./A
5000000.5
./A 0.03s user 0.00s system 96% cpu 0.038 total
As described in the RWH chapter above.
The seq function forces evaluation of the first parameter once the function is called. When you pass seq s (s+x) as a parameter the seq function is not called immediately, because there is no need to evaluate the value of that parameter. You want the call to seq to be evaluated before the recursive call, so that that in turn can force its parameter to be evaluated.
Usually this is done link this:
go s l (x:xs) = s `seq` l `seq` go (s+x) (l+1) xs
This is a syntactic variation of seq s (seq l (go (s+x) (l+1) xs)). Here the calls to seq are the outermost function calls in the expression. Because of Haskell's laziness this causes them to be evaluated first: seq is called with the still unevaluated parameters s and seq l (go (s+x) (l+1) xs), evaluating the parameters is deferred to the point where somebody actually tries to access their values.
Now seq can force its first parameter to be evaluated before returning the rest of the expression. Then the next step in the evaluation would be the second seq. If the calls to seq are buried somewhere in some parameter they might not be executed for a long time, defeating their purpose.
With the changed positions of the seqs the program executes fine, without using excessive amounts of memory.
Another solution to the problem would be to simply enable optimizations in GHC when the program is compiled (-O or -O2). The optimizer recognizes the dispensable laziness and produces code that doesn't allocate unnecessary memory.
You are right in your understanding that seq s (s+x) forces the evaluation of s. But it doesn't force s+x, therefore you're still building up thunks.
By using $! you can force the evaluation of the addition (two times, for both arguments). This achieves the same effect as using the bang patterns:
mean = go 0 0
where
go s l [] = s / fromIntegral l
go s l (x:xs) = ((go $! s+x) $! l+1) xs
The use of the $! function will translate the go $! (s+x) to the equivalent of:
let y = s+x
in seq y (go y)
Thus y is first forced into weak head normal form, which means that the outermost function is applied. In the case of y, the outermost function is +, thus y is fully evaluated to a number before being passed to go.
Oh, and you probably got the infinite type error message because you didn't have the parenthesis in the right place. I got the same error when I first wrote your program down :-)
Because the $! operator is right associative, without parenthesis go $! (s+x) $! (l+1) means the same as: go $! ((s+x) $! (l+1)), which is obviously wrong.

Resources