Any way to create the unmemo-monad? - performance

Suppose someone makes a program to play chess, or solve sudoku. In this kind of program it makes sense to have a tree structure representing game states.
This tree would be very large, "practically infinite". Which isn't by itself a problem as Haskell supports infinite data structures.
An familiar example of an infinite data structure:
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
Nodes are only allocated when first used, so the list takes finite memory. One may also iterate over an infinite list if they don't keep references to its head, allowing the garbage collector to collect its parts which are not needed anymore.
Back to the tree example - suppose one does some iteration over the tree, the tree nodes iterated over may not be freed if the root of the tree is still needed (for example in an iterative deepening search, the tree would be iterated over several times and so the root needs to be kept).
One possible solution for this problem that I thought of is using an "unmemo-monad".
I'll try to demonstrate what this monad is supposed to do using monadic lists:
import Control.Monad.ListT (ListT) -- cabal install List
import Data.Copointed -- cabal install pointed
import Data.List.Class
import Prelude hiding (enumFromTo)
nums :: ListT Unmemo Int -- What is Unmemo?
nums = enumFromTo 0 1000000
main = print $ div (copoint (foldlL (+) 0 nums)) (copoint (lengthL nums))
Using nums :: [Int], the program would take a lot of memory as a reference to nums is needed by lengthL nums while it is being iterated over foldlL (+) 0 nums.
The purpose of Unmemo is to make the runtime not keep the nodes iterated over.
I attempted using ((->) ()) as Unmemo, but it yields the same results as nums :: [Int] does - the program uses a lot of memory, as evident by running it with +RTS -s.
Is there anyway to implement Unmemo that does what I want?

Same trick as with a stream -- don't capture the remainder directly, but instead capture a value and a function which yields a remainder. You can add memoization on top of this as necessary.
data UTree a = Leaf a | Branch a (a -> [UTree a])
I'm not in the mood to figure it out precisely at the moment, but this structure arises, I'm sure, naturally as the cofree comonad over a fairly straightforward functor.
Edit
Found it: http://hackage.haskell.org/packages/archive/comonad-transformers/1.6.3/doc/html/Control-Comonad-Trans-Stream.html
Or this is perhaps simpler to understand: http://hackage.haskell.org/packages/archive/streams/0.7.2/doc/html/Data-Stream-Branching.html
In either case, the trick is that your f can be chosen to be something like data N s a = N (s -> (s,[a])) for an appropriate s (s being the type of your state parameter of the stream -- the seed of your unfold, if you will). That might not be exactly correct, but something close should do...
But of course for real work, you can scrap all this and just write the datatype directly as above.
Edit 2
The below code illustrates how this can prevent sharing. Note that even in the version without sharing, there are humps in the profile indicating that the sum and length calls aren't running in constant space. I'd imagine that we'd need an explicit strict accumulation to knock those down.
{-# LANGUAGE DeriveFunctor #-}
import Data.Stream.Branching(Stream(..))
import qualified Data.Stream.Branching as S
import Control.Arrow
import Control.Applicative
import Data.List
data UM s a = UM (s -> Maybe a) deriving Functor
type UStream s a = Stream (UM s) a
runUM s (UM f) = f s
liftUM x = UM $ const (Just x)
nullUM = UM $ const Nothing
buildUStream :: Int -> Int -> Stream (UM ()) Int
buildUStream start end = S.unfold (\x -> (x, go x)) start
where go x
| x < end = liftUM (x + 1)
| otherwise = nullUM
sumUS :: Stream (UM ()) Int -> Int
sumUS x = S.head $ S.scanr (\x us -> maybe 0 id (runUM () us) + x) x
lengthUS :: Stream (UM ()) Int -> Int
lengthUS x = S.head $ S.scanr (\x us -> maybe 0 id (runUM () us) + 1) x
sumUS' :: Stream (UM ()) Int -> Int
sumUS' x = last $ usToList $ liftUM $ S.scanl (+) 0 x
lengthUS' :: Stream (UM ()) Int -> Int
lengthUS' x = last $ usToList $ liftUM $ S.scanl (\acc _ -> acc + 1) 0 x
usToList x = unfoldr (\um -> (S.head &&& S.tail) <$> runUM () um) x
maxNum = 1000000
nums = buildUStream 0 maxNum
numsL :: [Int]
numsL = [0..maxNum]
-- All these need to be run with increased stack to avoid an overflow.
-- This generates an hp file with two humps (i.e. the list is not shared)
main = print $ div (fromIntegral $ sumUS' nums) (fromIntegral $ lengthUS' nums)
-- This generates an hp file as above, and uses somewhat less memory, at the cost of
-- an increased number of GCs. -H helps a lot with that.
-- main = print $ div (fromIntegral $ sumUS nums) (fromIntegral $ lengthUS nums)
-- This generates an hp file with one hump (i.e. the list is shared)
-- main = print $ div (fromIntegral $ sum $ numsL) (fromIntegral $ length $ numsL)

Related

Why does my Haskell code not appear to run in Parallel

I am trying to solve a 2-sum algorithm problem for Standford university online course on coursera. I need to find all distinct pairs x+y in a list that sum to a value t in a range [-10000 .. 10000]. I know there more efficient implementations but I thought it would be a good time to try and do some Haskell parallel programming.
I have tried to implement parellelisation just by looping through half of the range in two different threads (which I think are called sparks). My code is the following:
module Main where
import Data.List
import qualified Data.Map as M
import Debug.Trace
import Control.Parallel (par,pseq)
main :: IO ()
main = interact run
range :: [Int]
range = [negate 10000..10000]
emptyMap :: M.Map Int Bool
emptyMap = M.fromList $ zip [] []
run :: String -> String
run xs = let parsedInput = map (read :: String -> Int) $ words xs
hashMap = M.fromList $ zip parsedInput (repeat True)
pcalc r = map (\t -> trace (show t) (countVals hashMap parsedInput t)) r
bot = pcalc (take (div (length range) 2) range)
top = pcalc (drop (div (length range) 2) range)
out = top `par` bot `pseq` (sum bot + sum top)
in show out
countVals :: M.Map Int Bool -> [Int] -> Int -> Int
countVals m ks t = foldl' go 0 ks
where go acum x = if M.lookup y m == Just True
&& y /= x
then 1
else acum
where y = t - x
You can see I have two variables top and bot which I am trying to calculate in parallel via
out = top `par` bot `pseq` (sum bot + sum top)
which is what I thought other stack overflow answers are recommending. However when I compile and run I only seem to see the trace from the bot variable.
% stack ghc --package parallel -- -threaded Main.hs
[1 of 1] Compiling Main ( Main.hs, Main.o )
Linking Main ...
% ./Main +RTS -N8 < input.txt
-10000
-9999
-9998
-9997
-9996
...
Whereas I was expecting something like:
% ./Main +RTS -N8 < input.txt
-10000
0
-9999
1
-9998
2
-9997
-9996
...
Can someone help point out what exactly I am doing wrong? Thanks
Let's focus on this part:
bot = pcalc (take (div (length range) 2) range)
top = pcalc (drop (div (length range) 2) range)
out = top `par` bot `pseq` (sum bot + sum top)
Here, bot and top are lists. When we seq, pseq or par a value we cause it to be evaluated; since Haskell is lazy, evaluation stops when the "weak head normal form" is reached, i.e. until the first constructor appears in the result. For list values, this means that they are reduced to either [] or unevaluatedHead : unevaluatedTail.
Because of this, top `par` bot `pseq` ... only parallelizes the evaluation of the first cell of the lists, and not their full contents. The whole lists will only get evaluated after pseq when we sum them, but that is run on only one core.
To force the code to be parallel, we can parallelize the sums instead:
sumBot = sum bot
sumTop = sum top
out = sumBot `par` sumTop `pseq` sumBot + sumTop
Since evaluating the sums to WHNF requires evaluating the whole list, this should properly parallelize the computation.

Haskell explicit recursion vs `iterate`

While writing a function using iterate in Haskell, I found that an equivalent version with explicit recursion seemed noticeably faster - even though I believed that explicit recursion ought to be frowned upon in Haskell.
Similarly, I expected GHC to be able to inline/optimise list combinators appropriately so that the resulting machine code is at least similarly performing to the explicit recursion.
Here's a (different) example, which also displays the slowdown I observed.
steps m n and its variant steps' compute the number of Collatz steps n takes to reach 1, giving up after m attempts.
steps uses explicit recursion while steps' uses list functions.
import Data.List (elemIndex)
import Control.Exception (evaluate)
import Control.DeepSeq (rnf)
collatz :: Int -> Int
collatz n
| even n = n `quot` 2
| otherwise = 3 * n + 1
steps :: Int -> Int -> Maybe Int
steps m = go 0
where go k n
| n == 1 = Just k
| k == m = Nothing
| otherwise = go (k+1) (collatz n)
steps' :: Int -> Int -> Maybe Int
steps' m = elemIndex 1 . take m . iterate collatz
main :: IO ()
main = evaluate $ rnf $ map (steps 800) $ [1..10^7]
I tested these by evaluating for all values up to 10^7, each giving up after 800 steps. On my machine (compiled with ghc -O2), explicit recursion took just under 4 seconds (3.899s) but list combinators took about 5 times longer (19.922s).
Why is explicit recursion so much better in this case, and is there a way of writing this without explicit recursion while preserving performance?
Updated: I submitted Trac 15426 for this bug.
The problem disappears if you copy the definitions of elemIndex and findIndex into your module:
import Control.Exception (evaluate)
import Control.DeepSeq (rnf)
import Data.Maybe (listToMaybe)
import Data.List (findIndices)
elemIndex :: Eq a => a -> [a] -> Maybe Int
elemIndex x = findIndex (x==)
findIndex :: (a -> Bool) -> [a] -> Maybe Int
findIndex p = listToMaybe . findIndices p
collatz :: Int -> Int
collatz n
| even n = n `quot` 2
| otherwise = 3 * n + 1
steps' :: Int -> Int -> Maybe Int
steps' m = elemIndex 1 . take m . iterate collatz
main :: IO ()
main = evaluate $ rnf $ map (steps' 800) $ [1..10^7]
The problem seems to be that these must be inlinable for GHC to get the fusion right. Unfortunately, neither of them is marked inlinable in Data.OldList.
The change to allow findIndex to participate in fusion is relatively recent (see Trac 14387) where listToMaybe was reimplemented as a foldr. So, it probably hasn't seen a lot of testing yet.

Speeding up a stream like data type

I've made a type which is supposed to emulate a "stream". This is basically a list without memory.
data Stream a = forall s. Stream (s -> Maybe (a, s)) s
Basically a stream has two elements. A state s, and a function that takes the state, and returns an element of type a and the new state.
I want to be able to perform operations on streams, so I've imported Data.Foldable and defined streams on it as such:
import Data.Foldable
instance Foldable Stream where
foldr k z (Stream sf s) = go (sf s)
where
go Nothing = z
go (Just (e, ns)) = e `k` go (sf ns)
To test the speed of my stream, I've defined the following function:
mysum = foldl' (+) 0
And now we can compare the speed of ordinary lists and my stream type:
x1 = [1..n]
x2 = Stream (\s -> if (s == n + 1) then Nothing else Just (s, s + 1)) 1
--main = print $ mysum x1
--main = print $ mysum x2
My streams are about half the speed of lists (full code here).
Furthermore, here's a best case situation, without a list or a stream:
bestcase :: Int
bestcase = go 1 0 where
go i c = if i == n then c + i else go (i+1) (c+i)
This is a lot faster than both the list and stream versions.
So I've got two questions:
How to I get my stream version to be at least as fast as a list.
How to I get my stream version to be close to the speed of bestcase.
As it stands the foldl' you are getting from Foldable is defined in terms of the foldr you gave it. The default implementation is the brilliant and surprisingly good
foldl' :: (b -> a -> b) -> b -> t a -> b
foldl' f z0 xs = foldr f' id xs z0
where f' x k z = k $! f z x
But foldl' is the specialty of your type; fortunately the Foldable class includes foldl' as a method, so you can just add this to your instance.
foldl' op acc0 (Stream sf s0) = loop s0 acc0
where
loop !s !acc = case sf s of
Nothing -> acc
Just (a,s') -> loop s' (op acc a)
For me this seems to give about the same time as bestcase
Note that this is a standard case where we need a strictness annotation on the accumulator. You might look in the vector package's treatment of a similar type https://hackage.haskell.org/package/vector-0.10.12.2/docs/src/Data-Vector-Fusion-Stream.html for some ideas; or in the hidden 'fusion' modules of the text library https://github.com/bos/text/blob/master/Data/Text/Internal/Fusion .

Batching actions for caching and performance while avoiding the dirty work

Say I have two pure but unsafe functions, that do the same, but one of them is working on batches, and is asymptotically faster:
f :: Int -> Result -- takes O(1) time
f = unsafePerformIO ...
g :: [Int] -> [Result] -- takes O(log n) time
g = unsafePerformIO ...
A naive implementation:
getUntil :: Int -> [Result]
getUntil 0 = f 0
getUntil n = f n : getUntil n-1
switch is the n value where g gets cheaper than f.
getUntil will in practice be called with ever increasing n, but it might not start at 0. So since the Haskell runtime can memoize getUntil, performance will be optimal if getUntil is called with an interval lower than switch. But once the interval gets larger, this implementation is slow.
In an imperative program, I guess I would make a TreeMap (which could quickly be checked for gaps) for caching all calls. On cache misses, it would get filled with the results of g, if the gap was greater than switch in length, and f otherwise, respectively.
How can this be optimized in Haskell?
I think I am just looking for:
an ordered map filled on-demand using a fill function that would fill all values up to the requested index using one function if the missing range is small, another if it is large
a get operation on the map which returns a list of all lower values up to the requested index. This would result in a function similar to getUntil above.
I'll elaborate in my proposal for using map, after some tests I just ran.
import System.IO
import System.IO.Unsafe
import Control.Concurrent
import Control.Monad
switch :: Int
switch = 1000
f :: Int -> Int
f x = unsafePerformIO $ do
threadDelay $ 500 * x
putStrLn $ "Calculated from scratch: f(" ++ show x ++ ")"
return $ 500*x
g :: Int -> Int
g x = unsafePerformIO $ do
threadDelay $ x*x `div` 2
putStrLn $ "Calculated from scratch: g(" ++ show x ++ ")"
return $ x*x `div` 2
cachedFG :: [Int]
cachedFG = map g [0 .. switch] ++ map f [switch+1 ..]
main :: IO ()
main = forever $ getLine >>= print . (cachedFG !!) . read
… where f, g and switch have the same meaning indicated in the question.
The above program can be compiled as is using GHC. When executed, positive integers can be entered, followed by a newline, and the application will print some value based on the number entered by the user plus some extra indication on what values are being calculated from scratch.
A short session with this program is:
User: 10000
Program: Calculated from scratch: f(10000)
Program: 5000000
User: 10001
Program: Calculated from scratch: f(10001)
Program: 5000500
User: 10000
Program: 5000000
^C
The program has to be killed/terminated manually.
Notice that the last value entered doesn't show a "calculated from scratch" message. This indicates that the program has the value cached/memoized somewhere. You can try executing this program yourself; but have into account that threadDelay's lag is proportional to the value entered.
The getUntil function then could be implemented using:
getUntil :: Int -> [Int]
getUntil n = take n cachedFG
or:
getUntil :: Int -> [Int]
getUntil = flip take cachedFG
If you don't know the value for switch, you can try evaluating f and g in parallel and use the fastest result, but that's another show.

Haskell mutable map/tree

I am looking for a mutable (balanced) tree/map/hash table in Haskell or a way how to simulate it inside a function. I.e. when I call the same function several times, the structure is preserved. So far I have tried Data.HashTable (which is OK, but somewhat slow) and tried Data.Array.Judy but I was unable to make it work with GHC 6.10.4. Are there any other options?
If you want mutable state, you can have it. Just keep passing the updated map around, or keep it in a state monad (which turns out to be the same thing).
import qualified Data.Map as Map
import Control.Monad.ST
import Data.STRef
memoize :: Ord k => (k -> ST s a) -> ST s (k -> ST s a)
memoize f = do
mc <- newSTRef Map.empty
return $ \k -> do
c <- readSTRef mc
case Map.lookup k c of
Just a -> return a
Nothing -> do a <- f k
writeSTRef mc (Map.insert k a c) >> return a
You can use this like so. (In practice, you might want to add a way to clear items from the cache, too.)
import Control.Monad
main :: IO ()
main = do
fib <- stToIO $ fixST $ \fib -> memoize $ \n ->
if n < 2 then return n else liftM2 (+) (fib (n-1)) (fib (n-2))
mapM_ (print <=< stToIO . fib) [1..10000]
At your own risk, you can unsafely escape from the requirement of threading state through everything that needs it.
import System.IO.Unsafe
unsafeMemoize :: Ord k => (k -> a) -> k -> a
unsafeMemoize f = unsafePerformIO $ do
f' <- stToIO $ memoize $ return . f
return $ unsafePerformIO . stToIO . f'
fib :: Integer -> Integer
fib = unsafeMemoize $ \n -> if n < 2 then n else fib (n-1) + fib (n-2)
main :: IO ()
main = mapM_ (print . fib) [1..1000]
Building on #Ramsey's answer, I also suggest you reconceive your function to take a map and return a modified one. Then code using good ol' Data.Map, which is pretty efficient at modifications. Here is a pattern:
import qualified Data.Map as Map
-- | takes input and a map, and returns a result and a modified map
myFunc :: a -> Map.Map k v -> (r, Map.Map k v)
myFunc a m = … -- put your function here
-- | run myFunc over a list of inputs, gathering the outputs
mapFuncWithMap :: [a] -> Map.Map k v -> ([r], Map.Map k v)
mapFuncWithMap as m0 = foldr step ([], m0) as
where step a (rs, m) = let (r, m') = myFunc a m in (r:rs, m')
-- this starts with an initial map, uses successive versions of the map
-- on each iteration, and returns a tuple of the results, and the final map
-- | run myFunc over a list of inputs, gathering the outputs
mapFunc :: [a] -> [r]
mapFunc as = fst $ mapFuncWithMap as Map.empty
-- same as above, but starts with an empty map, and ignores the final map
It is easy to abstract this pattern and make mapFuncWithMap generic over functions that use maps in this way.
Although you ask for a mutable type, let me suggest that you use an immutable data structure and that you pass successive versions to your functions as an argument.
Regarding which data structure to use,
There is an implementation of red-black trees at Kent
If you have integer keys, Data.IntMap is extremely efficient.
If you have string keys, the bytestring-trie package from Hackage looks very good.
The problem is that I cannot use (or I don't know how to) use a non-mutable type.
If you're lucky, you can pass your table data structure as an extra parameter to every function that needs it. If, however, your table needs to be widely distributed, you may wish to use a state monad where the state is the contents of your table.
If you are trying to memoize, you can try some of the lazy memoization tricks from Conal Elliott's blog, but as soon as you go beyond integer arguments, lazy memoization becomes very murky—not something I would recommend you try as a beginner. Maybe you can post a question about the broader problem you are trying to solve? Often with Haskell and mutability the issue is how to contain the mutation or updates within some kind of scope.
It's not so easy learning to program without any global mutable variables.
If I read your comments right, then you have a structure with possibly ~500k total values to compute. The computations are expensive, so you want them done only once, and on subsequent accesses, you just want the value without recomputation.
In this case, use Haskell's laziness to your advantage! ~500k is not so big: Just build a map of all the answers, and then fetch as needed. The first fetch will force computation, subsequent fetches of the same answer will reuse the same result, and if you never fetch a particular computation - it never happens!
You can find a small implementation of this idea using 3D point distances as the computation in the file PointCloud.hs. That file uses Debug.Trace to log when the computation actually gets done:
> ghc --make PointCloud.hs
[1 of 1] Compiling Main ( PointCloud.hs, PointCloud.o )
Linking PointCloud ...
> ./PointCloud
(1,2)
(<calc (1,2)>)
Just 1.0
(1,2)
Just 1.0
(1,5)
(<calc (1,5)>)
Just 1.0
(1,2)
Just 1.0
Are there any other options?
A mutable reference to a purely functional dictionary like Data.Map.

Resources