Precise flow control in Haskell - algorithm

The Idea
Hello! I'm trying to implement in Haskell an image processing library based on dataflow ideology. I've got a problem connected to how I want to handle the flow of control.
The main idea is to introduce a time. The time is a Float, which could be accessed anywhere in the code (you can think of it like about State monad, but a little funnier). The funny thing about it, is that we can use timeShift operation on results, affecting the time corresponding operations would see.
An example would be best to explain this situation. Lets use following dataflow diagram:
-- timeShift(*2) --
-- / \
-- readImage -- addImages -> out
-- \ /
-- blur ----------
and its pseudocode (which deos not typecheck - its not important if we use do or let notation here, the idea should be clear):
test = do
f <- frame
a <- readImage $ "test" + show f + ".jpg"
aBlur <- blur a
a' <- a.timeShift(*2)
out <- addImage aBlur a'
main = print =<< runStateT test 5
The 5 is the time we want to run the test function with. The timeShift function affects all the operations on the left of it (in the dataflow diagram) - in this case the function readImage would be run twice - for both branches - the lower one would use frame 5 and the upper one frame 5*2 = 10.
The problem
I'm providing here a very simple implementation, that works great, but has some caveats I want to solve. The problem is, that I want to keep the order of all IO operations. Look at the bottom for example, which will clarify what I mean.
Sample implementation
Below is a sample implementation of the algorithm and a code, which constructs following dataflow graph:
-- A --- blur --- timeShift(*2) --
-- \
-- addImages -> out
-- /
-- B --- blur --------------------
the code:
import Control.Monad.State
-- for simplicity, lets assume an Image is just a String
type Image = String
imagesStr = ["a0","b1","c2","d3","e4","f5","g6","h7","i8","j9","k10","l11","m12","n13","o14","p15","q16","r17","s18","t19","u20","v21","w22","x23","y24","z25"]
images = "abcdefghjiklmnoprstuwxyz"
--------------------------------
-- Ordinary Image processing functions
blurImg' :: Image -> Image
blurImg' img = "(blur " ++ img ++ ")"
addImage' :: Image -> Image -> Image
addImage' img1 img2 = "(add " ++ img1 ++ " " ++ img2 ++ ")"
--------------------------------
-- Functions processing Images in States
readImage1 :: StateT Int IO Image
readImage1 = do
t <- get
liftIO . putStrLn $ "[1] reading image with time: " ++ show t
return $ imagesStr !! t
readImage2 :: StateT Int IO Image
readImage2 = do
t <- get
liftIO . putStrLn $ "[2] reading image with time: " ++ show t
return $ imagesStr !! t
blurImg :: StateT Int IO Image -> StateT Int IO Image
blurImg img = do
i <- img
liftIO $ putStrLn "blurring"
return $ blurImg' i
addImage :: StateT Int IO Image -> StateT Int IO Image -> StateT Int IO Image
addImage img1 img2 = do
i1 <- img1
i2 <- img2
liftIO $ putStrLn "adding images"
return $ addImage' i1 i2
timeShift :: StateT Int IO Image -> (Int -> Int) -> StateT Int IO Image
timeShift img f = do
t <- get
put (f t)
i <- img
put t
return i
test = out where
i1 = readImage1
j1 = readImage2
i2 = blurImg i1
j2 = blurImg j1
i3 = timeShift i2 (*2)
out = addImage i3 j2
main = do
print =<< runStateT test 5
print "end"
The output is:
[1] reading image with time: 10
blurring
[2] reading image with time: 5
blurring
adding images
("(add (blur k10) (blur f5))",5)
"end"
and should be:
[1] reading image with time: 10
[2] reading image with time: 5
blurring
blurring
adding images
("(add (blur k10) (blur f5))",5)
"end"
Please note that the correct output is ("(add (blur k10) (blur f5))",5) - which means, that we added image k10 to f5 - from respectively 10th and 5th frame.
Further requirements
I'm looking for a solution, which would allow users to write simple code (like in test function - it could of course be in a Monad), but I do not want them to handle the time-shifting logic by hand.
Conclusions
The only difference is the order of IO actions execution. I would love to preserve the order of the IO actions just like they are written in the test function. I was trying to implement the idea using Countinuations, Arrows and some funny states, but without success.

Dataflow and functional reactive programming libraries in Haskell are usually written in terms of Applicative or Arrow. These are abstractions for computations that are less general than Monads - the Applicative and Arrow typeclasses do not expose a way for the structure of computations to depend on the results of other computations. As a result, libraries exposing only these typeclasses can reason about the structure of computations in the library independently of performing those computations. We will solve your problem in terms of the Applicative typeclass
class Functor f => Applicative f where
-- | Lift a value.
pure :: a -> f a
-- | Sequential application.
(<*>) :: f (a -> b) -> f a -> f b
Applicative allows a library user to make new computations with pure, operate on existing computations with fmap (from Functor) and compose computations together with <*>, using the result of one computation as an input for another. It does not allow a library user to make a computation that makes another computation and then use the result of that computation directly; there's no way a user can write join :: f (f a) -> f a. This restriction will keep our library from running into the problem I described in my other answer.
Transformers, free, and the ApT transformer
Your example problem is quite involved, so we are going to pull out a bunch of high level Haskell tricks, and make a few new ones of our own. The first two tricks we are going to pull out are transformers and free data types. Transformers are types that take types with a kind like that of Functors, Applicatives or Monads and produce new types with the same kind.
Transformers typically look like the following Double example. Double can take any Functor or Applicative or Monad and make a version of it that always holds two values instead of one
newtype Double f a = Double {runDouble :: f (a, a)}
Free data types are transformers that do two things. First, given some simpler property of the underlying type the gain new exciting properties for the transformed type. The Free Monad provides a Monad given any Functor, and the free Applicative, Ap, makes an Applicative out of any Functor. The other thing "free" types do is they "free" the implementation of the interpreter as much as possible. Here are the types for the free Applicative, Ap, the free Monad, Free, and the free monad transfomer, FreeT. The free monad transformer provides a monad transformer for "free" given a Functor
-- Free Applicative
data Ap f a where
Pure :: a -> Ap f a
Ap :: f a -> Ap f (a -> b) -> Ap f b
-- Base functor of the free monad transformer
data FreeF f a b
= Pure a
| Free (f b)
-- Free monad transformer
newtype FreeT f m a = FreeT {runFreeT :: m (FreeF f a (FreeT f m a)}
-- The free monad is the free monad transformer applied to the Identity monad
type Free f = FreeT f Identity
Here's a sketch of our goal - we want to provide an Applicative interface for combining computations, which, at the bottom, allows Monadic computations. We want to "free" the interpreter as much as possible so that it can hopefully reorder computations. To do this, we will be combining both the free Applicative and the free monad transformer.
We want an Applicative interface, and the easiest one to make is the one we can get for "free", which aligns nicely with out goal of "freeing the interpeter" as much as possible. This suggests our type is going to look like
Ap f a
for some Functor f and any a. We'd like the underlying computation to be over some Monad, and Monads are functors, but we'd like to "free" the interpreter as much as posssible. We'll grab the free monad transformer as the underlying functor for Ap, giving us
Ap (FreeT f m) a
for some Functor f, some Monad m, and any a. We know the Monad m is probably going to be IO, but we'll leave our code as generic as possible. We just need to provide the Functor for FreeT. All Applicatives are Functors, so Ap itself could be used for f, we'd write something like
type ApT m a = Ap (FreeT (ApT m) m) a
This gives the compiler fits, so instead we'll mover the Ap inside and define
newtype ApT m a = ApT {unApT :: FreeT (Ap (ApT m)) m a}
We'll derive some instances for this and discuss its real motivation after an interlude.
Interlude
To run all of this code, you'll need the following. The Map and Control.Concurrent are only needed for sharing computations, more on that much later.
{-# LANGUAGE GADTs #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Main where
import Control.Monad.Trans.Class
import Control.Monad.IO.Class
import Control.Monad.Trans.Reader
import Control.Applicative
import Control.Applicative.Free hiding (Pure)
import qualified Control.Applicative.Free as Ap (Ap(Pure))
import Control.Monad.Trans.Free
import qualified Data.Map as Map
import Control.Concurrent
Stuffing it
I mislead you in the previous section, and pretended to discover ApT from resoning about the problem. I actually discovered ApT by trying anything and everything to try to stuff Monadic computations into an Applicative and be able to control their order when it came out. For a long time, I was trying to solve how to implement mapApM (below) in order to write flipImage (my replacement for your blur). Here's the ApT Monad transformer in all its glory. It's intended to be used as the Functor for an Ap, and, by using Ap as its own Functor for FreeT, can magically stuff values into an Applicative that shouldn't seem possible.
newtype ApT m a = ApT {unApT :: FreeT (Ap (ApT m)) m a}
deriving (Functor, Applicative, Monad, MonadIO)
It could derive even more instances from FreeT, these are just the ones we need. It can't derive MonadTrans, but we can do that ourselves:
instance MonadTrans ApT where
lift = ApT . lift
runApT :: ApT m a -> m (FreeF (Ap (ApT m)) a (FreeT (Ap (ApT m)) m a))
runApT = runFreeT . unApT
The real beauty of ApT is we can write some seemingly impossible code like
stuffM :: (Functor m, Monad m) => m (ApT m a) -> ApT m a
stuffMAp :: (Functor m, Monad m) => m (ApT m a) -> Ap (ApT m) a
The m on the outside disappeares, even into Ap that's merely Applicative.
This works because of the following cycle of functions, each of which can stuff the output from the function above it into the input of the function below it. The first function starts with an ApT m a, and the last one ends with one. (These definitions aren't part of the program)
liftAp' :: ApT m a ->
Ap (ApT m) a
liftAp' = liftAp
fmapReturn :: (Monad m) =>
Ap (ApT m) a ->
Ap (ApT m) (FreeT (Ap (ApT m)) m a)
fmapReturn = fmap return
free' :: Ap (ApT m) (FreeT (Ap (ApT m)) m a) ->
FreeF (Ap (ApT m)) a (FreeT (Ap (ApT m)) m a)
free' = Free
pure' :: a ->
FreeF (Ap (ApT m)) a (FreeT (Ap (ApT m)) m a)
pure' = Pure
return' :: (Monad m) =>
FreeF (Ap (ApT m)) a (FreeT (Ap (ApT m)) m a) ->
m (FreeF (Ap (ApT m)) a (FreeT (Ap (ApT m)) m a))
return' = return
freeT :: m (FreeF (Ap (ApT m)) a (FreeT (Ap (ApT m)) m a)) ->
FreeT (Ap (ApT m)) m a
freeT = FreeT
apT :: FreeT (Ap (ApT m)) m a ->
ApT m a
apT = ApT
This lets us write
-- Get rid of an Ap by stuffing it into an ApT.
stuffAp :: (Monad m) => Ap (ApT m) a -> ApT m a
stuffAp = ApT . FreeT . return . Free . fmap return
-- Stuff ApT into Free
stuffApTFree :: (Monad m) => ApT m a -> FreeF (Ap (ApT m)) a (FreeT (Ap (ApT m)) m a)
stuffApTFree = Free . fmap return . liftAp
-- Get rid of an m by stuffing it into an ApT
stuffM :: (Functor m, Monad m) => m (ApT m a) -> ApT m a
stuffM = ApT . FreeT . fmap stuffApTFree
-- Get rid of an m by stuffing it into an Ap
stuffMAp :: (Functor m, Monad m) => m (ApT m a) -> Ap (ApT m) a
stuffMAp = liftAp . stuffM
And some utility functions for working on a transformer stack
mapFreeT :: (Functor f, Functor m, Monad m) => (m a -> m b) -> FreeT f m a -> FreeT f m b
mapFreeT f fa = do
a <- fa
FreeT . fmap Pure . f . return $ a
mapApT :: (Functor m, Monad m) => (m a -> m b) -> ApT m a -> ApT m b
mapApT f = ApT . mapFreeT f . unApT
mapApM :: (Functor m, Monad m) => (m a -> m b) -> Ap (ApT m) a -> Ap (ApT m) b
mapApM f = liftAp . mapApT f . stuffAp
We'd like to start writing our example image processors, but first we need to take another diversion to address a hard requirement.
A hard requirement - input sharing
Your first example shows
-- timeShift(*2) --
-- / \
-- readImage -- addImages -> out
-- \ /
-- blur ----------
implying that the result of readImage should be shared between blur and timeShift(*2). I take this to mean that the results of readImage should only be computed once for each time.
Applicative isn't powerful enough to capture this. We'll make a new typeclass to represent computations whose output can be divided into multiple identical streams.
-- The class of things where input can be shared and divided among multiple parts
class Applicative f => Divisible f where
(<\>) :: (f a -> f b) -> f a -> f b
We'll make a transformer that adds this capability to existing Applicatives
-- A transformer that adds input sharing
data LetT f a where
NoLet :: f a -> LetT f a
Let :: LetT f b -> (LetT f b -> LetT f a) -> LetT f a
And provide some utility functions and instances for it
-- A transformer that adds input sharing
data LetT f a where
NoLet :: f a -> LetT f a
Let :: LetT f b -> (LetT f b -> LetT f a) -> LetT f a
liftLetT :: f a -> LetT f a
liftLetT = NoLet
mapLetT :: (f a -> f b) -> LetT f a -> LetT f b
mapLetT f = go
where
go (NoLet a) = NoLet (f a)
go (Let b g) = Let b (go . g)
instance (Applicative f) => Functor (LetT f) where
fmap f = mapLetT (fmap f)
-- I haven't checked that these obey the Applicative laws.
instance (Applicative f) => Applicative (LetT f) where
pure = NoLet . pure
NoLet f <*> a = mapLetT (f <*>) a
Let c h <*> a = Let c ((<*> a) . h)
instance (Applicative f) => Divisible (LetT f) where
(<\>) = flip Let
Image processors
With all of our transformers in place, we can start writing our image processors. At the bottom of our stack we have our ApT from an earlier section
Ap (ApT IO)
The computations need to be able to read the time from the environment, so we'll add a ReaderT for that
ReaderT Int (Ap (ApT IO))
Finally, we'd like to be able to share computations, so we'll add out LetT transformer on top, giving the entire type IP for our image processors
type Image = String
type IP = LetT (ReaderT Int (Ap (ApT IO)))
We'll read images from IO. getLine makes fun interactive examples.
readImage :: Int -> IP Image
readImage n = liftLetT $ ReaderT (\t -> liftAp . liftIO $ do
putStrLn $ "[" ++ show n ++ "] reading image for time: " ++ show t
--getLine
return $ "|image [" ++ show n ++ "] for time: " ++ show t ++ "|"
)
We can shift the time of inputs
timeShift :: (Int -> Int) -> IP a -> IP a
timeShift f = mapLetT shift
where
shift (ReaderT g) = ReaderT (g . f)
Add multiple images together
addImages :: Applicative f => [f Image] -> f Image
addImages = foldl (liftA2 (++)) (pure [])
And flip images pretending to use some library that's stuck in IO. I couldn't figure out how to blur a string...
inIO :: (IO a -> IO b) -> IP a -> IP b
inIO = mapLetT . mapReaderT . mapApM
flipImage :: IP [a] -> IP [a]
flipImage = inIO flip'
where
flip' ma = do
a <- ma
putStrLn "flipping"
return . reverse $ a
Interpreting LetT
Our LetT for sharing results is at the top of our transformer stack. We'll need to interpret it to get at the computations underneath it. To interpret LetT we will need a way to share results in IO, which memoize provides, and an interpeter that removes the LetT transformer from the top of the stack.
To share computations we need to store them somewhere, this memoizes an IO computation in IO, making sure it happens only once even across multiple threads.
memoize :: (Ord k) => (k -> IO a) -> IO (k -> IO a)
memoize definition = do
cache <- newMVar Map.empty
let populateCache k map = do
case Map.lookup k map of
Just a -> return (map, a)
Nothing -> do
a <- definition k
return (Map.insert k a map, a)
let fromCache k = do
map <- readMVar cache
case Map.lookup k map of
Just a -> return a
Nothing -> modifyMVar cache (populateCache k)
return fromCache
In order to interpret a Let, we need an evaluator for the underlying ApT IO to incorporate into the definitions for the Let bindings. Since the result of computations depends on the environment read from the ReaderT, we will incorporate dealing with the ReaderT into this step. A more sophisticated approach would use transformer classes, but transformer classes for Applicative is a topic for a different question.
compileIP :: (forall x. ApT IO x -> IO x) -> IP a -> IO (Int -> ApT IO a)
compileIP eval (NoLet (ReaderT f)) = return (stuffAp . f)
compileIP eval (Let b lf) = do
cb <- compileIP eval b
mb <- memoize (eval . cb)
compileIP eval . lf . NoLet $ ReaderT (liftAp . lift . mb)
Interpreting ApT
Our interpreter uses the following State to avoid needing to peek inside AsT, FreeT, and FreeF all the time.
data State m a where
InPure :: a -> State m a
InAp :: State m b -> State m (b -> State m a) -> State m a
InM :: m a -> State m a
instance Functor m => Functor (State m) where
fmap f (InPure a) = InPure (f a)
fmap f (InAp b sa) = InAp b (fmap (fmap (fmap f)) sa)
fmap f (InM m) = InM (fmap f m)
Interpereting Ap is harder than it looks. The goal is to take data that's in Ap.Pure and put it in InPure and data that's in Ap and put it in InAp. interpretAp actually needs to call itself with a larger type each time it goes into a deeper Ap; the function keeps picking up another argument. The first argument t provides a way to simplify these otherwise exploding types.
interpretAp :: (Functor m) => (a -> State m b) -> Ap m a -> State m b
interpretAp t (Ap.Pure a) = t a
interpretAp t (Ap mb ap) = InAp sb sf
where
sb = InM mb
sf = interpretAp (InPure . (t .)) $ ap
interperetApT gets data out of ApT, FreeT, and FreeF and into State m
interpretApT :: (Functor m, Monad m) => ApT m a -> m (State (ApT m) a)
interpretApT = (fmap inAp) . runApT
where
inAp (Pure a) = InPure a
inAp (Free ap) = interpretAp (InM . ApT) $ ap
With these simple interpreting pieces we can make strategies for interpreting results. Each strategy is a function from the interpreter's State to a new State, with possible side effect happening on the way. The order the strategy chooses to execute side effects in determines the order of the side effects. We'll make two example strategies.
The first strategy performs only one step on everything that's ready to be computed, and combines results when they are ready. This is probably the strategy that you want.
stepFB :: (Functor m, Monad m) => State (ApT m) a -> m (State (ApT m) a)
stepFB (InM ma) = interpretApT ma
stepFB (InPure a) = return (InPure a)
stepFB (InAp b f) = do
sf <- stepFB f
sb <- stepFB b
case (sf, sb) of
(InPure f, InPure b) -> return (f b)
otherwise -> return (InAp sb sf)
This other strategy performs all the calculations as soon as it knows about them. It performs them all in a single pass.
allFB :: (Functor m, Monad m) => State (ApT m) a -> m (State (ApT m) a)
allFB (InM ma) = interpretApT ma
allFB (InPure a) = return (InPure a)
allFB (InAp b f) = do
sf <- allFB f
sb <- allFB b
case (sf, sb) of
(InPure f, InPure b) -> return (f b)
otherwise -> allFB (InAp sb sf)
Many, many other strategies are possible.
We can evaluate a strategy by running it until it produces a single result.
untilPure :: (Monad m) => ((State f a) -> m (State f a)) -> State f a -> m a
untilPure s = go
where
go state =
case state of
(InPure a) -> return a
otherwise -> s state >>= go
Executing the intepreter
To execute the interpreter, we need some example data. Here are a few interesting examples.
example1 = (\i -> addImages [timeShift (*2) i, flipImage i]) <\> readImage 1
example1' = (\i -> addImages [timeShift (*2) i, flipImage i, flipImage . timeShift (*2) $ i]) <\> readImage 1
example1'' = (\i -> readImage 2) <\> readImage 1
example2 = addImages [timeShift (*2) . flipImage $ readImage 1, flipImage $ readImage 2]
The LetT interpreter needs to know what evaluator to use for bound values, so we'll define our evaluator only once. A single interpretApT kicks off the evaluation by finding the initial State of the interpreter.
evaluator :: ApT IO x -> IO x
evaluator = (>>= untilPure stepFB) . interpretApT
We'll compile example2, which is essentially your example, and run it for time 5.
main = do
f <- compileIP evaluator example2
a <- evaluator . f $ 5
print a
Which produces almost the desired result, with all reads happening before any flips.
[2] reading image for time: 5
[1] reading image for time: 10
flipping
flipping
"|01 :emit rof ]1[ egami||5 :emit rof ]2[ egami|"

A Monad can not reorder the component steps that make up img1 and img2 in
addImage :: (Monad m) => m [i] -> m [i] -> m [i]
addImage img1 img2 = do
i1 <- img1
i2 <- img2
return $ i1 ++ i2
if there exists any m [i] whose result depends on a side effect. Any MonadIO m has an m [i] whose result depends on a side effect, therefore you cannot reorder the component steps of img1 and img2.
The above desugars to
addImage :: (Monad m) => m [i] -> m [i] -> m [i]
addImage img1 img2 =
img1 >>=
(\i1 ->
img2 >>=
(\i2 ->
return (i1 ++ i2)
)
)
Let's focus on the first >>= (remembering that (>>=) :: forall a b. m a -> (a -> m b) -> m b). Specialized for our type, this is (>>=) :: m [i] -> ([i] -> m [i]) -> m [i]. If we are going to implement it, we'd have to write something like
(img1 :: m [i]) >>= (f :: [i] -> m [i]) = ...
In order to do anything with f, we need to pass it an [i]. The only correct [i] we have is stuck inside img1 :: m [i]. We need the result of img1 to do anything with f. There are now two possibilities. We either can or can not determine the result of img1 without executing its side effects. We will examine both cases, starting with when we can not.
can not
When we can not determine the result of img1 without executing its side effects, we have only one choice - we must execute img1 and all of its side effects. We now have an [i], but all of img1s side effects have already been executed. There's no way we can execute any of the side effects from img2 before some of the side effects of img1 because the side effects of img1 have already happened.
can
If we can determine the result of img1 without executing its side effects, we're in luck. We find the result of img1 and pass that to f, getting a new m [i] holding the result we want. We can now examine the side effects of both img1 and the new m [i] and reorder them (although there's a huge caveat here about the associative law for >>=).
the problem at hand
As this applies to our case, for any MonadIO, there exists the following, whose result can not be determined without executing its side effects, placing us firmly in the can not case where we can not re-order side effects.
counterExample :: (MonadIO m) => m String
counterExample = liftIO getLine
There are also many other counter examples, such as anything like readImage1 or readImage2 that must actually read the image from IO.

Related

Is runInBoundThread the best tool for parallelism?

Say, I want to fold monoids in parallel. My computer has 8 cores. I have this function to split a list into equal-sized smaller lists (with bounded modulo-bias):
import Data.List
parallelize :: Int -> [a] -> [[a]]
parallelize 0 _ = []
parallelize n [] = replicate n []
parallelize n xs = let
(us,vs) = splitAt (quot (length xs) n) xs
in us : parallelize (n-1) vs
The first version of parallel fold I made was:
import Control.Concurrent
import Control.Concurrent.QSemN
import Data.Foldable
import Data.IORef
foldP :: Monoid m => [m] -> IO m
foldP xs = do
result <- newIORef mempty
sem <- newQSemN 0
n <- getNumCapabilities
let yss = parallelize n xs
for_ yss (\ys -> forkIO (modifyIORef result (fold ys <>) >> signalQSemN sem 1))
waitQSemN sem n
readIORef result
But usage of IORefs and semaphores seemed ugly to me. So I made another version:
import Data.Traversable
foldP :: Monoid m => [m] -> IO m
foldP xs = do
n <- getNumCapabilities
let yss = parallelize n xs
rs <- for yss (\ys -> runInUnboundThread (return (fold ys)))
return (fold rs)
The test code I used is:
import Data.Monoid
import System.CPUTime
main :: IO ()
main = do
start <- getCPUTime
Product result <- foldP (fmap Product [1 .. 100])
end <- getCPUTime
putStrLn ("Time took: " ++ show (end - start) ++ "ps.")
putStrLn ("Result: " ++ show result)
The second version of foldP outperformed the first version. When I used runInBoundThread instead of runInUnboundThread, it became even faster.
By what are these performance differences made?
TLDR; Use fold function from massiv package and you will likely get the most efficient solution in Haskell.
I would like to start by saying that the first thing that people forget when trying to implement concurrent patterns like this is exception handling. In the solution from the question the exception handling is non-existent thus it is totally wrong. Therefore I'd recommend to use existing implementations for common concurrency patterns. async is the goto library for concurrency, but for such use case it will not be the most efficient solution.
This particular example can easily be solved with scheduler package, in fact it is exactly the kind of stuff it was designed for. Here is how you can use it to achieve folding of monoids:
import Control.Scheduler
import Control.Monad.IO.Unlift
foldP :: (MonadUnliftIO m, Monoid n) => Comp -> [n] -> m n
foldP comp xs = do
rs <-
withScheduler comp $ \scheduler ->
mapM_ (scheduleWork scheduler . pure . fold) (parallelize (numWorkers scheduler) xs)
pure $ fold rs
See the Comp type for explanation on best parallelization strategies. From what I found in practice Par will usually work best, because it will use pinned threads created with forkOn
Note that the parallelize function is implemented inefficiently and dangerously as well, it is better to write it this way:
parallelize :: Int -> [a] -> [[a]]
parallelize n' xs' = go 0 id xs'
where
n = max 1 n'
-- at least two elements make sense to get benefit of parallel fold
k = max 2 $ quot (length xs') n
go i acc xs
| null xs = acc []
| i < n =
case splitAt k xs of
(ls, rs) -> go (i + 1) (acc . (ls :)) rs
| otherwise = acc . (xs:) $ []
One more bit of advise is that list is far from ideal data structure for parallelization and efficiency in general. In order to split the lists into chunks before parallelizing computation you already have to go through the data structure with parallelize, which can be avoided if you were to use an array. What I am getting at is use an array instead, as suggested in the beginning of this answer.

Why does performance drop when a function is moved to another module?

I am observing that the same fn gives different performance, depending on whether it's placed in the same module where it's used or in a module next to it. Any ideas what could be causing it?
Here's the function:
https://github.com/oshyshko/polymodperf/blob/master/src/Main.hs#L41-L55
test :: MArray a t m => (t -> t) -> a Int t -> m ()
test f a =
mapM_ (\ xy -> do
v <- get a xy
set a xy (f v))
[ (x,y) | y <- [0..1000 - 1],
x <- [0..1000 - 1],
n <- [0..10]]
where
get :: MArray a e m => a Int e -> (Int, Int) -> m e
get a (x,y) = readArray a (x + y * 1000)
set :: MArray a e m => a Int e -> (Int, Int) -> e -> m ()
set a (x,y) = writeArray a (x + y * 1000)
In my test pass I use Data.Array.IO.newArray to create an array, then pass it to test.
Here's how to observe the difference in performance (second value, ms):
$ ./scripts/build-exec.sh
...
Main.test
(11000000,2010)
(11000000,239)
(11000000,240)
(11000000,242)
(11000000,237)
SomeModule.test
(11000000,6376)
(11000000,4851)
(11000000,5455)
(11000000,5096)
(11000000,5206)
Main.test: both newArray and test both live in Main => okay performance (the first 2010ms run is probably bad due to warmup, but the rest look good)
SomeModule.test: newArray lives in Main, but test is imported from SomeModule.test => much worse performance
The code of test is identical in both modules:
https://github.com/oshyshko/polymodperf/blob/master/src/Main.hs#L41-L55
https://github.com/oshyshko/polymodperf/blob/master/src/SomeModule.hs#L9-L17
The used MArray typeclass and fns readArray, writeArray imported from the same module in both cases:
import Data.Array.MArray (MArray, readArray, writeArray)
Any ideas what could be causing the difference in performance?
As leftaroundabout suggested in a comment, adding INLINE pragma solved the problem:
test :: MArray a t m => (t -> t) -> a Int t -> m ()
{-# INLINE test #-}
test f a =
...
https://github.com/oshyshko/polymodperf/blob/master/src/SomeModule.hs#L10

How to optimize this Haskell code summing up the primes in sublinear time?

Problem 10 from Project Euler is to find the sum of all the primes below given n.
I solved it simply by summing up the primes generated by the sieve of Eratosthenes. Then I came across much more efficient solution by Lucy_Hedgehog (sub-linear!).
For n = 2⋅10^9:
Python code (from the quote above) runs in 1.2 seconds in Python 2.7.3.
C++ code (mine) runs in about 0.3 seconds (compiled with g++ 4.8.4).
I re-implemented the same algorithm in Haskell, since I'm learning it:
import Data.List
import Data.Map (Map, (!))
import qualified Data.Map as Map
problem10 :: Integer -> Integer
problem10 n = (sieve (Map.fromList [(i, i * (i + 1) `div` 2 - 1) | i <- vs]) 2 r vs) ! n
where vs = [n `div` i | i <- [1..r]] ++ reverse [1..n `div` r - 1]
r = floor (sqrt (fromIntegral n))
sieve :: Map Integer Integer -> Integer -> Integer -> [Integer] -> Map Integer Integer
sieve m p r vs | p > r = m
| otherwise = sieve (if m ! p > m ! (p - 1) then update m vs p else m) (p + 1) r vs
update :: Map Integer Integer -> [Integer] -> Integer -> Map Integer Integer
update m vs p = foldl' decrease m (map (\v -> (v, sumOfSieved m v p)) (takeWhile (>= p*p) vs))
decrease :: Map Integer Integer -> (Integer, Integer) -> Map Integer Integer
decrease m (k, v) = Map.insertWith (flip (-)) k v m
sumOfSieved :: Map Integer Integer -> Integer -> Integer -> Integer
sumOfSieved m v p = p * (m ! (v `div` p) - m ! (p - 1))
main = print $ problem10 $ 2*10^9
I compiled it with ghc -O2 10.hs and run with time ./10.
It gives the correct answer, but takes about 7 seconds.
I compiled it with ghc -prof -fprof-auto -rtsopts 10 and run with ./10 +RTS -p -h.
10.prof shows that decrease takes 52.2% time and 67.5% allocations.
After running hp2ps 10.hp I got such heap profile:
Again looks like decrease takes most of the heap. GHC version 7.6.3.
How would you optimize run time of this Haskell code?
Update 13.06.17:
I tried replacing immutable Data.Map with mutable Data.HashTable.IO.BasicHashTable from the hashtables package, but I'm probably doing something bad, since for tiny n = 30 it already takes too long, about 10 seconds. What's wrong?
Update 18.06.17:
Curious about the HashTable performance issues is a good read. I took Sherh's code using mutable Data.HashTable.ST.Linear, but dropped Data.Judy in instead. It runs in 1.1 seconds, still relatively slow.
I've done some small improvements so it runs in 3.4-3.5 seconds on my machine.
Using IntMap.Strict helped a lot. Other than that I just manually performed some ghc optimizations just to be sure. And make Haskell code more close to Python code from your link. As a next step you could try to use some mutable HashMap. But I'm not sure... IntMap can't be much faster than some mutable container because it's an immutable one. Though I'm still surprised about it's efficiency. I hope this can be implemented faster.
Here is the code:
import Data.List (foldl')
import Data.IntMap.Strict (IntMap, (!))
import qualified Data.IntMap.Strict as IntMap
p :: Int -> Int
p n = (sieve (IntMap.fromList [(i, i * (i + 1) `div` 2 - 1) | i <- vs]) 2 r vs) ! n
where vs = [n `div` i | i <- [1..r]] ++ [n', n' - 1 .. 1]
r = floor (sqrt (fromIntegral n) :: Double)
n' = n `div` r - 1
sieve :: IntMap Int -> Int -> Int -> [Int] -> IntMap Int
sieve m' p' r vs = go m' p'
where
go m p | p > r = m
| m ! p > m ! (p - 1) = go (update m vs p) (p + 1)
| otherwise = go m (p + 1)
update :: IntMap Int -> [Int] -> Int -> IntMap Int
update s vs p = foldl' decrease s (takeWhile (>= p2) vs)
where
sp = s ! (p - 1)
p2 = p * p
sumOfSieved v = p * (s ! (v `div` p) - sp)
decrease m v = IntMap.adjust (subtract $ sumOfSieved v) v m
main :: IO ()
main = print $ p $ 2*10^(9 :: Int)
UPDATE:
Using mutable hashtables I've managed to make performance up to ~5.5sec on Haskell with this implementation.
Also, I used unboxed vectors instead of lists in several places. Linear hashing seems to be the fastest. I think this can be done even faster. I noticed sse42 option in hasthables package. Not sure I've managed to set it correctly but even without it runs that fast.
UPDATE 2 (19.06.2017)
I've managed to make it 3x faster then best solution from #Krom (using my code + his map) by dropping judy hashmap at all. Instead just plain arrays are used. You can come up with the same idea if you notice that keys for S hashmap are either sequence from 1 to n' or n div i for i from 1 to r. So we can represent such HashMap as two arrays making lookups in array depending on searching key.
My code + Judy HashMap
$ time ./judy
95673602693282040
real 0m0.590s
user 0m0.588s
sys 0m0.000s
My code + my sparse map
$ time ./sparse
95673602693282040
real 0m0.203s
user 0m0.196s
sys 0m0.004s
This can be done even faster if instead of IOUArray already generated vectors and Vector library is used and readArray is replaced by unsafeRead. But I don't think this should be done if only you're not really interested in optimizing this as much as possible.
Comparison with this solution is cheating and is not fair. I expect same ideas implemented in Python and C++ will be even faster. But #Krom solution with closed hashmap is already cheating because it uses custom data structure instead of standard one. At least you can see that standard and most popular hash maps in Haskell are not that fast. Using better algorithms and better ad-hoc data structures can be better for such problems.
Here's resulting code.
First as a baseline, the timings of the existing approaches
on my machine:
Original program posted in the question:
time stack exec primorig
95673602693282040
real 0m4.601s
user 0m4.387s
sys 0m0.251s
Second the version using Data.IntMap.Strict from
here
time stack exec primIntMapStrict
95673602693282040
real 0m2.775s
user 0m2.753s
sys 0m0.052s
Shershs code with Data.Judy dropped in here
time stack exec prim-hash2
95673602693282040
real 0m0.945s
user 0m0.955s
sys 0m0.028s
Your python solution.
I compiled it with
python -O -m py_compile problem10.py
and the timing:
time python __pycache__/problem10.cpython-36.opt-1.pyc
95673602693282040
real 0m1.163s
user 0m1.160s
sys 0m0.003s
Your C++ version:
$ g++ -O2 --std=c++11 p10.cpp -o p10
$ time ./p10
sum(2000000000) = 95673602693282040
real 0m0.314s
user 0m0.310s
sys 0m0.003s
I didn't bother to provide a baseline for slow.hs, as I didn't
want to wait for it to complete when run with an argument of
2*10^9.
Subsecond performance
The following program runs in under a second on my machine.
It uses a hand rolled hashmap, which uses closed hashing with
linear probing and uses some variant of knuths hashfunction,
see here.
Certainly it is somewhat tailored to the case, as the lookup
function for example expects the searched keys to be present.
Timings:
time stack exec prim
95673602693282040
real 0m0.725s
user 0m0.714s
sys 0m0.047s
First I implemented my hand rolled hashmap simply to hash
the keys with
key `mod` size
and selected a size multiple times higher than the expected
input, but the program took 22s or more to complete.
Finally it was a matter of choosing a hash function which was
good for the workload.
Here is the program:
import Data.Maybe
import Control.Monad
import Data.Array.IO
import Data.Array.Base (unsafeRead)
type Number = Int
data Map = Map { keys :: IOUArray Int Number
, values :: IOUArray Int Number
, size :: !Int
, factor :: !Int
}
newMap :: Int -> Int -> IO Map
newMap s f = do
k <- newArray (0, s-1) 0
v <- newArray (0, s-1) 0
return $ Map k v s f
storeKey :: IOUArray Int Number -> Int -> Int -> Number -> IO Int
storeKey arr s f key = go ((key * f) `mod` s)
where
go :: Int -> IO Int
go ind = do
v <- readArray arr ind
go2 v ind
go2 v ind
| v == 0 = do { writeArray arr ind key; return ind; }
| v == key = return ind
| otherwise = go ((ind + 1) `mod` s)
loadKey :: IOUArray Int Number -> Int -> Int -> Number -> IO Int
loadKey arr s f key = s `seq` key `seq` go ((key *f) `mod` s)
where
go :: Int -> IO Int
go ix = do
v <- unsafeRead arr ix
if v == key then return ix else go ((ix + 1) `mod` s)
insertIntoMap :: Map -> (Number, Number) -> IO Map
insertIntoMap m#(Map ks vs s f) (k, v) = do
ix <- storeKey ks s f k
writeArray vs ix v
return m
fromList :: Int -> Int -> [(Number, Number)] -> IO Map
fromList s f xs = do
m <- newMap s f
foldM insertIntoMap m xs
(!) :: Map -> Number -> IO Number
(!) (Map ks vs s f) k = do
ix <- loadKey ks s f k
readArray vs ix
mupdate :: Map -> Number -> (Number -> Number) -> IO ()
mupdate (Map ks vs s fac) i f = do
ix <- loadKey ks s fac i
old <- readArray vs ix
let x' = f old
x' `seq` writeArray vs ix x'
r' :: Number -> Number
r' = floor . sqrt . fromIntegral
vs' :: Integral a => a -> a -> [a]
vs' n r = [n `div` i | i <- [1..r]] ++ reverse [1..n `div` r - 1]
vss' n r = r + n `div` r -1
list' :: Int -> Int -> [Number] -> IO Map
list' s f vs = fromList s f [(i, i * (i + 1) `div` 2 - 1) | i <- vs]
problem10 :: Number -> IO Number
problem10 n = do
m <- list' (19*vss) (19*vss+7) vs
nm <- sieve m 2 r vs
nm ! n
where vs = vs' n r
vss = vss' n r
r = r' n
sieve :: Map -> Number -> Number -> [Number] -> IO Map
sieve m p r vs | p > r = return m
| otherwise = do
v1 <- m ! p
v2 <- m ! (p - 1)
nm <- if v1 > v2 then update m vs p else return m
sieve nm (p + 1) r vs
update :: Map -> [Number] -> Number -> IO Map
update m vs p = foldM (decrease p) m $ takeWhile (>= p*p) vs
decrease :: Number -> Map -> Number -> IO Map
decrease p m k = do
v <- sumOfSieved m k p
mupdate m k (subtract v)
return m
sumOfSieved :: Map -> Number -> Number -> IO Number
sumOfSieved m v p = do
v1 <- m ! (v `div` p)
v2 <- m ! (p - 1)
return $ p * (v1 - v2)
main = do { n <- problem10 (2*10^9) ; print n; } -- 2*10^9
I am not a professional with hashing and that sort of stuff, so
this can certainly be improved a lot. Maybe we Haskellers should
improve the of the shelf hash maps or provide some simpler ones.
My hashmap, Shershs code
If I plug my hashmap in Shershs (see answer below) code, see here
we are even down to
time stack exec prim-hash2
95673602693282040
real 0m0.601s
user 0m0.604s
sys 0m0.034s
Why is slow.hs slow?
If you read through the source
for the function insert in Data.HashTable.ST.Basic, you
will see that it deletes the old key value pair and inserts
a new one. It doesn't look up the "place" for the value and
mutate it, as one might imagine, if one reads that it is
a "mutable" hashtable. Here the hashtable itself is mutable,
so you don't need to copy the whole hashtable for insertion
of a new key value pair, but the value places for the pairs
are not. I don't know if that is the whole story of slow.hs
being slow, but my guess is, it is a pretty big part of it.
A few minor improvements
So that's the idea I followed while trying to improve
your program the first time.
See, you don't need a mutable mapping from keys to values.
Your key set is fixed. You want a mapping from keys to mutable
places. (Which is, by the way, what you get from C++ by default.)
And so I tried to come up with that. I used IntMap IORef from
Data.IntMap.Strict and Data.IORef first and got a timing
of
tack exec prim
95673602693282040
real 0m2.134s
user 0m2.141s
sys 0m0.028s
I thought maybe it would help to work with unboxed values
and to get that, I used IOUArray Int Int with 1 element
each instead of IORef and got those timings:
time stack exec prim
95673602693282040
real 0m2.015s
user 0m2.018s
sys 0m0.038s
Not much of a difference and so I tried to get rid of bounds
checking in the 1 element arrays by using unsafeRead and
unsafeWrite and got a timing of
time stack exec prim
95673602693282040
real 0m1.845s
user 0m1.850s
sys 0m0.030s
which was the best I got using Data.IntMap.Strict.
Of course I ran each program multiple times to see if
the times are stable and the differences in run time aren't
just noise.
It looks like these are all just micro-optimizations.
And here is the program that ran fastest for me without using a hand rolled data structure:
import qualified Data.IntMap.Strict as M
import Control.Monad
import Data.Array.IO
import Data.Array.Base (unsafeRead, unsafeWrite)
type Number = Int
type Place = IOUArray Number Number
type Map = M.IntMap Place
tupleToRef :: (Number, Number) -> IO (Number, Place)
tupleToRef = traverse (newArray (0,0))
insertRefs :: [(Number, Number)] -> IO [(Number, Place)]
insertRefs = traverse tupleToRef
fromList :: [(Number, Number)] -> IO Map
fromList xs = M.fromList <$> insertRefs xs
(!) :: Map -> Number -> IO Number
(!) m i = unsafeRead (m M.! i) 0
mupdate :: Map -> Number -> (Number -> Number) -> IO ()
mupdate m i f = do
let place = m M.! i
old <- unsafeRead place 0
let x' = f old
-- make the application of f strict
x' `seq` unsafeWrite place 0 x'
r' :: Number -> Number
r' = floor . sqrt . fromIntegral
vs' :: Integral a => a -> a -> [a]
vs' n r = [n `div` i | i <- [1..r]] ++ reverse [1..n `div` r - 1]
list' :: [Number] -> IO Map
list' vs = fromList [(i, i * (i + 1) `div` 2 - 1) | i <- vs]
problem10 :: Number -> IO Number
problem10 n = do
m <- list' vs
nm <- sieve m 2 r vs
nm ! n
where vs = vs' n r
r = r' n
sieve :: Map -> Number -> Number -> [Number] -> IO Map
sieve m p r vs | p > r = return m
| otherwise = do
v1 <- m ! p
v2 <- m ! (p - 1)
nm <- if v1 > v2 then update m vs p else return m
sieve nm (p + 1) r vs
update :: Map -> [Number] -> Number -> IO Map
update m vs p = foldM (decrease p) m $ takeWhile (>= p*p) vs
decrease :: Number -> Map -> Number -> IO Map
decrease p m k = do
v <- sumOfSieved m k p
mupdate m k (subtract v)
return m
sumOfSieved :: Map -> Number -> Number -> IO Number
sumOfSieved m v p = do
v1 <- m ! (v `div` p)
v2 <- m ! (p - 1)
return $ p * (v1 - v2)
main = do { n <- problem10 (2*10^9) ; print n; } -- 2*10^9
If you profile that, you see that it spends most of the time in the custom lookup function (!),
don't know how to improve that further. Trying to inline (!) with {-# INLINE (!) #-}
didn't yield better results; maybe ghc already did this.
This code of mine evaluates the sum to 2⋅10^9 in 0.3 seconds and the sum to 10^12 (18435588552550705911377) in 19.6 seconds (if given sufficient RAM).
import Control.DeepSeq
import qualified Control.Monad as ControlMonad
import qualified Data.Array as Array
import qualified Data.Array.ST as ArrayST
import qualified Data.Array.Base as ArrayBase
primeLucy :: (Integer -> Integer) -> (Integer -> Integer) -> Integer -> (Integer->Integer)
primeLucy f sf n = g
where
r = fromIntegral $ integerSquareRoot n
ni = fromIntegral n
loop from to c = let go i = ControlMonad.when (to<=i) (c i >> go (i-1)) in go from
k = ArrayST.runSTArray $ do
k <- ArrayST.newListArray (-r,r) $ force $
[sf (div n (toInteger i)) - sf 1|i<-[r,r-1..1]] ++
[0] ++
[sf (toInteger i) - sf 1|i<-[1..r]]
ControlMonad.forM_ (takeWhile (<=r) primes) $ \p -> do
l <- ArrayST.readArray k (p-1)
let q = force $ f (toInteger p)
let adjust = \i j -> do { v <- ArrayBase.unsafeRead k (i+r); w <- ArrayBase.unsafeRead k (j+r); ArrayBase.unsafeWrite k (i+r) $!! v+q*(l-w) }
loop (-1) (-div r p) $ \i -> adjust i (i*p)
loop (-div r p-1) (-min r (div ni (p*p))) $ \i -> adjust i (div (-ni) (i*p))
loop r (p*p) $ \i -> adjust i (div i p)
return k
g :: Integer -> Integer
g m
| m >= 1 && m <= integerSquareRoot n = k Array.! (fromIntegral m)
| m >= integerSquareRoot n && m <= n && div n (div n m)==m = k Array.! (fromIntegral (negate (div n m)))
| otherwise = error $ "Function not precalculated for value " ++ show m
primeSum :: Integer -> Integer
primeSum n = (primeLucy id (\m -> div (m*m+m) 2) n) n
If your integerSquareRoot function is buggy (as reportedly some are), you can replace it here with floor . sqrt . fromIntegral.
Explanation:
As the name suggests it is based upon a generalization of the famous method by "Lucy Hedgehog" eventually discovered by the original poster.
It allows you to calculate many sums of the form (with p prime) without enumerating all the primes up to N and in time O(N^0.75).
Its inputs are the function f (i.e., id if you want the prime sum), its summatory function over all the integers (i.e., in that case the sum of the first m integers or div (m*m+m) 2), and N.
PrimeLucy returns a lookup function (with p prime) restricted to certain values of n: .
Try this and let me know how fast it is:
-- sum of primes
import Control.Monad (forM_, when)
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed
sieve :: Int -> UArray Int Bool
sieve n = runSTUArray $ do
let m = (n-1) `div` 2
r = floor . sqrt $ fromIntegral n
bits <- newArray (0, m-1) True
forM_ [0 .. r `div` 2 - 1] $ \i -> do
isPrime <- readArray bits i
when isPrime $ do
let a = 2*i*i + 6*i + 3
b = 2*i*i + 8*i + 6
forM_ [a, b .. (m-1)] $ \j -> do
writeArray bits j False
return bits
primes :: Int -> [Int]
primes n = 2 : [2*i+3 | (i, True) <- assocs $ sieve n]
main = do
print $ sum $ primes 1000000
You can run it on ideone. My algorithm is the Sieve of Eratosthenes, and it should be quite fast for small n. For n = 2,000,000,000, the array size may be a problem, in which case you will need to use a segmented sieve. See my blog for more information about the Sieve of Eratosthenes. See this answer for information about a segmented sieve (but not in Haskell, unfortunately).

Infinite/Lazy Reservoir Sampling in Haskell

I tried to implement a simple reservoir sampling in haskell following http://jeremykun.com/2013/07/05/reservoir-sampling/ (note that the algorithm shown is possibly semantically incorrect)
According to this: Iterative or Lazy Reservoir Sampling lazy reservoir sampling is impossible unless you know the population size ahead of time.
Even so, I'm not understanding why (operationally speaking) the below sampleReservoir doesn't work on infinite lists. Just where exactly is laziness broken?
import System.Random (randomRIO)
-- equivalent to python's enumerate
enumerate :: (Num i, Enum i) => i -> [e] -> [(i, e)]
enumerate start = zip [start..]
sampleReservoir stream =
foldr
(\(i, e) reservoir -> do
r <- randomRIO (0.0, 1.0) :: IO Double
-- randomRIO gets confused about 0.0 and 1.0
if r < (1.0 / fromIntegral i) then
fmap (e:) reservoir
else
reservoir)
(return [])
(enumerate 1 stream)
The challenge and test is fmap (take 1) $ sampleReservoir [1..].
Furthermore, if reservoir sampling can't be lazy, what can take in a lazy list and produce a sampled lazy list?
I get the idea that there must be a way of making the above function lazy in the output as well, because I could change this:
if r < (1.0 / fromIntegral i) then
fmap (e:) reservoir
else
To:
if r < (1.0 / fromIntegral i) then
do
print e
fmap (e:) reservoir
This shows results as the function is iterating over the list. Using coroutine abstraction, perhaps instead of print e there can be a yield e, and the rest of the computation can be held as a continuation.
The problem is that the IO monad maintains a strict sequence between actions. Writing fmap (e:) reservoir will first execute all of the effects associated with reservoir, which will be infinite if the input list is infinite.
I was able to fix this with liberal use of unsafeInterleaveIO, which allows you to break the semantics of IO:
sampleReservoir2 :: [e] -> IO [e]
sampleReservoir2 stream =
foldr
(\(i, e) reservoir -> do
r <- unsafeInterleaveIO $ randomRIO (0.0, 1.0) :: IO Double -- randomRIO gets confused about 0.0 and 1.0
if r < (1.0 / fromIntegral i) then unsafeInterleaveIO $ do
rr <- reservoir
return (e:rr)
else
reservoir)
(return [])
(enumerate 1 stream)
Obviously, this will allow the interleaving of IO actions, but since all you're doing is generating random numbers it shouldn't matter. However, this solution isn't very satisfactory; the correct solution is to refactor your code somewhat. You should generate an infinite list of random numbers, then consume that infinite list (lazily) with foldr:
sampleReservoir3 :: MonadRandom m => [a] -> m [a]
sampleReservoir3 stream = do
ws <- getRandomRs (0, 1 :: Double)
return $ foldr
(\(w, (i, e)) reservoir ->
(if w < (1 / fromIntegral i) then (e:) else id) reservoir
)
[]
(zip ws $ enumerate 1 stream)
This can also (equivalently) be written as
sampleReservoir4 :: [a] -> IO [a]
sampleReservoir4 stream = do
seed <- newStdGen
let ws = randomRs (0, 1 :: Double) seed
return $ foldr
(\(w, (i, e)) reservoir ->
(if w < (1 / fromIntegral i) then (e:) else id) reservoir
)
[]
(zip ws $ enumerate 1 stream)
As an aside, I'm not sure as to the correctness of the algorithm, since it seems to always return the first element of the input list first. Not very random.

Speeding up a stream like data type

I've made a type which is supposed to emulate a "stream". This is basically a list without memory.
data Stream a = forall s. Stream (s -> Maybe (a, s)) s
Basically a stream has two elements. A state s, and a function that takes the state, and returns an element of type a and the new state.
I want to be able to perform operations on streams, so I've imported Data.Foldable and defined streams on it as such:
import Data.Foldable
instance Foldable Stream where
foldr k z (Stream sf s) = go (sf s)
where
go Nothing = z
go (Just (e, ns)) = e `k` go (sf ns)
To test the speed of my stream, I've defined the following function:
mysum = foldl' (+) 0
And now we can compare the speed of ordinary lists and my stream type:
x1 = [1..n]
x2 = Stream (\s -> if (s == n + 1) then Nothing else Just (s, s + 1)) 1
--main = print $ mysum x1
--main = print $ mysum x2
My streams are about half the speed of lists (full code here).
Furthermore, here's a best case situation, without a list or a stream:
bestcase :: Int
bestcase = go 1 0 where
go i c = if i == n then c + i else go (i+1) (c+i)
This is a lot faster than both the list and stream versions.
So I've got two questions:
How to I get my stream version to be at least as fast as a list.
How to I get my stream version to be close to the speed of bestcase.
As it stands the foldl' you are getting from Foldable is defined in terms of the foldr you gave it. The default implementation is the brilliant and surprisingly good
foldl' :: (b -> a -> b) -> b -> t a -> b
foldl' f z0 xs = foldr f' id xs z0
where f' x k z = k $! f z x
But foldl' is the specialty of your type; fortunately the Foldable class includes foldl' as a method, so you can just add this to your instance.
foldl' op acc0 (Stream sf s0) = loop s0 acc0
where
loop !s !acc = case sf s of
Nothing -> acc
Just (a,s') -> loop s' (op acc a)
For me this seems to give about the same time as bestcase
Note that this is a standard case where we need a strictness annotation on the accumulator. You might look in the vector package's treatment of a similar type https://hackage.haskell.org/package/vector-0.10.12.2/docs/src/Data-Vector-Fusion-Stream.html for some ideas; or in the hidden 'fusion' modules of the text library https://github.com/bos/text/blob/master/Data/Text/Internal/Fusion .

Resources