Very slow guards in my monadic random implementation (haskell) - performance

I was tried to write one random number generator implementation, based on number class. I also add there Monad and MonadPlus instance.
What mean "MonadPlus" and why I add this instance? Because of I want to use guards like here:
-- test.hs --
import RandomMonad
import Control.Monad
import System.Random
x = Rand (randomR (1 ::Integer, 3)) ::Rand StdGen Integer
y = do
a <-x
guard (a /=2)
guard (a /=1)
return a
here comes RandomMonad.hs file contents:
-- RandomMonad.hs --
module RandomMonad where
import Control.Monad
import System.Random
import Data.List
data RandomGen g => Rand g a = Rand (g ->(a,g)) | RandZero
instance (Show g, RandomGen g) => Monad (Rand g)
where
return x = Rand (\g ->(x,g))
(RandZero)>>= _ = RandZero
(Rand argTransformer)>>=(parametricRandom) = Rand funTransformer
where
funTransformer g | isZero x = funTransformer g1
| otherwise = (getRandom x g1,getGen x g1)
where
x = parametricRandom val
(val,g1) = argTransformer g
isZero RandZero = True
isZero _ = False
instance (Show g, RandomGen g) => MonadPlus (Rand g)
where
mzero = RandZero
RandZero `mplus` x = x
x `mplus` RandZero = x
x `mplus` y = x
getRandom :: RandomGen g => Rand g a ->g ->a
getRandom (Rand f) g = (fst (f g))
getGen :: RandomGen g => Rand g a ->g -> g
getGen (Rand f) g = snd (f g)
when I run ghci interpreter, and give following command
getRandom y (mkStdGen 2000000000)
I can see memory overflow on my computer (1G). It's not expected, and if I delete one guard, it works very fast. Why in this case it works too slow?
What I do wrong?

Your definition of (>>=) is certainly wrong, but I cannot point to where because it is so complicated! Instead I will explain why it cannot be defined correctly using an example. Consider:
Rand (\g -> (42,g)) >>= const mzero
We need to get that 42 out, so we need a g. The place to get the g is from the return value of the bind, so the answer is definitely:
Rand (\g -> ...)
For some ..., responsible for returning a (b,g) pair. Now that we have 42, we can evaluate const mzero 42 and find that we have RandZero But where are we going to get that b? It is nowhere (in fact, so nowhere in this example that it can be any type whatsoever, since the type of the expression is forall b. Rand b).
What is the purpose of RandZero for your monad? Are you just trying to make StateT g Maybe? My guess is that you are. In that case, you might have more luck trying to implement this type:
newtype Rand g a = Rand (g -> Maybe (a, g))

If I understand your "monad" correctly, (>>=) fails to be associative. Try defining
y' = do a <- do a' <- x
guard (a' /= 2)
return a'
guard (a /= 1)
return a
to check whether this is the case. Effectively, your backtracking strategy can only undo the last step, not the entire computation.

Related

Why does performance drop when a function is moved to another module?

I am observing that the same fn gives different performance, depending on whether it's placed in the same module where it's used or in a module next to it. Any ideas what could be causing it?
Here's the function:
https://github.com/oshyshko/polymodperf/blob/master/src/Main.hs#L41-L55
test :: MArray a t m => (t -> t) -> a Int t -> m ()
test f a =
mapM_ (\ xy -> do
v <- get a xy
set a xy (f v))
[ (x,y) | y <- [0..1000 - 1],
x <- [0..1000 - 1],
n <- [0..10]]
where
get :: MArray a e m => a Int e -> (Int, Int) -> m e
get a (x,y) = readArray a (x + y * 1000)
set :: MArray a e m => a Int e -> (Int, Int) -> e -> m ()
set a (x,y) = writeArray a (x + y * 1000)
In my test pass I use Data.Array.IO.newArray to create an array, then pass it to test.
Here's how to observe the difference in performance (second value, ms):
$ ./scripts/build-exec.sh
...
Main.test
(11000000,2010)
(11000000,239)
(11000000,240)
(11000000,242)
(11000000,237)
SomeModule.test
(11000000,6376)
(11000000,4851)
(11000000,5455)
(11000000,5096)
(11000000,5206)
Main.test: both newArray and test both live in Main => okay performance (the first 2010ms run is probably bad due to warmup, but the rest look good)
SomeModule.test: newArray lives in Main, but test is imported from SomeModule.test => much worse performance
The code of test is identical in both modules:
https://github.com/oshyshko/polymodperf/blob/master/src/Main.hs#L41-L55
https://github.com/oshyshko/polymodperf/blob/master/src/SomeModule.hs#L9-L17
The used MArray typeclass and fns readArray, writeArray imported from the same module in both cases:
import Data.Array.MArray (MArray, readArray, writeArray)
Any ideas what could be causing the difference in performance?
As leftaroundabout suggested in a comment, adding INLINE pragma solved the problem:
test :: MArray a t m => (t -> t) -> a Int t -> m ()
{-# INLINE test #-}
test f a =
...
https://github.com/oshyshko/polymodperf/blob/master/src/SomeModule.hs#L10

Generic algorithm to enumerate sum and product types on Haskell?

Some time ago, I've asked how to map back and forth from godel numbers to terms of a context-free language. While the answer solved the issue specificaly, I'm having trouble in actually programming it generically. So, this question is more generic: given a recursive algebraic data type with terminals, sums and products - such as
data Term = Prod Term Term | SumL Term | SumR Term | AtomA | AtomB
what is an algorithm that will map a term of this type to its godel number, and its inverse?
Edit: for example:
data Foo = A | B Foo | C Foo deriving Show
to :: Foo -> Int
to A = 1
to (B x) = to x * 2
to (C x) = to x * 2 + 1
from :: Int -> Foo
from 1 = A
from n = case mod n 2 of
0 -> B (from (div n 2))
1 -> C (from (div n 2))
Here, to and from do what I want for Foo. I'm just asking for a systematic way to derive those functions for any datatype.
In order to avoid dealing with a particular Goedel numbering, let's define a class that'll abstract the necessary operations (with some imports we'll need later):
{-# LANGUAGE TypeOperators, DefaultSignatures, FlexibleContexts, DeriveGeneric #-}
import Control.Applicative
import GHC.Generics
import Test.QuickCheck
import Test.QuickCheck.Gen
class GodelNum a where
fromInt :: Integer -> a
toInt :: a -> Maybe Integer
encode :: [a] -> a
decode :: a -> [a]
So we can inject natural numbers and encode sequences. Let's further create a canonical instance of this class that'll use throughout the code, which does no real Goedel encoding, just constructs a tree of terms.
data TermNum = Value Integer | Complex [TermNum]
deriving (Show)
instance GodelNum TermNum where
fromInt = Value
toInt (Value x) = Just x
toInt _ = Nothing
encode = Complex
decode (Complex xs) = xs
decode _ = []
For real encoding we'd use another implementation that'd use just one Integer, something like newtype SomeGoedelNumbering = SGN Integer.
Let's further create a class for types that we can encode/decode:
class GNum a where
gto :: (GodelNum g) => a -> g
gfrom :: (GodelNum g) => g -> Maybe a
default gto :: (Generic a, GodelNum g, GGNum (Rep a)) => a -> g
gto = ggto . from
default gfrom :: (Generic a, GodelNum g, GGNum (Rep a)) => g -> Maybe a
gfrom = liftA to . ggfrom
The last four lines define a generic implementation of gto and gfrom using GHC Generics and DefaultSignatures. The class GGNum that they use is a helper class which we'll use to define encoding for the atomic ADT operations - products, sums, etc.:
class GGNum f where
ggto :: (GodelNum g) => f a -> g
ggfrom :: (GodelNum g) => g -> Maybe (f a)
-- no-arg constructors
instance GGNum U1 where
ggto U1 = encode []
ggfrom _ = Just U1
-- products
instance (GGNum a, GGNum b) => GGNum (a :*: b) where
ggto (a :*: b) = encode [ggto a, ggto b]
ggfrom e | [x, y] <- decode e = liftA2 (:*:) (ggfrom x) (ggfrom y)
| otherwise = Nothing
-- sums
instance (GGNum a, GGNum b) => GGNum (a :+: b) where
ggto (L1 x) = encode [fromInt 0, ggto x]
ggto (R1 y) = encode [fromInt 1, ggto y]
ggfrom e | [n, x] <- decode e = case toInt n of
Just 0 -> L1 <$> ggfrom x
Just 1 -> R1 <$> ggfrom x
_ -> Nothing
-- metadata
instance (GGNum a) => GGNum (M1 i c a) where
ggto (M1 x) = ggto x
ggfrom e = M1 <$> ggfrom e
-- constants and recursion of kind *
instance (GNum a) => GGNum (K1 i a) where
ggto (K1 x) = gto x
ggfrom e = K1 <$> gfrom e
Having that, we can then define a data type like yours and just declare its GNum instance, everything else will be automatically derived.
data Term = Prod Term Term | SumL Term | SumR Term | AtomA | AtomB
deriving (Eq, Show, Generic)
instance GNum Term where
And just to be sure we've done everything right, let's use QuickCheck to verify that our gfrom is an inverse of gto:
instance Arbitrary Term where
arbitrary = oneof [ return AtomA
, return AtomB
, SumL <$> arbitrary
, SumR <$> arbitrary
, Prod <$> arbitrary <*> arbitrary
]
prop_enc_dec :: Term -> Property
prop_enc_dec x = Just x === gfrom (gto x :: TermNum)
main :: IO ()
main = quickCheck prop_enc_dec
Notes:
The same thing could be accomplished using Scrap Your Boilerplate, perhaps more efficiently, as it allows somewhat higher-level access - enumerating constructors and records, etc.
See also paper Efficient Bijective G¨odel Numberings for Term Algebras (I haven't read the paper yet, but seems related).
For fun, I decided to try the approach in the link you posted, and didn't get stuck anywhere. So here's my code, with no commentary (the explanation is the same as the last time). First, code stolen from the other answer:
{-# LANGUAGE TypeSynonymInstances #-}
import Control.Applicative
import Data.Universe.Helpers
type Nat = Integer
class Godel a where
to :: a -> Nat
from :: Nat -> a
instance Godel Nat where to = id; from = id
instance (Godel a, Godel b) => Godel (a, b) where
to (m_, n_) = (m + n) * (m + n + 1) `quot` 2 + m where
m = to m_
n = to n_
from p = (from m, from n) where
isqrt = floor . sqrt . fromIntegral
base = (isqrt (1 + 8 * p) - 1) `quot` 2
triangle = base * (base + 1) `quot` 2
m = p - triangle
n = base - m
And the code specific to your new type:
data Term = Prod Term Term | SumL Term | SumR Term | AtomA | AtomB
deriving (Eq, Ord, Read, Show)
ts = AtomA : AtomB : interleave [uncurry Prod <$> ts +*+ ts, SumL <$> ts, SumR <$> ts]
instance Godel Term where
to AtomA = 0
to AtomB = 1
to (Prod t1 t2) = 2 + 0 + 3 * to (t1, t2)
to (SumL t) = 2 + 1 + 3 * to t
to (SumR t) = 2 + 2 + 3 * to t
from 0 = AtomA
from 1 = AtomB
from n = case quotRem (n-2) 3 of
(q, 0) -> uncurry Prod (from q)
(q, 1) -> SumL (from q)
(q, 2) -> SumR (from q)
The same ghci test as last time:
*Main> take 30 (map from [0..]) == take 30 ts
True

Better pattern for caching results

I'm running a number of times now into a similar pattern which is error-prone (typos can skip some caching) and simply doesn't look nice to me. Is there a better way of writing something like this?
sum_with_cache' result cache ((p1,p2,p3,p4):partitions) = let
(cache_p1, sol1) = count_noncrossing' cache p1
(cache_p2, sol2) = count_noncrossing' cache_p1 p2
(cache_p3, sol3) = count_noncrossing' cache_p2 p3
(cache_p4, sol4) = count_noncrossing' cache_p3 p4
in sum_with_cache' (result+(sol1*sol2*sol3*sol4)) cache_p4 partitions
So basically N operations which can update the cache?
I could write also something like:
process_with_cache' res cache _ [] = (cache, res)
process_with_cache' res cache f (x:xs) =
let (new_cache, r) = f cache x
in process_with_cache' (r:res) new_cache f xs
process_with_cache = process_with_cache' []
But that doesn't look really clean either. Is there a nicer way of writing this code?
Another similar pattern is when you request a series of named random numbers:
let (x, rng') = random rng''
(y, rng) = random rng'
in (x^2 + y^2, rng)
This is exactly when using a state monad is the right way to go:
import Control.Monad.State
For all random number generators of type (RandomGen g) => g there is a state monad State g, which threads the state implicitly:
do x <- state random
y <- state random
return (x^2 + y^2)
The state function simply takes a function of type s -> (a, s) and turns it into a computation of type State s a, in this case:
state :: (RandomGen g) => (g -> (a, g)) -> State g a
You can run a State computation by using runState, evalState or execState:
runState (liftA2 (\x y -> x^2 + y^2) (state random) (state random))
(mkStdGen 0)

composing two comparison functions?

I'd like to sort by one property and then by another (if the first property is the same.)
What's the idiomatic way in Haskell of composing two comparison functions, i.e. a function used with sortBy?
Given
f :: Ord a => a -> a -> Ordering
g :: Ord a => a -> a -> Ordering
composing f and g would yield:
h x y = case v of
EQ -> g x y
otherwise -> v
where v = f x y
vitus points out the very cool instance of Monoid for Ordering. If you combine it with the instance instance Monoid b => Monoid (a -> b) it turns out your composition function is just (get ready):
mappend
Check it out:
Prelude Data.Monoid> let f a b = EQ
Prelude Data.Monoid> let g a b = LT
Prelude Data.Monoid> :t f `mappend` g
f `mappend` g :: t -> t1 -> Ordering
Prelude Data.Monoid> (f `mappend` g) undefined undefined
LT
Prelude Data.Monoid> let f a b = GT
Prelude Data.Monoid> (f `mappend` g) undefined undefined
GT
+1 for powerful and simple abstractions
You can use the <> operator. In this example bigSort sorts string by their numerical value, first comparing length and then comparing lexicographically.
import Data.List (sortBy)
import Data.Ord (compare, comparing)
bigSort :: [String] -> [String]
bigSort = sortBy $ (comparing length) <> compare
Example:
bigSort ["31415926535897932384626433832795","1","3","10","3","5"] =
["1","3","3","5","10","31415926535897932384626433832795"]
<> is an alias of mappend from the Data.Monoid module (see jberryman answer).
The (free) book Learn You a Haskell for Great Good! explains how it works here in Chapter 11
instance Monoid Ordering where
mempty = EQ
LT `mappend` _ = LT
EQ `mappend` y = y
GT `mappend` _ = GT
The instance is set up like this: when we mappend two Ordering values, the one on the left is kept, unless the value on the left is EQ, in which case the right one is the result. The identity is EQ.

Inconsistent behaviour with Haskell

I was reading on perceptrons and trying to implement one in haskell. The algorithm seems to be working as far as I can test. I'm going to rewrite the code entirely at some point, but before doing so I thought of asking a few questions that have arosen while coding this.
The neuron can be trained when returning the complete neuron. let neuron = train set [1,1] works, but if I change the train function to return an incomplete neuron without the inputs, or try to pattern match and create only an incomplete neuron, the code falls into neverending loop.
tl;dr when returning complete neuron everything works, but when returning curryable neuron, the code falls into a loop.
module Main where
import System.Random
type Inputs = [Float]
type Weights = [Float]
type Threshold = Float
type Output = Float
type Trainingset = [(Inputs, Output)]
data Neuron = Neuron Threshold Weights Inputs deriving Show
output :: Neuron -> Output
output (Neuron threshold weights inputs) =
if total >= threshold then 1 else 0
where total = sum $ zipWith (*) weights inputs
rate :: Float -> Float -> Float
rate t o = 0.1 * (t - o)
newweight :: Float -> Float -> Weights -> Inputs -> Weights
newweight t o weight input = zipWith nw weight input
where nw w x = w + (rate t o) * x
learn :: Neuron -> Float -> Neuron
learn on#(Neuron tr w i) t =
let o = output on
in Neuron tr (newweight t o w i) i
converged :: (Inputs -> Neuron) -> Trainingset -> Bool
converged n set = not $ any (\(i,o) -> output (n i) /= o) set
train :: Weights -> Trainingset -> Neuron
train w s = train' s (Neuron 1 w)
train' :: Trainingset -> (Inputs -> Neuron) -> Neuron
train' s n | not $ converged n set
= let (Neuron t w i) = train'' s n
in train' s (Neuron t w)
| otherwise = n $ fst $ head s
train'' :: Trainingset -> (Inputs -> Neuron) -> Neuron
train'' ((a,b):[]) n = learn (n a) b
train'' ((a,b):xs) n = let
(Neuron t w i) = learn (n a) b
in
train'' xs (Neuron t w)
set :: Trainingset
set = [
([1,0], 0),
([1,1], 1),
([0,1], 0),
([0,0], 0)
]
randomWeights :: Int -> IO [Float]
randomWeights n =
do
g <- newStdGen
return $ take n $ randomRs (-1, 1) g
main = do
w <- randomWeights 2
let (Neuron t w i) = train w set
print $ output $ (Neuron t w [1,1])
return ()
Edit: As per comments, specifying a little more.
Running with the code above, I get:
perceptron: <<loop>>
But by editing the main method to:
main = do
w <- randomWeights 2
let neuron = train w set
print $ neuron
return ()
(Notice the let neuron, and print rows), everything works and the output is:
Neuron 1.0 [0.71345896,0.33792675] [1.0,0.0]
Perhaps I am missing something, but I boiled your test case down to this program:
module Main where
data Foo a = Foo a
main = do
x ← getLine
let (Foo x) = Foo x
putStrLn x
This further simplifies to:
main = do
x ← getLine
let x = x
putStrLn x
The problem is that binding (Foo x) to something that depends on x
is a cyclic dependency. To evaluate x, we need to know the value of
x. OK, so we just need to calculate x. To calculate x, we need to
know the value of x. That's fine, we'll just calculate x. And so on.
This isn't C, remember: it's binding, not assignment, and the binding
is evaluated lazily.
Use better variable names, and it all works:
module Main where
data Foo a = Foo a
main = do
line ← getLine
let (Foo x) = Foo line
putStrLn x
(The variable in question, in your case, is w.)
This is a common mistake in Haskell. You cannot say things like:
let x = 0
let x = x + 1
And have it mean what it would in a language with assignment, or even nonrecursive binding. The first line is irrelevant, it gets shadowed by the second line, which defines x as x+1, that is, it defines recursively x = ((((...)+1)+1)+1)+1, which will loop upon evaluation.

Resources