Consider the two following implementations of an infinite Fibonacci sequence:
fibsA :: Num a => [a]
fibsA = 0:1:(zipWith (+) fibsA (tail fibsA))
fibsB :: [Integer]
fibsB = 0:1:(zipWith (+) fibsB (tail fibsB))
In GHCI, executing fibsB !! k is much faster than executing fibsA !! k.
In particular, it seems that the values of fibsA are continuously recalculated (not memoized/stored).
Furthermore, when the type signature is omitted, GHCI's :t shows it to be [Integer], and the function performs accordingly.
This behavior also occurs in compiled code (ghc -O3 Fibs.hs).
Why is it the case that Integer is so much faster than Num a => a?
When you write fibsA :: Num a => [a], the compiler constructs what is essentially
fibsA :: NumDict a -> [a]
Where
data NumDict a = NumDict
{ (+) :: a -> a -> a
, (-) :: a -> a -> a
, (*) :: a -> a -> a
, negate :: a -> a
, abs :: a -> a
, signum :: a -> a
, fromInteger :: Integer -> a
}
Notice that Num a has moved from being a constraint to being an argument to the function. A typeclass is essentially just a lookup table for each type that implements the class. So for Num, you'd have by default
mkInteger_NumDict :: NumDict Integer
mkInteger_NumDict = NumDict
{ (+) = integer_plus
, (-) = integer_minus
, (*) = integer_mult
, ...
}
mkInt_NumDict :: NumDict Int
mkFloat_NumDict :: NumDict Float
mkDouble_NumDict :: NumDict Double
and these get automatically passed to a function using a typeclass when the instance is resolved. This means that our function fibsA essentially takes an argument. When you call it from GHCi, the defaulting rules kick in and pick Integer, but since it's being called this way it would look more like this internally:
{-# RecordWildCards #-} -- To reduce typing
fibsA :: NumDict a -> [a]
fibsA nd#(NumDict{..}) = fromInteger 0 : fromInteger 1 : zipWith (+) (fibsA nd) (tail $ fibsA nd)
Do you see the problem with this? It's still recursive, but now it has to make a function call each step of the way, reducing performance. If you wanted to make it really fast, a smart programmer would do
fibsA nd#(NumDict{..}) = fromInteger 0 : fromInteger 1 : zipWith (+) fibsA' (tail fibsA')
where fibsA' = fibsA nd
This at least allows memoization. However, a haskell binary can't really perform this optimization at runtime, that happens at compile time. So what you end up with is a slower recursive function. With fibsB, you're specifying the type concretely, there are no polymorphic constraints on it's type signature. The value fibsB has no implicit or explicit arguments, so when referred to it's a pointer to the same object in memory. fibsA is a pointer to a function, so when used recursively it returns new objects in memory, and has no memoization. This is why fibsB is faster than fibsA, only fibsB gets optimized because the compiler doesn't have to make it work for all Num, only Integer.
In addition to #bheklilr's thorough explanation: You can also make fibsA fast, if you perform the list sharing inside the function, making it non-recursive (hiding the recursion inside):
fibsA' :: Num a => [a]
fibsA' =
let f = 0:1:(zipWith (+) f (tail f))
in f
Related
With QuickCheck, one can write parametrically polymorphic properties, like this:
associativityLaw :: (Eq a, Show a, Semigroup a) => a -> a -> a -> Property
associativityLaw x y z = (x <> y) <> z === x <> (y <> z)
This is just an example, as my actual properties are more complex, but it illustrates the problem well enough. This property verifies that for a type a, the <> operator is associative.
Imagine that I'd like to exercise this property for more than one type. I could define my test list like this:
tests =
[
testGroup "Monoid laws" [
testProperty "Associativity law, [Int]" (associativityLaw :: [Int] -> [Int] -> [Int] -> Property),
testProperty "Associativity law, Sum Int" (associativityLaw :: Sum Int -> Sum Int -> Sum Int -> Property)
]
]
This works, but feels unnecessarily verbose. I'd like to be able to simply state that for a given property, a should be [Int], or a should be Sum Int.
Something like this hypothetical syntax:
testProperty "Associativity law, [Int]" (associativityLaw :: a = [Int]),
testProperty "Associativity law, Sum Int" (associativityLaw :: a = Sum Int)
Is there a way to do this, perhaps with a GHC language extension?
My actual problem involves higher-kinded types, and I'd like to be able to state that e.g. f a is [Int], or f a is Maybe String.
I'm aware of this answer, but both options (Proxy and Tagged) seem, as described there, at least, too awkward to really address the issue.
You can use TypeApplications to bind type variables like this:
{-# LANGUAGE TypeApplications #-}
associativityLaw #[Int]
In the case you mentioned where you have a higher kinded type and you want to bind f a to [Int], you have to bind the type variables f and a separately:
fmap #[] #Int
For functions with more than one type variable, you can apply the args in order:
f :: a -> b -> Int
-- bind both type vars
f #Int #String
-- bind just the first type var, and let GHC infer the second one
f #Int
-- bind just the second type var, and let GHC infer the first one
f #_ #String
Sometimes the "order" of the type variables may not be obvious, but you can use :type +v and ask GHCi for more info:
λ> :t +v traverse
traverse
:: Traversable t =>
forall (f :: * -> *) a b.
Applicative f =>
(a -> f b) -> t a -> f (t b)
In standard haskell, the "order" of the type variables doesn't matter, so GHC just makes one up for you. But in the presence of TypeApplications, the order does matter:
map :: forall b a. (a -> b) -> ([a] -> [b])
-- is not the same as
map :: forall a b. (a -> b) -> ([a] -> [b])
For this reason, when working with highly parametric code, or you expect your users are going to want to use TypeApplications on your functions, you might want to explicitly set the order of your type vars instead of letting GHC define an order for you, with ExplicitForAll:
{-# LANGUAGE ExplicitForAll #-}
map :: forall a b. (a -> b) -> ([a] -> [b])
Which feels a lot like <T1, T2> in java or c#
I am implementing some algorithm on haskell. This algorithm requires generating some data.
I have a function of an algorithm which takes generation function as a parameter. For example, algorithm is just multiplying input data by n:
algo :: a -> ??? -> [a]
algo n dgf = map (\x -> x * n) $ dgf
dgf is used to generate data. How to write function header correctly, as dgf can be any function with any number of parameters?
Another variant is accepting not the generation function but already generated data.
algo :: a -> [b] -> [a]
algo n d = (\x -> n*x) d
So, now let's imagine I'm generation data with stdGen, which uses IO. How can I make function more generic, so that it could accept both IO instance and plain values like just [1,2,3]. This also relates to variant with function, as it can also produce IO.
All in all, which solution is better - having a generation function or a pre-generated data?
Thanks in advance.
One option is to take a stream rather than a list. If generating the values involves performing IO, and there may be many many values, this is often the best approach. There are several packages that offer streams of some sort, but I'll use the streaming package in this example.
import qualified Streaming.Prelude as S
import Streaming
algo :: Monad m => a -> Stream (Of a) m r -> Stream (Of a) m r
algo a = S.map (a +)
You can read Stream (Of a) m r as "a way to use operations in m to produce successive values of type a and finally a result of type r". This algo function doesn't commit to any particular way of generating the data; they can be created purely:
algo a (S.each [these, are, my, elements])
or within IO,
algo a $ S.takeWhile (> 3) (S.readLn :: Stream (Of Int) IO ())
or using a randomness monad, or whatever you like.
For contrast, I'm going to take the opposite approach as dfeuer's answer.
Just use lists.
Consider your first example:
algo :: a -> ??? -> [a]
algo n dgf = map (\x -> x * n) $ dgf
You ask "How to write function header correctly, as dgf can be any function with any number of parameters?"
Well, one way is to use uncurrying.
Normally, Haskell functions are curried. If we have a function like
add :: Int -> Int -> Int
add x y = x + y
And we want a function that adds two to its input we can just use add 2.
>>> map (add 2) [1..10]
[3,4,5,6,7,8,9,10,11,12]
Because add is not actually a function that takes two arguments,
it's a function of one argument that returns a function of one argument.
We could have added parentheses to the argument of add above to make this more clear:
add :: Int -> (Int -> Int)
In Haskell, all functions are functions of one argument.
However, we can also go the other way - uncurry a function
that returns a function to get a function that takes a pair:
>>> :t uncurry
uncurry :: (a -> b -> c) -> (a, b) -> c
>>> :t uncurry add
uncurry add :: (Int, Int) -> Int
This can also be useful, say if we want to find the sum of each pair in a list:
>>> map (uncurry add) [ (1,2), (3,4), (5,6), (7,8), (9,10) ]
[3,7,11,15,19]
In general, we can uncurry any function of type a0-> a1 -> ... -> aN -> b
into a function (a0, a1, ..., aN) -> b, though there might not be
a cute library function to do it for us.
With that in mind, we could implement algo by passing it an uncurried
function and a tuple of values:
algo :: Num a => a -> (t -> [a]) -> t -> [a]
algo n f t = map (\x -> x * n) $ f t
And then use anonymous functions to uncurry our argument functions:
>>> algo 2 (\(lo,hi) -> enumFromTo lo hi) (5, 10)
[10,12,14,16,18,20]
>>> algo 3 (\(a,b,c,d) -> zipWith (+) [a..b] [c..d]) (1, 5, 10, 14)
[33,39,45,51,57]
Now we could do it this way, but we don't need to. As implemented above,
algo is only using f and t once. So why not pass it the list directly?
algo' :: Num a => a -> [a] -> [a]
algo' n ns = map (\x -> x * n) ns
It calculates the same results:
>>> algo' 2 $ (\(lo,hi) -> enumFromTo lo hi) (5, 10)
[10,12,14,16,18,20]
>>> algo' 2 $ enumFromTo 5 10
[10,12,14,16,18,20]
>>> algo' 3 $ (\(a,b,c,d) -> zipWith (+) [a..b] [c..d]) (1, 5, 10, 14)
[33,39,45,51,57]
>>> algo' 3 $ zipWith (+) [1..5] [10..14]
[33,39,45,51,57]
Furthermore, since haskell is non-strict, the argument to algo' isn't evaluated
until it's actually used, so we don't have to worry about "wasting" time computing
arguments that won't actually be used:
algo'' :: Num a => a -> [a] -> [a]
algo'' n ns = [n,n,n,n]
algo'' doesn't use the list passed to it, so it's never forced, so whatever
computation is used to calculate it never runs:
>>> let isPrime n = n > 2 && null [ i | i <- [2..n-1], n `rem` i == 0 ]
>>> :set +s
>>> isPrime 10000019
True
(6.18 secs, 2,000,067,648 bytes)
>>> algo'' 5 (filter isPrime [1..999999999999999])
[5,5,5,5]
(0.01 secs, 68,936 bytes)
Now to the second part of your question - what if your data is being generated within some monad?
Rather than convince algo to operate on monadic values, you could take the stream
based approach as dfeuer explains. Or you could just use a list.
Just because you're in a monad, doesn't mean that your values suddenly become strict.
For example, want a infinite list of random numbers? No problem.
newRandoms :: Num a -> IO [a]
newRandoms = unfoldr (\g -> Just (random g)) <$> newStdGen
Now I can just pass those to some algorithm:
>>> rints <- newRandoms :: IO [Int]
(0.00 secs, 60,624 bytes)
>>> algo'' 5 rints
[5,5,5,5]
(0.00 secs, 68,920 bytes)
For a small program which is just reading input from a file or two, there's no problem
with just using readFile and lazy I/O to get a list to operate on.
For example
>>> let grep pat lines = [ line | line <- lines, pat `isInfixOf` line ]
>>> :set +s
>>> dict <- lines <$> readFile "/usr/share/dict/words"
(0.01 secs, 81,504 bytes)
>>> grep "poop" dict
["apoop","epoophoron","nincompoop","nincompoopery","nincompoophood","nincompoopish","poop","pooped","poophyte","poophytic","whisterpoop"]
(0.72 secs, 423,650,152 bytes)
After learning few basics I wanted to try a "real world application" in Haskell, started with a Bittorrent client. Following through the explanation from this blog post, I did NOT use the Attoparsec parser combinator library. Instead following through Huttons book, I started writing the Parser Combinators. This is the code that I have so far (Still at the parsing stage, a long journey ahead):
module Main where
import System.Environment (getArgs)
import qualified Data.Map as Map
import Control.Monad (liftM, ap)
import Data.Char (isDigit, isAlpha, isAlphaNum, ord)
import Data.List(foldl')
main :: IO ()
main = do
[fileName] <- getArgs
contents <- readFile fileName
download . parse $ contents
parse :: String -> Maybe BenValue
parse s = case runParser value s of
[] -> Nothing
[(p, _)] -> Just p
download :: Maybe BenValue -> IO ()
download (Just p) = print p
download _ = print "Oh!! Man!!"
data BenValue = BenString String
| BenNumber Integer
| BenList [BenValue]
| BenDict (Map.Map String BenValue)
deriving(Show, Eq)
-- From Hutton, this follows: a Parser is a function
-- that takes a string and returns a list of results
-- each containing a pair : a result of type a and
-- an output string. (the string is the unconsumed part of the input).
newtype Parser a = Parser (String -> [(a, String)])
-- Unit takes a value and returns a Parser (a function)
unit :: a -> Parser a
unit v = Parser (\inp -> [(v, inp)])
failure :: Parser a
failure = Parser (\inp -> [])
one :: Parser Char
one = Parser $ \inp -> case inp of
[] -> []
(x: xs) -> [(x, xs)]
runParser :: Parser a -> String -> [(a, String)]
runParser (Parser p) inp = p inp
bind :: Parser a -> (a -> Parser b) -> Parser b
bind (Parser p) f = Parser $ \inp -> case p inp of
[] -> []
[(v, out)] -> runParser (f v) out
instance Monad Parser where
return = unit
p >>= f = bind p f
instance Applicative Parser where
pure = unit
(<*>) = ap
instance Functor Parser where
fmap = liftM
choice :: Parser a -> Parser a -> Parser a
choice p q = Parser $ \inp -> case runParser p inp of
[] -> runParser q inp
x -> x
satisfies :: (Char -> Bool) -> Parser Char
satisfies p = do
x <- one
if p x
then unit x
else failure
digit :: Parser Char
digit = satisfies isDigit
letter :: Parser Char
letter = satisfies isAlpha
alphanum :: Parser Char
alphanum = satisfies isAlphaNum
char :: Char -> Parser Char
char x = satisfies (== x)
many :: Parser a -> Parser [a]
many p = choice (many1 p) (unit [])
many1 :: Parser a -> Parser [a]
many1 p = do
v <- p
vs <- many p
unit (v:vs)
peek :: Parser Char
peek = Parser $ \inp -> case inp of
[] -> []
v#(x:xs) -> [(x, v)]
taken :: Int -> Parser [Char]
taken n = do
if n > 0
then do
v <- one
vs <- taken (n-1)
unit (v:vs)
else unit []
takeWhile1 :: (Char -> Bool) -> Parser [Char]
takeWhile1 pred = do
v <- peek
if pred v
then do
one
vs <- takeWhile1 pred
unit (v:vs)
else unit []
decimal :: Integral a => Parser a
decimal = foldl' step 0 `fmap` takeWhile1 isDigit
where step a c = a * 10 + fromIntegral (ord c - 48)
string :: Parser BenValue
string = do
n <- decimal
char ':'
BenString <$> taken n
signed :: Num a => Parser a -> Parser a
signed p = (negate <$> (char '-' *> p) )
`choice` (char '+' *> p)
`choice` p
number :: Parser BenValue
number = BenNumber <$> (char 'i' *> (signed decimal) <* char 'e')
list :: Parser BenValue
list = BenList <$> (char 'l' *> (many value) <* char 'e')
dict :: Parser BenValue
dict = do
char 'd'
pair <- many ((,) <$> string <*> value)
char 'e'
let pair' = (\(BenString s, v) -> (s,v)) <$> pair
let map' = Map.fromList pair'
unit $ BenDict map'
value = string `choice` number `choice` list `choice` dict
The above is a mix of code read/understood from the source code of the three sources the blog, the library, and the book. the download function just prints the "parse tree", obtained from the parser, Once I get the parser working will fill in the download function and test it out.
The parser is NOT working on few of the torrent files. :( There is definitely chance that I might have used code from the references incorrectly. And would like to know if there is anything obvious.
It works on "toy" examples and also on the test file picked from combinatorrent
When I pick a real world torrent like the Debian/Ubuntu etc, this fails.
I would like to debug and see what is happening, debugging with GHCI does not seem straight forward, I've tried :trace / :history style debugging mentioned in this document, but looks very primitive :-) .
My question to the experts out there is: "how to debug!!" :-)
Would really appreciate any hints on approaching how to debug this.
Thanks.
Because Haskell code is pure, "stepping" through it is less essential than in other languages. When I step through some Java code, I am often trying to see where a certain variable gets changed. That is obviously a non-issue in Haskell given that things are immutable.
That means we can also run snippets of code in GHCi to debug what is happening without worrying that what we run is going to change some global state, or what we run will work any differently than how it would if called deep inside our program. This mode of work benefits from iterating your design slowly building it to work on the full range of expected inputs.
Parsing is always a bit unpleasant - even in imperative languages. Nobody wants to run a parser just to get back Nothing - you want to know why you got back nothing. To that effect, most parsers libraries are helpful in giving you some information about what went wrong. That is a point for using a parser like attoparsec. Also, attoparsec works with ByteString by default - perfect for binary data. If you want to roll your own parser implementation, you'll have to debug it too.
Finally, based on your comments, it looks like you are having issues with character encodings. This is exactly the reason why we have ByteString - it represents a packed sequence of bytes - no encodings. The extension OverloadedStrings even makes it pretty easy to make ByteString literals that look just like regular strings.
I have that Haskell function, that's causing more than 50% of all the allocations of my program, causing 60% of my run time to be taken by the GC. I run with a small stack (-K10K) so there is no stack overflow, but can I make this function faster, with less allocation?
The goal here is to calculate the product of a matrix by a vector. I cannot use hmatrix for example because this is part of a bigger function using the ad Automatic Differentiation package, so I need to use lists of Num. At runtime I suppose the use of the Numeric.AD module means my types must be Scalar Double.
listMProd :: (Num a) => [a] -> [a] -> [a]
listMProd mdt vdt = go mdt vdt 0
where
go [] _ s = [s]
go ls [] s = s : go ls vdt 0
go (y:ys) (x:xs) ix = go ys xs (y*x+ix)
Basically we loop through the matrix, multiplying and adding an accumulator until we reach the end of the vector, storing the result, then continuing restarting the vector again. I have a quickcheck test verifying that I get the same result than the matrix/vector product in hmatrix.
I have tried with foldl, foldr, etc. Nothing I've tried makes the function faster (and some things like foldr cause a space leak).
Running with profiling tells me, on top of the fact that this function is where most of the time and allocation is spent, that there are loads of Cells being created, Cells being a data type from the ad package.
A simple test to run:
import Numeric.AD
main = do
let m :: [Double] = replicate 400 0.2
v :: [Double] = replicate 4 0.1
mycost v m = sum $ listMProd m v
mygrads = gradientDescent (mycost (map auto v)) (map auto m)
print $ mygrads !! 1000
This on my machine tells me GC is busy 47% of the time.
Any ideas?
A very simple optimization is to make the go function strict by its accumulator parameter, because it's small, can be unboxed if a is primitive and always needs to be fully evaluated:
{-# LANGUAGE BangPatterns #-}
listMProd :: (Num a) => [a] -> [a] -> [a]
listMProd mdt vdt = go mdt vdt 0
where
go [] _ !s = [s]
go ls [] !s = s : go ls vdt 0
go (y:ys) (x:xs) !ix = go ys xs (y*x+ix)
On my machine, it gives 3-4x speedup (compiled with -O2).
On the other hand, intermediate lists shouldn't be strict so they could be fused.
I am looking for a mutable (balanced) tree/map/hash table in Haskell or a way how to simulate it inside a function. I.e. when I call the same function several times, the structure is preserved. So far I have tried Data.HashTable (which is OK, but somewhat slow) and tried Data.Array.Judy but I was unable to make it work with GHC 6.10.4. Are there any other options?
If you want mutable state, you can have it. Just keep passing the updated map around, or keep it in a state monad (which turns out to be the same thing).
import qualified Data.Map as Map
import Control.Monad.ST
import Data.STRef
memoize :: Ord k => (k -> ST s a) -> ST s (k -> ST s a)
memoize f = do
mc <- newSTRef Map.empty
return $ \k -> do
c <- readSTRef mc
case Map.lookup k c of
Just a -> return a
Nothing -> do a <- f k
writeSTRef mc (Map.insert k a c) >> return a
You can use this like so. (In practice, you might want to add a way to clear items from the cache, too.)
import Control.Monad
main :: IO ()
main = do
fib <- stToIO $ fixST $ \fib -> memoize $ \n ->
if n < 2 then return n else liftM2 (+) (fib (n-1)) (fib (n-2))
mapM_ (print <=< stToIO . fib) [1..10000]
At your own risk, you can unsafely escape from the requirement of threading state through everything that needs it.
import System.IO.Unsafe
unsafeMemoize :: Ord k => (k -> a) -> k -> a
unsafeMemoize f = unsafePerformIO $ do
f' <- stToIO $ memoize $ return . f
return $ unsafePerformIO . stToIO . f'
fib :: Integer -> Integer
fib = unsafeMemoize $ \n -> if n < 2 then n else fib (n-1) + fib (n-2)
main :: IO ()
main = mapM_ (print . fib) [1..1000]
Building on #Ramsey's answer, I also suggest you reconceive your function to take a map and return a modified one. Then code using good ol' Data.Map, which is pretty efficient at modifications. Here is a pattern:
import qualified Data.Map as Map
-- | takes input and a map, and returns a result and a modified map
myFunc :: a -> Map.Map k v -> (r, Map.Map k v)
myFunc a m = … -- put your function here
-- | run myFunc over a list of inputs, gathering the outputs
mapFuncWithMap :: [a] -> Map.Map k v -> ([r], Map.Map k v)
mapFuncWithMap as m0 = foldr step ([], m0) as
where step a (rs, m) = let (r, m') = myFunc a m in (r:rs, m')
-- this starts with an initial map, uses successive versions of the map
-- on each iteration, and returns a tuple of the results, and the final map
-- | run myFunc over a list of inputs, gathering the outputs
mapFunc :: [a] -> [r]
mapFunc as = fst $ mapFuncWithMap as Map.empty
-- same as above, but starts with an empty map, and ignores the final map
It is easy to abstract this pattern and make mapFuncWithMap generic over functions that use maps in this way.
Although you ask for a mutable type, let me suggest that you use an immutable data structure and that you pass successive versions to your functions as an argument.
Regarding which data structure to use,
There is an implementation of red-black trees at Kent
If you have integer keys, Data.IntMap is extremely efficient.
If you have string keys, the bytestring-trie package from Hackage looks very good.
The problem is that I cannot use (or I don't know how to) use a non-mutable type.
If you're lucky, you can pass your table data structure as an extra parameter to every function that needs it. If, however, your table needs to be widely distributed, you may wish to use a state monad where the state is the contents of your table.
If you are trying to memoize, you can try some of the lazy memoization tricks from Conal Elliott's blog, but as soon as you go beyond integer arguments, lazy memoization becomes very murky—not something I would recommend you try as a beginner. Maybe you can post a question about the broader problem you are trying to solve? Often with Haskell and mutability the issue is how to contain the mutation or updates within some kind of scope.
It's not so easy learning to program without any global mutable variables.
If I read your comments right, then you have a structure with possibly ~500k total values to compute. The computations are expensive, so you want them done only once, and on subsequent accesses, you just want the value without recomputation.
In this case, use Haskell's laziness to your advantage! ~500k is not so big: Just build a map of all the answers, and then fetch as needed. The first fetch will force computation, subsequent fetches of the same answer will reuse the same result, and if you never fetch a particular computation - it never happens!
You can find a small implementation of this idea using 3D point distances as the computation in the file PointCloud.hs. That file uses Debug.Trace to log when the computation actually gets done:
> ghc --make PointCloud.hs
[1 of 1] Compiling Main ( PointCloud.hs, PointCloud.o )
Linking PointCloud ...
> ./PointCloud
(1,2)
(<calc (1,2)>)
Just 1.0
(1,2)
Just 1.0
(1,5)
(<calc (1,5)>)
Just 1.0
(1,2)
Just 1.0
Are there any other options?
A mutable reference to a purely functional dictionary like Data.Map.