Find the origin of "Ratio has zero denominator" Exception - debugging

As a personal excercize in the process of learning Haskell, I'm trying to port this F# snippet for Random Art.
I've not embedded full source code for not bloating the question, but is available as gist.
An important part of the program is this Expr type:
data Expr =
VariableX
| VariableY
| Constant
| Sum Expr Expr
| Product Expr Expr
| Mod Expr Expr
| Well Expr
| Tent Expr
| Sin Expr
| Level Expr Expr Expr
| Mix Expr Expr Expr
deriving Show
and two functions:
gen :: Int -> IO Expr random generates a tree-like structure given a number of iterations
eval :: Expr -> IO (Point -> Rgb Double) walks the tree and terminates producing a drawing function.
More high is the number passed to gen than higher are the probability that the following exception is generated: Ratio has zero denominator.
I'm new to Haskell so to solve the problem I've tried to compile it as above:
ghc RandomArt.hs -prof -auto-all -caf-all
Obtaining only this more (to me quite useless) info:
$ ./RandomArt +RTS -xc
*** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace:
GHC.Real.CAF
--> evaluated by: Main.eval.\,
called from Main.eval,
called from Main.tga.pxs',
called from Main.tga,
called from Main.save,
called from Main.main,
called from :Main.CAF:main
--> evaluated by: Main.eval.\.r,
called from Main.eval.\,
called from Main.eval,
called from Main.tga.pxs',
called from Main.tga,
called from Main.save,
called from Main.main,
called from :Main.CAF:main
*** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace:
Main.tga,
called from Main.save,
called from Main.main,
called from GHC.Real.CAF
RandomArt: Ratio has zero denominator
The code that persist the generated function to a TGA file works because it was my previous excercize (a port from OCaml).
I've tried executing various Expr tree from GHCi, assembling data by hand or applying functions as in the program but I wasn't able to identify the bug.
Haskell docs talks about a package named loch that should able to compile preserving source code line numbers, but I was not able to install it (while I normally install with cabal install every package I need).
The question, to be honest are two:
where's is the bug (in this specific case)?
which tool do I need to master to find bugs like this (or bugs in general)?
Thanks in advance.

The exception
Let's focus on the exception first.
Finding the bug
where's is the bug (in this specific case)?
In mod'. We can check this easily if we provide an alternative version instead of the one by Data.Fixed:
mod' :: RealFrac a => a -> a -> a
mod' _ 0 = error "Used mod' on 0"
mod' a b =
let k = floor $ a / b
in a - (fromInteger k) * b
We now get Used mod' on 0.
Rationale
which tool do I need to master to find bugs like this (or bugs in general)?
In this case, the necessary hint was already in the exception's message:
Ratio has zero denominator
This means that there's a place where you divide by zero in the context of a Ratio. So you need to look after all places where you divide something. Since you use only (/) and mod', it boils down to whether one of them actually can throw this exception:
(/) usually returns ±Infinity on division by zero if used on Double,
mod' uses toRational internally, which is a Ratio Integer.
So there's only one culprit left. Note that the other implementation yields the same results if b isn't zero.
The actual problem
Using mod or mod' with b == 0 isn't well-defined. After all, a modulo operation should hold the following property:
prop_mod :: Integral n => n -> n -> Bool
prop_mod a b =
let m = a `mod` b
d = a `div` b
in a == b * d + m -- (1)
&& abs m < abs b -- (2)
If b == 0, there doesn't exist any pair (d, m) such that (1) and (2) hold. If we relax this law and throw (2) away, the result of mod isn't necessarily unique anymore. This leads to the following definition:
mod' :: RealFrac a => a -> a -> a
mod' a 0 = a -- this is arbitrary
mod' a b =
let k = floor $ a / b
in a - (fromInteger k) * b
However, this is an arbitrary definition. You have to ask yourself, "What do I actually want to do if I cannot use mod in a sane way". Since F# apparently didn't complain about a % 0, have a look at their documentation.
Either way, you cannot use a library mod function, since they aren't defined for a zero denominator.

Related

How do I check where my code gets stuck in Erlang?

I'm trying to write a function that receives a list, finds the highest value integer in the list, and then divides all the other integers in the list by that value.
Unfortunately, my code gets stuck somewhere. If this were python for example I could easily write a couple different "print"s and see where it gets stuck. But how do you do that in Erlang?
Here is the code.
highest_value([], N) ->
if
N =:= 0 ->
'Error! No positive values.'
end,
N;
highest_value([H|T], N) when H > N, H > 0 ->
highest_value([T], H);
highest_value([_|T], N) ->
highest_value([T], N).
divide(_L) -> [X / highest_value(_L, 0) || X <- _L].
For prints you could just use io:format/2. Same thing.
highest_value([H|T], N) when H > N, H > 0 ->
io:format(">>> when H bigger than N~n"),
io:format(">>> H: ~p, T: ~p, N: ~p ~n", [H, T, N]),
highest_value([T], H);
highest_value(List) ->
highest_value(List, 0).
EDIT
One thing you are getting wrong is [H | T] syntax. H, or head, is the first element in the list. T stands for tail, or "rest of the list". And like the name suggests, tail is a list (could be an empty list, but a list nonetheless). So when you are doing you recursion you don't need to put T inside a new list.
highest_value([H|T], N) when H > N ->
highest_value(T, H);
highest_value([_|T], N) ->
highest_value(T, N).
In your old code you called:
highest_value([T], N).
which created a new list with one element, like [[2,3,4,5]]. If you head-tail this, you get this only-element-list as the head, and an empty list as the tail.
Also, in your first function clause you have an atom 'Error! No positive values.' (singe quotes means this is just a long atom, and not a string) which is never returned (you will always return N). If you would like to return either some atom, or N, depending on value of N you could just extend your use of function clauses
highest_value([], 0) ->
'Error! No positive values.'
highest_value([], N) ->
N;
[...]
And you have to initialize your function with 0, which could be considered a bad pattern. You could write and use highest_value/1 which does that for you
highest_value(List) ->
highest_value(List, 0).
Or even use a modification of this algorithm: since the largest number will be one of the numbers in the list, you could use the first element as the function initialization.
highest_value(_List = [First|T]) when First > 0 ->
highest_value(T, First).
This assumes that handling negative numbers is something you don't care about right now.
While debugging via print statements is common and even sometimes useful, and io:format can be used for this purpose in Erlang as already noted, Erlang provides powerful built-in tracing capabilities you should use instead.
Let's say your highest_value/2 and divide/1 functions reside in a module named hv. First, we compile hv in the Erlang shell:
1> c(hv).
{ok,hv}
Next, we use Erlang's dbg module to enable tracing on the hv functions:
2> dbg:tracer().
{ok,<0.41.0>}
3> dbg:p(self(),call).
{ok,[{matched,nonode#nohost,26}]}
4> dbg:tpl(hv,c).
{ok,[{matched,nonode#nohost,5},{saved,c}]}
In command 2 we enable debug tracing and in command 3 we indicate that we want to trace function calls in our current process (returned by self()). In command 4 we create a call trace, using the built-in c trace specification, on all functions in the hv module.
Once debug tracing is enabled, we call hv:divide/1 and the trace output begins:
5> hv:divide([4,8,12,16]).
(<0.34.0>) call hv:divide([4,8,12,16]) ({erl_eval,do_apply,6})
(<0.34.0>) call hv:'-divide/1-lc$^0/1-0-'([4,8,12,16],[4,8,12,16]) ({erl_eval,
do_apply,
6})
(<0.34.0>) call hv:highest_value([4,8,12,16],0) ({hv,'-divide/1-lc$^0/1-0-',2})
(<0.34.0>) call hv:highest_value([[8,12,16]],4) ({hv,'-divide/1-lc$^0/1-0-',2})
(<0.34.0>) call hv:highest_value([[]],[8,12,16]) ({hv,'-divide/1-lc$^0/1-0-',2})
(<0.34.0>) call hv:highest_value([[]],[8,12,16]) ({hv,'-divide/1-lc$^0/1-0-',2})
...
First, note that I abbreviated the trace output because at the ... point it's already in an infinite loop, and the remainder of the trace is identical to the two statements prior to the ....
What does the trace output tell us? The first line shows the invocation of the divide/1 function, and the second line shows the call to the list comprehension inside divide/1. We then see calls to highest_value/2, first with the full list and N set to 0. The next call is where it gets interesting: because your code passes [T] rather than T as the first argument in the recursive invocation of highest_value/2, H has the value [8,12,16], which Erlang treats as being greater than the current N value of 4, so the next recursive call is:
highest_value([T], [8,12,16]).
and because T is [], this turns into:
highest_value([[]], [8,12,16]).
Here, H is [], and T is also []. H is not greater than [8,12,16], so all remaining recursive invocations after this point are identical to this one, and the recursion is infinite.
To fix this, you need to pass T correctly as already noted:
highest_value([H|T], N) when H > N, H > 0 ->
highest_value(T, H);
highest_value([_|T], N) ->
highest_value(T, N).
Then recompile, which also reloads your module, and because of that you'll also need to set up your debug tracing again:
5> c(hv).
{ok,hv}
6> dbg:tpl(hv,c).
{ok,[{matched,nonode#nohost,5},{saved,c}]}
7> hv:divide([4,8,12,16]).
(<0.34.0>) call hv:divide([4,8,12,16]) ({erl_eval,do_apply,6})
(<0.34.0>) call hv:'-divide/1-lc$^0/1-0-'([4,8,12,16],[4,8,12,16]) ({erl_eval,
do_apply,
6})
(<0.34.0>) call hv:highest_value([4,8,12,16],0) ({hv,'-divide/1-lc$^0/1-0-',2})
(<0.34.0>) call hv:highest_value([8,12,16],4) ({hv,'-divide/1-lc$^0/1-0-',2})
(<0.34.0>) call hv:highest_value([12,16],8) ({hv,'-divide/1-lc$^0/1-0-',2})
(<0.34.0>) call hv:highest_value([16],12) ({hv,'-divide/1-lc$^0/1-0-',2})
(<0.34.0>) call hv:highest_value([],16) ({hv,'-divide/1-lc$^0/1-0-',2})
** exception error: no true branch found when evaluating an if expression
in function hv:highest_value/2 (/tmp/hv.erl, line 5)
in call from hv:'-divide/1-lc$^0/1-0-'/2 (/tmp/hv.erl, line 15)
Tracing now shows that highest_value/2 is working as expected, but we now hit a new problem with the if statement, and the fix for this is already explained in another answer so I won't repeat it here.
As you can see, Erlang's tracing is far more powerful than using "print debugging".
It can be turned on and off interactively in the Erlang shell as needed.
Unlike debugging in other languages, debug tracing requires no special compilation flags for your modules.
Unlike with debug print statements, you need not change your code and recompile repeatedly.
What I've shown here barely scratches the surface as far as Erlang's tracing capabilities go, but it was more than enough to find and fix the problems.
And finally, note that using the lists:max/1 standard library call you can more easily achieve what your module is trying to do:
divide(L) ->
case lists:max(L) of
N when N > 0 ->
[V/N || V <- L];
_ ->
error(badarg, [L])
end.

Improving performance on chunked lists

I have a simple problem: Given a list of integers, read the first line as N. Then, read the next N lines and return the sum of them. Repeat until N = 0.
My first approach was using this:
main = interact $ unlines . f . (map read) . lines
f::[Int] -> [String]
f (n:ls)
| n == 0 = []
| otherwise = [show rr] ++ (f rest)
where (xs, rest) = splitAt n ls
rr = sum xs
f _ = []
But it's relatively slow. I've profiled it using
ghc -O2 --make test.hs -prof -auto-all -caf-all -fforce-recomp -rtsopts
time ./test +RTS -hc -p -i0.001 < input.in
Where input.in is a test input where the first line is 100k, followed by 100k random numbers, followed by 0. We can see in the Figure below that it's using O(N) memory:
EDITED: My original question was comparing 2 similarly slow approaches. I've updated it to compare with an optimized approach below
Now, if I do the sum iteratively, instead of calling sum, I get constant amount of memory
{-# LANGUAGE BangPatterns #-}
main = interact $ unlines . g . (map read) . lines
g::[Int] -> [String]
g (n:ls)
| n == 0 = []
| otherwise = g' n ls 0
g _ = []
g' n (l:ls) !cnt
| n == 0 = [show cnt] ++ (g (l:ls))
| otherwise = g' (n-1) ls (cnt + l)
I'm trying to understand what is causing the performance loss in the first example. I would guess everything there could be lazily evaluated?
I don't know precisely what is causing the difference. But I can show you this:
Data.Map> sum [1 .. 1e8]
Out of memory.
Data.Map> foldl' (+) 0 [1 .. 1e8]
5.00000005e15
For some reason, sum = foldl (+) 0, rather than foldl' (with the apostrophe). The difference is that the latter function is more strict, so it uses virtually no memory. The lazy version, by contrast, does this:
sum [1..100]
1 + sum [2..100]
1 + 2 + sum [3..100]
1 + 2 + 3 + sum [4.100]
...
In other words, it creates a giant expression that says 1 + 2 + 3 + ... And then, right at the end, it tries to evaluate it all. Well, obviously, that's going to eat a lot of RAM. By using foldl' instead of foldl, you make it do the additions immediately, rather than pointlessly storing them in RAM.
You probably also want to do I/O using ByteString rather than String; but the laziness difference will probably give you a big speed boost on its own.
I think that laziness is what prevents your first and second version from being equivalent.
Consider the result created from the input "numbers"
1
garbage_here
2
3
5
0
The first version would give a result list [error "...some parse error", 8], which you can safely look at the second element of, while the second version errors near immediately. It seems hard to achieve the first in a streaming way.
Even without laziness, though, getting from the first to the second version may be more than GHC can handle - it would need to have fusion rewriting rules combining foldl/foldl' on the first element of a tuple with splitAt. And GHC has only recently got to the point where it can fuse foldl/foldl' at all.

Accessing first element of a matrix in Isabelle

Accessing the “first” element of a matrix
I want to write a proof about a trivial case of the determinant of a matrix, where the matrix consists of just a single element (i.e., the cardinality of 'n is one).
Thus the determinant (or det A) is the single element in the matrix.
However, it is not clear to me how to reference the single element of the matrix. I tried A $ zero $ zero, which did not work.
My current way to demonstrate the problem is to write ∀a∈(UNIV :: 'n set). det A = A $ a $ a. It assumes that the cardinality of the numeral type is one.
What is the correct way to write this trivial proof about determinants?
Here is my current code:
theory Notepad
imports
Main
"~~/src/HOL/Library/Polynomial"
"~~/src/HOL/Algebra/Ring"
"~~/src/HOL/Library/Numeral_Type"
"~~/src/HOL/Library/Permutations"
"~~/src/HOL/Multivariate_Analysis/Determinants"
"~~/src/HOL/Multivariate_Analysis/L2_Norm"
"~~/src/HOL/Library/Numeral_Type"
begin
lemma det_one_element_matrix:
fixes A :: "('a::comm_ring_1)^'n∷finite^'n∷finite"
assumes "card(UNIV :: 'n set) = 1"
shows "∀a∈(UNIV :: 'n set). det A = A $ a $ a"
proof-
(*sledgehammer proof of 1, 2 and ?thesis *)
have 1: "∀a∈(UNIV :: 'n set). UNIV = {a}"
by (metis (full_types) Set.set_insert UNIV_I assms card_1_exists ex_in_conv)
have 2:
"det A = (∏i∈UNIV. A $ i $ i)"
by (metis (mono_tags, lifting) "1" UNIV_I det_diagonal singletonD)
from 1 2 show ?thesis by (metis setprod_singleton)
qed
UPDATE:
Unfortunately, this is part of a larger theorem which is already proven for the cardinality of 'n∷finite > 1. In this theorem the type of matrix A is
fixed as A :: "('a::comm_ring_1)^'n∷finite^'n∷finite and the definition of the determinant is used in this larger theorem.
Therefore, I don't think I can change the type of my matrix A to ('a::comm_ring_1)^1^1) in oder to solve my larger theorem.
I feel that my previous answer is the better solution in general if it is possible to use, so I will leave it as-is. In your case where you are not able to use this approach, things get a little harder, unfortunately.
What you need to show is that:
There can only be a single element in your type 'n, and thus every element is equal;
Additionally, the definition of det also references permutations, so we need to show that there only exists a single function of type 'n ⇒ 'n, which happens to be equal to the function id.
With these in place, we can carry out the proof as follows:
lemma det_one_element_matrix:
fixes A :: "('a::comm_ring_1)^'n∷finite^'n∷finite"
assumes "card(UNIV :: 'n set) = 1"
shows "det A = A $ x $ x"
proof-
have 0: "⋀x y. (x :: 'n) = y"
by (metis (full_types) UNIV_I assms card_1_exists)
hence 1: "(UNIV :: 'n set) = {x}"
by auto
have 2: "(UNIV :: ('n ⇒ 'n) set) = {id}"
by (auto intro!: ext simp: 0)
thus ?thesis
by (auto simp: det_def permutes_def 0 1 2 sign_id)
qed
Using A $ zero $ zero (or A $ 0 $ 0) wouldn't have worked, because the vectors are indexed from 1: A $ 0 $ 0 is undefined, which makes it hard to prove anything about.
Playing a little myself, I came up with the following lemma:
lemma det_one_element_matrix:
"det (A :: ('a::comm_ring_1)^1^1) = A $ 1 $ 1"
by (clarsimp simp: det_def sign_def)
Instead of using a type 'a :: finite and assuming it has cardinality 1, I used the standard Isabelle 1 type which encodes both these facts into the type itself. (Similar types exist for all numerals, so you can write things like 'a ^ 23 ^ 72)
Incidentally, after typing in the lemma above, auto solve_direct quickly informed me that something already exists in the library stating the same result, a lemma named det_1.

Haskell Fibonacci sequence performance depending on methodology

I was trying out different approaches to getting a number at a given index of the Fibonacci sequence and they could basically be divided into two categories:
building a list and querying an index
using variables (might be separate or tupled, without a list)
I picked an example of both:
fibs1 :: Int -> Integer
fibs1 n = fibs1' !! n
where fibs1' = 0 : scanl (+) 1 fibs1'
fib2 :: Int -> Integer
fib2 n = fib2' 1 1 n where
fib2' _ b 2 = b
fib2' a b n = fib2' b (a + b) (n - 1)
fibs1:
real 0m2.356s
user 0m2.310s
sys 0m0.030s
fibs2:
real 0m0.671s
user 0m0.667s
sys 0m0.000s
Both were compiled with 64bit GHC 7.6.1 and -O2 -fllvm. Their core dumps are very similar in length, but they differ in the parts that I'm not very proficient at interpreting.
I was not surprised that fibs1 failed for n = 350000 (Stack space overflow). However, I am not comfortable with the fact that it used that much memory.
I would like to clear some things up:
Why does the GC not take care of the beginning of the list throughout computation even though most of it quickly becomes useless?
Why does GHC not optimize the list version to a variable version since only two of its elements are required at once?
EDIT: Sorry, I mixed the speed results, fixed. Two of three of my doubts are still valid, though ;).
Why does the GC not take care of the beginning of the list throughout computation even though most of it quickly becomes useless?
fibs1 uses a lot of memory and is slow because scanl is lazy, it doesn't evaluate the list elements, so
fibs1' = 0 : scanl (+) 1 fibs1'
produces
0 : scanl (+) 1 (0 : more)
0 : 1 : let f2 = 1+0 in scanl (+) f2 (1 : more')
0 : 1 : let f2 = 1+0 in f2 : let f3 = f2+1 in scanl (+) f3 (f2 : more'')
0 : 1 : let f2 = 1+0 in f2 : let f3 = f2+1 in f3 : let f4 = f3+f2 in scanl (+) f4 (f3 : more''')
etc. So you rather quickly get a huge nested thunk. When that thunk is evaluated, it is pushed on the stack, and at some point between 250000 and 350000, it becomes too big for the default stack.
And since each list element holds a reference to the previous while it is not evaluated, the beginning of the list cannot be garbage-collected.
If you use a strict scan,
fibs1 :: Int -> Integer
fibs1 n = fibs1' !! n
where
fibs1' = 0 : scanl' (+) 1 fibs1'
scanl' f a (x:xs) = let x' = f a x in x' `seq` (a : scanl' f x' xs)
scanl' _ a [] = [a]
when the k-th list cell is produced, its value is already evaluated, so doesn't refer to a previous, hence the list can be garbage collected (assuming nothing else holds a reference to it) as it is traversed.
With that implementation, the list version is about as fast and lean as fib2 (it needs to allocate list cells nevertheless, so it allocates a small bit more, and is possibly a tiny bit slower therefore, but the difference is minute, since the Fibonacci numbers become so large that the list construction overhead becomes negligible).
The idea of scanl is that its result is incrementally consumed, so that the consumption forces the elements and prevents the build-up of large thunks.
Why does GHC not optimize the list version to a variable version since only two of its elements are required at once?
Its optimiser can't see through the algorithm to determine that. scanl is opaque to the compiler, it doesn't know what scanl does.
If we take the exact source code for scanl (renaming it or hiding scanl from the Prelude, I opted for renaming),
scans :: (b -> a -> b) -> b -> [a] -> [b]
scans f q ls = q : (case ls of
[] -> []
x:xs -> scans f (f q x) xs)
and compile the module exporting it (with -O2), and then look at the generated interface file with
ghc --show-iface Scan.hi
we get (for example, minor differences between compiler versions)
Magic: Wanted 33214052,
got 33214052
Version: Wanted [7, 0, 6, 1],
got [7, 0, 6, 1]
Way: Wanted [],
got []
interface main:Scan 7061
interface hash: ef57dac14815e2f1f897b42a007c0c81
ABI hash: 8cfc8dab79de6a51fcad666f1869574f
export-list hash: 57d6805e5f0b5f76f0dd8dfb228df988
orphan hash: 693e9af84d3dfcc71e640e005bdc5e2e
flag hash: 1e8135cb44ef6dd330f1f56943d1f463
used TH splices: False
where
exports:
Scan.scans
module dependencies:
package dependencies: base* ghc-prim integer-gmp
orphans: base:GHC.Base base:GHC.Float base:GHC.Real
family instance modules:
import -/ base:Prelude 1cb4b618cf45281dc97748b1831bf0cd
d79ca4e223c0de0a770a3b88a5e67687
scans :: forall b a. (b -> a -> b) -> b -> [a] -> [b]
{- Arity: 3, HasNoCafRefs, Strictness: LLL -}
vectorised variables:
vectorised tycons:
vectorised reused tycons:
scalar variables:
scalar tycons:
trusted: safe-inferred
require own pkg trusted: False
and see that the interface file doesn't expose the unfolding of the function, only its type, arity, strictness and that it doesn't refer to CAFs.
When a module importing that is compiled, all that the compiler has to go by is the information exposed by the interface file.
Here, there is no information exposed that would allow the compiler to do anything else but emit a call to the function.
If the unfolding were exposed, the compiler had a chance to inline the unfolding and analyse the code knowing the types and combination function to produce more eager code that doesn't build thunks.
The semantics of scanl, however, are maximally lazy, each element of the output is emitted before the input list is inspected. That has the consequence that GHC can't make the addition strict, since that would change the result if the list contained any undefined values:
scanl (+) 1 [undefined] = 1 : scanl (+) (1 + undefined) [] = 1 : (1 + undefined) : []
while
scanl' (+) 1 [undefined] = let x' = 1 + undefined in x' `seq` 1 : scanl' (+) x' []
= *** Exception: Prelude.undefined
One could make a variant
scanl'' f b (x:xs) = b `seq` b : scanl'' f (f b x) xs
that would produce 1 : *** Exception: Prelude.undefined for the above input, but any strictness would indeed change the result if the list contained undefined values, so even if the compiler knew the unfolding, it couldn't make the evaluation strict - unless it could prove that there are no undefined values in the list, a fact that is obvious to us, but not the compiler [and I don't think it would be easy to teach a compiler recognize that and be able to prove the absence of undefined values].

How does one write efficient Dynamic Programming algorithms in Haskell?

I've been playing around with dynamic programming in Haskell. Practically every tutorial I've seen on the subject gives the same, very elegant algorithm based on memoization and the laziness of the Array type. Inspired by those examples, I wrote the following algorithm as a test:
-- pascal n returns the nth entry on the main diagonal of pascal's triangle
-- (mod a million for efficiency)
pascal :: Int -> Int
pascal n = p ! (n,n) where
p = listArray ((0,0),(n,n)) [f (i,j) | i <- [0 .. n], j <- [0 .. n]]
f :: (Int,Int) -> Int
f (_,0) = 1
f (0,_) = 1
f (i,j) = (p ! (i, j-1) + p ! (i-1, j)) `mod` 1000000
My only problem is efficiency. Even using GHC's -O2, this program takes 1.6 seconds to compute pascal 1000, which is about 160 times slower than an equivalent unoptimized C++ program. And the gap only widens with larger inputs.
It seems like I've tried every possible permutation of the above code, along with suggested alternatives like the data-memocombinators library, and they all had the same or worse performance. The one thing I haven't tried is the ST Monad, which I'm sure could be made to run the program only slighter slower than the C version. But I'd really like to write it in idiomatic Haskell, and I don't understand why the idiomatic version is so inefficient. I have two questions:
Why is the above code so inefficient? It seems like a straightforward iteration through a matrix, with an arithmetic operation at each entry. Clearly Haskell is doing something behind the scenes I don't understand.
Is there a way to make it much more efficient (at most 10-15 times the runtime of a C program) without sacrificing its stateless, recursive formulation (vis-a-vis an implementation using mutable arrays in the ST Monad)?
Thanks a lot.
Edit: The array module used is the standard Data.Array
Well, the algorithm could be designed a little better. Using the vector package and being smart about only keeping one row in memory at a time, we can get something that's idiomatic in a different way:
{-# LANGUAGE BangPatterns #-}
import Data.Vector.Unboxed
import Prelude hiding (replicate, tail, scanl)
pascal :: Int -> Int
pascal !n = go 1 ((replicate (n+1) 1) :: Vector Int) where
go !i !prevRow
| i <= n = go (i+1) (scanl f 1 (tail prevRow))
| otherwise = prevRow ! n
f x y = (x + y) `rem` 1000000
This optimizes down very tightly, especially because the vector package includes some rather ingenious tricks to transparently optimize array operations written in an idiomatic style.
1 Why is the above code so inefficient? It seems like a straightforward iteration through a matrix, with an arithmetic operation at each entry. Clearly Haskell is doing something behind the scenes I don't understand.
The problem is that the code writes thunks to the array. Then when entry (n,n) is read, the evaluation of the thunks jumps all over the array again, recurring until finally a value not needing further recursion is found. That causes a lot of unnecessary allocation and inefficiency.
The C++ code doesn't have that problem, the values are written, and read directly without requiring further evaluation. As it would happen with an STUArray. Does
p = runSTUArray $ do
arr <- newArray ((0,0),(n,n)) 1
forM_ [1 .. n] $ \i ->
forM_ [1 .. n] $ \j -> do
a <- readArray arr (i,j-1)
b <- readArray arr (i-1,j)
writeArray arr (i,j) $! (a+b) `rem` 1000000
return arr
really look so bad?
2 Is there a way to make it much more efficient (at most 10-15 times the runtime of a C program) without sacrificing its stateless, recursive formulation (vis-a-vis an implementation using mutable arrays in the ST Monad)?
I don't know of one. But there might be.
Addendum:
Once one uses STUArrays or unboxed Vectors, there's still a significant difference to the equivalent C implementation. The reason is that gcc replaces the % by a combination of multiplications, shifts and subtractions (even without optimisations), since the modulus is known. Doing the same by hand in Haskell (since GHC doesn't [yet] do that),
-- fast modulo 1000000
-- for nonnegative Ints < 2^31
-- requires 64-bit Ints
fastMod :: Int -> Int
fastMod n = n - 1000000*((n*1125899907) `shiftR` 50)
gets the Haskell versions on par with C.
The trick is to think about how to write the whole damn algorithm at once, and then use unboxed vectors as your backing data type. For example, the following runs about 20 times faster on my machine than your code:
import qualified Data.Vector.Unboxed as V
combine :: Int -> Int -> Int
combine x y = (x+y) `mod` 1000000
pascal n = V.last $ go n where
go 0 = V.replicate (n+1) 1
go m = V.scanl1 combine (go (m-1))
I then wrote two main functions that called out to yours and mine with an argument of 4000; these ran in 10.42s and 0.54s respectively. Of course, as I'm sure you know, they both get blown out of the water (0.00s) by the version that uses a better algorithm:
pascal' :: Integer -> Integer
pascal :: Int -> Int
pascal' n = product [n+1..n*2] `div` product [2..n]
pascal = fromIntegral . (`mod` 1000000) . pascal' . fromIntegral

Resources