Haskell Fibonacci sequence performance depending on methodology - performance

I was trying out different approaches to getting a number at a given index of the Fibonacci sequence and they could basically be divided into two categories:
building a list and querying an index
using variables (might be separate or tupled, without a list)
I picked an example of both:
fibs1 :: Int -> Integer
fibs1 n = fibs1' !! n
where fibs1' = 0 : scanl (+) 1 fibs1'
fib2 :: Int -> Integer
fib2 n = fib2' 1 1 n where
fib2' _ b 2 = b
fib2' a b n = fib2' b (a + b) (n - 1)
fibs1:
real 0m2.356s
user 0m2.310s
sys 0m0.030s
fibs2:
real 0m0.671s
user 0m0.667s
sys 0m0.000s
Both were compiled with 64bit GHC 7.6.1 and -O2 -fllvm. Their core dumps are very similar in length, but they differ in the parts that I'm not very proficient at interpreting.
I was not surprised that fibs1 failed for n = 350000 (Stack space overflow). However, I am not comfortable with the fact that it used that much memory.
I would like to clear some things up:
Why does the GC not take care of the beginning of the list throughout computation even though most of it quickly becomes useless?
Why does GHC not optimize the list version to a variable version since only two of its elements are required at once?
EDIT: Sorry, I mixed the speed results, fixed. Two of three of my doubts are still valid, though ;).

Why does the GC not take care of the beginning of the list throughout computation even though most of it quickly becomes useless?
fibs1 uses a lot of memory and is slow because scanl is lazy, it doesn't evaluate the list elements, so
fibs1' = 0 : scanl (+) 1 fibs1'
produces
0 : scanl (+) 1 (0 : more)
0 : 1 : let f2 = 1+0 in scanl (+) f2 (1 : more')
0 : 1 : let f2 = 1+0 in f2 : let f3 = f2+1 in scanl (+) f3 (f2 : more'')
0 : 1 : let f2 = 1+0 in f2 : let f3 = f2+1 in f3 : let f4 = f3+f2 in scanl (+) f4 (f3 : more''')
etc. So you rather quickly get a huge nested thunk. When that thunk is evaluated, it is pushed on the stack, and at some point between 250000 and 350000, it becomes too big for the default stack.
And since each list element holds a reference to the previous while it is not evaluated, the beginning of the list cannot be garbage-collected.
If you use a strict scan,
fibs1 :: Int -> Integer
fibs1 n = fibs1' !! n
where
fibs1' = 0 : scanl' (+) 1 fibs1'
scanl' f a (x:xs) = let x' = f a x in x' `seq` (a : scanl' f x' xs)
scanl' _ a [] = [a]
when the k-th list cell is produced, its value is already evaluated, so doesn't refer to a previous, hence the list can be garbage collected (assuming nothing else holds a reference to it) as it is traversed.
With that implementation, the list version is about as fast and lean as fib2 (it needs to allocate list cells nevertheless, so it allocates a small bit more, and is possibly a tiny bit slower therefore, but the difference is minute, since the Fibonacci numbers become so large that the list construction overhead becomes negligible).
The idea of scanl is that its result is incrementally consumed, so that the consumption forces the elements and prevents the build-up of large thunks.
Why does GHC not optimize the list version to a variable version since only two of its elements are required at once?
Its optimiser can't see through the algorithm to determine that. scanl is opaque to the compiler, it doesn't know what scanl does.
If we take the exact source code for scanl (renaming it or hiding scanl from the Prelude, I opted for renaming),
scans :: (b -> a -> b) -> b -> [a] -> [b]
scans f q ls = q : (case ls of
[] -> []
x:xs -> scans f (f q x) xs)
and compile the module exporting it (with -O2), and then look at the generated interface file with
ghc --show-iface Scan.hi
we get (for example, minor differences between compiler versions)
Magic: Wanted 33214052,
got 33214052
Version: Wanted [7, 0, 6, 1],
got [7, 0, 6, 1]
Way: Wanted [],
got []
interface main:Scan 7061
interface hash: ef57dac14815e2f1f897b42a007c0c81
ABI hash: 8cfc8dab79de6a51fcad666f1869574f
export-list hash: 57d6805e5f0b5f76f0dd8dfb228df988
orphan hash: 693e9af84d3dfcc71e640e005bdc5e2e
flag hash: 1e8135cb44ef6dd330f1f56943d1f463
used TH splices: False
where
exports:
Scan.scans
module dependencies:
package dependencies: base* ghc-prim integer-gmp
orphans: base:GHC.Base base:GHC.Float base:GHC.Real
family instance modules:
import -/ base:Prelude 1cb4b618cf45281dc97748b1831bf0cd
d79ca4e223c0de0a770a3b88a5e67687
scans :: forall b a. (b -> a -> b) -> b -> [a] -> [b]
{- Arity: 3, HasNoCafRefs, Strictness: LLL -}
vectorised variables:
vectorised tycons:
vectorised reused tycons:
scalar variables:
scalar tycons:
trusted: safe-inferred
require own pkg trusted: False
and see that the interface file doesn't expose the unfolding of the function, only its type, arity, strictness and that it doesn't refer to CAFs.
When a module importing that is compiled, all that the compiler has to go by is the information exposed by the interface file.
Here, there is no information exposed that would allow the compiler to do anything else but emit a call to the function.
If the unfolding were exposed, the compiler had a chance to inline the unfolding and analyse the code knowing the types and combination function to produce more eager code that doesn't build thunks.
The semantics of scanl, however, are maximally lazy, each element of the output is emitted before the input list is inspected. That has the consequence that GHC can't make the addition strict, since that would change the result if the list contained any undefined values:
scanl (+) 1 [undefined] = 1 : scanl (+) (1 + undefined) [] = 1 : (1 + undefined) : []
while
scanl' (+) 1 [undefined] = let x' = 1 + undefined in x' `seq` 1 : scanl' (+) x' []
= *** Exception: Prelude.undefined
One could make a variant
scanl'' f b (x:xs) = b `seq` b : scanl'' f (f b x) xs
that would produce 1 : *** Exception: Prelude.undefined for the above input, but any strictness would indeed change the result if the list contained undefined values, so even if the compiler knew the unfolding, it couldn't make the evaluation strict - unless it could prove that there are no undefined values in the list, a fact that is obvious to us, but not the compiler [and I don't think it would be easy to teach a compiler recognize that and be able to prove the absence of undefined values].

Related

Difference between two kinds of recursive function

In OCaml, there are two ways I have seen to write a map function for example
let rec map f xs =
match xs with
| [] -> []
| x::rest -> f x :: map f rest
and
let map f xs =
let rec go xs =
match xs with
| [] -> []
| x::rest -> f x :: go rest
in go xs
The second one looks like more optimizing because it is similar to loop invariant elimination but in functional programming it may involve allocating a new closure. Can anyone explain the difference between the two styles of recursive function, in particular in terms of performance? Thanks for your help!
I couldn't find similar questions in SO and I'm expecting there is a term like "recursive invariant elimination" to describe the kind of transformation from the first program to the second one.
I've always wondered the exact same thing: does the compiler optimizes invariant argument in recursive function ?
Since your question motivated me to benchmark it, let me share here my results.
Protocol
I have not tried it with map, since it would require big lists, which would result in a stack_overflow. I could try it with rev_map but i don't see the point of allocating huge lists while it's easier to test an equivalent behavior on integers (plus I'm afraid that allocations. would ultimately trigger a round of GC which would mess with my time measures).
The following code reproduces your use-case with a dummy recursive function with an invariant argument, as in map:
let rec g f x = if x = 0 then 0 else g f (f x)
let g2 f x =
let rec aux x = if x = 0 then 0 else aux (f x) in
aux x
let time title f x =
let t = Sys.time () in
let fx = f x in
Printf.printf "%s: %fs\n%!" title (Sys.time () -. t) ;
fx
let main =
let nb = int_of_string Sys.argv.(1) in
ignore (time "normal" (g pred) nb) ;
ignore (time "invariant elimination" (g2 pred) nb)
You can compile it (ocamlopt a.ml for example) and run it by doing
./a.out 10000000000. You can obviously change the integer parameter to tune the number of recursive calls.
Results
On my computer, for an input number of 10000000000, it outputs:
normal: 11.813643s
invariant elimination: 11.646377s
On bigger values:
20000000000
normal: 23.353022s
invariant elimination: 22.977813s
30000000000
normal: 35.586871s
invariant elimination: 35.421313s
I didn't bother going higher.
This to me seems to indicate that both versions are equivalent, maybe the compiler does optimize invariant argument in recursive function and it's just not measurable, maybe it doesn't.
Bytecode comparison
I have also tried to see if the generated bytecode is the same or not (ocamlc -dinstr a.ml), and it does differ slightly as you can see in the following code snippet
normal
compiling a file with only this in it:
let g f x =
let rec aux f x = if x = 0 then 0 else aux f (f x) in
aux f x
gives
branch L2
restart
L3: grab 1
acc 1
push
const 0
eqint
branchifnot L4
const 0
return 2
L4: acc 1
push
acc 1
apply 1
push
acc 1
push
offsetclosure 0
appterm 2, 4
restart
L1: grab 1
closurerec 3, 0
acc 2
push
acc 2
push
acc 2
appterm 2, 5
L2: closure L1, 0
push
acc 0
makeblock 1, 0
pop 1
setglobal E!
invariant elimination
compiling a file with only this in it:
let g2 f x =
let rec aux x = if x = 0 then 0 else aux (f x) in
aux x
gives:
branch L2
L3: acc 0
push
const 0
eqint
branchifnot L4
const 0
return 1
L4: acc 0
push
envacc 1
apply 1
push
offsetclosure 0
appterm 1, 2
restart
L1: grab 1
acc 0
closurerec 3, 1
acc 2
push
acc 1
appterm 1, 4
L2: closure L1, 0
push
acc 0
makeblock 1, 0
pop 1
setglobal E2!
But i'm not expert enough to draw any conclusion as i don't speak bytecode.
That's also around here that i decided that the answer is not that important for now and it's easier anyway to ask #gasche next time i see him.
The use of go suggests a Haskell background. Both OCaml and Haskell are functional programming languages, but there are substantial differences and what one knows about Haskell should not be used to make assumptions about OCaml.
I see no particular reason to write map the second way. If you're using OCaml 4.14.0 or later, you might want to use tail_mod_cons to make map tail-recursive without an explicit accumulator as in Stef's comment.
let[#tail_mod_cons] rec map f =
function
| [] -> []
| x::xs -> f x :: map f xs
And of course, the real solution is:
let map = List.map
As others, I never seen the second form. And it's hard for me to imagine what kind of optimization it can provide. What I know however is that (as #Stef and #Chris pointed out) this function can be written in a tail-recursive way. So just for the sake of completeness:
let map f xs =
let rec go xs ys =
match xs with
| [] -> ys
| x::rest -> go rest ((f x)::ys)
in List.rev (go xs [])
This version is more optimized than the two forms from your post, as each next recursive call can reuse the same stack frame eliminating unnecessary allocations, saving space and the execution time.

Efficient summation in OCaml

Please note I am almost a complete newbie in OCaml. In order to learn a bit, and test its performance, I tried to implement a module that approximates Pi using the Leibniz series.
My first attempt led to a stack overflow (the actual error, not this site). Knowing from Haskell that this may come from too many "thunks", or promises to compute something, while recursing over the addends, I looked for some way of keeping just the last result while summing with the next. I found the following tail-recursive implementations of sum and map in the notes of an OCaml course, here and here, and expected the compiler to produce an efficient result.
However, the resulting executable, compiled with ocamlopt, is much slower than a C++ version compiled with clang++. Is this code as efficient as possible? Is there some optimization flag I am missing?
My complete code is:
let (--) i j =
let rec aux n acc =
if n < i then acc else aux (n-1) (n :: acc)
in aux j [];;
let sum_list_tr l =
let rec helper a l = match l with
| [] -> a
| h :: t -> helper (a +. h) t
in helper 0. l
let rec tailmap f l a = match l with
| [] -> a
| h :: t -> tailmap f t (f h :: a);;
let rev l =
let rec helper l a = match l with
| [] -> a
| h :: t -> helper t (h :: a)
in helper l [];;
let efficient_map f l = rev (tailmap f l []);;
let summand n =
let m = float_of_int n
in (-1.) ** m /. (2. *. m +. 1.);;
let pi_approx n =
4. *. sum_list_tr (efficient_map summand (0 -- n));;
let n = int_of_string Sys.argv.(1);;
Printf.printf "%F\n" (pi_approx n);;
Just for reference, here are the measured times on my machine:
❯❯❯ time ocaml/main 10000000
3.14159275359
ocaml/main 10000000 3,33s user 0,30s system 99% cpu 3,625 total
❯❯❯ time cpp/main 10000000
3.14159
cpp/main 10000000 0,17s user 0,00s system 99% cpu 0,174 total
For completeness, let me state that the first helper function, an equivalent to Python's range, comes from this SO thread, and that this is run using OCaml version 4.01.0, installed via MacPorts on a Darwin 13.1.0.
As I noted in a comment, OCaml's float are boxed, which puts OCaml to a disadvantage compared to Clang.
However, I may be noticing another typical rough edge trying OCaml after Haskell:
if I see what your program is doing, you are creating a list of stuff, to then map a function on that list and finally fold it into a result.
In Haskell, you could more or less expect such a program to be automatically “deforested” at compile-time, so that the resulting generated code was an efficient implementation of the task at hand.
In OCaml, the fact that functions can have side-effects, and in particular functions passed to high-order functions such as map and fold, means that it would be much harder for the compiler to deforest automatically. The programmer has to do it by hand.
In other words: stop building huge short-lived data structures such as 0 -- n and (efficient_map summand (0 -- n)). When your program decides to tackle a new summand, make it do all it wants to do with that summand in a single pass. You can see this as an exercise in applying the principles in Wadler's article (again, by hand, because for various reasons the compiler will not do it for you despite your program being pure).
Here are some results:
$ ocamlopt v2.ml
$ time ./a.out 1000000
3.14159165359
real 0m0.020s
user 0m0.013s
sys 0m0.003s
$ ocamlopt v1.ml
$ time ./a.out 1000000
3.14159365359
real 0m0.238s
user 0m0.204s
sys 0m0.029s
v1.ml is your version. v2.ml is what you might consider an idiomatic OCaml version:
let rec q_pi_approx p n acc =
if n = p
then acc
else q_pi_approx (succ p) n (acc +. (summand p))
let n = int_of_string Sys.argv.(1);;
Printf.printf "%F\n" (4. *. (q_pi_approx 0 n 0.));;
(reusing summand from your code)
It might be more accurate to sum from the last terms to the first, instead of from the first to the last. This is orthogonal to your question, but you may consider it as an exercise in modifying a function that has been forcefully made tail-recursive. Besides, the (-1.) ** m expression in summand is mapped by the compiler to a call to the pow() function on the host, and that's a bag of hurt you may want to avoid.
I've also tried several variants, here are my conclusions:
Using arrays
Using recursion
Using imperative loop
Recursive function is about 30% more effective than array implementation. Imperative loop is approximately as much effective as a recursion (maybe even little slower).
Here're my implementations:
Array:
open Core.Std
let pi_approx n =
let f m = (-1.) ** m /. (2. *. m +. 1.) in
let qpi = Array.init n ~f:Float.of_int |>
Array.map ~f |>
Array.reduce_exn ~f:(+.) in
qpi *. 4.0
Recursion:
let pi_approx n =
let rec loop n acc m =
if m = n
then acc *. 4.0
else
let acc = acc +. (-1.) ** m /. (2. *. m +. 1.) in
loop n acc (m +. 1.0) in
let n = float_of_int n in
loop n 0.0 0.0
This can be further optimized, by moving local function loop outside, so that compiler can inline it.
Imperative loop:
let pi_approx n =
let sum = ref 0. in
for m = 0 to n -1 do
let m = float_of_int m in
sum := !sum +. (-1.) ** m /. (2. *. m +. 1.)
done;
4.0 *. !sum
But, in the code above creating a ref to the sum will incur boxing/unboxing on each step, that we can further optimize this code by using float_ref trick:
type float_ref = { mutable value : float}
let pi_approx n =
let sum = {value = 0.} in
for m = 0 to n - 1 do
let m = float_of_int m in
sum.value <- sum.value +. (-1.) ** m /. (2. *. m +. 1.)
done;
4.0 *. sum.value
Scoreboard
for-loop (with float_ref) : 1.0
non-local recursion : 0.89
local recursion : 0.86
Pascal's version : 0.77
for-loop (with float ref) : 0.62
array : 0.47
original : 0.08
Update
I've updated the answer, as I've found a way to give 40% speedup (or 33% in comparison with #Pascal's answer.
I would like to add that although floats are boxed in OCaml, float arrays are unboxed. Here is a program that builds a float array corresponding to the Leibnitz sequence and uses it to approximate π:
open Array
let q_pi_approx n =
let summand n =
let m = float_of_int n
in (-1.) ** m /. (2. *. m +. 1.) in
let a = Array.init n summand in
Array.fold_left (+.) 0. a
let n = int_of_string Sys.argv.(1);;
Printf.printf "%F\n" (4. *. (q_pi_approx n));;
Obviously, it is still slower than a code that doesn't build any data structure at all. Execution times (the version with array is the last one):
time ./v1 10000000
3.14159275359
real 0m2.479s
user 0m2.380s
sys 0m0.104s
time ./v2 10000000
3.14159255359
real 0m0.402s
user 0m0.400s
sys 0m0.000s
time ./a 10000000
3.14159255359
real 0m0.453s
user 0m0.432s
sys 0m0.020s

Haskell Recursion - finding largest difference between numbers in list

Here's the problem at hand: I need to find the largest difference between adjacent numbers in a list using recursion. Take the following list for example: [1,2,5,6,7,9]. The largest difference between two adjacent numbers is 3 (between 2 and 5).
I know that recursion may not be the best solution, but I'm trying to improve my ability to use recursion in Haskell.
Here's the current code I currently have:
largestDiff (x:y:xs) = if (length (y:xs) > 1) then max((x-y), largestDiff (y:xs)) else 0
Basically - the list will keep getting shorter until it reaches 1 (i.e. no more numbers can be compared, then it returns 0). As 0 passes up the call stack, the max function is then used to implement a 'King of the Hill' type algorithm. Finally - at the end of the call stack, the largest number should be returned.
Trouble is, I'm getting an error in my code that I can't work around:
Occurs check: cannot construct the infinite type:
t1 = (t0, t1) -> (t0, t1)
In the return type of a call of `largestDiff'
Probable cause: `largestDiff' is applied to too few arguments
In the expression: largestDiff (y : xs)
In the first argument of `max', namely
`((x - y), largestDiff (y : xs))'
Anyone have some words of wisdom to share?
Thanks for your time!
EDIT: Thanks everyone for your time - I ended up independently discovering a much simpler way after much trial and error.
largestDiff [] = error "List too small"
largestDiff [x] = error "List too small"
largestDiff [x,y] = abs(x-y)
largestDiff (x:y:xs) = max(abs(x-y)) (largestDiff (y:xs))
Thanks again, all!
So the reason why your code is throwing an error is because
max((x-y), largestDiff (y:xs))
In Haskell, you do not use parentheses around parameters and separate them by commas, the correct syntax is
max (x - y) (largestDiff (y:xs))
The syntax you used is getting parsed as
max ((x - y), largestDiff (y:xs))
Which looks like you're passing a tuple to max!
However, this does not solve the problem. I always got 0 back. Instead, I would recommend breaking up the problem into two functions. You want to calculate the maximum of the difference, so first write a function to calculate the differences and then a function to calculate the maximum of those:
diffs :: Num a => [a] -> [a]
diffs [] = [] -- No elements case
diffs [x] = [] -- One element case
diffs (x:y:xs) = y - x : diffs (y:xs) -- Two or more elements case
largestDiff :: (Ord a, Num a) => [a] -> a
largestDiff xs = maximum $ map abs $ diffs xs
Notice how I've pulled the recursion out into the simplest possible case. We didn't need to calculate the maximum as we traversed the list; it's possible, just more complex. Since Haskell has a handy built-in function for calculating the maximum of a list for us, we can also leverage that. Our recursive function is clean and simple, and it is then combined with maximum to implement the desired largestDiff. As an FYI, diffs is really just a function to compute the derivative of a list of numbers, it can be a very useful function for data processing.
EDIT: Needed Ord constraint on largestDiff and added in map abs before calculating maximum.
Here's my take at it.
First some helpers:
diff a b = abs(a-b)
pick a b = if a > b then a else b
Then the solution:
mdiff :: [Int] -> Int
mdiff [] = 0
mdiff [_] = 0
mdiff (a:b:xs) = pick (diff a b) (mdiff (b:xs))
You have to provide two closing clauses, because the sequence might have either even or odd number of elements.
Another solution to this problem, which circumvents your error, can be obtained
by just transforming lists and folding/reducing them.
import Data.List (foldl')
diffs :: (Num a) => [a] -> [a]
diffs x = zipWith (-) x (drop 1 x)
absMax :: (Ord a, Num a) => [a] -> a
absMax x = foldl' max (fromInteger 0) (map abs x)
Now I admit this is a bit dense for a beginner, so I will explain the above.
The function zipWith transforms two given lists by using a binary function,
which is (-) in this case.
The second list we pass to zipWith is drop 1 x, which is just another way of
describing the tail of a list, but where tail [] results in an error,
drop 1 [] just yields the empty list. So drop 1 is the "safer" choice.
So the first function calculates the adjacent differences.
The name of the second function suggests that it calculates the maximum absolute
value of a given list, which is only partly true, it results in "0" if passed an
empty list.
But how does this happen, reading from right to left, we see that map abs
transforms every list element to its absolute value, which is asserted by
the Num a constraint. Then the foldl'-function traverses the list and
accumulates the maximum of the previous accumulator and the current element of
the list traversal. Moreover I'd like to mention that foldl' is the "strict"
sister/brother of the foldl-function, where the latter is rarely of use,
because it tends to build up a bunch of unevaluated expressions called thunks.
So let's quit all this blah blah and see it in action ;-)
> let a = diffs [1..3] :: [Int]
>>> zipWith (-) [1,2,3] (drop 1 [1,2,3])
<=> zipWith (-) [1,2,3] [2,3]
<=> [1-2,2-3] -- zipWith stops at the end of the SHORTER list
<=> [-1,-1]
> b = absMax a
>>> foldl' max (fromInteger 0) (map abs [-1,-1])
-- fromInteger 0 is in this case is just 0 - interesting stuff only happens
-- for other numerical types
<=> foldl' max 0 (map abs [-1,-1])
<=> foldl' max 0 [1,1]
<=> foldl' max (max 0 1) [1]
<=> foldl' max 1 [1]
<=> foldl' max (max 1 1) []
<=> foldl' max 1 [] -- foldl' _ acc [] returns just the accumulator
<=> 1

Compiler optimizations for infinite lists in Haskell?

I've various "partial permutation" functions of type t -> Maybe t that either take me to a new location in a data structure by returning a Just or else return a Nothing if they cannot yet get there.
I routinely must applying these partial permutations in repeated specific patterns, building a list of all intermediate values, but truncating the list whenever I return to my starting position or a permutation fails.
scan_partial_perms :: Eq t => [t -> Maybe t] -> t -> [t]
scan_partial_perms ps v = map fromJust . takeWhile test $ scanl (>>=) (Just v) ps
where test (Just i) | i /= v = True
test _ = False
iterate_partial_perm = scan_partial_perm . iterate
cycle_partial_perms = scan_partial_perms perms . cycle
I'm fairly confident that scanl has the desirable strictness and tail recursion properties in this context. Any other tips on optimizing this code? In particular, what compiler options beyond -O3 -fllvm should I read about?
At worst, I could replace the scanl and infinite list with an accessor function defined like
perm l i = l !! i `rem` length l
I'd imagine this cannot improve performance with the right optimizations however.
I think you have a bug in scan_partial_perms,
scan_partial_perms ps v = map fromJust . takeWhile test $ scanl (>>=) (Just v) ps
scanl f s list always starts with s, so takeWhile test (scanl ...) is []. If that is intentional, it's quite obfuscated. Assuming what you want is
scan_partial_perms ps v = (v:) . map fromJust . takeWhile test . tail $ scanl (>>=) (Just v) ps
there's not much you can do. You can {-# SPECIALISE #-} it so the Eq dictionary is eliminated for the specialised-for types. That'll do you some good if the compiler doesn't do that on its own (which it may if it can see the use site). With ghc >= 7, you can instead make it {-# INLINABLE #-}, so that it can be specialised and perhaps inlined at each use site.
I don't know what happens down the llvm road, but at the core-level, map, fromJust and takeWhile are not yet inlined, so if you're desperate enough, you can get maybe a few tenths of a percent by inlining them manually if they aren't inlined later in the llvm backend:
scan_partial_perms ps v = v : go v ps
where
go w (q:qs) = case q w of
Just z
| z /= v -> z : go z qs
_ -> []
go _ _ = []
But those are very cheap functions, so the gains - if at all present - would be small.
So what you have is rather good already, if it's not good enough, you need a different route of attack.
The one with the list indexing,
perm l i = l !! (i `rem` length l)
-- parentheses necessary, I don't think (l !! i) `rem` length l was what you want
doesn't look good. length is expensive, (!!) is expensive too, so both should in general be avoided.

variant of pascal's triangle in haskell - problem with lazy evaluation

To solve some problem I need to compute a variant of the pascal's triangle which is defined like this:
f(1,1) = 1,
f(n,k) = f(n-1,k-1) + f(n-1,k) + 1 for 1 <= k < n,
f(n,0) = 0,
f(n,n) = 2*f(n-1,n-1) + 1.
For n given I want to efficiently get the n-th line (f(n,1) .. f(n,n)). One further restriction: f(n,k) should be -1 if it would be >= 2^32.
My implementation:
next :: [Int64] -> [Int64]
next list#(x:_) = x+1 : takeWhile (/= -1) (nextRec list)
nextRec (a:rest#(b:_)) = boundAdd a b : nextRec rest
nextRec [a] = [boundAdd a a]
boundAdd x y
| x < 0 || y < 0 = -1
| x + y + 1 >= limit = -1
| otherwise = (x+y+1)
-- start shoud be [1]
fLine d start = until ((== d) . head) next start
The problem: for very large numbers I get a stack overflow. Is there a way to force haskell to evaluate the whole list? It's clear that each line can't contain more elements than an upper bound, because they eventually become -1 and don't get stored and each line only depends on the previous one. Due to the lazy evaluation only the head of each line is computed until the last line needs it's second element and all the trunks along the way are stored...
I have a very efficient implementation in c++ but I am really wondering if there is a way to get it done in haskell, too.
Works for me: What Haskell implementation are you using? A naive program to calculate this triangle works fine for me in GHC 6.10.4. I can print the 1000th row just fine:
nextRow :: [Integer] -> [Integer]
nextRow row = 0 : [a + b + 1 | (a, b) <- zip row (tail row ++ [last row])]
tri = iterate nextRow [0]
main = putStrLn $ show $ tri !! 1000 -- print 1000th row
I can even print the first 10 numbers in row 100000 without overflowing the stack. I'm not sure what's going wrong for you. The global name tri might be keeping the whole triangle of results alive, but even if it is, that seems relatively harmless.
How to force order of evaluation: You can force thunks to be evaluated in a certain order using the Prelude function seq (which is a magic function that can't be implemented in terms of Haskell's other basic features). If you tell Haskell to print a `seq` b, it first evaluates the thunk for a, then evaluates and prints b.
Note that seq is shallow: it only does enough evaluation to force a to no longer be a thunk. If a is of a tuple type, the result might still be a tuple of thunks. If it's a list, the result might be a cons cell having thunks for both the head and the tail.
It seems like you shouldn't need to do this for such a simple problem; a few thousand thunks shouldn't be too much for any reasonable implementation. But it would go like this:
-- Evaluate a whole list of thunks before calculating `result`.
-- This returns `result`.
seqList :: [b] -> a -> a
seqList lst result = foldr seq result lst
-- Exactly the same as `nextRow`, but compute every element of `row`
-- before calculating any element of the next row.
nextRow' :: [Integer] -> [Integer]
nextRow' row = row `seqList` nextRow row
tri = iterate nextRow' [0]
The fold in seqList basically expands to lst!!0 `seq` lst!!1 `seq` lst!!2 `seq` ... `seq` result.
This is much slower for me when printing just the first 10 elements of row 100,000. I think that's because it requires computing 99,999 complete rows of the triangle.

Resources