Haskell State monad vs state as parameter performance test

Haskell State monad vs state as parameter performance test - performance

I start to learn a State Monad and one idea bother me. Instead of passing accumulator as parameter, we can wrap everything to the state monad.
So I wanted to compare performance between using State monad vs passing it as parameter.
So I created two functions:
sum1 :: Int -> [Int] -> Int
sum1 x [] = x
sum1 x (y:xs) = sum1 (x + y) xs
and
sumState:: [Int] -> Int
sumState xs = execState (traverse f xs) 0
where f n = modify (n+)
I compared them on the input array [1..1000000000].
sumState running time was around 15s
sum1 around 5s
We can see clear winner, but the I realised that sumState can be optimised as:
We can use strict version of modify
We do not need necessary the map list output, so we can use traverse_ instead
So the new optimised state function is:
sumState:: [Int] -> Int
sumState xs = execState (traverse_ f xs) 0
where f n = modify' (n+)
which has running time around 350ms. This is a huge improvement. It was shocking.
Why the modified sumState has better performance then sum1? Can sum1 be optimised to match or even be better then sumState?
I also tried other different implementation of sum as
using built in sum function, which gives me around 240ms ((sum [1..x] ::Int))
using strict foldl', which gives me the same result around 240ms (with implicit [Int] -> Int)
Does it actually mean that it is better to use foldl function or State monad to pass accumulator instead of passing it as argument to the function?
Thank you for help.
EDIT:
Each function was in separate file with own main function and compiled with "-O2" flag.
main = do
x <- (read . head ) <$> getArgs
print $ <particular sum function> [1..x]
Runtime was measured via time command on linux.

To give a bit more explanation as to why traverse is slower: traverse f xs has has type State [()] and that [()] (list of unit tuples) is built up during the summation. This prevents further optimizations and would cause a memory leak if you were not using lazy state.
Update: I think GHC should have been able to notice that that list of unit tuples is never used, so I opened a GHC issue.
In both cases, To get the best performance we want to combine (or fuse) the summation with the enumeration [1..x] into a tight recursive loop which simply increments and adds until it reaches x. The resulting code would look something like this:
sumFromTo :: Int -> Int -> Int -> Int
sumFromTo s x y
| x == y = s + x
| otherwise = sumFromTo (s + x) (x + 1) y
This avoids allocations for the list [1..x].
The base library achieves this optimization using foldr/build fusion, also known as short cut fusion. The sum, foldl' and traverse (for lists) functions are implemented using the foldr function and [1..x] is implemented using the build function. The foldr and build function have special optimization rules so that they can be fused. Your custom sum1 function doesn't use foldr and so it can never be fused with [1..x] in this way.

Ironically, the same problem that plagued your implementation of sumState is also the problem with sum1. You don't have strict accumulation, so you build up thunks like so:
sum 0 [1, 2, 3]
sum (0 + 1) [2, 3]
sum ((0 + 1) + 2) [3]
sum (((0 + 1) + 2) + 3) []
(((0 + 1) + 2) + 3)
((1 + 2) + 3)
(3 + 3)
6
If you add strictness to sum1, you should see a dramatic improvement in efficiency because you eliminate the non-tail-recursive evaluation of the thunk (((0 + 1) + 2) + 3), which is the costly part of sum1. Using strict accumulation makes this much more efficient:
sum1 x [] = []
sum1 x (y : xs) = x `seq` sum1 (x + y) xs
should give you comparable performance to sum (although as noted in another answer, GHC may not be able to use fusion properly to give you the truly magical performance of sum on the list [1..x]).

Related

Haskell Recursion - finding largest difference between numbers in list

Here's the problem at hand: I need to find the largest difference between adjacent numbers in a list using recursion. Take the following list for example: [1,2,5,6,7,9]. The largest difference between two adjacent numbers is 3 (between 2 and 5).
I know that recursion may not be the best solution, but I'm trying to improve my ability to use recursion in Haskell.
Here's the current code I currently have:
largestDiff (x:y:xs) = if (length (y:xs) > 1) then max((x-y), largestDiff (y:xs)) else 0
Basically - the list will keep getting shorter until it reaches 1 (i.e. no more numbers can be compared, then it returns 0). As 0 passes up the call stack, the max function is then used to implement a 'King of the Hill' type algorithm. Finally - at the end of the call stack, the largest number should be returned.
Trouble is, I'm getting an error in my code that I can't work around:
Occurs check: cannot construct the infinite type:
t1 = (t0, t1) -> (t0, t1)
In the return type of a call of `largestDiff'
Probable cause: `largestDiff' is applied to too few arguments
In the expression: largestDiff (y : xs)
In the first argument of `max', namely
`((x - y), largestDiff (y : xs))'
Anyone have some words of wisdom to share?
Thanks for your time!
EDIT: Thanks everyone for your time - I ended up independently discovering a much simpler way after much trial and error.
largestDiff [] = error "List too small"
largestDiff [x] = error "List too small"
largestDiff [x,y] = abs(x-y)
largestDiff (x:y:xs) = max(abs(x-y)) (largestDiff (y:xs))
Thanks again, all!

So the reason why your code is throwing an error is because
max((x-y), largestDiff (y:xs))
In Haskell, you do not use parentheses around parameters and separate them by commas, the correct syntax is
max (x - y) (largestDiff (y:xs))
The syntax you used is getting parsed as
max ((x - y), largestDiff (y:xs))
Which looks like you're passing a tuple to max!
However, this does not solve the problem. I always got 0 back. Instead, I would recommend breaking up the problem into two functions. You want to calculate the maximum of the difference, so first write a function to calculate the differences and then a function to calculate the maximum of those:
diffs :: Num a => [a] -> [a]
diffs [] = [] -- No elements case
diffs [x] = [] -- One element case
diffs (x:y:xs) = y - x : diffs (y:xs) -- Two or more elements case
largestDiff :: (Ord a, Num a) => [a] -> a
largestDiff xs = maximum $ map abs $ diffs xs
Notice how I've pulled the recursion out into the simplest possible case. We didn't need to calculate the maximum as we traversed the list; it's possible, just more complex. Since Haskell has a handy built-in function for calculating the maximum of a list for us, we can also leverage that. Our recursive function is clean and simple, and it is then combined with maximum to implement the desired largestDiff. As an FYI, diffs is really just a function to compute the derivative of a list of numbers, it can be a very useful function for data processing.
EDIT: Needed Ord constraint on largestDiff and added in map abs before calculating maximum.

Here's my take at it.
First some helpers:
diff a b = abs(a-b)
pick a b = if a > b then a else b
Then the solution:
mdiff :: [Int] -> Int
mdiff [] = 0
mdiff [_] = 0
mdiff (a:b:xs) = pick (diff a b) (mdiff (b:xs))
You have to provide two closing clauses, because the sequence might have either even or odd number of elements.

Another solution to this problem, which circumvents your error, can be obtained
by just transforming lists and folding/reducing them.
import Data.List (foldl')
diffs :: (Num a) => [a] -> [a]
diffs x = zipWith (-) x (drop 1 x)
absMax :: (Ord a, Num a) => [a] -> a
absMax x = foldl' max (fromInteger 0) (map abs x)
Now I admit this is a bit dense for a beginner, so I will explain the above.
The function zipWith transforms two given lists by using a binary function,
which is (-) in this case.
The second list we pass to zipWith is drop 1 x, which is just another way of
describing the tail of a list, but where tail [] results in an error,
drop 1 [] just yields the empty list. So drop 1 is the "safer" choice.
So the first function calculates the adjacent differences.
The name of the second function suggests that it calculates the maximum absolute
value of a given list, which is only partly true, it results in "0" if passed an
empty list.
But how does this happen, reading from right to left, we see that map abs
transforms every list element to its absolute value, which is asserted by
the Num a constraint. Then the foldl'-function traverses the list and
accumulates the maximum of the previous accumulator and the current element of
the list traversal. Moreover I'd like to mention that foldl' is the "strict"
sister/brother of the foldl-function, where the latter is rarely of use,
because it tends to build up a bunch of unevaluated expressions called thunks.
So let's quit all this blah blah and see it in action ;-)
> let a = diffs [1..3] :: [Int]
>>> zipWith (-) [1,2,3] (drop 1 [1,2,3])
<=> zipWith (-) [1,2,3] [2,3]
<=> [1-2,2-3] -- zipWith stops at the end of the SHORTER list
<=> [-1,-1]
> b = absMax a
>>> foldl' max (fromInteger 0) (map abs [-1,-1])
-- fromInteger 0 is in this case is just 0 - interesting stuff only happens
-- for other numerical types
<=> foldl' max 0 (map abs [-1,-1])
<=> foldl' max 0 [1,1]
<=> foldl' max (max 0 1) [1]
<=> foldl' max 1 [1]
<=> foldl' max (max 1 1) []
<=> foldl' max 1 [] -- foldl' _ acc [] returns just the accumulator
<=> 1

Performance comparison of two implementations of a primes filter

I have two programs to find prime numbers (just an exercise, I'm learning Haskell). "primes" is about 10X faster than "primes2", once compiled with ghc (with flag -O). However, in "primes2", I thought it would consider only prime numbers for the divisor test, which should be faster than considering odd numbers in "isPrime", right? What am I missing?
isqrt :: Integral a => a -> a
isqrt = floor . sqrt . fromIntegral
isPrime :: Integral a => a -> Bool
isPrime n = length [i | i <- [1,3..(isqrt n)], mod n i == 0] == 1
primes :: Integral a => a -> [a]
primes n = [2,3,5,7,11,13] ++ (filter (isPrime) [15,17..n])
primes2 :: Integral a => a -> [a]
primes2 n = 2 : [i | i <- [3,5..n], all ((/= 0) . mod i) (primes2 (isqrt i))]

I think what's happening here is that isPrime is a simple loop, whereas primes2 is calling itself recursively — and its recursion pattern looks exponential to me.
Searching through my old source code, I found this code:
primes :: [Integer]
primes = 2 : filter isPrime [3,5..]
isPrime :: Integer -> Bool
isPrime x = all (\n -> x `mod` n /= 0) $
takeWhile (\n -> n * n <= x) primes
This tests each possible prime x only against the primes below sqrt(x), using the already generated list of primes. So it probably doesn't test any given prime more than once.
Memoization in Haskell:
Memoization in Haskell is generally explicit, not implicit. The compiler won't "do the right thing" but it will only do what you tell it to. When you call primes2,
*Main> primes2 5
[2,3,5]
*Main> primes2 10
[2,3,5,7]
Each time you call the function it calculates all of its results all over again. It has to. Why? Because 1) You didn't make it save its results, and 2) the answer is different each time you call it.
In the sample code I gave above, primes is a constant (i.e. it has arity zero) so there's only one copy of it in memory, and its parts only get evaluated once.
If you want memoization, you need to have a value with arity zero somewhere in your code.

I like what Dietrich has done with the memoization, but I think theres a data structure issue here too. Lists are just not the ideal data structure for this. They are, by necessity, lisp style cons cells with no random access. Set seems better suited to me.
import qualified Data.Set as S
sieve :: (Integral a) => a -> S.Set a
sieve top = let l = S.fromList (2:3:([5,11..top]++[7,13..top]))
iter s c
| cur > (div (S.findMax s) 2) = s
| otherwise = iter (s S.\\ (S.fromList [2*cur,3*cur..top])) (S.deleteMin c)
where cur = S.findMin c
in iter l (l S.\\ (S.fromList [2,3]))
I know its kind of ugly, and not too declarative, but it runs rather quickly. Im looking into a way to make this nicer looking using Set.fold and Set.union over the composites. Any other ideas for neatening this up would be appreciated.
PS - see how (2:3:([5,11..top]++[7,13..top])) avoids unnecessary multiples of 3 such as the 15 in your primes. Unfortunately, this ruins your ordering if you work with lists and you sign up for a sorting, but for sets thats not an issue.

What's the way to determine if an Int is a perfect square in Haskell?

I need a simple function
is_square :: Int -> Bool
which determines if an Int N a perfect square (is there an integer x such that x*x = N).
Of course I can just write something like
is_square n = sq * sq == n
where sq = floor $ sqrt $ (fromIntegral n::Double)
but it looks terrible! Maybe there is a common simple way to implement such a predicate?

Think of it this way, if you have a positive int n, then you're basically doing a binary search on the range of numbers from 1 .. n to find the first number n' where n' * n' = n.
I don't know Haskell, but this F# should be easy to convert:
let is_perfect_square n =
let rec binary_search low high =
let mid = (high + low) / 2
let midSquare = mid * mid
if low > high then false
elif n = midSquare then true
else if n < midSquare then binary_search low (mid - 1)
else binary_search (mid + 1) high
binary_search 1 n
Guaranteed to be O(log n). Easy to modify perfect cubes and higher powers.

There is a wonderful library for most number theory related problems in Haskell included in the arithmoi package.
Use the Math.NumberTheory.Powers.Squares library.
Specifically the isSquare' function.
is_square :: Int -> Bool
is_square = isSquare' . fromIntegral
The library is optimized and well vetted by people much more dedicated to efficiency then you or I. While it currently doesn't have this kind of shenanigans going on under the hood, it could in the future as the library evolves and gets more optimized. View the source code to understand how it works!
Don't reinvent the wheel, always use a library when available.

I think the code you provided is the fastest that you are going to get:
is_square n = sq * sq == n
where sq = floor $ sqrt $ (fromIntegral n::Double)
The complexity of this code is: one sqrt, one double multiplication, one cast (dbl->int), and one comparison. You could try to use other computation methods to replace the sqrt and the multiplication with just integer arithmetic and shifts, but chances are it is not going to be faster than one sqrt and one multiplication.
The only place where it might be worth using another method is if the CPU on which you are running does not support floating point arithmetic. In this case the compiler will probably have to generate sqrt and double multiplication in software, and you could get advantage in optimizing for your specific application.
As pointed out by other answer, there is still a limitation of big integers, but unless you are going to run into those numbers, it is probably better to take advantage of the floating point hardware support than writing your own algorithm.

In a comment on another answer to this question, you discussed memoization. Keep in mind that this technique helps when your probe patterns exhibit good density. In this case, that would mean testing the same integers over and over. How likely is your code to repeat the same work and thus benefit from caching answers?
You didn't give us an idea of the distribution of your inputs, so consider a quick benchmark that uses the excellent criterion package:
module Main
where
import Criterion.Main
import Random
is_square n = sq * sq == n
where sq = floor $ sqrt $ (fromIntegral n::Double)
is_square_mem =
let check n = sq * sq == n
where sq = floor $ sqrt $ (fromIntegral n :: Double)
in (map check [0..] !!)
main = do
g <- newStdGen
let rs = take 10000 $ randomRs (0,1000::Int) g
direct = map is_square
memo = map is_square_mem
defaultMain [ bench "direct" $ whnf direct rs
, bench "memo" $ whnf memo rs
]
This workload may or may not be a fair representative of what you're doing, but as written, the cache miss rate appears too high:

Wikipedia's article on Integer Square Roots has algorithms can be adapted to suit your needs. Newton's method is nice because it converges quadratically, i.e., you get twice as many correct digits each step.
I would advise you to stay away from Double if the input might be bigger than 2^53, after which not all integers can be exactly represented as Double.

Oh, today I needed to determine if a number is perfect cube, and similar solution was VERY slow.
So, I came up with a pretty clever alternative
cubes = map (\x -> x*x*x) [1..]
is_cube n = n == (head $ dropWhile (<n) cubes)
Very simple. I think, I need to use a tree for faster lookups, but now I'll try this solution, maybe it will be fast enough for my task. If not, I'll edit the answer with proper datastructure

Sometimes you shouldn't divide problems into too small parts (like checks is_square):
intersectSorted [] _ = []
intersectSorted _ [] = []
intersectSorted xs (y:ys) | head xs > y = intersectSorted xs ys
intersectSorted (x:xs) ys | head ys > x = intersectSorted xs ys
intersectSorted (x:xs) (y:ys) | x == y = x : intersectSorted xs ys
squares = [x*x | x <- [ 1..]]
weird = [2*x+1 | x <- [ 1..]]
perfectSquareWeird = intersectSorted squares weird

There's a very simple way to test for a perfect square - quite literally, you check if the square root of the number has anything other than zero in the fractional part of it.
I'm assuming a square root function that returns a floating point, in which case you can do (Psuedocode):
func IsSquare(N)
sq = sqrt(N)
return (sq modulus 1.0) equals 0.0

It's not particularly pretty or fast, but here's a cast-free, FPA-free version based on Newton's method that works (slowly) for arbitrarily large integers:
import Control.Applicative ((<*>))
import Control.Monad (join)
import Data.Ratio ((%))
isSquare = (==) =<< (^2) . floor . (join g <*> join f) . (%1)
where
f n x = (x + n / x) / 2
g n x y | abs (x - y) > 1 = g n y $ f n y
| otherwise = y
It could probably be sped up with some additional number theory trickery.

variant of pascal's triangle in haskell - problem with lazy evaluation

To solve some problem I need to compute a variant of the pascal's triangle which is defined like this:
f(1,1) = 1,
f(n,k) = f(n-1,k-1) + f(n-1,k) + 1 for 1 <= k < n,
f(n,0) = 0,
f(n,n) = 2*f(n-1,n-1) + 1.
For n given I want to efficiently get the n-th line (f(n,1) .. f(n,n)). One further restriction: f(n,k) should be -1 if it would be >= 2^32.
My implementation:
next :: [Int64] -> [Int64]
next list#(x:_) = x+1 : takeWhile (/= -1) (nextRec list)
nextRec (a:rest#(b:_)) = boundAdd a b : nextRec rest
nextRec [a] = [boundAdd a a]
boundAdd x y
| x < 0 || y < 0 = -1
| x + y + 1 >= limit = -1
| otherwise = (x+y+1)
-- start shoud be [1]
fLine d start = until ((== d) . head) next start
The problem: for very large numbers I get a stack overflow. Is there a way to force haskell to evaluate the whole list? It's clear that each line can't contain more elements than an upper bound, because they eventually become -1 and don't get stored and each line only depends on the previous one. Due to the lazy evaluation only the head of each line is computed until the last line needs it's second element and all the trunks along the way are stored...
I have a very efficient implementation in c++ but I am really wondering if there is a way to get it done in haskell, too.

Works for me: What Haskell implementation are you using? A naive program to calculate this triangle works fine for me in GHC 6.10.4. I can print the 1000th row just fine:
nextRow :: [Integer] -> [Integer]
nextRow row = 0 : [a + b + 1 | (a, b) <- zip row (tail row ++ [last row])]
tri = iterate nextRow [0]
main = putStrLn $ show $ tri !! 1000 -- print 1000th row
I can even print the first 10 numbers in row 100000 without overflowing the stack. I'm not sure what's going wrong for you. The global name tri might be keeping the whole triangle of results alive, but even if it is, that seems relatively harmless.
How to force order of evaluation: You can force thunks to be evaluated in a certain order using the Prelude function seq (which is a magic function that can't be implemented in terms of Haskell's other basic features). If you tell Haskell to print a `seq` b, it first evaluates the thunk for a, then evaluates and prints b.
Note that seq is shallow: it only does enough evaluation to force a to no longer be a thunk. If a is of a tuple type, the result might still be a tuple of thunks. If it's a list, the result might be a cons cell having thunks for both the head and the tail.
It seems like you shouldn't need to do this for such a simple problem; a few thousand thunks shouldn't be too much for any reasonable implementation. But it would go like this:
-- Evaluate a whole list of thunks before calculating `result`.
-- This returns `result`.
seqList :: [b] -> a -> a
seqList lst result = foldr seq result lst
-- Exactly the same as `nextRow`, but compute every element of `row`
-- before calculating any element of the next row.
nextRow' :: [Integer] -> [Integer]
nextRow' row = row `seqList` nextRow row
tri = iterate nextRow' [0]
The fold in seqList basically expands to lst!!0 `seq` lst!!1 `seq` lst!!2 `seq` ... `seq` result.
This is much slower for me when printing just the first 10 elements of row 100,000. I think that's because it requires computing 99,999 complete rows of the triangle.

Functional learning woes

I'm a beginner to functional languages, and I'm trying to get the whole thing down in Haskell. Here's a quick-and-dirty function that finds all the factors of a number:
factors :: (Integral a) => a -> [a]
factors x = filter (\z -> x `mod` z == 0) [2..x `div` 2]
Works fine, but I found it to be unbearably slow for large numbers. So I made myself a better one:
factorcalc :: (Integral a) => a -> a -> [a] -> [a]
factorcalc x y z
| y `elem` z = sort z
| x `mod` y == 0 = factorcalc x (y+1) (z ++ [y] ++ [(x `div` y)])
| otherwise = factorcalc x (y+1) z
But here's my problem: Even though the code works, and can cut literally hours off the execution time of my programs, it's hideous!
It reeks of ugly imperative thinking: It constantly updates a counter and a data structure in a loop until it finishes. Since you can't change state in purely functional programming, I cheated by holding the data in the parameters, which the function simply passes to itself over and over again.
I may be wrong, but there simply must be a better way of doing the same thing...

Note that the original question asked for all the factors, not for only the prime factors. There being many fewer prime factors, they can probably be found more quickly. Perhaps that's what the OQ wanted. Perhaps not. But let's solve the original problem and put the "fun" back in "functional"!
Some observations:
The two functions don't produce the same output---if x is a perfect square, the second function includes the square root twice.
The first function enumerates checks a number of potential factors proportional to the size of x; the second function checks only proportional to the square root of x, then stops (with the bug noted above).
The first function (factors) allocates a list of all integers from 2 to n div 2, where the second function never allocates a list but instead visits fewer integers one at a time in a parameter. I ran the optimizer with -O and looked at the output with -ddump-simpl, and GHC just isn't smart enough to optimize away those allocations.
factorcalc is tail-recursive, which means it compiles into a tight machine-code loop; filter is not and does not.
Some experiments show that the square root is the killer:
Here's a sample function that produces the factors of x from z down to 2:
factors_from x 1 = []
factors_from x z
| x `mod` z == 0 = z : factors_from x (z-1)
| otherwise = factors_from x (z-1)
factors'' x = factors_from x (x `div` 2)
It's a bit faster because it doesn't allocate, but it's still not tail-recursive.
Here's a tail-recursive version that is more faithful to the original:
factors_from' x 1 l = l
factors_from' x z l
| x `mod` z == 0 = factors_from' x (z-1) (z:l)
| otherwise = factors_from' x (z-1) l
factors''' x = factors_from x (x `div` 2)
This is still slower than factorcalc because it enumerates all the integers from 2 to x div 2, whereas factorcalc stops at the square root.
Armed with this knowledge, we can now create a more functional version of factorcalc which replicates both its speed and its bug:
factors'''' x = sort $ uncurry (++) $ unzip $ takeWhile (uncurry (<=)) $
[ (z, x `div` z) | z <- [2..x], x `mod` z == 0 ]
I didn't time it exactly, but given 100 million as an input, both it and factorcalc terminate instantaneously, where the others all take a number of seconds.
How and why the function works is left as an exercise for the reader :-)
ADDENDUM: OK, to mitigate the eyeball bleeding, here's a slightly saner version (and without the bug):
saneFactors x = sort $ concat $ takeWhile small $
[ pair z | z <- [2..], x `mod` z == 0 ]
where pair z = if z * z == x then [z] else [z, x `div` z]
small [z, z'] = z < z'
small [z] = True

Okay, take a deep breath. It'll be all right.
First of all, why is your first attempt slow? How is it spending its time?
Can you think of a recursive definition for the prime factorization that doesn't have that property?
(Hint.)

Firstly, although factorcalc is "ugly", you could add a wrapper function factors' x = factorscalc x 2 [], add a comment, and move on.
If you want to make a 'beautiful' factors fast, you need to find out why it is slow. Looking at your two functions, factors walks the list about n/2 elements long, but factorcalc stops after around sqrt n iterations.
Here is another factors that also stops after about sqrt n iterations, but uses a fold instead of explicit iteration. It also breaks the problem into three parts: finding the factors (factor); stopping at the square root of x (small) and then computing pairs of factors (factorize):
factors' :: (Integral a) => a -> [a]
factors' x = sort (foldl factorize [] (takeWhile small (filter factor [2..])))
where
factor z = x `mod` z == 0
small z = z <= (x `div` z)
factorize acc z = z : (if z == y then acc else y : acc)
where y = x `div` z
This is marginally faster than factorscalc on my machine. You can fuse factor and factorize and it is about twice as fast as factorscalc.
The Profiling and Optimization chapter of Real World Haskell is a good guide to the GHC suite's performance tools for tackling tougher performance problems.
By the way, I have a minor style nitpick with factorscalc: it is much more efficient to prepend single elements to the front of a list O(1) than it is to append to the end of a list of length n O(n). The lists of factors are typically small, so it is not such a big deal, but factorcalc should probably be something like:
factorcalc :: (Integral a) => a -> a -> [a] -> [a]
factorcalc x y z
| y `elem` z = sort z
| x `mod` y == 0 = factorcalc x (y+1) (y : (x `div` y) : z)
| otherwise = factorcalc x (y+1) z

Since you can't change state in purely
functional programming, I cheated by
holding the data in the parameters,
which the function simply passes to
itself over and over again.
Actually, this is not cheating; this is a—no, make that the—standard technique! That sort of parameter is usually known as an "accumulator," and it's generally hidden within a helper function that does the actual recursion after being set up by the function you're calling.
A common case is when you're doing list operations that depend on the previous data in the list. The two problems you need to solve are, where do you get the data about previous iterations, and how do you deal with the fact that your "working area of interest" for any particular iteration is actually at the tail of the result list you're building. For both of these, the accumulator comes to the rescue. For example, to generate a list where each element is the sum of all of the elements of the input list up to that point:
sums :: Num a => [a] -> [a]
sums inp = helper inp []
where
helper [] acc = reverse acc
helper (x:xs) [] = helper xs [x]
helper (x:xs) acc#(h:_) = helper xs (x+h : acc)
Note that we flip the direction of the accumulator, so we can operate on the head of that, which is much more efficient (as Dominic mentions), and then we just reverse the final output.
By the way, I found reading The Little Schemer to be a useful introduction and offer good practice in thinking recursively.

This seemed like an interesting problem, and I hadn't coded any real Haskell in a while, so I gave it a crack. I've run both it and Norman's factors'''' against the same values, and it feels like mine's faster, though they're both so close that it's hard to tell.
factors :: Int -> [Int]
factors n = firstFactors ++ reverse [ n `div` i | i <- firstFactors ]
where
firstFactors = filter (\i -> n `mod` i == 0) (takeWhile ( \i -> i * i <= n ) [2..n])
Factors can be paired up into those that are greater than sqrt n, and those that are less than or equal to (for simplicity's sake, the exact square root, if n is a perfect square, falls into this category. So if we just take the ones that are less than or equal to, we can calculate the others later by doing div n i. They'll be in reverse order, so we can either reverse firstFactors first or reverse the result later. It doesn't really matter.

This is my "functional" approach to the problem. ("Functional" in quotes, because I'd approach this problem the same way even in non-functional languages, but maybe that's because I've been tainted by Haskell.)
{-# LANGUAGE PatternGuards #-}
factors :: (Integral a) => a -> [a]
factors = multiplyFactors . primeFactors primes 0 [] . abs where
multiplyFactors [] = [1]
multiplyFactors ((p, n) : factors) =
[ pn * x
| pn <- take (succ n) $ iterate (* p) 1
, x <- multiplyFactors factors ]
primeFactors _ _ _ 0 = error "Can't factor 0"
primeFactors (p:primes) n list x
| (x', 0) <- x `divMod` p
= primeFactors (p:primes) (succ n) list x'
primeFactors _ 0 list 1 = list
primeFactors (_:primes) 0 list x = primeFactors primes 0 list x
primeFactors (p:primes) n list x
= primeFactors primes 0 ((p, n) : list) x
primes = sieve [2..]
sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]
primes is the naive Sieve of Eratothenes. There's better, but this is the shortest method.
sieve [2..]
=> 2 : sieve [x | x <- [3..], x `mod` 2 /= 0]
=> 2 : 3 : sieve [x | x <- [4..], x `mod` 2 /= 0, x `mod` 3 /= 0]
=> 2 : 3 : sieve [x | x <- [5..], x `mod` 2 /= 0, x `mod` 3 /= 0]
=> 2 : 3 : 5 : ...
primeFactors is the simple repeated trial-division algorithm: it walks through the list of primes, and tries dividing the given number by each, recording the factors as it goes.
primeFactors (2:_) 0 [] 50
=> primeFactors (2:_) 1 [] 25
=> primeFactors (3:_) 0 [(2, 1)] 25
=> primeFactors (5:_) 0 [(2, 1)] 25
=> primeFactors (5:_) 1 [(2, 1)] 5
=> primeFactors (5:_) 2 [(2, 1)] 1
=> primeFactors _ 0 [(5, 2), (2, 1)] 1
=> [(5, 2), (2, 1)]
multiplyPrimes takes a list of primes and powers, and explodes it back out to a full list of factors.
multiplyPrimes [(5, 2), (2, 1)]
=> [ pn * x
| pn <- take (succ 2) $ iterate (* 5) 1
, x <- multiplyPrimes [(2, 1)] ]
=> [ pn * x | pn <- [1, 5, 25], x <- [1, 2] ]
=> [1, 2, 5, 10, 25, 50]
factors just strings these two functions together, along with an abs to prevent infinite recursion in case the input is negative.

I don't know much about Haskell, but somehow I think this link is appropriate:
http://www.willamette.edu/~fruehr/haskell/evolution.html
Edit: I'm not entirely sure why people are so aggressive about the downvoting on this. The original poster's real problem was that the code was ugly; while it's funny, the point of the linked article is, to some extent, that advanced Haskell code is, in fact, ugly; the more you learn, the uglier your code gets, to some extent. The point of this answer was to point out to the OP that apparently, the ugliness of the code that he was lamenting is not uncommon.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Haskell State monad vs state as parameter performance test - performance

Related

Haskell Recursion - finding largest difference between numbers in list

Performance comparison of two implementations of a primes filter

What's the way to determine if an Int is a perfect square in Haskell?

variant of pascal's triangle in haskell - problem with lazy evaluation

Functional learning woes

Categories

Resources