Strange space behavior of Haskell program - performance

I thought that the Cont monad is just equivalent to CPS Transformation, so if I have
a monadic sum, if I run in the Identity monad, it will fail due to stack overflow, and if
I run it in the Cont Monad, it will be okay due to tail recursion.
So I've written a simple program to verify my idea. But to my surprise, the result is unreasonable due to my limited knowledge.
All programs are compiled using ghc --make Test.hs -o test && ./test
sum0 n = if n==0 then 0 else n + sum0 (n-1)
sum1 n = if n==0 then return 0 else sum1 (n-1) >>= \ v -> seq v (return (n+v))
sum2 n k = if n == 0 then k 0 else sum2 n (\v -> k (n + v))
sum3 n k = if n == 0 then k 0 else sum3 n (\ !v -> k (n + v))
sum4 n k = if n == 0 then k 0 else sum4 n (\ v -> seq v ( k (n + v)))
sum5 n = if n==0 then return 0 else sum5 (n-1) >>= \ v -> (return (n+v))
main = print (sum0 3000000)
Stack overflow. This is reasonable.
main = print (flip runCont id (sum1 3000000))
Uses 180M memory, which is reasonable, but I am not clear why seq needed here, since its continuation is not applied until n goes to 0.
main = print (flip runCont id (sum5 3000000))
Stack overflow. Why?
main = print (flip runCont (const 0) (sum1 3000000))
Uses 130M memory. This is reasonable.
main = print (flip runCont (const 0) (sum5 3000000))
Uses 118M memory. This is reasonable.
main = print (sum2 3000000 (const 0))
Uses a lot of memory (more than 1G). I thought sum2 is equivalent to sum5 (when sum5 is in Cont monad). Why?
main = print (sum3 3000000 (const 0))
Uses a lot of memory. I thought sum3 is equivalent to sum1 (Cont monad). Why?
main = print (runIdentity (sum1 3000000))
Stack overflow, exactly what I want.
main = print (sum3 3000000 id)
Uses a lot of memory. Equivalent to sum1, why?
main = print (sum4 3000000 id)
Uses a lot of memory. Equivalent to sum1, why?
main = print (sum [1 .. 3000000])
Stack overflow. The source of sum = foldl (+) 0, so this is reasonable.
main = print (foldl' (+) 0 [1 .. 3000000])
Uses 1.5M.

First of all, it looks to me like sum2, sum3, and sum4 never actually decrement n. So they're using lots of memory because they're going into an infinite loop that does allocation.
After correcting that, I've run each of your tests again with the following results, where "allocation" refers to approximate peak memory use:
main = print (sum0 3000000) : Stack overflow, after allocating very little memory
main = print (flip runCont id (sum1 3000000)) : Success, allocating similar amounts to what you saw
main = print (flip runCont id (sum5 3000000)) : Stack overflow, after allocating similar amounts of memory as sum1.
main = print (flip runCont (const 0) (sum1 3000000)) : Success, similar allocation as the above
main = print (flip runCont (const 0) (sum5 3000000)) : Same
main = print (sum2 3000000 (const 0)) : Success, about 70% as much allocation as sum1
main = print (sum3 3000000 (const 0)) : Success, about 50% as much allocation as sum1
main = print (runIdentity (sum1 3000000)) : Stack overflow, with little allocation
main = print (sum3 3000000 id) : Success, about 50% as much allocation as sum1
main = print (sum4 3000000 id) : Success, about 50% as much allocation as sum1
main = print (sum [1 .. 3000000]) : Stack overflow, with about 80% as much allocation as sum1
main = print (foldl' (+) 0 [1 .. 3000000]) : Success, with almost no allocation
So that's mostly what you expected, with the exception of why seq makes such a difference between sum1 vs. sum5.

Related

Haskell Program Low Performance Compared with Perl

I am doing a Facebook Hackercup 2015 problem with Haskell and got stuck on this problem.
Input: Begins with an integer T, the number of questions. For each question, there is one line containing 3 space-separated integers:A, B, and K.
Output: For the ith question, print a line containing "Case #i: " followed by the number of integers in the inclusive range [A, B] with a primacity of K.
Primacity of a number X is the number of its prime factors. For example, the primacity of 12 is 2 (as it's divisible by primes 2 and 3), the primacity of 550 is 3 (as it's divisible by primes 2, 5, and 11), and the primacity of 7 is 1 (as the only prime it's divisible by is 7).
1 ≤ T ≤ 100 
2 ≤ A ≤ B ≤ 10^7 
1 ≤ K ≤ 10^9 
Here is my Haskell solution:
import System.IO
import Data.List
import Control.Monad
incEvery :: Int -> [(Int -> Int)]
incEvery n = (cycle ((replicate (n-1) id) ++ [(+ 1)]))
primes2 :: [Int]
primes2 = sieve 2 (replicate (10^7) 0)
where
sieve _ [] = []
sieve n (a:xs) = (a + (if a == 0 then 1 else 0))
: if a == 0 then
sieve (n+1) (zipWith ($) (incEvery n) xs)
else
sieve (n+1) xs
process :: (Int, Int, Int) -> Int
process (lo, hi, k) =
length . filter (\(a, b) -> a >= lo && a <= hi && b == k) . zip [2,3..] $ primes2
readIn :: [String] -> (Int, Int, Int)
readIn =
(\[x, y, z] -> (x, y, z)) . fmap (read::String->Int) . take 3
lib :: String -> String
lib xs = unlines . fmap (\(i, x) -> "Case #" ++ (show i) ++ ": " ++ x) . zip [1,2..]
. fmap parse . tail . lines $ xs
where
parse = (show . process . readIn . words)
main :: IO ()
main = interact lib
Here is my Perl solution:
use strict;
use warnings;
my $max = 10000010;
my #f = (0) x $max;
for my $i (2 .. $max) {
if($f[$i] == 0) {
$f[$i] = 1;
# print $i . "\n";
for my $j (2 .. ($max / $i)) {
$f[$i * $j] ++;
}
}
}
my $k = <STDIN>;
for my $i (1 .. $k) {
my $line = <STDIN>;
if($line) {
chomp $line;
my ($a, $b, $t) = split(' ', $line);
my $ans = 0;
for my $j ($a .. $b) {
if($f[$j] == $t) {
$ans ++;
}
}
print "Case #$i: " . $ans . "\n";
}
}
Though I am using the same sieving algorithm for both languages, the Haskell version is significantly slower than Perl version on 10^7 scale of data.
Basically the following Haskell function is slower than its Perl counterpart:
incEvery :: Int -> [(Int -> Int)]
incEvery n = (cycle ((replicate (n-1) id) ++ [(+ 1)]))
primes2 :: [Int]
primes2 = sieve 2 (replicate (10^7) 0)
where
sieve _ [] = []
sieve n (a:xs) = (a + (if a == 0 then 1 else 0))
: if a == 0 then
sieve (n+1) (zipWith ($) (incEvery n) xs)
else
sieve (n+1) xs
I think both recursion and (zipWith ($) (incEvery n) xs) are causing the problem. Any ideas?
There is absolutely no reason why you need to resort to imperative programming to gain performance. The unique thing about Haskell is you have to learn to think differently if you want to program in a purely functional manner. Exploit laziness to speed things up a bit:
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Applicative ( pure, (<$>) )
import Data.List ( nub )
import Data.Monoid ( (<>) )
isPrime :: (Integral i) => i -> Bool
isPrime n = isPrime_ n primes
where isPrime_ n (p:ps)
| p * p > n = True
| n `mod` p == 0 = False
| otherwise = isPrime_ n ps
primes :: (Integral i) => [i]
primes = 2 : filter isPrime [3,5..]
primeFactors :: (Integral i) => i -> [i]
primeFactors n = factors n primes
where factors n (x:xs)
| x * x > n = [n]
| n `mod` x == 0 = x : factors (n `div` x) (x:xs)
| otherwise = factors n xs
primacity :: (Integral i) => i -> Int
primacity = length . nub . primeFactors
user :: IO Int
user = do
xs <- getLine
let a :: Int = read . takeWhile (/=' ') . dropN 0 $ xs
let b :: Int = read . takeWhile (/=' ') . dropN 1 $ xs
let k :: Int = read . takeWhile (/=' ') . dropN 2 $ xs
let n = length . filter (== k) . fmap primacity $ [a..b]
pure n
where
dropN 0 = id
dropN n = dropN (pred n) . drop 1 . dropWhile (/= ' ')
printNTimes :: Int -> Int -> IO ()
printNTimes 0 _ = pure ()
printNTimes n total = do
ans <- user
putStr $ "Case #" <> show (total - n + 1) <> ": "
putStrLn $ show ans
printNTimes (pred n) total
main :: IO ()
main = do
n :: Int <- read <$> getLine
printNTimes n n
This is basically mutual recursion mixed with laziness. Might take a while to understand it, but I can guarantee that it's fast.
Yes, of course. You're effectively using two different algorithms. Your Haskell zipWith ($) (incEvery n) xs has to process every entry of your list, while your Perl for my $j (2 .. ($max / $i)) { $f[$i * $j] ++; } only has to process the entries it actually increments, which is a factor of $i faster. This is a prototypical example of a problem where mutable arrays are helpful: in Haskell you can use STUArray, for example.

Haskell performance tuning

I'm quite new to Haskell, and to learn it better I started solving problems here and there and I ended up with this (project Euler 34).
145 is a curious number, as 1! + 4! + 5! = 1 + 24 + 120 = 145.
Find the sum of all numbers which are equal to the sum of the factorial >of their digits.
Note: as 1! = 1 and 2! = 2 are not sums they are not included.
I wrote a C and an Haskell brute force solution.
Could someone explain me the Haskell version is ~15x (~0.450 s vs ~6.5s )slower than the C implementation and how to possibly tune and speedup the Haskell solution?
unsigned int solve(){
unsigned int result = 0;
unsigned int i=10;
while(i<2540161){
unsigned int sumOfFacts = 0;
unsigned int number = i;
while (number > 0) {
unsigned int d = number % 10;
number /= 10;
sumOfFacts += factorial(d);
}
if (sumOfFacts == i)
result += i;
i++;
}
return result;
}
here the haskell solution
--BRUTE FORCE SOLUTION
solve:: Int
solve = sum (filter (\x-> sfc x 0 == x) [10..2540160])
--sum factorial of digits
sfc :: Int -> Int -> Int
sfc 0 acc = acc
sfc n acc = sfc n' (acc+fc r)
where
n' = div n 10
r = mod n 10 --n-(10*n')
fc 0 =1
fc 1 =1
fc 2 =2
fc 3 =6
fc 4 =24
fc 5 =120
fc 6 =720
fc 7 =5040
fc 8 =40320
fc 9 =362880
First, compile with optimizations. With ghc-7.10.1 -O2 -fllvm, the Haskell version runs in 0.54 secs for me. This is already pretty good.
If we want to do even better, we should first replace div with quot and mod with rem. div and mod do some extra work, because they handle the rounding of negative numbers differently. Since we only have positive numbers here, we should switch to the faster functions.
Second, we should replace the pattern matching in fc with an array lookup. GHC uses a branching construct for Int patterns, and uses binary search when the number of cases is large enough. We can do better here with a lookup.
The new code looks like this:
import qualified Data.Vector.Unboxed as V
facs :: V.Vector Int
facs =
V.fromList [1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880]
--BRUTE FORCE SOLUTION
solve:: Int
solve = sum (filter (\x-> sfc x 0 == x) [10..2540160])
--sum factorial of digits
sfc :: Int -> Int -> Int
sfc 0 acc = acc
sfc n acc = sfc n' (acc + V.unsafeIndex facs r)
where
(n', r) = quotRem n 10
main = print solve
It runs in 0.095 seconds on my computer.

An efficient algorithm to calculate the integer square root (isqrt) of arbitrarily large integers

Notice
For a solution in Erlang or C / C++, go to Trial 4 below.
Wikipedia Articles
Integer square root
The definition of "integer square root" could be found here
Methods of computing square roots
An algorithm that does "bit magic" could be found here
[ Trial 1 : Using Library Function ]
Code
isqrt(N) when erlang:is_integer(N), N >= 0 ->
erlang:trunc(math:sqrt(N)).
Problem
This implementation uses the sqrt() function from the C library, so it does not work with arbitrarily large integers (Note that the returned result does not match the input. The correct answer should be 12345678901234567890):
Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.4 (abort with ^G)
1> erlang:trunc(math:sqrt(12345678901234567890 * 12345678901234567890)).
12345678901234567168
2>
[ Trial 2 : Using Bigint + Only ]
Code
isqrt2(N) when erlang:is_integer(N), N >= 0 ->
isqrt2(N, 0, 3, 0).
isqrt2(N, I, _, Result) when I >= N ->
Result;
isqrt2(N, I, Times, Result) ->
isqrt2(N, I + Times, Times + 2, Result + 1).
Description
This implementation is based on the following observation:
isqrt(0) = 0 # <--- One 0
isqrt(1) = 1 # <-+
isqrt(2) = 1 # |- Three 1's
isqrt(3) = 1 # <-+
isqrt(4) = 2 # <-+
isqrt(5) = 2 # |
isqrt(6) = 2 # |- Five 2's
isqrt(7) = 2 # |
isqrt(8) = 2 # <-+
isqrt(9) = 3 # <-+
isqrt(10) = 3 # |
isqrt(11) = 3 # |
isqrt(12) = 3 # |- Seven 3's
isqrt(13) = 3 # |
isqrt(14) = 3 # |
isqrt(15) = 3 # <-+
isqrt(16) = 4 # <--- Nine 4's
...
Problem
This implementation involves only bigint additions so I expected it to run fast. However, when I fed it with 1111111111111111111111111111111111111111 * 1111111111111111111111111111111111111111, it seems to run forever on my (very fast) machine.
[ Trial 3 : Using Binary Search with Bigint +1, -1 and div 2 Only ]
Code
Variant 1 (My original implementation)
isqrt3(N) when erlang:is_integer(N), N >= 0 ->
isqrt3(N, 1, N).
isqrt3(_N, Low, High) when High =:= Low + 1 ->
Low;
isqrt3(N, Low, High) ->
Mid = (Low + High) div 2,
MidSqr = Mid * Mid,
if
%% This also catches N = 0 or 1
MidSqr =:= N ->
Mid;
MidSqr < N ->
isqrt3(N, Mid, High);
MidSqr > N ->
isqrt3(N, Low, Mid)
end.
Variant 2 (modified above code so that the boundaries go with Mid+1 or Mid-1 instead, with reference to the answer by Vikram Bhat)
isqrt3a(N) when erlang:is_integer(N), N >= 0 ->
isqrt3a(N, 1, N).
isqrt3a(N, Low, High) when Low >= High ->
HighSqr = High * High,
if
HighSqr > N ->
High - 1;
HighSqr =< N ->
High
end;
isqrt3a(N, Low, High) ->
Mid = (Low + High) div 2,
MidSqr = Mid * Mid,
if
%% This also catches N = 0 or 1
MidSqr =:= N ->
Mid;
MidSqr < N ->
isqrt3a(N, Mid + 1, High);
MidSqr > N ->
isqrt3a(N, Low, Mid - 1)
end.
Problem
Now it solves the 79-digit number (namely 1111111111111111111111111111111111111111 * 1111111111111111111111111111111111111111) in lightening speed, the result is shown immediately. However, it takes 60 seconds (+- 2 seconds) on my machine to solve one million (1,000,000) 61-digit numbers (namely, from 1000000000000000000000000000000000000000000000000000000000000 to 1000000000000000000000000000000000000000000000000000001000000). I would like to do it even faster.
[ Trial 4 : Using Newton's Method with Bigint + and div Only ]
Code
isqrt4(0) -> 0;
isqrt4(N) when erlang:is_integer(N), N >= 0 ->
isqrt4(N, N).
isqrt4(N, Xk) ->
Xk1 = (Xk + N div Xk) div 2,
if
Xk1 >= Xk ->
Xk;
Xk1 < Xk ->
isqrt4(N, Xk1)
end.
Code in C / C++ (for your interest)
Recursive variant
#include <stdint.h>
uint32_t isqrt_impl(
uint64_t const n,
uint64_t const xk)
{
uint64_t const xk1 = (xk + n / xk) / 2;
return (xk1 >= xk) ? xk : isqrt_impl(n, xk1);
}
uint32_t isqrt(uint64_t const n)
{
if (n == 0) return 0;
if (n == 18446744073709551615ULL) return 4294967295U;
return isqrt_impl(n, n);
}
Iterative variant
#include <stdint.h>
uint32_t isqrt_iterative(uint64_t const n)
{
uint64_t xk = n;
if (n == 0) return 0;
if (n == 18446744073709551615ULL) return 4294967295U;
do
{
uint64_t const xk1 = (xk + n / xk) / 2;
if (xk1 >= xk)
{
return xk;
}
else
{
xk = xk1;
}
} while (1);
}
Problem
The Erlang code solves one million (1,000,000) 61-digit numbers in 40 seconds (+- 1 second) on my machine, so this is faster than Trial 3. Can it go even faster?
About My Machine
Processor : 3.4 GHz Intel Core i7
Memory : 32 GB 1600 MHz DDR3
OS : Mac OS X Version 10.9.1
Related Questions
Integer square root in python
The answer by user448810 uses "Newton's Method". I'm not sure whether doing the division using "integer division" is okay or not. I'll try this later as an update. [UPDATE (2015-01-11): It is okay to do so]
The answer by math involves using a 3rd party Python package gmpy, which is not very favourable to me, since I'm primarily interested in solving it in Erlang with only builtin facilities.
The answer by DSM seems interesting. I don't really understand what is going on, but it seems that "bit magic" is involved there, and so it's not quite suitable for me too.
Infinite Recursion in Meta Integer Square Root
This question is for C++, and the algorithm by AraK (the questioner) looks like it's from the same idea as Trial 2 above.
How about binary search like following doesn't need floating divisions only integer multiplications (Slower than newtons method) :-
low = 1;
/* More efficient bound
high = pow(10,log10(target)/2+1);
*/
high = target
while(low<high) {
mid = (low+high)/2;
currsq = mid*mid;
if(currsq==target) {
return(mid);
}
if(currsq<target) {
if((mid+1)*(mid+1)>target) {
return(mid);
}
low = mid+1;
}
else {
high = mid-1;
}
}
This works for O(logN) iterations so should not run forever for even very large numbers
Log10(target) Computation if needed :-
acc = target
log10 = 0;
while(acc>0) {
log10 = log10 + 1;
acc = acc/10;
}
Note : acc/10 is integer division
Edit :-
Efficient bound :- The sqrt(n) has about half the number of digits as n so you can pass high = 10^(log10(N)/2+1) && low = 10^(log10(N)/2-1) to get tighter bound and it should provide 2 times speed up.
Evaluate bound:-
bound = 1;
acc = N;
count = 0;
while(acc>0) {
acc = acc/10;
if(count%2==0) {
bound = bound*10;
}
count++;
}
high = bound*10;
low = bound/10;
isqrt(N,low,high);

Why is my Scala tail-recursion faster than the while loop?

Here are two solutions to exercise 4.9 in Cay Horstmann's Scala for the Impatient: "Write a function lteqgt(values: Array[Int], v: Int) that returns a triple containing the counts of values less than v, equal to v, and greater than v." One uses tail recursion, the other uses a while loop. I thought that both would compile to similar bytecode but the while loop is slower than the tail recursion by a factor of almost 2. This suggests to me that my while method is badly written.
import scala.annotation.tailrec
import scala.util.Random
object PerformanceTest {
def main(args: Array[String]): Unit = {
val bigArray:Array[Int] = fillArray(new Array[Int](100000000))
println(time(lteqgt(bigArray, 25)))
println(time(lteqgt2(bigArray, 25)))
}
def time[T](block : => T):T = {
val start = System.nanoTime : Double
val result = block
val end = System.nanoTime : Double
println("Time = " + (end - start) / 1000000.0 + " millis")
result
}
#tailrec def fillArray(a:Array[Int], pos:Int=0):Array[Int] = {
if (pos == a.length)
a
else {
a(pos) = Random.nextInt(50)
fillArray(a, pos+1)
}
}
#tailrec def lteqgt(values: Array[Int], v:Int, lt:Int=0, eq:Int=0, gt:Int=0, pos:Int=0):(Int, Int, Int) = {
if (pos == values.length)
(lt, eq, gt)
else
lteqgt(values, v, lt + (if (values(pos) < v) 1 else 0), eq + (if (values(pos) == v) 1 else 0), gt + (if (values(pos) > v) 1 else 0), pos+1)
}
def lteqgt2(values:Array[Int], v:Int):(Int, Int, Int) = {
var lt = 0
var eq = 0
var gt = 0
var pos = 0
val limit = values.length
while (pos < limit) {
if (values(pos) > v)
gt += 1
else if (values(pos) < v)
lt += 1
else
eq += 1
pos += 1
}
(lt, eq, gt)
}
}
Adjust the size of bigArray according to your heap size. Here is some sample output:
Time = 245.110899 millis
(50004367,2003090,47992543)
Time = 465.836894 millis
(50004367,2003090,47992543)
Why is the while method so much slower than the tailrec? Naively the tailrec version looks to be at a slight disadvantage, as it must always perform 3 "if" checks for every iteration, whereas the while version will often only perform 1 or 2 tests due to the else construct. (NB reversing the order I perform the two methods does not affect the outcome).
Test results (after reducing array size to 20000000)
Under Java 1.6.22 I get 151 and 122 ms for tail-recursion and while-loop respectively.
Under Java 1.7.0 I get 55 and 101 ms
So under Java 6 your while-loop is actually faster; both have improved in performance under Java 7, but the tail-recursive version has overtaken the loop.
Explanation
The performance difference is due to the fact that in your loop, you conditionally add 1 to the totals, while for recursion you always add either 1 or 0. So they are not equivalent. The equivalent while-loop to your recursive method is:
def lteqgt2(values:Array[Int], v:Int):(Int, Int, Int) = {
var lt = 0
var eq = 0
var gt = 0
var pos = 0
val limit = values.length
while (pos < limit) {
gt += (if (values(pos) > v) 1 else 0)
lt += (if (values(pos) < v) 1 else 0)
eq += (if (values(pos) == v) 1 else 0)
pos += 1
}
(lt, eq, gt)
}
and this gives exactly the same execution time as the recursive method (regardless of Java version).
Discussion
I'm not an expert on why the Java 7 VM (HotSpot) can optimize this better than your first version, but I'd guess it's because it's taking the same path through the code each time (rather than branching along the if / else if paths), so the bytecode can be inlined more efficiently.
But remember that this is not the case in Java 6. Why one while-loop outperforms the other is a question of JVM internals. Happily for the Scala programmer, the version produced from idiomatic tail-recursion is the faster one in the latest version of the JVM.
The difference could also be occurring at the processor level. See this question, which explains how code slows down if it contains unpredictable branching.
The two constructs are not identical. In particular, in the first case you don't need any jumps (on x86, you can use cmp and setle and add, instead of having to use cmp and jb and (if you don't jump) add. Not jumping is faster than jumping on pretty much every modern architecture.
So, if you have code that looks like
if (a < b) x += 1
where you may add or you may jump instead, vs.
x += (a < b)
(which only makes sense in C/C++ where 1 = true and 0 = false), the latter tends to be faster as it can be turned into more compact assembly code. In Scala/Java, you can't do this, but you can do
x += if (a < b) 1 else 0
which a smart JVM should recognize is the same as x += (a < b), which has a jump-free machine code translation, which is usually faster than jumping. An even smarter JVM would recognize that
if (a < b) x += 1
is the same yet again (because adding zero doesn't do anything).
C/C++ compilers routinely perform optimizations like this. Being unable to apply any of these optimizations was not a mark in the JIT compiler's favor; apparently it can as of 1.7, but only partially (i.e. it doesn't recognize that adding zero is the same as a conditional adding one, but it does at least convert x += if (a<b) 1 else 0 into fast machine code).
Now, none of this has anything to do with tail recursion or while loops per se. With tail recursion it's more natural to write the if (a < b) 1 else 0 form, but you can do either; and with while loops you can also do either. It just so happened that you picked one form for tail recursion and the other for the while loop, making it look like recursion vs. looping was the change instead of the two different ways to do the conditionals.

How do I find a factorial? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
How can I write a program to find the factorial of any natural number?
This will work for the factorial (although a very small subset) of positive integers:
unsigned long factorial(unsigned long f)
{
if ( f == 0 )
return 1;
return(f * factorial(f - 1));
}
printf("%i", factorial(5));
Due to the nature of your problem (and level that you have admitted), this solution is based more in the concept of solving this rather than a function that will be used in the next "Permutation Engine".
This calculates factorials of non-negative integers[*] up to ULONG_MAX, which will have so many digits that it's unlikely your machine can store a whole lot more, even if it has time to calculate them. Uses the GNU multiple precision library, which you need to link against.
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>
void factorial(mpz_t result, unsigned long input) {
mpz_set_ui(result, 1);
while (input > 1) {
mpz_mul_ui(result, result, input--);
}
}
int main() {
mpz_t fact;
unsigned long input = 0;
char *buf;
mpz_init(fact);
scanf("%lu", &input);
factorial(fact, input);
buf = malloc(mpz_sizeinbase(fact, 10) + 1);
assert(buf);
mpz_get_str(buf, 10, fact);
printf("%s\n", buf);
free(buf);
mpz_clear(fact);
}
Example output:
$ make factorial CFLAGS="-L/bin/ -lcyggmp-3 -pedantic" -B && ./factorial
cc -L/bin/ -lcyggmp-3 -pedantic factorial.c -o factorial
100
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
[*] If you mean something else by "number" then you'll have to be more specific. I'm not aware of any other numbers for which the factorial is defined, despite valiant efforts by Pascal to extend the domain by use of the Gamma function.
Why do it in C when you can do it in Haskell:
Freshman Haskell programmer
fac n = if n == 0
then 1
else n * fac (n-1)
Sophomore Haskell programmer, at MIT (studied Scheme as a freshman)
fac = (\(n) ->
(if ((==) n 0)
then 1
else ((*) n (fac ((-) n 1)))))
Junior Haskell programmer (beginning Peano player)
fac 0 = 1
fac (n+1) = (n+1) * fac n
Another junior Haskell programmer (read that n+k patterns are “a
disgusting part of Haskell” 1 and joined the “Ban n+k
patterns”-movement [2])
fac 0 = 1
fac n = n * fac (n-1)
Senior Haskell programmer (voted for Nixon Buchanan Bush —
“leans right”)
fac n = foldr (*) 1 [1..n]
Another senior Haskell programmer (voted for McGovern Biafra
Nader — “leans left”)
fac n = foldl (*) 1 [1..n]
Yet another senior Haskell programmer (leaned so far right he came
back left again!)
-- using foldr to simulate foldl
fac n = foldr (\x g n -> g (x*n)) id [1..n] 1
Memoizing Haskell programmer (takes Ginkgo Biloba daily)
facs = scanl (*) 1 [1..]
fac n = facs !! n
Pointless (ahem) “Points-free” Haskell programmer (studied at
Oxford)
fac = foldr (*) 1 . enumFromTo 1
Iterative Haskell programmer (former Pascal programmer)
fac n = result (for init next done)
where init = (0,1)
next (i,m) = (i+1, m * (i+1))
done (i,_) = i==n
result (_,m) = m
for i n d = until d n i
Iterative one-liner Haskell programmer (former APL and C programmer)
fac n = snd (until ((>n) . fst) (\(i,m) -> (i+1, i*m)) (1,1))
Accumulating Haskell programmer (building up to a quick climax)
facAcc a 0 = a
facAcc a n = facAcc (n*a) (n-1)
fac = facAcc 1
Continuation-passing Haskell programmer (raised RABBITS in early
years, then moved to New Jersey)
facCps k 0 = k 1
facCps k n = facCps (k . (n *)) (n-1)
fac = facCps id
Boy Scout Haskell programmer (likes tying knots; always “reverent,”
he belongs to the Church of the Least Fixed-Point [8])
y f = f (y f)
fac = y (\f n -> if (n==0) then 1 else n * f (n-1))
Combinatory Haskell programmer (eschews variables, if not
obfuscation; all this currying’s just a phase, though it seldom
hinders)
s f g x = f x (g x)
k x y = x
b f g x = f (g x)
c f g x = f x g
y f = f (y f)
cond p f g x = if p x then f x else g x
fac = y (b (cond ((==) 0) (k 1)) (b (s (*)) (c b pred)))
List-encoding Haskell programmer (prefers to count in unary)
arb = () -- "undefined" is also a good RHS, as is "arb" :)
listenc n = replicate n arb
listprj f = length . f . listenc
listprod xs ys = [ i (x,y) | x<-xs, y<-ys ]
where i _ = arb
facl [] = listenc 1
facl n#(_:pred) = listprod n (facl pred)
fac = listprj facl
Interpretive Haskell programmer (never “met a language” he didn't
like)
-- a dynamically-typed term language
data Term = Occ Var
| Use Prim
| Lit Integer
| App Term Term
| Abs Var Term
| Rec Var Term
type Var = String
type Prim = String
-- a domain of values, including functions
data Value = Num Integer
| Bool Bool
| Fun (Value -> Value)
instance Show Value where
show (Num n) = show n
show (Bool b) = show b
show (Fun _) = ""
prjFun (Fun f) = f
prjFun _ = error "bad function value"
prjNum (Num n) = n
prjNum _ = error "bad numeric value"
prjBool (Bool b) = b
prjBool _ = error "bad boolean value"
binOp inj f = Fun (\i -> (Fun (\j -> inj (f (prjNum i) (prjNum j)))))
-- environments mapping variables to values
type Env = [(Var, Value)]
getval x env = case lookup x env of
Just v -> v
Nothing -> error ("no value for " ++ x)
-- an environment-based evaluation function
eval env (Occ x) = getval x env
eval env (Use c) = getval c prims
eval env (Lit k) = Num k
eval env (App m n) = prjFun (eval env m) (eval env n)
eval env (Abs x m) = Fun (\v -> eval ((x,v) : env) m)
eval env (Rec x m) = f where f = eval ((x,f) : env) m
-- a (fixed) "environment" of language primitives
times = binOp Num (*)
minus = binOp Num (-)
equal = binOp Bool (==)
cond = Fun (\b -> Fun (\x -> Fun (\y -> if (prjBool b) then x else y)))
prims = [ ("*", times), ("-", minus), ("==", equal), ("if", cond) ]
-- a term representing factorial and a "wrapper" for evaluation
facTerm = Rec "f" (Abs "n"
(App (App (App (Use "if")
(App (App (Use "==") (Occ "n")) (Lit 0))) (Lit 1))
(App (App (Use "*") (Occ "n"))
(App (Occ "f")
(App (App (Use "-") (Occ "n")) (Lit 1))))))
fac n = prjNum (eval [] (App facTerm (Lit n)))
Static Haskell programmer (he does it with class, he’s got that
fundep Jones! After Thomas Hallgren’s “Fun with Functional
Dependencies” [7])
-- static Peano constructors and numerals
data Zero
data Succ n
type One = Succ Zero
type Two = Succ One
type Three = Succ Two
type Four = Succ Three
-- dynamic representatives for static Peanos
zero = undefined :: Zero
one = undefined :: One
two = undefined :: Two
three = undefined :: Three
four = undefined :: Four
-- addition, a la Prolog
class Add a b c | a b -> c where
add :: a -> b -> c
instance Add Zero b b
instance Add a b c => Add (Succ a) b (Succ c)
-- multiplication, a la Prolog
class Mul a b c | a b -> c where
mul :: a -> b -> c
instance Mul Zero b Zero
instance (Mul a b c, Add b c d) => Mul (Succ a) b d
-- factorial, a la Prolog
class Fac a b | a -> b where
fac :: a -> b
instance Fac Zero One
instance (Fac n k, Mul (Succ n) k m) => Fac (Succ n) m
-- try, for "instance" (sorry):
--
-- :t fac four
Beginning graduate Haskell programmer (graduate education tends to
liberate one from petty concerns about, e.g., the efficiency of
hardware-based integers)
-- the natural numbers, a la Peano
data Nat = Zero | Succ Nat
-- iteration and some applications
iter z s Zero = z
iter z s (Succ n) = s (iter z s n)
plus n = iter n Succ
mult n = iter Zero (plus n)
-- primitive recursion
primrec z s Zero = z
primrec z s (Succ n) = s n (primrec z s n)
-- two versions of factorial
fac = snd . iter (one, one) (\(a,b) -> (Succ a, mult a b))
fac' = primrec one (mult . Succ)
-- for convenience and testing (try e.g. "fac five")
int = iter 0 (1+)
instance Show Nat where
show = show . int
(zero : one : two : three : four : five : _) = iterate Succ Zero
Origamist Haskell programmer
(always starts out with the “basic Bird fold”)
-- (curried, list) fold and an application
fold c n [] = n
fold c n (x:xs) = c x (fold c n xs)
prod = fold (*) 1
-- (curried, boolean-based, list) unfold and an application
unfold p f g x =
if p x
then []
else f x : unfold p f g (g x)
downfrom = unfold (==0) id pred
-- hylomorphisms, as-is or "unfolded" (ouch! sorry ...)
refold c n p f g = fold c n . unfold p f g
refold' c n p f g x =
if p x
then n
else c (f x) (refold' c n p f g (g x))
-- several versions of factorial, all (extensionally) equivalent
fac = prod . downfrom
fac' = refold (*) 1 (==0) id pred
fac'' = refold' (*) 1 (==0) id pred
Cartesianally-inclined Haskell programmer (prefers Greek food,
avoids the spicy Indian stuff; inspired by Lex Augusteijn’s “Sorting
Morphisms” [3])
-- (product-based, list) catamorphisms and an application
cata (n,c) [] = n
cata (n,c) (x:xs) = c (x, cata (n,c) xs)
mult = uncurry (*)
prod = cata (1, mult)
-- (co-product-based, list) anamorphisms and an application
ana f = either (const []) (cons . pair (id, ana f)) . f
cons = uncurry (:)
downfrom = ana uncount
uncount 0 = Left ()
uncount n = Right (n, n-1)
-- two variations on list hylomorphisms
hylo f g = cata g . ana f
hylo' f (n,c) = either (const n) (c . pair (id, hylo' f (c,n))) . f
pair (f,g) (x,y) = (f x, g y)
-- several versions of factorial, all (extensionally) equivalent
fac = prod . downfrom
fac' = hylo uncount (1, mult)
fac'' = hylo' uncount (1, mult)
Ph.D. Haskell programmer (ate so many bananas that his eyes bugged
out, now he needs new lenses!)
-- explicit type recursion based on functors
newtype Mu f = Mu (f (Mu f)) deriving Show
in x = Mu x
out (Mu x) = x
-- cata- and ana-morphisms, now for *arbitrary* (regular) base functors
cata phi = phi . fmap (cata phi) . out
ana psi = in . fmap (ana psi) . psi
-- base functor and data type for natural numbers,
-- using a curried elimination operator
data N b = Zero | Succ b deriving Show
instance Functor N where
fmap f = nelim Zero (Succ . f)
nelim z s Zero = z
nelim z s (Succ n) = s n
type Nat = Mu N
-- conversion to internal numbers, conveniences and applications
int = cata (nelim 0 (1+))
instance Show Nat where
show = show . int
zero = in Zero
suck = in . Succ -- pardon my "French" (Prelude conflict)
plus n = cata (nelim n suck )
mult n = cata (nelim zero (plus n))
-- base functor and data type for lists
data L a b = Nil | Cons a b deriving Show
instance Functor (L a) where
fmap f = lelim Nil (\a b -> Cons a (f b))
lelim n c Nil = n
lelim n c (Cons a b) = c a b
type List a = Mu (L a)
-- conversion to internal lists, conveniences and applications
list = cata (lelim [] (:))
instance Show a => Show (List a) where
show = show . list
prod = cata (lelim (suck zero) mult)
upto = ana (nelim Nil (diag (Cons . suck)) . out)
diag f x = f x x
fac = prod . upto
Post-doc Haskell programmer
(from Uustalu, Vene and Pardo’s “Recursion Schemes from Comonads” [4])
-- explicit type recursion with functors and catamorphisms
newtype Mu f = In (f (Mu f))
unIn (In x) = x
cata phi = phi . fmap (cata phi) . unIn
-- base functor and data type for natural numbers,
-- using locally-defined "eliminators"
data N c = Z | S c
instance Functor N where
fmap g Z = Z
fmap g (S x) = S (g x)
type Nat = Mu N
zero = In Z
suck n = In (S n)
add m = cata phi where
phi Z = m
phi (S f) = suck f
mult m = cata phi where
phi Z = zero
phi (S f) = add m f
-- explicit products and their functorial action
data Prod e c = Pair c e
outl (Pair x y) = x
outr (Pair x y) = y
fork f g x = Pair (f x) (g x)
instance Functor (Prod e) where
fmap g = fork (g . outl) outr
-- comonads, the categorical "opposite" of monads
class Functor n => Comonad n where
extr :: n a -> a
dupl :: n a -> n (n a)
instance Comonad (Prod e) where
extr = outl
dupl = fork id outr
-- generalized catamorphisms, zygomorphisms and paramorphisms
gcata :: (Functor f, Comonad n) =>
(forall a. f (n a) -> n (f a))
-> (f (n c) -> c) -> Mu f -> c
gcata dist phi = extr . cata (fmap phi . dist . fmap dupl)
zygo chi = gcata (fork (fmap outl) (chi . fmap outr))
para :: Functor f => (f (Prod (Mu f) c) -> c) -> Mu f -> c
para = zygo In
-- factorial, the *hard* way!
fac = para phi where
phi Z = suck zero
phi (S (Pair f n)) = mult f (suck n)
-- for convenience and testing
int = cata phi where
phi Z = 0
phi (S f) = 1 + f
instance Show (Mu N) where
show = show . int
Tenured professor (teaching Haskell to freshmen)
fac n = product [1..n]
Content from The Evolution of a Haskell Programmer by Fritz Ruehr, Willamette University - 11 July 01
Thanks to Christoph, a C99 solution that works for quite a few "numbers":
#include <math.h>
#include <stdio.h>
double fact(double x)
{
return tgamma(x+1.);
}
int main()
{
printf("%f %f\n", fact(3.0), fact(5.0));
return 0;
}
produces 6.000000 120.000000
For large n you may run into some issues and you may want to use Stirling's approximation:
Which is:
If your main objective is an interesting looking function:
int facorial(int a) {
int b = 1, c, d, e;
a--;
for (c = a; c > 0; c--)
for (d = b; d > 0; d--)
for (e = c; e > 0; e--)
b++;
return b;
}
(Not recommended as an algorithm for real use.)
a tail-recursive version:
long factorial(long n)
{
return tr_fact(n, 1);
}
static long tr_fact(long n, long result)
{
if(n==1)
return result;
else
return tr_fact(n-1, n*result);
}
In C99 (or Java) I would write the factorial function iteratively like this:
int factorial(int n)
{
int result = 1;
for (int i = 2; i <= n; i++)
{
result *= i;
}
return result;
}
C is not a functional language and you can't rely on tail-call optimization. So don't use recursion in C (or Java) unless you need to.
Just because factorial is often used as the first example for recursion it doesn't mean you need recursion to compute it.
This will overflow silently if n is too big, as is the custom in C (and Java).
If the numbers int can represent are too small for the factorials you want to compute then choose another number type. long long if it needs be just a little bit bigger, float or double if n isn't too big and you don't mind some imprecision, or big integers if you want the exact values of really big factorials.
Here's a C program that uses OPENSSL's BIGNUM implementation, and therefore is not particularly useful for students. (Of course accepting a BIGNUM as the input parameter is crazy, but helpful for demonstrating interaction between BIGNUMs).
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <openssl/crypto.h>
#include <openssl/bn.h>
BIGNUM *factorial(const BIGNUM *num)
{
BIGNUM *count = BN_new();
BIGNUM *fact = NULL;
BN_CTX *ctx = NULL;
BN_one(count);
if( BN_cmp(num, BN_value_one()) <= 0 )
{
return count;
}
ctx = BN_CTX_new();
fact = BN_dup(num);
BN_sub(count, fact, BN_value_one());
while( BN_cmp(count, BN_value_one()) > 0 )
{
BN_mul(fact, count, fact, ctx);
BN_sub(count, count, BN_value_one());
}
BN_CTX_free(ctx);
BN_free(count);
return fact;
}
This test program shows how to create a number for input and what to do with the return value:
int main(int argc, char *argv[])
{
const char *test_cases[] =
{
"0", "1",
"1", "1",
"4", "24",
"15", "1307674368000",
"30", "265252859812191058636308480000000",
"56", "710998587804863451854045647463724949736497978881168458687447040000000000000",
NULL, NULL
};
int index = 0;
BIGNUM *bn = NULL;
BIGNUM *fact = NULL;
char *result_str = NULL;
for( index = 0; test_cases[index] != NULL; index += 2 )
{
BN_dec2bn(&bn, test_cases[index]);
fact = factorial(bn);
result_str = BN_bn2dec(fact);
printf("%3s: %s\n", test_cases[index], result_str);
assert(strcmp(result_str, test_cases[index + 1]) == 0);
OPENSSL_free(result_str);
BN_free(fact);
BN_free(bn);
bn = NULL;
}
return 0;
}
Compiled with gcc:
gcc factorial.c -o factorial -g -lcrypto
int factorial(int n){
return n <= 1 ? 1 : n * factorial(n-1);
}
You use the following code to do it.
#include <stdio.h>
#include <stdlib.h>
int main()
{
int x, number, fac;
fac = 1;
printf("Enter a number:\n");
scanf("%d",&number);
if(number<0)
{
printf("Factorial not defined for negative numbers.\n");
exit(0);
}
for(x = 1; x <= number; x++)
{
if (number >= 0)
fac = fac * x;
else
fac=1;
}
printf("%d! = %d\n", number, fac);
}
For large numbers you probably can get away with an approximate solution, which tgamma gives you (n! = Gamma(n+1)) from math.h. If you want even larger numbers, they won't fit in a double, so you should use lgamma (natural log of the gamma function) instead.
If you're working somewhere without a full C99 math.h, you can easily do this type of thing yourself:
double logfactorial(int n) {
double fac = 0.0;
for ( ; n>1 ; n--) fac += log(fac);
return fac;
}
I don't think I'd use this in most cases, but one well-known practice which is becoming less widely used is to have a look-up table. If we're only working with built-in types, the memory hit is tiny.
Just another approach, to make the poster aware of a different technique. Many recursive solutions also can be memoized whereby a lookup table is filled in when the algorithm runs, drastically reducing the cost on future calls (kind of like the principle behind .NET JIT compilation I guess).
We have to start from 1 to the limit specfied say n.Start from 1*2*3...*n.
In c, i am writing it as a function.
main()
{
int n;
scanf("%d",&n);
printf("%ld",fact(n));
}
long int fact(int n)
{
long int facto=1;
int i;
for(i=1;i<=n;i++)
{
facto=facto*i;
}
return facto;
}
Simple solution:
unsigned int factorial(unsigned int n)
{
return (n == 1 || n == 0) ? 1 : factorial(n - 1) * n;
}
Simplest and most efficient is to sum up logarithms. If you use Log10 you get power and exponent.
Pseudocode
r=0
for i from 1 to n
r= r + log(i)/log(10)
print "result is:", 10^(r-floor(r)) ,"*10^" , floor(r)
You might need to add the code so the integer part does not increase too much and thus decrease accuracy, but result should be ok for even very large factorials.
Example in C using recursion
unsigned long factorial(unsigned long f)
{
if (f) return(f * factorial(f - 1));
return 1;
}
printf("%lu", factorial(5));
I used this code for Factorial:
#include<stdio.h>
int main(){
int i=1,f=1,n;
printf("\n\nEnter a number: ");
scanf("%d",&n);
while(i<=n){
f=f*i;
i++;
}
printf("Factorial of is: %d",f);
getch();
}
I would do this with a pre-calculated lookup table as suggested by Mr. Boy. This would be faster to calculate than an iterative or recursive solution. It relies on how fast n! grows, because the largest n! you can calculate without overflowing an unsigned long long (max value of 18,446,744,073,709,551,615) is only 20!, so you only need an array with 21 elements. Here's how it would look in c:
long long factorial (int n) {
long long f[22] = {1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800, 39916800, 479001600, 6227020800, 87178291200, 1307674368000, 20922789888000, 355687428096000, 6402373705728000, 121645100408832000, 2432902008176640000, 51090942171709440000};
return f[n];
}
See for yourself!

Resources