Haskell: Caches, memoization, and referential transparency [duplicate] - performance

I can't figure out why m1 is apparently memoized while m2 is not in the following:
m1 = ((filter odd [1..]) !!)
m2 n = ((filter odd [1..]) !! n)
m1 10000000 takes about 1.5 seconds on the first call, and a fraction of that on subsequent calls (presumably it caches the list), whereas m2 10000000 always takes the same amount of time (rebuilding the list with each call). Any idea what's going on? Are there any rules of thumb as to if and when GHC will memoize a function? Thanks.

GHC does not memoize functions.
It does, however, compute any given expression in the code at most once per time that its surrounding lambda-expression is entered, or at most once ever if it is at top level. Determining where the lambda-expressions are can be a little tricky when you use syntactic sugar like in your example, so let's convert these to equivalent desugared syntax:
m1' = (!!) (filter odd [1..]) -- NB: See below!
m2' = \n -> (!!) (filter odd [1..]) n
(Note: The Haskell 98 report actually describes a left operator section like (a %) as equivalent to \b -> (%) a b, but GHC desugars it to (%) a. These are technically different because they can be distinguished by seq. I think I might have submitted a GHC Trac ticket about this.)
Given this, you can see that in m1', the expression filter odd [1..] is not contained in any lambda-expression, so it will only be computed once per run of your program, while in m2', filter odd [1..] will be computed each time the lambda-expression is entered, i.e., on each call of m2'. That explains the difference in timing you are seeing.
Actually, some versions of GHC, with certain optimization options, will share more values than the above description indicates. This can be problematic in some situations. For example, consider the function
f = \x -> let y = [1..30000000] in foldl' (+) 0 (y ++ [x])
GHC might notice that y does not depend on x and rewrite the function to
f = let y = [1..30000000] in \x -> foldl' (+) 0 (y ++ [x])
In this case, the new version is much less efficient because it will have to read about 1 GB from memory where y is stored, while the original version would run in constant space and fit in the processor's cache. In fact, under GHC 6.12.1, the function f is almost twice as fast when compiled without optimizations than it is compiled with -O2.

m1 is computed only once because it is a Constant Applicative Form, while m2 is not a CAF, and so is computed for each evaluation.
See the GHC wiki on CAFs: http://www.haskell.org/haskellwiki/Constant_applicative_form

There is a crucial difference between the two forms: the monomorphism restriction applies to m1 but not m2, because m2 has explicitly given arguments. So m2's type is general but m1's is specific. The types they are assigned are:
m1 :: Int -> Integer
m2 :: (Integral a) => Int -> a
Most Haskell compilers and interpreters (all of them that I know of actually) do not memoize polymorphic structures, so m2's internal list is recreated every time it's called, where m1's is not.

I'm not sure, because I'm quite new to Haskell myself, but it appears that it's beacuse the second function is parametrized and the first one is not. The nature of the function is that, it's result depends on input value and in functional paradigm especailly it depends ONLY on the input. Obvious implication is that a function with no parameters returns always the same value over and over, no matter what.
Aparently there's an optimizing mechanizm in GHC compiler that exploits this fact to compute the value of such a function only once for whole program runtime. It does it lazily, to be sure, but does it nonetheless. I noticed it myself, when I wrote the following function:
primes = filter isPrime [2..]
where isPrime n = null [factor | factor <- [2..n-1], factor `divides` n]
where f `divides` n = (n `mod` f) == 0
Then to test it, I entered GHCI and wrote: primes !! 1000. It took a few seconds, but finally I got the answer: 7927. Then I called primes !! 1001 and got the answer instantly. Similarly in an instant I got the result for take 1000 primes, because Haskell had to compute the whole thousand-element list to return 1001st element before.
Thus if you can write your function such that it takes no parameters, you probably want it. ;)

Related

Why does refactoring data to newtype speed up my haskell program?

I have a program which traverses an expression tree that does algebra on probability distributions, either sampling or computing the resulting distribution.
I have two implementations computing the distribution: one (computeDistribution) nicely reusable with monad transformers and one (simpleDistribution) where I concretize everything by hand. I would like to not concretize everything by hand, since that would be code duplication between the sampling and computing code.
I also have two data representations:
type Measure a = [(a, Rational)]
-- data Distribution a = Distribution (Measure a) deriving Show
newtype Distribution a = Distribution (Measure a) deriving Show
When I use the data version with the reusable code, computing the distribution of 20d2 (ghc -O3 program.hs; time ./program 20 > /dev/null) takes about one second, which seems way too long. Pick higher values of n at your own peril.
When I use the hand-concretized code, or I use the newtype representation with either implementation, computing 20d2 (time ./program 20 s > /dev/null) takes the blink of an eye.
Why?
How can I find out why?
My knowledge of how Haskell is executed is almost nil. I gather there's a graph of thunks in basically the same shape as the program, but that's about all I know.
I figure with newtype the representation of Distribution is the same as that of Measure, i.e. it's just a list, whereas with the data version each Distribution is kinda' like a single-field record, except with a pointer to the contained list, and so the data version has to perform more allocations. Is this true? If true, is this enough to explain the performance difference?
I'm new to working with monad transformer stacks. Consider the Let and Uniform cases in simpleDistribution — do they do the same as the walkTree-based implementation? How do I tell?
Here's my program. Note that Uniform n corresponds to rolling an n-sided die (in case the unary-ness was surprising).
Update: based on comments I simplified my program by removing everything not contributing to the performance gap. I made two semantic changes: probabilities are now denormalized and all wonky and wrong, and the simplification step is gone. But the essential shape of my program is still there. (See question edit history for the non-simplified program.)
Update 2: I made further simplifications, reducing Distribution down to the list monad with a small twist, removing everything to do with probabilities, and shortening the names. I still observe large performance differences when using data but not newtype.
import Control.Monad (liftM2)
import Control.Monad.Trans (lift)
import Control.Monad.Reader (ReaderT, runReaderT)
import System.Environment (getArgs)
import Text.Read (readMaybe)
main = do
args <- getArgs
let dieCount = case map readMaybe args of Just n : _ -> n; _ -> 10
let f = if ["s"] == (take 1 $ drop 1 $ args) then fast else slow
print $ f dieCount
fast, slow :: Int -> P Integer
fast n = walkTree n
slow n = walkTree n `runReaderT` ()
walkTree 0 = uniform
walkTree n = liftM2 (+) (walkTree 0) (walkTree $ n - 1)
data P a = P [a] deriving Show
-- newtype P a = P [a] deriving Show
class Monad m => MonadP m where uniform :: m Integer
instance MonadP P where uniform = P [1, 1]
instance MonadP p => MonadP (ReaderT env p) where uniform = lift uniform
instance Functor P where fmap f (P pxs) = P $ fmap f pxs
instance Applicative P where
pure x = P [x]
(P pfs) <*> (P pxs) = P $ pfs <*> pxs
instance Monad P where
(P pxs) >>= f = P $ do
x <- pxs
case f x of P fxs -> fxs
How can I find out why?
This is, in general, hard.
The extreme way to do it is to look at the core code (which you can produce by running GHC with -ddump-simpl). This can get complicated really quickly, and it's basically a whole new language to learn. Your program is already big enough that I had trouble learning much from the core dump.
The other way to find out why is to just keep using GHC and asking questions and learning about GHC optimizations until you recognize certain patterns.
Why?
In short, I believe it's due to list fusion.
NOTE: I don't know for sure that this answer is correct, and it would take more time/work to verify than I'm willing to put in right now. That said, it fits the evidence.
First off, we can check whether this slowdown you're seeing is a result of something truly fundamental vs a GHC optimization triggering or not by running in O0, that is, without optimizations. In this mode, both Distribution representations result in about the same (excruciatingly long) runtime. This leads me to believe that it's not the data representation that is inherently the problem but rather there's an optimization that's triggered with the newtype version that isn't with the data version.
When GHC is run in -O1 or higher, it engages certain rewrite rules to fuse different folds and maps of lists together so that it doesn't need to allocate intermediate values. (See https://markkarpov.com/tutorial/ghc-optimization-and-fusion.html#fusion for a decent tutorial on this concept as well as https://stackoverflow.com/a/38910170/14802384 which additionally has a link to a gist with all of the rewrite rules in base.) Since computeDistribution is basically just a bunch of list manipulations (which are all essentially folds), there is the potential for these to fire.
The key is that with the newtype representation of Distribution, the newtype wrapper is erased during compilation, and the list operations are allowed to fuse. However, with the data representation, the wrappers are not erased, and the rewrite rules do not fire.
Therefore, I will make an unsubstantiated claim: If you want your data representation to be as fast as the newtype one, you will need to set up rewrite rules similar to the ones for list folding but that work over the Distribution type. This may involve writing your own special fold functions and then rewriting your Functor/Applicative/Monad instances to use them.

Do Haskell’s strict folds really use linear space?

I thought I understood the basics of fold performance in Haskell, as described in foldr, foldl, foldl' on the Haskell Wiki and many other places. In particular, I learned that for accumulating functions, one should use foldl', to avoid space leaks, and that the standard library functions are written to respect this. So I presumed that simple accumulators like length, applied to simple lists like replicate n 1, should require constant space (or at least sub-linear) in the length of the list. My intuition was that on sufficiently simple lists, they would behave roughly like a for loop in an imperative language.
But today I found that this seems not to hold in practice. For instance, length $ replicate n 1 seems to use space linear in n. In ghci:
ghci> :set +s
ghci> length $ replicate (10^6) 1
1000000
(0.02 secs, 56,077,464 bytes)
ghci> length $ replicate (10^7) 1
10000000
(0.08 secs, 560,078,360 bytes)
ghci> length $ replicate (10^8) 1
100000000
(0.61 secs, 5,600,079,312 bytes)
ghci> length $ replicate (10^9) 1
1000000000
(5.88 secs, 56,000,080,192 bytes)
Briefly, my question is: Do length and other strict folds really use linear space? If so, why? And is it inevitable? Below are more details of how I’ve played around trying to understand this, but they’re probably not worth reading — the tl;dr is that the linear-space usage seems to persist whatever variations I try.
(I originally used sum as the example function. As Willem Van Onsem points out, that was a badly-chosen example as default instances aren’t actually strict. However, the main question remains, since as noted below, this occurs with plenty of other functions that really are based on strict folds.)
Replacing length with foldl' (\n _ -> n+1) 0 appears to make performance worse by a constant factor; space usage still seems to be linear.
Versions defined with foldl and foldr had worse memory usage (as expected), but only by a small constant factor, not asymptotically worse (as most discussions seem to suggest).
Replacing length with sum, last, or other simple accumulators, or with the obvious definitions of these using foldl', also doesn’t seem to change the linear space usage.
Using [1..n] as the test list, and other similar variations, also seems to make no significant difference.
Switching between the general versions of sum, foldl', etc from Data.Foldable, the specialised ones in Data.List, and local versions defined directly by pattern-matching, also seems to make no difference.
Compiling instead of working in ghci also only seemed to improve space usage by a constant factor.
Switching between several recent versions of GHC — 8.8.4, 8.10.5, and 9.0.1 — also seemed to make no significant difference.
"Do they use linear space" is a slightly unclear question. Usually when we talk about the space an algorithm uses, we're talking about its working set: the maximum amount of memory it needs all at once. "If my computer only had X bytes of memory, could I run this program?" But that's not what GHCI's :set +s measures. It measures the sum of all memory allocations made, including those that were cleaned up partway through. And what is the biggest use of memory in your experiment? The list itself, of course.
So you've really just measured the number of bytes that a list of size N takes up. You can confirm this by using last instead of length, which I hope you'll agree allocates no intermediate results, and is strict. It takes the same amount of memory using your metric as length does - length does no extra allocation for the sums.
But a bigger problem is that GHCI is not an optimizing compiler. If you care about performance characteristics at all, GHCI is the wrong tool. Instead, use GHC with -O2, and turn on GHC's profiler.
import System.Environment (getArgs)
main = do
n <- read . head <$> getArgs
print $ length (replicate (10^n) 1)
And running it:
$ ghc -O2 -prof -fprof-auto stackoverflow.hs
$ ./stackoverflow 6 +RTS -p
1000000
$ grep "total alloc" stackoverflow.prof
total alloc = 54,856 bytes (excludes profiling overheads)
$ ./stackoverflow 9 +RTS -p
1000000000
$ grep "total alloc" stackoverflow.prof
total alloc = 55,008 bytes (excludes profiling overheads)
we can see that space usage is roughly constant despite a thousand-fold increase in input size.
Will Ness correctly points out in a comment that -s would be a better measuring tool than -p.
Replacing sum with foldl' (+) 0 here, then performance improves noticeably in both time and space (which is itself a surprise; shouldn’t the standard sum be at least as efficient?) — but only by a constant factor; space usage still seems to be linear.
The sum is implemented as [src]:
sum :: Num a => t a -> a
sum = getSum #. foldMap Sum
It thus makes use of the Sum data type and its Monoid instance such that mappend = (+) and mempty = 0. foldMap works right associative, indeed:
Map each element of the structure into a monoid, and combine the results with (<>). This fold is right-associative and lazy in the accumulator. For strict left-associative folds consider foldMap' instead.
foldMap is thus implemented with foldr [src]:
foldMap :: Monoid m => (a -> m) -> t a -> m
{-# INLINE foldMap #-}
-- This INLINE allows more list functions to fuse. See #9848.
foldMap f = foldr (mappend . f) mempty
While foldl' will indeed have a (much) smaller memory footprint, and likely be more efficient, a reason to work with foldr is that for Peano numbers for example, one can make use of lazyness, and thus the head normal form will look like S(…) where … might not be evaluated (yet).
foldr can also terminate earlier. If for example you make a sum for a certain algebraic structure, it is possible that we can terminate the looping earlier.

OCaml: Stack_overflow exception in pervasives.ml

I got a Stack_overflow error in my OCaml program lately. If I turn on backtracing, I see the exception is raised by a "primitive operation" "pervasives.ml", line 270. I went into the OCaml source code and saw that line 270 defines the function # (i.e. list append). I don't get any other information from the backtrace, not even where the exception gets thrown in my program. I switched to bytecode and tried ocamldebug, and it doesn't help (no backtrace generated).
I thought this is an extremely weird situation. The only places in my program where I used a list is (a) building a list containing integers 1 to 1000000, (b) in-order traversing a RBT and putting the result into a list, and (c) printing a list of integers containing ostensibly 1000000 numbers. I've tested all functions and none of them contain could an infinite loop, and I thought 1000000 isn't even a huge number. Moreover, I've tried the equivalent of my program in Haskell (GHC), Scala and SML (MLton), and all of those versions worked perfectly and in a reasonably short amount of time. So, the question is, what could be going on? Can I debug it?
The # operator is not tail-recursive in the OCaml standard library,
let rec ( # ) l1 l2 =
match l1 with
[] -> l2
| hd :: tl -> hd :: (tl # l2)
Thus calling it with large lists (as the left argument) will overflow your stack.
It could be possible, that you're building your list by appending a new element to the end of the already generated list, e.g.,
let rec init n x = if n > 0 then init (n-1) x # [x] else []
This has time complexity n^2 and will consume n slots in the stack space.
Concerning the general question - how to debug such stack overflows, my usual recipe is to reduce the stack size, so that the problem is triggered as soon as possible before the trace is bloated, e.g.,
OCAMLRUNPARAM=b,l=1024 ocaml ./test.ml
If you're compiling your OCaml code to the native code, then you need to pass the -g option to the compiler, so that it can produce backtraces. Also, in the native execution, the size of the stack is controlled by the operating system and should be set using the corresponding mechanism of your OS, for example with ulimit in GNU/Linux, e.g., ulimit -s 1024.
As a bonus track, the following init function is tail recursive and will have O(N) time complexity and will take O(1) stack space:
let init n x =
let rec loop n xs =
if n = 0 then xs else loop (n-1) (x :: xs) in
loop n []
The idea is to use an accumulator list and build the list in the heap space.
If you don't like thinking about tail-recursiveness then you can use Janestreet Base library (or Core), or Batteries library. They both provide tail-recursive versions of the init function, as well as guarantees that all other functions are tail-recursive.
List functions in the standard library are optimised for small lists and are not necessarily tail-recursive; with the partial justification that lists are not an efficient data structure for storing large amount of data (note that Haskell lists are lazy and thus are quite different than OCaml eager lists).
In particular, if you get a stackoverflow error using #, you are quite probably implementing an algorithm with a quadratic time-complexity due to the fact that #'s complexity is linear in the size of its left argument.
They are probably far better data structure than list for your problem, if you want iteration the sequence library or any other forms of iterator would be far more efficient for instance.
With all the caveat stated before, it is relatively straightforward to redefine tail-recursive but inefficient version of the standard library function, e.g. :
let (#!) x y = List.rev_append (List.rev x) y
Another option is to use the containers library or any of the extended standard libraries (batteries or base essentially): all of those libraries reimplement tail-recursive version of list functions.

Mutable, (possibly parallel) Haskell code and performance tuning

I have now implemented another SHA3 candidate, namely Grøstl. This is still work in progress (very much so), but at the moment a 224-bit version pass all KATs. So now I'm wondering about performance (again :->). The difference this time, is that I chose to more closely mirror the (optimized) C implementation, i.e. I made a port from C to Haskell. The optimized C version use table-lookups to implement the algorithm. Furthermore the code is heavily based on updating an array containing 64-bit words. Thus I chose to use mutable unboxed vectors in Haskell.
My Grøstl code can be found here: https://github.com/hakoja/SHA3/blob/master/Data/Digest/GroestlMutable.hs
Short description of the algorithm: It's a Merkle-Damgård construction, iterating a compression function (f512M in my code) as long as there are 512-bits blocks of message left. The compression function is very simple: it simply runs two different independent 512-bit permutations P and Q (permP and permQ in my code) and combines their output. Its these permutations which are implemented by lookup tables.
Q1) The first thing that bothers me is that the use of mutable vectors makes my code look really fugly. This is my first time writing any major mutable code in Haskell so I don't really know how to improve this. Any tips on how I might better strucure the monadic code would be welcome.
Q2) The second is performance. Actually It's not too bad, because at the moment the Haskell code is only 3 times slower. Using GHC-7.2.1 and compiling as such:
ghc -O2 -Odph -fllvm -optlo-O3 -optlo-loop-reduce -optlo-loop-deletion
the Haskell code uses 60s. on an input of ~1GB, while the C-version uses 21-22s. But there are some things I find odd:
(1) If I try to inline rnd512QM, the code takes 4 times longer, but if I inline rnd512PM nothing happens! Why is this happening? These two functions are virtually identical!
(2) This is maybe more difficult. I've been experimenting with executing the two permutations in parallel. But currently to no avail. This is one example of what I tried:
f512 h m = V.force outP `par` (V.force outQ `pseq` (V.zipWith3 xor3 h outP outQ))
where xor3 x1 x2 x3 = x1 `xor` x2 `xor` x3
inP = V.zipWith xor h m
outP = permP inP
outQ = permQ m
When checking the run-time statistics, and using ThreadScope, I noticed that the correct number of SPARKS was created, but almost none was actually converted to useful parallel work. Thus I gained nothing in speedup. My question then becomes:
Are the P and Q functions just too small for the runtime to bother to run in parallel?
If not, is my use of par and pseq (and possibly Vector.Unboxed.force) wrong?
Would I gain anything by switching to strategies? And how would I go about doing that?
Thank you so much for your time.
EDIT:
Sorry for not providing any real benchmark tests. The testing code in the repo was just intended for myself only. For those wanting to test the code out, you will need to compile main.hs, and then run it as:
./main "algorithm" "testvariant" "byte aligned"
For instance:
./main groestl short224 False
or
./main groestl e False
(e stands for "Extreme". It's the very long message provided with the NIST KATS).
I checked out the repo, but there's no simple benchmark to just run and play with, so my ideas are just from eyeballing the code. Numbering is unrelated to your questions.
1) I'm pretty sure force doesn't do what you want -- it actually forces a copy of the underlying vector.
2) I think the use of unsafeThaw and unsafeFreeze is sort of odd. I'd just put f512M in the ST monad and be done with it. Then run it something like so:
otherwise = \msg -> truncate G224 . outputTransformation . runST $ foldM f512M h0_224 (parseMessage dataBitLen 512 msg)
3) V.foldM' is sort of silly -- you can just use a normal (strict) foldM over a list -- folding over the vector in the second argument doesn't seem to buy anything.
4) i'm dubious about the bangs in columnM and for the unsafeReads.
Also...
a) I suspect that xoring unboxed vectors can probably be implemented at a lower level than zipWith, making use of Data.Vector internals.
b) However, it may be better not to do this as it could interfere with vector fusion.
c) On inspection, extractByte looks slightly inefficient? Rather than using fromIntegral to truncate, maybe use mod or quot and then a single fromIntegral to take you directly to an Int.
Be sure to compile with -threaded -rtsopts and execute with +RTS -N2. Without that, you won't have more than one OS thread to perform computations.
Try to spark computations that are referred to elsewhere, otherwise they might be collected:
_
f512 h m = outP `par` (outQ `pseq` (V.zipWith3 xor3 h outP outQ))
where xor3 x1 x2 x3 = x1 `xor` x2 `xor` x3
inP = V.zipWith xor h m
outP = V.force $ permP inP
outQ = V.force $ permQ m
_
3) If you switch things up so parseBlock accepts strict bytestrings (or chunks and packs lazy ones when needed) then you can use Data.Vector.Storable and potentially avoid some copying.

Help with debugging unexpected takeWhile behaviour with large numbers in Haskell

Firstly, apologies for the vague title, but I'm not sure exactly what I'm asking here(!).
After encountering Haskell at university, I've recently started using it in anger and so am working through the Project Euler problems as an extended Hello World, really. I've encountered a bug in one of my answers that seems to suggest a misunderstanding of a fundamental part of the language, and it's not something I could work out from the tutorials, nor something I know enough about to start Googling for.
A brief description of the issue itself - the solution relates to primes, so I wanted an infinite list of prime numbers which I implemented (without optimisation yet!) thusly:
isPrime :: Int -> Bool
isPrime n = isPrime' 2 n
where isPrime' p n | p >= n = True
| n `mod` p == 0 = False
| otherwise = isPrime' (p+1) n
primes :: [Int]
primes = filter isPrime [2..]
Since infinite lists can be a little tedious to evaluate, I'll of course be using lazy evaluation to ensure that just the bits I want get evaulatued. So, for example, I can ask GHCI for the prime numbers less than 100:
*Main> takeWhile (< 100) primes
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
Now here's the part that I don't understand at all - when the upper limit gets large enough, I get no answers back at all. In particular:
*Main> takeWhile (< 4000000000) primes
[]
This isn't a problem with takeWhile itself, or the partially-applied function, as takeWhile (< 4000000000) [2..] works as I would expect. It's not a problem with my use of filter (within the definition of primes), since takeWhile (< 4000000000) (filter even [2..]) also returns the expected result.
Through binary search I found that the greatest upper limit that works is 2^31 - 1, so this would certainly seem to imply some kind of space-based constraint (i.e. largest positive signed integer). However:
I was of the impression that Haskell had no language limits on the size of integers, and they were bounded only by the amount of free memory.
This number only appears in the less-than predicate, which I know works as expected in at least some cases. Surely when it's applied to elements of a list, it shouldn't care where they come from? Looking solely at the first element, I know that the predicate returns true for the input 2 when it comes from filter even [2..]; I know that primes returns 2 as its first element. So how can my list be empty, how is this predicate failing "for some values of 2"?
Any thoughts would be grateful as I don't have enough experience to know where to start with this one. Thanks for taking the time to take a look.
There are 2 built-in integral types in haskell: Int and Integer. Integer is the default and is unbounded. Int however is bounded. Since you're explicitly using Int in the type for isPrime 4000000000 is used as an Int and overflows. If you change the type of isPrime to Integer -> Bool or even better Integral a => a -> Bool (read: a function that can take any kind of Integral value and returns a Bool), it will work as expected.
The important thing to take away here (other than the difference between Int and Integer) is that the type of 4000000000 depends on how it is used. If it is used as an argument to a function that takes an Int, it will be an Int (and on 32-bit systems it will overflow). If it is used as an argument to a function that takes an Integer, it will be an Integer (and never overflow). If it is used as an argument to a function that takes any kind of Integral, it will also be an Integer because Integer is the default instance of Integral.
That's an easy answer (...which I see has already been partly answered) - "premature specialization".
The first part of your definition, the type signature, specifies:
isPrime :: Int -> Bool
An Int is not just a "shortcut" way to say Integer - they are different types! To be a nit-picker (which in turn invites every one else to tear apart the many places here, where I am not accurate), there are never "different values of 2" - it has to be of type Int, because that's how you specified the function (you compare 2 to the function's argument n and you're only allowed to compare values of the same type, so your 2 is "pinned down" to the Int type.
Oh, and just as a warning, the Int type is a type just rife with corner case potential. If your system is built in a 64-bit environment, then your Int will also be based on a 64-bit representation, and your example will work up to 2^63-1, instead of 2^31-1 as yours did. Note my phrasing: I have a 64-bit computer with an MS Windows OS, which means that there is not yet an official 64-bit MinGW toolchain - my OS is 64-bit, but the GHC version I have was compiled with 32-bit libraries, so it has 32-bit-based Ints. When I use Linux, even in a VM, it has a 64-bit toolchain, so Ints are 64 bits. If you had used one of those, you may not have even noticed the behavior!
So, I guess that's just one more reason to be careful when reasoning about your types. (Especially in Haskell, anyway....)

Resources