Help with debugging unexpected takeWhile behaviour with large numbers in Haskell - debugging

Firstly, apologies for the vague title, but I'm not sure exactly what I'm asking here(!).
After encountering Haskell at university, I've recently started using it in anger and so am working through the Project Euler problems as an extended Hello World, really. I've encountered a bug in one of my answers that seems to suggest a misunderstanding of a fundamental part of the language, and it's not something I could work out from the tutorials, nor something I know enough about to start Googling for.
A brief description of the issue itself - the solution relates to primes, so I wanted an infinite list of prime numbers which I implemented (without optimisation yet!) thusly:
isPrime :: Int -> Bool
isPrime n = isPrime' 2 n
where isPrime' p n | p >= n = True
| n `mod` p == 0 = False
| otherwise = isPrime' (p+1) n
primes :: [Int]
primes = filter isPrime [2..]
Since infinite lists can be a little tedious to evaluate, I'll of course be using lazy evaluation to ensure that just the bits I want get evaulatued. So, for example, I can ask GHCI for the prime numbers less than 100:
*Main> takeWhile (< 100) primes
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
Now here's the part that I don't understand at all - when the upper limit gets large enough, I get no answers back at all. In particular:
*Main> takeWhile (< 4000000000) primes
[]
This isn't a problem with takeWhile itself, or the partially-applied function, as takeWhile (< 4000000000) [2..] works as I would expect. It's not a problem with my use of filter (within the definition of primes), since takeWhile (< 4000000000) (filter even [2..]) also returns the expected result.
Through binary search I found that the greatest upper limit that works is 2^31 - 1, so this would certainly seem to imply some kind of space-based constraint (i.e. largest positive signed integer). However:
I was of the impression that Haskell had no language limits on the size of integers, and they were bounded only by the amount of free memory.
This number only appears in the less-than predicate, which I know works as expected in at least some cases. Surely when it's applied to elements of a list, it shouldn't care where they come from? Looking solely at the first element, I know that the predicate returns true for the input 2 when it comes from filter even [2..]; I know that primes returns 2 as its first element. So how can my list be empty, how is this predicate failing "for some values of 2"?
Any thoughts would be grateful as I don't have enough experience to know where to start with this one. Thanks for taking the time to take a look.

There are 2 built-in integral types in haskell: Int and Integer. Integer is the default and is unbounded. Int however is bounded. Since you're explicitly using Int in the type for isPrime 4000000000 is used as an Int and overflows. If you change the type of isPrime to Integer -> Bool or even better Integral a => a -> Bool (read: a function that can take any kind of Integral value and returns a Bool), it will work as expected.
The important thing to take away here (other than the difference between Int and Integer) is that the type of 4000000000 depends on how it is used. If it is used as an argument to a function that takes an Int, it will be an Int (and on 32-bit systems it will overflow). If it is used as an argument to a function that takes an Integer, it will be an Integer (and never overflow). If it is used as an argument to a function that takes any kind of Integral, it will also be an Integer because Integer is the default instance of Integral.

That's an easy answer (...which I see has already been partly answered) - "premature specialization".
The first part of your definition, the type signature, specifies:
isPrime :: Int -> Bool
An Int is not just a "shortcut" way to say Integer - they are different types! To be a nit-picker (which in turn invites every one else to tear apart the many places here, where I am not accurate), there are never "different values of 2" - it has to be of type Int, because that's how you specified the function (you compare 2 to the function's argument n and you're only allowed to compare values of the same type, so your 2 is "pinned down" to the Int type.
Oh, and just as a warning, the Int type is a type just rife with corner case potential. If your system is built in a 64-bit environment, then your Int will also be based on a 64-bit representation, and your example will work up to 2^63-1, instead of 2^31-1 as yours did. Note my phrasing: I have a 64-bit computer with an MS Windows OS, which means that there is not yet an official 64-bit MinGW toolchain - my OS is 64-bit, but the GHC version I have was compiled with 32-bit libraries, so it has 32-bit-based Ints. When I use Linux, even in a VM, it has a 64-bit toolchain, so Ints are 64 bits. If you had used one of those, you may not have even noticed the behavior!
So, I guess that's just one more reason to be careful when reasoning about your types. (Especially in Haskell, anyway....)

Related

Why is math.Pow performance worse than bitshifting?

When solving this exercise on Exercism website, I have used the standard math.Pow package function to get the raising powers of two.
return uint64(math.Pow(2, float64(n-1)))
After checking the community solutions, I found a solution using bit shifting to achieve the same thing:
return uint64(1 << uint(n-1)), nil
What surprised me is that there is a big performance difference between the two:
bit-shifting
math-pow
I thought that the Go compiler would recognize that math.Pow uses a constant 2 as the base and just utilize bit shifting on its own, without me explicitly doing it so. The only other difference I can see is the conversion of the float64 and that math.Pow is operating on floats and not on integers.
Why doesn't the compiler optimize the power operation to achieve performance similar to bit shifting?
First, note that uint64(1) << (n-1) is a better version of the expression uint64(1 << uint(n-1)) that appears in your question. The expression 1<<n is an int, so valid shift values are between 0 and either 30 or 62 depending on the size of int. uint64(1) << n allows n between 0 and 63.
In general, the optimization you suggest is incorrect. The compiler would have to be able to deduce that n is within a particular range.
See this example (on playground)
package main
import (
"fmt"
"math"
)
func main() {
n := 65
fmt.Println(uint64(math.Pow(2, float64(n-1))))
fmt.Println(uint64(1) << uint(n-1))
}
The output demonstrates that the two methods are different:
9223372036854775808
0
math.Pow() is implemented to operate on float64 numbers. Bit shifting to calculate powers of 2 can only be applied on integers, and only on a tiny subset where the result fits into int64 (or uint64).
If you have such special case, you're more than welcome to use bit shifting.
Any other case where the result is bigger than math.MaxInt64 or where the base is not 2 (or a power of 2) requires floating point arithmetic.
Also note that even if detection of the above possible tiny subset would be implemented, the result is in 2's complement format, which also would have to be converted to IEEE 754 format because the return value of math.Pow() is float64 (although these numbers could be cached), which again you'd most likely convert back to int64.
Again: if you need performance, use explicit bit shifting.
Because such optimization was never implemented.
The go compiler aims to have fast compile time. Thus, some optimizations are decided to be not worth it. This saves compilation time at the expense of some run-time.

How can this function in Haskell be optimised

As part of an advent of code challenge, I've written the following functions in Haskell:
simulateUntilRepeat_int a b i = if (a /= b) then (simulateUntilRepeat_int a (updateCycle b) (i+1)) else i
simulateUntilRepeat a = simulateUntilRepeat_int a (updateCycle a) 1
The purpose of this is to take a list of moons and simulate their movement until they resume their original position, returning the number of cycles it took for them to get there. (the function updateCycle does one iteration of the simulation). However, when I attempt to run this it uses all available memory and then gets killed by the operating system. The question does admit that this may take a very large number of cycles.
Googling around about this problem I find the usual fix is to make some of the parameters strict, but I think I've experimented with all possible permutations of strictness on the parameters to no avail. By the looks of this function, I'd have anticipated the compiler would be able to use the tail recursion optimisation and turn it into a loop, but this seems to not be happening somehow.
A friend of mine, who is knowledgeable in haskell suggested changing the form of the function to the following:
f a b0 = length (takeWhile (/= a) (iterate updateCycle b0))
But doing this didn't fix it either, leaving me out of ideas.
The comments are undoubtedly correct that your approach is not the intended solution method.
However, the functions you've posted would not, in and of themselves, cause a memory leak, fail to tail recurse, or lead to poor performance. Given your code above plus the definitions:
updateCycle 4686774942 = 0
updateCycle n = n+1
main = do
print $ simulateUntilRepeat (0 :: Int)
and compiling with -O2, the program runs in constant memory on my laptop in about 30 seconds. Adding explicit type signatures to use Int in place of Integer for the iteration count:
simulateUntilRepeat_int :: Int -> Int -> Int -> Int
simulateUntilRepeat :: Int -> Int
it runs in about 2.4 seconds.
So, to understand why your program is gobbling all available memory or why your strictness annotations failed to make a difference, it would probably be necessary to see the whole working program (or preferably a minimal example that illustrates the performance problem). If the program is short, and the question is "why is the performance of this program totally unreasonable?" instead of "how can I optimize my program to run as fast as possible?", it might still be a good SO question. Otherwise, the Code Review site might be better -- you can post a larger program there and ask for general performance advice, and that's considered on-topic for that site.

Overflow in a random number generator and 4-byte vs. 8-byte integers

The famous linear congruential random number generator also known as minimal standard use formula
x(i+1)=16807*x(i) mod (2^31-1)
I want to implement this using Fortran.
However, as pointed out by "Numerical Recipes", directly implement the formula with default Integer type (32bit) will cause 16807*x(i) to overflow.
So the book recommend Schrage’s algorithm is based on an approximate factorization of m. This method can still implemented with default integer type.
However, I am wondering fortran actually has Integer(8) type whose range is -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 which is much bigger than 16807*x(i) could be.
but the book even said the following sentence
It is not possible to implement equations (7.1.2) and (7.1.3) directly
in a high-level language, since the product of a and m − 1 exceeds the
maximum value for a 32-bit integer.
So why can't we just use Integer(8) type to implement the formula directly?
Whether or not you can have 8-byte integers depends on your compiler and your system. What's worse is that the actual value to pass to kind to get a specific precision is not standardized. While most Fortran compilers I know use the number of bytes (so 8 would be 64 bit), this is not guaranteed.
You can use the selected_int_kindmethod to get a kind of int that has a certain range. This code compiles on my 64 bit computer and works fine:
program ran
implicit none
integer, parameter :: i8 = selected_int_kind(R=18)
integer(kind=i8) :: x
integer :: i
x = 100
do i = 1, 100
x = my_rand(x)
write(*, *) x
end do
contains
function my_rand(x)
implicit none
integer(kind=i8), intent(in) :: x
integer(kind=i8) :: my_rand
my_rand = mod(16807_i8 * x, 2_i8**31 - 1)
end function my_rand
end program ran
Update and explanation of #VladimirF's comment below
Modern Fortran delivers an intrinsic module called iso_fortran_env that supplies constants that reference the standard variable types. In your case, one would use this:
program ran
use, intrinsic :: iso_fortran_env, only: int64
implicit none
integer(kind=int64) :: x
and then as above. This code is easier to read than the old selected_int_kind. (Why did R have to be 18 again?)
Yes. The simplest thing is to append _8 to the integer constants to make them 8 bytes. I know it is "old style" Fortran but is is portable and unambiguous.
By the way, when you write:
16807*x mod (2^31-1)
this is equivalent to take the result of 16807*x and use an and with a 32-bit mask where all the bits are set to one except the sign bit.
The efficient way to write it by avoiding the expensive mod functions is:
iand(16807_8*x, Z'7FFFFFFF')
Update after comment :
or
iand(16807_8*x, 2147483647_8)
if your super modern compiler does not have backwards compatibility.

Haskell: Caches, memoization, and referential transparency [duplicate]

I can't figure out why m1 is apparently memoized while m2 is not in the following:
m1 = ((filter odd [1..]) !!)
m2 n = ((filter odd [1..]) !! n)
m1 10000000 takes about 1.5 seconds on the first call, and a fraction of that on subsequent calls (presumably it caches the list), whereas m2 10000000 always takes the same amount of time (rebuilding the list with each call). Any idea what's going on? Are there any rules of thumb as to if and when GHC will memoize a function? Thanks.
GHC does not memoize functions.
It does, however, compute any given expression in the code at most once per time that its surrounding lambda-expression is entered, or at most once ever if it is at top level. Determining where the lambda-expressions are can be a little tricky when you use syntactic sugar like in your example, so let's convert these to equivalent desugared syntax:
m1' = (!!) (filter odd [1..]) -- NB: See below!
m2' = \n -> (!!) (filter odd [1..]) n
(Note: The Haskell 98 report actually describes a left operator section like (a %) as equivalent to \b -> (%) a b, but GHC desugars it to (%) a. These are technically different because they can be distinguished by seq. I think I might have submitted a GHC Trac ticket about this.)
Given this, you can see that in m1', the expression filter odd [1..] is not contained in any lambda-expression, so it will only be computed once per run of your program, while in m2', filter odd [1..] will be computed each time the lambda-expression is entered, i.e., on each call of m2'. That explains the difference in timing you are seeing.
Actually, some versions of GHC, with certain optimization options, will share more values than the above description indicates. This can be problematic in some situations. For example, consider the function
f = \x -> let y = [1..30000000] in foldl' (+) 0 (y ++ [x])
GHC might notice that y does not depend on x and rewrite the function to
f = let y = [1..30000000] in \x -> foldl' (+) 0 (y ++ [x])
In this case, the new version is much less efficient because it will have to read about 1 GB from memory where y is stored, while the original version would run in constant space and fit in the processor's cache. In fact, under GHC 6.12.1, the function f is almost twice as fast when compiled without optimizations than it is compiled with -O2.
m1 is computed only once because it is a Constant Applicative Form, while m2 is not a CAF, and so is computed for each evaluation.
See the GHC wiki on CAFs: http://www.haskell.org/haskellwiki/Constant_applicative_form
There is a crucial difference between the two forms: the monomorphism restriction applies to m1 but not m2, because m2 has explicitly given arguments. So m2's type is general but m1's is specific. The types they are assigned are:
m1 :: Int -> Integer
m2 :: (Integral a) => Int -> a
Most Haskell compilers and interpreters (all of them that I know of actually) do not memoize polymorphic structures, so m2's internal list is recreated every time it's called, where m1's is not.
I'm not sure, because I'm quite new to Haskell myself, but it appears that it's beacuse the second function is parametrized and the first one is not. The nature of the function is that, it's result depends on input value and in functional paradigm especailly it depends ONLY on the input. Obvious implication is that a function with no parameters returns always the same value over and over, no matter what.
Aparently there's an optimizing mechanizm in GHC compiler that exploits this fact to compute the value of such a function only once for whole program runtime. It does it lazily, to be sure, but does it nonetheless. I noticed it myself, when I wrote the following function:
primes = filter isPrime [2..]
where isPrime n = null [factor | factor <- [2..n-1], factor `divides` n]
where f `divides` n = (n `mod` f) == 0
Then to test it, I entered GHCI and wrote: primes !! 1000. It took a few seconds, but finally I got the answer: 7927. Then I called primes !! 1001 and got the answer instantly. Similarly in an instant I got the result for take 1000 primes, because Haskell had to compute the whole thousand-element list to return 1001st element before.
Thus if you can write your function such that it takes no parameters, you probably want it. ;)

Analyzing slow performance of a Haskell program

I was trying to solve ITA Software's "Word Nubmers" puzzle using a brute force approach. It looks like my Haskell version is more than 10 times slower than a C#/C++ version.
The answer
Thanks to Bryan O'Sullivan's answer, I was able to "correct" my program to acceptable performance. You can read his code which is much cleaner than mine. I am going to outline the key points here.
Int is Int64 on Linux GHC x64. Unless you unsafeCoerce, you should just use Int. This saves you from having to fromIntegral. Doing Int64 on Windows 32-bit GHC is just darn slow, avoid it. (This is in fact not GHC's fault. As mentioned in my blog post below, 64 bit integers in 32-bit programs is slow in general (at least in Windows))
-fllvm or -fvia-C for performance.
Prefer quotRem to divMod, quotRem already suffices. That gave me 20% speed up.
In general, prefer Data.Vector to Data.Array as an "array"
Use the wrapper-worker pattern liberally.
The above points were enough to give me about 100% boost over my original version.
In my blog post, I have detailed a step-by-step illustrated example of how I turned the original program to match Bryan's program. There are other points mentioned there as well.
The original question
(This may sound like a "could you do the work for me" post, but I argue that such a concrete example would be very instructive since profiling Haskell performance is often seen as a myth)
(As noted in the comments, I think I have misinterpreted the problem. But who cares, we can focus on performance in a different problem)
Here's a my version of a quick recap of the problem:
A wordNumber is defined as
wordNumber 1 = "one"
wordNumber 2 = "onetwo"
wordNumber 3 = "onethree"
wordNumber 15 = "onetwothreefourfivesixseveneightnineteneleventwelvethirteenfourteenfifteen"
...
Problem: Find the 51-billion-th letter of (wordNumber Infinity); assume that letter is found at 'wordNumber x', also find 'sum [1..x]'
From an imperative perspective, a naive algorithm would be to have 2 counters, one for sum of numbers and one for sum of lengths. Keep counting the length of each wordNumber and "break" to return the result.
The imperative brute-force approach is implemented in C# here: http://ideone.com/JjCb3. It takes about 1.5 minutes to find the answer on my computer. There is also an C++ implementation that runs in 45 seconds on my computer.
Then I implemented a brute-force Haskell version: http://ideone.com/ngfFq. It cannot finish the calculation in 5 minutes on my machine. (Irony: it's has more lines than the C# version)
Here is the -p profile of the Haskell program: http://hpaste.org/49934
Question: How to make it perform comparatively to the C# version? Are there obvious mistakes I am making?
(Note: I am fully aware that brute-forcing it is not the correct solution to this problem. I am mainly interested in making the Haskell version perform comparatively to the C# version. Right now it is at least 5x slower so obviously I am missing something obvious)
(Note 2: It does not seem to be space leaking. The program runs with constant memory (about 2MB) on my computer)
(Note 3: I am compiling with `ghc -O2 WordNumber.hs)
To make the question more reader friendly, I include the "gist" of the two versions.
// C#
long sumNum = 0;
long sumLen = 0;
long target = 51000000000;
long i = 1;
for (; i < 999999999; i++)
{
// WordiLength(1) = 3 "one"
// WordiLength(101) = 13 "onehundredone"
long newLength = sumLen + WordiLength(i);
if (newLength >= target)
break;
sumNum += i;
sumLen = newLength;
}
Console.WriteLine(Wordify(i)[Convert.ToInt32(target - sumLen - 1)]);
-
-- Haskell
-- This has become totally ugly during my squeeze for
-- performance
-- Tail recursive
-- n-th number (51000000000 in our problem) -> accumulated result -> list of 'zipped' left to try
-- accumulated has the format (sum of numbers, current lengths of the whole chain, the current number)
solve :: Int64 -> (Int64, Int64, Int64) -> [(Int64, Int64)] -> (Int64, Int64, Int64)
solve !n !acc#(!sumNum, !sumLen, !curr) ((!num, !len):xs)
| sumLen' >= n = (sumNum', sumLen, num)
| otherwise = solve n (sumNum', sumLen', num) xs
where
sumNum' = sumNum + num
sumLen' = sumLen + len
-- wordLength 1 = 3 "one"
-- wordLength 101 = 13 "onehundredone"
wordLength :: Int64 -> Int64
-- wordLength = ...
solution :: Int64 -> (Int64, Char)
solution !x =
let (sumNum, sumLen, n) = solve x (0,0,1) (map (\n -> (n, wordLength n)) [1..])
in (sumNum, (wordify n) !! (fromIntegral $ x - sumLen - 1))
I've written a gist that contains both a C++ version (a copy of yours from a Haskell-cafe message, with a bug fixed) and a Haskell translation.
Notice that the two are structurally almost identical. When compiled with -fllvm, the Haskell code runs at about half the speed of the C++ code, which is pretty good.
Now let's compare my Haskell wordLength code to yours. You're passing around an extra unnecessary parameter, which is unnecessary (you apparently figured that out when writing the C++ code that I translated). Also, the large number of bang patterns suggests panic; they're almost all useless.
Your solve function is also very confused.
You're passing parameters in three different ways: a regular Int, a 3-tuple, and a list! Whoa.
This function is necessarily not very regular in its behaviour, so while you gain nothing stylistically by using a list to supply your counter, you probably force GHC to allocate memory. In other words, this both obfuscates the code and makes it slower.
By using a tuple for three parameters (for no obvious reason), you're again working hard to force GHC to allocate memory for every step through the loop, when it could avoid doing so if you passed the parameters directly.
Only your n parameter is dealt with in a sensible way, but you don't need a bang pattern on it.
The only parameter that needs a bang pattern is sumNum, because you never inspect its value until after the loop has finished. GHC's strictness analyser will deal with the others. All of your other bang patterns are unnecessary at best, misdirections at worst.
Here are two pointers I could come up with in a quick investigation:
Note that using Int64 is really slow when you are using a 32 bit build of GHC, as is the default for Haskell Platform, currently. This also turned out to be the main villain in a previous performance problem (there I give a few more details).
For reasons I don't quite understand the divMod function does not seem to get inlined. As a result, the numbers are returned on the heap. When using div and mod separately, wordLength' executes purely on the stack as it should be.
Sadly I currently have no 64-bit GHC around to test whether this is enough to solve the problem.

Resources