Whitespace problems in OCaml - whitespace

I'm trying to learn OCaml, and am reading the Introduction to Objective Caml
I use OCamlWinPlus v1.9RC4 as my toploop.
When tryting to solve exercise 3.4 which is about programming Euclid's original GCD algorithm, I ran into a weird problem: It seems white-space is significant when typing in the program.
My first attempt was this:
let rec (%%) n m =
if m = 0 then
n
else
if n > m then
(n-m) %% m
else
n %% (m-n);;
which gave the type:
val ( %% ) : int -> int -> 'a = <fun>
Hmm... not really what I expected, and sure enough 54 %% 24 gave an infinite loop.
After much mocking about, I tried putting the whole thing on one line like this:
let rec (%%) n m = if m = 0 then n else if n > m then (n-m) %% m else n %% (m-n);;
Which gave the type:
val ( %% ) : int -> int -> int = <fun>
Much better, and this one-liner also seems to work correctly.
I would like to know if anybody can explain this behavior?
I've tried putting parantheses in various places, but nothing seems to work.
Could it be a problem with the toploop?
I hope somebody can help me with this, since I'm uncomfortable with continuing to learn this language until I know what's going on.
EDIT:
I tried copy-pasting the showed code snippets back into OCamlWinPlus, and I got the exact same problematic result.
Details of my system:
Windows XP, Home Edition, Version 2002, Service Pack 3
OCaml version 3.11.0.
OCamlWinPlus v1.9RC4

The code you tested is not the code you're showing. Most probably, you tested a version without the if m = 0 test, or with a different result calling the %% operator recursively. That would explain both the 'a return type and the non-termination: 'a inferred here means "abnormal computation".
For the link between 'a and non-termination, see Andrew Koenig's article An anecdote about ML type inference.

Related

How can I emulate the results of this if then then statement while using correct syntax?

Working on an exercise for university class and cant seem to represent what I am trying to do with correct syntax in ocaml. I want the function sum_positive to sum all the positive integers in the list into a single int value and return that value.
let int x = 0 in
let rec sum_positive (ls: int list) = function
|h::[] -> x (*sum of positive ints in list*)
|[] -> 0
|h::t -> if (h >= 0) then x + h then sum_positive t else sum_positive t (*trying to ensure that sum_positive t will still run after the addition of x + h*)
On compiling I am met with this error,
File "functions.ml", line 26, characters 34-38:
Error: Syntax error
This points to the then then statement I have in there, I know it cannot work but I cant think of any other representations that would.
You have if ... then ... then which is not syntactically valid.
It seems what you're asking is how to write what you have in mind in a way that is syntactically valid. But it's not clear what you have in mind.
You can evaluate two expressions in OCaml sequentially (one after the other) by separating them with ;. Possibly that is what you have in mind.
However it seems to me your code has bigger problems than just syntax. It appears you're trying to use x as an accumulated sum for the calculation. You should be aware that OCaml variables like x are immutable. Once you say let x = 0, the value can't be changed later. x will always be 0. The expression x + h doesn't change the value of x. It just evaluates to a new value.
The usual way to make this work is to pass x as a function parameter.
I was getting an issue that had involved the parameter of , I believe it was because I was trying to add an int value to function of type int list. This is what I ended up with.
let rec sum_positive = function
|[] -> 0
|h::t -> if h > 0 then h + (sum_positive t) else sum_positive t
a lot simpler than I thought it out to be.

Is there a performance difference between head$filter and head$dropWhile with Haskell Strings?

I'm working on lists of "People" objects in Haskell, and I was wondering if there was any difference in performance between head$dropWhile and head$filter to find the first person with a given name. The two options and a snip of the datatype would be:
datatype Person = Person { name :: String
, otherStuff :: StuffTypesAboutPerson }
findPerson :: String -> [Person] -> Person
findPerson n = head $ dropWhile (\p -> name p /= n)
findPerson n = head $ filter (\p -> name p == n)
My thought was, filter would have to compare the full length of n to the full length of every name until it finds the first one. I would think dropWhile would only need to compare the strings until the first non-matching Char. However, I know there is a ton of magic in Haskell, especially GHC. I would prefer to use the filter version, because I think it's more straight-forward to read. However, I was wondering if there actually is any performance difference? Even if it's negligible, I'm also interested from a curiosity standpoint at this point.
Edit: I know I also need to protect from errors with Maybe, etc, but I left that out to simplify the code example.
There are several approaches to the problem
findPerson n = head $ dropWhile (\p -> name p /= n)
findPerson n = head $ filter (\p -> name p == n)
findPerson n = fromJust $ find (\p -> name p == n)
The question also points out two facts:
when x,y are equal strings, == needs to compare all the characters
when x,y are different strings, /= only needs to compare until the first different character
This is correct, but does not consider the other cases
when x,y are equal strings, /= needs to compare all the characters
when x,y are different strings, == only needs to compare until the first different character
So, between == and /= there is no performance winner. We can expect that, at most, one of them will perform an additional not w.r.t. the other one.
Also, all the three implementations of findPerson mentioned above, essentially perform the same steps. Given xs :: [Person], they will all scan xs until a matching name is found, and no more. On all the persons before the match, the name will be compared against n, and this comparison will stop at the first different character (no matter what comparison we use above). The matching person will have their name compared completely with n (again, in all cases).
Hence, the approaches are expected to run in the same time. There might be a very small difference between them, but it could be so small that it would be hard to detect. You can try to experiment with criterion and see what happens, if you wish.

Find the origin of "Ratio has zero denominator" Exception

As a personal excercize in the process of learning Haskell, I'm trying to port this F# snippet for Random Art.
I've not embedded full source code for not bloating the question, but is available as gist.
An important part of the program is this Expr type:
data Expr =
VariableX
| VariableY
| Constant
| Sum Expr Expr
| Product Expr Expr
| Mod Expr Expr
| Well Expr
| Tent Expr
| Sin Expr
| Level Expr Expr Expr
| Mix Expr Expr Expr
deriving Show
and two functions:
gen :: Int -> IO Expr random generates a tree-like structure given a number of iterations
eval :: Expr -> IO (Point -> Rgb Double) walks the tree and terminates producing a drawing function.
More high is the number passed to gen than higher are the probability that the following exception is generated: Ratio has zero denominator.
I'm new to Haskell so to solve the problem I've tried to compile it as above:
ghc RandomArt.hs -prof -auto-all -caf-all
Obtaining only this more (to me quite useless) info:
$ ./RandomArt +RTS -xc
*** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace:
GHC.Real.CAF
--> evaluated by: Main.eval.\,
called from Main.eval,
called from Main.tga.pxs',
called from Main.tga,
called from Main.save,
called from Main.main,
called from :Main.CAF:main
--> evaluated by: Main.eval.\.r,
called from Main.eval.\,
called from Main.eval,
called from Main.tga.pxs',
called from Main.tga,
called from Main.save,
called from Main.main,
called from :Main.CAF:main
*** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace:
Main.tga,
called from Main.save,
called from Main.main,
called from GHC.Real.CAF
RandomArt: Ratio has zero denominator
The code that persist the generated function to a TGA file works because it was my previous excercize (a port from OCaml).
I've tried executing various Expr tree from GHCi, assembling data by hand or applying functions as in the program but I wasn't able to identify the bug.
Haskell docs talks about a package named loch that should able to compile preserving source code line numbers, but I was not able to install it (while I normally install with cabal install every package I need).
The question, to be honest are two:
where's is the bug (in this specific case)?
which tool do I need to master to find bugs like this (or bugs in general)?
Thanks in advance.
The exception
Let's focus on the exception first.
Finding the bug
where's is the bug (in this specific case)?
In mod'. We can check this easily if we provide an alternative version instead of the one by Data.Fixed:
mod' :: RealFrac a => a -> a -> a
mod' _ 0 = error "Used mod' on 0"
mod' a b =
let k = floor $ a / b
in a - (fromInteger k) * b
We now get Used mod' on 0.
Rationale
which tool do I need to master to find bugs like this (or bugs in general)?
In this case, the necessary hint was already in the exception's message:
Ratio has zero denominator
This means that there's a place where you divide by zero in the context of a Ratio. So you need to look after all places where you divide something. Since you use only (/) and mod', it boils down to whether one of them actually can throw this exception:
(/) usually returns ±Infinity on division by zero if used on Double,
mod' uses toRational internally, which is a Ratio Integer.
So there's only one culprit left. Note that the other implementation yields the same results if b isn't zero.
The actual problem
Using mod or mod' with b == 0 isn't well-defined. After all, a modulo operation should hold the following property:
prop_mod :: Integral n => n -> n -> Bool
prop_mod a b =
let m = a `mod` b
d = a `div` b
in a == b * d + m -- (1)
&& abs m < abs b -- (2)
If b == 0, there doesn't exist any pair (d, m) such that (1) and (2) hold. If we relax this law and throw (2) away, the result of mod isn't necessarily unique anymore. This leads to the following definition:
mod' :: RealFrac a => a -> a -> a
mod' a 0 = a -- this is arbitrary
mod' a b =
let k = floor $ a / b
in a - (fromInteger k) * b
However, this is an arbitrary definition. You have to ask yourself, "What do I actually want to do if I cannot use mod in a sane way". Since F# apparently didn't complain about a % 0, have a look at their documentation.
Either way, you cannot use a library mod function, since they aren't defined for a zero denominator.

How does one write efficient Dynamic Programming algorithms in Haskell?

I've been playing around with dynamic programming in Haskell. Practically every tutorial I've seen on the subject gives the same, very elegant algorithm based on memoization and the laziness of the Array type. Inspired by those examples, I wrote the following algorithm as a test:
-- pascal n returns the nth entry on the main diagonal of pascal's triangle
-- (mod a million for efficiency)
pascal :: Int -> Int
pascal n = p ! (n,n) where
p = listArray ((0,0),(n,n)) [f (i,j) | i <- [0 .. n], j <- [0 .. n]]
f :: (Int,Int) -> Int
f (_,0) = 1
f (0,_) = 1
f (i,j) = (p ! (i, j-1) + p ! (i-1, j)) `mod` 1000000
My only problem is efficiency. Even using GHC's -O2, this program takes 1.6 seconds to compute pascal 1000, which is about 160 times slower than an equivalent unoptimized C++ program. And the gap only widens with larger inputs.
It seems like I've tried every possible permutation of the above code, along with suggested alternatives like the data-memocombinators library, and they all had the same or worse performance. The one thing I haven't tried is the ST Monad, which I'm sure could be made to run the program only slighter slower than the C version. But I'd really like to write it in idiomatic Haskell, and I don't understand why the idiomatic version is so inefficient. I have two questions:
Why is the above code so inefficient? It seems like a straightforward iteration through a matrix, with an arithmetic operation at each entry. Clearly Haskell is doing something behind the scenes I don't understand.
Is there a way to make it much more efficient (at most 10-15 times the runtime of a C program) without sacrificing its stateless, recursive formulation (vis-a-vis an implementation using mutable arrays in the ST Monad)?
Thanks a lot.
Edit: The array module used is the standard Data.Array
Well, the algorithm could be designed a little better. Using the vector package and being smart about only keeping one row in memory at a time, we can get something that's idiomatic in a different way:
{-# LANGUAGE BangPatterns #-}
import Data.Vector.Unboxed
import Prelude hiding (replicate, tail, scanl)
pascal :: Int -> Int
pascal !n = go 1 ((replicate (n+1) 1) :: Vector Int) where
go !i !prevRow
| i <= n = go (i+1) (scanl f 1 (tail prevRow))
| otherwise = prevRow ! n
f x y = (x + y) `rem` 1000000
This optimizes down very tightly, especially because the vector package includes some rather ingenious tricks to transparently optimize array operations written in an idiomatic style.
1 Why is the above code so inefficient? It seems like a straightforward iteration through a matrix, with an arithmetic operation at each entry. Clearly Haskell is doing something behind the scenes I don't understand.
The problem is that the code writes thunks to the array. Then when entry (n,n) is read, the evaluation of the thunks jumps all over the array again, recurring until finally a value not needing further recursion is found. That causes a lot of unnecessary allocation and inefficiency.
The C++ code doesn't have that problem, the values are written, and read directly without requiring further evaluation. As it would happen with an STUArray. Does
p = runSTUArray $ do
arr <- newArray ((0,0),(n,n)) 1
forM_ [1 .. n] $ \i ->
forM_ [1 .. n] $ \j -> do
a <- readArray arr (i,j-1)
b <- readArray arr (i-1,j)
writeArray arr (i,j) $! (a+b) `rem` 1000000
return arr
really look so bad?
2 Is there a way to make it much more efficient (at most 10-15 times the runtime of a C program) without sacrificing its stateless, recursive formulation (vis-a-vis an implementation using mutable arrays in the ST Monad)?
I don't know of one. But there might be.
Addendum:
Once one uses STUArrays or unboxed Vectors, there's still a significant difference to the equivalent C implementation. The reason is that gcc replaces the % by a combination of multiplications, shifts and subtractions (even without optimisations), since the modulus is known. Doing the same by hand in Haskell (since GHC doesn't [yet] do that),
-- fast modulo 1000000
-- for nonnegative Ints < 2^31
-- requires 64-bit Ints
fastMod :: Int -> Int
fastMod n = n - 1000000*((n*1125899907) `shiftR` 50)
gets the Haskell versions on par with C.
The trick is to think about how to write the whole damn algorithm at once, and then use unboxed vectors as your backing data type. For example, the following runs about 20 times faster on my machine than your code:
import qualified Data.Vector.Unboxed as V
combine :: Int -> Int -> Int
combine x y = (x+y) `mod` 1000000
pascal n = V.last $ go n where
go 0 = V.replicate (n+1) 1
go m = V.scanl1 combine (go (m-1))
I then wrote two main functions that called out to yours and mine with an argument of 4000; these ran in 10.42s and 0.54s respectively. Of course, as I'm sure you know, they both get blown out of the water (0.00s) by the version that uses a better algorithm:
pascal' :: Integer -> Integer
pascal :: Int -> Int
pascal' n = product [n+1..n*2] `div` product [2..n]
pascal = fromIntegral . (`mod` 1000000) . pascal' . fromIntegral

Space leak in list program

I am solving some problems of Project Euler in Haskell. I wrote a program for a riddle in it and it did not work as I expected.
When I looked in the task manager when running the program I saw that it was using > 1 gigabyte of RAM on ghc. A friend of me wrote a program with the same meaning in Java and succeeded in 7 seconds.
import Data.List
opl = find vw $ map (\x-> fromDigits (x++[0,0,9]) )
$ sequence [[1],re,[2],re,[3],re,[4],re,[5],re,[6],re,[7],re,[8],re]
vw x = hh^2 == x
where hh = (round.sqrt.fromIntegral) x
re = [0..9]
fromDigits x = foldl1 (\n m->10*n+m) x
I know this program would output the number I want given enough RAM and time, but there has to be a better-performing way.
The main problem here is that sequence has a space leak. It is defined like this:
sequence [] = [[]]
sequence (xs:xss) = [ y:ys | y <- xs, ys <- sequence xss ]
so the problem is that the list produced by the recursive call sequence xss is re-used for each of the elements of xs, so it can't be discarded until the end. A version without the space leak is
myseq :: [[a]] -> [[a]]
myseq xs = go (reverse xs) []
where
go [] acc = [acc]
go (xs:xss) acc = concat [ go xss (x:acc) | x <- xs ]
PS. the answer seems to be Just 1229314359627783009
Edit version avoiding the concat:
seqlists :: [[a]] -> [[a]]
seqlists xss = go (reverse xss) [] []
where
go [] acc rest = acc : rest
go (xs:xss) acc rest = foldr (\y r -> go xss (y:acc) r) rest xs
note that both of these versions generate the results in a different order from the standard sequence, so while they work for this problem we can't use one as a specialised version of sequence.
Following on from the answer given by Simon Marlow, here's a version of sequence that avoids the space leak while otherwise working just like the original, including preserving the order.
It still uses the nice, simple list comprehension of the original sequence - the only difference is that a fake data dependency is introduced that prevents the recursive call from being shared.
sequenceDummy d [] = d `seq` [[]]
sequenceDummy _ (xs:xss) = [ y:ys | y <- xs, ys <- sequenceDummy (Just y) xss ]
sequenceUnshared = sequenceDummy Nothing
I think this is a better way of avoiding the sharing that leads to the space leak.
I'd blame the excessive sharing on the "full laziness" transformation. Normally this does a great job of creating sharing that avoids recomputions, but sometimes recompution is very much more efficient than storing shared results.
It'd be nice if there was a more direct way to tell the compiler not to share a specific expression - the above dummy Maybe argument works and is efficient, but it's basically a hack that's just complicated enough that ghc can't tell that there's no real dependency. (In a strict language you don't have these issues because you only have sharing where you explicitly bind a variable to a value.)
EDIT: I think I'm wrong here - changing the type signature to :: Maybe Word64 (which would be enough bits for this problem I think) also takes forever / has a space leak, so it couldn't be the old Integer bug.
Your problem seems to be an old GHC bug (that I thought was fixed) with Integer causing a space leak. The below code finishes in about 150 ms when compiled with -O2.
import Data.List
import Data.Word
main = print opl
opl :: Maybe Word32
opl = find vw $ map (\x-> fromDigits (x++[0,0,9]) ) $ sequence [[1],re,[2],re,[3],re,[4],re,[5],re,[6],re,[7],re,[8],re]
vw x = hh^2 == x
where hh = (round.sqrt.fromIntegral) x
re = [0..9]
fromDigits x = foldl1 (\n m->10*n+m) x
Since you're looking for a nineteen-digit number with those characteristics found in vw, I'd try to simplify the construction in the mapped function just say fromDigits x*1000+9 for starters. Appending to a list is O(length-of-the-left-list), so throwing those last three digits on the end hurts the computation time a bunch.
As an aside (to you both), using the strict version of the fold (foldl1') will also help.

Resources