Is it possible to debug pattern matching in a Haskell function? - debugging

I have defined a type
data Expr =
Const Double
| Add Expr Expr
| Sub Expr Expr
and declared it as an instance of Eq typeclass:
instance Eq Expr where
(Add (Const a1) (Const a2)) == Const b = a1+a2 == b
(Add (Const a1) (Const a2)) == (Add (Const b1) (Const b2)) = a1+a2 == b1 + b2
Of course, the evaluation of the expression Sub (Const 1) (Const 1) == Const 0 will fail. How can I debug at runtime the pattern matching process to spot that it's failing? I would like to see how Haskell takes the arguments of == and walks through the patterns. Is it possible at all?

edit: providing a real answer to the question...
I find the easiest way to see what patterns are matching is to add trace statements, like so:
import Debug.Trace
instance Eq Expr where
(Add (Const a1) (Const a2)) == Const b = trace "Expr Eq pat 1" $ a1+a2 == b
(Add (Const a1) (Const a2)) == (Add (Const b1) (Const b2)) = trace "Expr Eq pat 2" $ a1+a2 == b1 + b2
-- catch any unmatched patterns
l == r = error $ "Expr Eq failed pattern match. \n l: " ++ show l ++ "\n r: " ++ show r
If you don't include a final statement to catch any otherwise unmatched patterns, you'll get a runtime exception, but I find it's more useful to see what data you're getting. Then it's usually simple to see why it doesn't match the previous patterns.
Of course you don't want to leave this in production code. I only insert traces as necessary then remove them when I've finished. You could also use CPP to leave them out of production builds.
I also want to say that I think pattern matching is the wrong way to go about this. You'll end up with a combinatorial explosion in the number of patterns, which quickly grows unmanageable. If you want to make a Float instance for Expr for example, you'll need several more primitive constructors.
Instead, you presumably have an interpreter function interpret :: Expr -> Double, or at least could write one. Then you can define
instance Eq Expr where
l == r = interpret l == interpret r
By pattern matching, you're essentially re-writing your interpret function in the Eq instance. If you want to make an Ord instance, you'll end up re-writing the interpret function yet again.

If you wish to get some examples on how the matching may fail, you could have a look at QuickCheck. There's an example on the manual (the size of test data) about generating and testing recursive data types that seems to perfectly suit your needs.
While the -Wall flag gives you a list of patterns non matched, a run of QuickCheck gives you examples of input data that lead your given proposition to failure.
For example, if I write a generator for your Expr and I give in input to quickCheck a proposition prop_expr_eq :: Expr -> Bool that checks if an Expr is equal to itself, I obtain very quickly Const 0.0 as a first example of non-matching input.
import Test.QuickCheck
import Control.Monad
data Expr =
Const Double
| Add Expr Expr
| Sub Expr Expr
deriving (Show)
instance Eq Expr where
(Add (Const a1) (Const a2)) == Const b = a1+a2 == b
(Add (Const a1) (Const a2)) == (Add (Const b1) (Const b2)) = a1+a2 == b1 + b2
instance Arbitrary Expr where
arbitrary = sized expr'
where
expr' 0 = liftM Const arbitrary
expr' n | n > 0 =
let subexpr = expr' (n `div` 2)
in oneof [liftM Const arbitrary,
liftM2 Add subexpr subexpr,
liftM2 Sub subexpr subexpr]
prop_expr_eq :: Expr -> Bool
prop_expr_eq e = e == e
As you see, running the test gives you a counterexample to prove that your equality test is wrong. I know this may be a little bit an overkill, but the advantage if you write things good is that you also get unit tests for your code that look at arbitrary properties, not only pattern matching exhaustiveness.
*Main> quickCheck prop_expr_eq
*** Failed! Exception: 'test.hs:(11,5)-(12,81): Non-exhaustive patterns in function ==' (after 1 test):
Const 0.0
PS: Another good reading about unit testing with QuickCheck is in the free book real world haskell.

You can break your complex pattern into simpler patterns and use trace to see what's going on. Something like this:
instance Eq Expr where
x1 == x2 | trace ("Top level: " ++ show (x, y1)) True,
Add x11 x12 <- x1,
trace ("First argument Add: " ++ show (x11, x12)) True,
Const a1 <- x11,
trace ("Matched first Const: " ++ show a1) True,
Const a2 <- x12,
trace ("Matched second Const: " ++ show a2) True,
Const b <- x2
trace ("Second argument Const: " ++ show b) True
= a1+a2 == b
It's a bit desperate, but desperate times calls for desperate measures. :)
As you get used to Haskell you rarely, if ever, need to do this.

Related

How to perform several operations under the same condition?

I want to negate the chosen element of the matrix along with its adjacent elements.
My question is how do I make these multiple expressions happen without '&&'. I don't know the syntax very well.
I am getting-
Error: This expression has type unit but an expression was expected of type bool
let matrix2 =[|[|true;true;false;false|];
[|false;false;true;true|];
[|true;false;true;false|];
[|true;false;false;true|]|];;
let flip_matrix matrix a b=
let n=Array.length matrix in
for i=1 to n do
let n1=Array.length matrix in
for j=1 to n1 do
if i=a && j=b
then
matrix.(i).(j)<- not matrix.(i).(j)&&matrix.(i+1).(j+1)<- not matrix.(i+1).(j+1)&&matrix.(i-1).(j-1)<- not matrix.(i-1).(j-1)
done;
done;
matrix;;
flip_matrix matrix2 1 2;;
The sequencing operator ; is used to chain together several expressions,
<exp1>; <exp2>
means evaluate <exp1> first, then evaluate <exp2>, example:
print_endline "Hello!"; print_endline "World."
Note that ; works only for expressions that return a value of type unit, i.e., that are evaluated only for their side-effects and do not produce any useful values.
When you need to chain several expression that produce useful values, you need to bound those values to some variables, and have to use the let <v> = <exp1> in <exp2>. This expression will evaluate <exp1> and bound it to the variable <v> that becomes available for expression <exp2>, which is evaluated after that. Example,
let message = "hello", ^ ", world" in
print_endline message
As you can see, the <exp1>; <exp2> is just a short-hand notation for,
let () = <exp1> in <exp2>
Also, note that could be a let .. in .. expression itself, so that you can chain an arbitrary number of expressions in OCaml,
let x1 = f1 y1 in
let x2 = f2 y2 in
...
let xN = fN yN in
final_result
Now, we're ready for conditional expressions such as if. It would be natural to assume that
if x > 0 then print_endline "Hello"; print_endline "World"
Would print
Hello
World
If x is greater than zero. But that is wrong! As a I described recent in this answer, the if expression has higher precedence (priority) than ;, so in fact the OCaml parser splits this into two expressions:
(if x > 0 then print_endline "Hello"); print_endline "World"
So that at the end only one of the expressions is under condition. As always in such precedence problems the solution is to use parentheses (or begin/end, which is the same), e.g.,
if x > 0 then (print_endline "Hello"; print_endline "World")
You can also use the more generic let .. in .. if you would like, it works without any extra parentheses, e.g.,
if x > 0 then
let () = print_endline "Hello" in
print_endline "World"
albeit a little bit ugly :)
Mutating assignment of an array evaluates to the unit:
utop # let arr = Array.make 10 0;;
val arr : int array = [|0; 0; 0; 0; 0; 0; 0; 0; 0; 0|]
utop # arr.(0) <- 1;;
- : unit = ()
In this line:
matrix.(i).(j)<- not matrix.(i).(j)&&matrix.(i+1).(j+1)<- not matrix.(i+1).(j+1)&&matrix.(i-1).(j-1)<- not matrix.(i-1).(j-1)
You are using && to conjoin a boolean value with the result of evaluating matrix.(i+1).(j+1) <- ..., and that latter expression with be the unit. Of course && only works to conjoin too boolean values.
I think this should do it:
let matrix33 = [|[|true;true;false;false|];
[|false;false;true;true|];
[|true;false;true;false|];
[|true;false;false;true|]|];;
let flip_matrix matrix a b=
let n=Array.length matrix in
for i=1 to n do
let n1=Array.length matrix in
for j=1 to n1 do
if i=a && j=b
then begin
matrix.(i).(j)<- not matrix.(i).(j);
matrix.(i+1).(j)<- not matrix.(i+1).(j);
matrix.(i).(j+1)<- not matrix.(i).(j+1);
matrix.(i).(j-1)<- not matrix.(i).(j-1);
matrix.(i-1).(j)<- not matrix.(i-1).(j);
end;
done;
done;
matrix;;
flip_matrix matrix33 1 1 ;;```

a haskell function to test if an integer appears after another integer

I'm writing a function called after which takes a list of integers and two integers as parameters. after list num1 num2 should return True if num1 occurs in the list and num2 occurs in list afternum1. (Not necessarily immediately after).
after::[Int]->Int->Int->Bool
after [] _ _=False
after [x:xs] b c
|x==b && c `elem` xs =True
|x/=b && b `elem` xs && b `elem` xs=True
This is what I have so far,my biggest problem is that I don't know how to force num2 to be after num1.
There's a few different ways to approach this one; while it's tempting to go straight for recursion on this, it's nice to
avoid using recursion explicitly if there's another option.
Here's a simple version using some list utilities. Note that it's a Haskell idiom that the object we're operating over is usually the last argument. In this case switching the arguments lets us write it as a pipeline with it's third argument (the list) passed implicitly:
after :: Int -> Int -> [Int] -> Bool
after a b = elem b . dropWhile (/= a)
Hopefully this is pretty easy to understand; we drop elements of the list until we hit an a, assuming we find one we check if there's a b in the remaining list. If there was no a, this list is [] and obviously there's no b there, so it returns False as expected.
You haven't specified what happens if 'a' and 'b' are equal, so I'll leave it up to you to adapt it for that case. HINT: add a tail somewhere ;)
Here are a couple of other approaches if you're interested:
This is pretty easily handled using a fold;
We have three states to model. Either we're looking for the first elem, or
we're looking for the second elem, or we've found them (in the right order).
data State =
FindA | FindB | Found
deriving Eq
Then we can 'fold' (aka reduce) the list down to the result of whether it matches or not.
after :: Int -> Int -> [Int] -> Bool
after a b xs = foldl go FindA xs == Found
where
go FindA x = if x == a then FindB else FindA
go FindB x = if x == b then Found else FindB
go Found _ = Found
You can also do it recursively if you like:
after :: Int -> Int -> [Int] -> Bool
after _ _ [] = False
after a b (x:xs)
| x == a = b `elem` xs
| otherwise = after a b xs
Cheers!
You can split it into two parts: the first one will find the first occurrence of num1. After that, you just need to drop all elements before it and just check that num2 is in the remaining part of the list.
There's a standard function elemIndex for the first part. The second one is just elem.
import Data.List (elemIndex)
after xs x y =
case x `elemIndex` xs of
Just i -> y `elem` (drop (i + 1) xs)
Nothing -> False
If you'd like to implement it without elem or elemIndex, you could include a subroutine. Something like:
after xs b c = go xs False
where go (x:xs) bFound
| x == b && not (null xs) = go xs True
| bFound && x == c = True
| null xs = False
| otherwise = go xs bFound

Suggestions for reducing allocations (and work) in this Haskell function

I have the following function in my (much larger) Haskell code (with some supporting code to make it clear what's what):
import qualified Data.Set as S
import qualified Data.IntMap.Strict as M
import Data.Ord
import Data.Monoid
data Atom = Neg { index :: Int }
| Pos { index :: Int }
deriving (Eq, Ord, Show, Read)
newtype Clause = Clause { atoms :: S.Set Atom }
deriving (Eq, Show, Read)
instance Ord Clause where
compare = comparing (Down . S.size . atoms) <> comparing atoms
newtype Form = Form { clauses :: S.Set Clause }
deriving (Eq, Ord, Show, Read)
type Interpretation = M.IntMap Bool
-- the function of interest
interpret :: Interpretation -> Form -> Maybe Bool
interpret interp = evalForm
where evalAtom x#(Pos _) = M.lookup (index x) interp
evalAtom x#(Neg _) = not <$> M.lookup (index x) interp
evalClause (Clause x)
| S.member (Just False) evaluated = Just False
| evaluated == S.singleton (Just True) = Just True
| otherwise = Nothing
where evaluated = S.map evalAtom x
evalForm (Form x)
| S.member (Just True) evaluated = Just True
| evaluated == S.singleton (Just False) = Just False
| otherwise = Nothing
where evaluated = S.map evalClause x
Having profiled my Haskell program, I've found that this interpret function's allocations comprise nearly 40% of all allocations in my program (as well as about 40% of the CPU work).
Is there any way I can reduce either the amount of work interpret does, or the amount it allocates? This could potentially win me big performance gains (which I could really need, as I need to run this code many times, for experiments).
I would experiment with S.foldr.
From your code, it looks as if these are AND-clauses, so I will assume an empty clause is false.
evalClause (Clause x) = S.foldr f (Just False) $ S.map evalAtom x
where f b#(Just False) _ = b
f (Just True) y = y
f Nothing y#(Just False) = y
f Nothing y = Nothing
and the analogous for evalForm.
It might also be beneficial to use lists rather than sets. Sets, as implemented, are strict, and (I think) will not trigger some optimizations like fusion/deforestation/etc. Lists are lazily produced, and should behave better in this sort of code.
evalClause (Clause x) = foldr f (Just False) . map evalAtom $ S.toList x
...
An observation:
A Maybe Bool can only have three possible values - Nothing, Just False and Just True.
evaluated in both evalClause and evalForm has type Set (Maybe Bool) which can be represented with three bits which fits in a Int.
I would define:
data MaybeBool = Nuthin | JustFalse | JustTrue
deriving (Eq, Ord, Enum, Bounded, Show, Read)
and change the signature of intepret return a MaybeBool
Then define evaluated as a bitset like this:
import Data.Bits
evaluated = foldl' combine 0 (map evalAtom (S.toList x))
where combine s a = s .|. (1 `shiftLeft` fromEnum a)
evaluated will be a Int between 0 and 7 with bit 0 set if Nutin is in the set, bit 1 set if JustFalse is in the set and bit 2 set if JustTrue is in the set. This will eliminate allocation of Sets from your computation.

SML syntax: `val rec` and `fun` compared to each other

What are the know things possible with one and not with the other? What are the known idioms to work around limitations of any one of the two?
What I know of it
In another question, Andreas Rossberg pointed to a restriction applying to val rec in SML: it must be of the form of an fn‑match, even when other expressions would make sense.
The fun syntax does not have such a restriction, but can't be used to introduce a simple binding (I mean, simply a name with an optional type annotation and nothing else), as it requires arguments to be exposed.
In an older question I lose track of, there was discrete comments in favour or fun over val / val rec.
I personally more use val / val rec, because it expose the distinction between self‑recursive and non‑self‑recursive binding (while what's exposed as self‑recursive may not actually be, the other way always hold, what's exposed as not self‑recursive is never self‑recursive), and also because it use the same syntax as anonymous lambda expressions (more consistency).
The (all related) questions
These are the things I know. Are there others? I less know about any workaround idioms. Are they some?
Limitations of both seems to me to be syntactical only, and not have real semantic or soundness background. Is this indeed or are there semantic and soundness background for these limitations?
A sample case (you can skip it)
If it's not abusing, I'm posting below a snippet, a variation of the one posted in the question linked above. This snippet expose a case where I'm having an issue with both (I could not be happy of neither one). The comments tells where are the two issues and why it's issues to my eyes. This sample can't really be simplified, as the issue are syntactical, and so the real use case matters.
(* ======================================================================== *)
(* A process chain. *)
datatype 'a process = Chain of ('a -> 'a process)
(* ------------------------------------------------------------------------ *)
(* An example controlling iterator using a process chain. it ends up to be
* a kind of co‑iteration (if that's not misusing the word). *)
val rec iter =
fn process: int -> int process =>
fn first: int =>
fn last: int =>
let
val rec step =
fn (i, Chain process) =>
if i < first then ()
else if i = last then (process i; ())
else if i > last then ()
else
let val Chain process = process i
in step (i + 1, Chain process)
end
in step (first, Chain process)
end
(* ------------------------------------------------------------------------ *)
(* A tiny test use case. *)
val rec process: int -> int process =
fn a: int =>
(print (Int.toString a);
Chain (fn a => (print "-";
Chain (fn a => (print (Int.toString a);
Chain (fn a => (print "|";
Chain process)))))))
(* Note the above is recursive: fn x => (a x; Chain (fn x => …)). We can't
* easily extract seperated `fn`, which would be nice to help composition.
* This is solved in the next section. *)
val () = iter process 0 20
val () = print "\n"
(* ======================================================================== *)
(* This section attempts to set‑up functions and operators to help write
* `process` in more pleasant way or with a more pleasant look (helps
* readability).
*)
(* ------------------------------------------------------------------------ *)
(* Make nested functions, parameters, with an helper function. *)
val chain: ('a -> unit) -> ('a -> 'a process) -> ('a -> 'a process) =
fn e =>
fn p =>
fn a => (e a; Chain p)
(* Now that we can extract the nested functions, we can rewrite: *)
val rec process: int -> int process =
fn a =>
let
val e1 = fn a => print (Int.toString a)
val e2 = fn a => print "-"
val e3 = fn a => print (Int.toString a)
val e4 = fn a => print "|"
in
(chain e1 (chain e2 (chain e3 (chain e4 process)))) a
end
(* Using this:
* val e1 = fn a => print (Int.toString a)
* val e2 = fn a => print "-"
* …
*
* Due to an SML syntactical restriction, we can't write this:
* val rec process = chain e1 (chain e2 ( … process))
*
* This requires to add a parameter on both side, but this, is OK:
* fun process a = (chain e1 (chain e2 ( … process))) a
*)
val e1 = fn a => print (Int.toString a)
val e2 = fn a => print "-"
val e3 = fn a => print (Int.toString a)
val e4 = fn a => print "|"
(* An unfortunate consequence of the need to use `fun`: the parameter added
* for `fun`, syntactically appears at the end of the expression, while it
* will be the parameter passed to `e1`. This syntactical distance acts
* against readability.
*)
fun process a = (chain e1 (chain e2 (chain e3 (chain e4 process)))) a
(* Or else, this, not better, with a useless `fn` wrapper: *)
val rec process = fn a =>
(chain e1 (chain e2 (chain e3 (chain e4 process)))) a
(* A purely syntactical function, to move the last argument to the front. *)
val start: 'a -> ('a -> 'b) -> 'b = fn a => fn f => f a
(* Now that we can write `start a f` instead of `f a`, we can write: *)
fun process a = start a (chain e1 (chain e2 (chain e3 (chain e4 process))))
infixr 0 THEN
val op THEN = fn (e, p) => (chain e p)
fun process a = start a (e1 THEN e2 THEN e3 THEN e4 THEN process)
(* This is already more pleasant (while still not perfect). Let's test it: *)
val () = iter process 0 20
val () = print "\n"
The val rec form computes a smallest fixpoint. Such a fixpoint isn't always well-defined or unique in the general case (at least not in a strict language). In particular, what should the meaning of a recursive binding be if the right-hand side(s) contain expressions that require non-trivial computation, and these computations already depend on what's being defined?
No useful answer exists, so SML (like many other languages) restricts recursion to (syntactic) functions. This way, it has a clear semantic explanation in terms of well-known fixpoint operators like Y, and can be given simple enough evaluation rules.
The same applies to fun, of course. More specifically,
fun f x y = e
is merely defined as syntactic sugar for
val rec f = fn x => fn y => e
So there has to be at least one parameter to fun to satisfy the syntactic requirement for val rec.
I will attempt to start to answer my own question.
For the case of the forced use of a wrapper fn due to syntactic restrictions (may be an issue to consider adressing with sML ?), I could find, not really a workaround, but an idiom which helps to make these cases less noisy.
I reused the start function from the sample (see question), and renamed it as n_equiv, for the reason given in comment. This would just require a few prior wording to explain what the η-equivalence is and also to tell about the syntactical restrictions which justify the definition and use of this function (which is always good for learning material anyway, and I'm planning to post some SML samples on a French forum).
(* A purely syntactical function, to try to make forced use of `fn` wrappers
* a bit more acceptable. The function is named `n_equiv`, which refers to
* the η-equivalence axiom. It explicitly tells the construction has no
* effect. The function syntactically swap the function expression and its
* argument, so that both occurrences of the arguments appears close
* to each other in text, which helps avoid disturbance.
*)
val n_equiv: 'a -> ('a -> 'b) -> 'b = fn a => fn f => f a
Use case from the sample in the question, now looks like this:
fun process a = n_equiv a (chain e1 (chain e2 (chain e3 (chain e4 process))))
…
fun process a = n_equiv a (e1 THEN e2 THEN e3 THEN e4 THEN process)
That's already better, as now one is clearly told the surrounding construct is neutral.
To answer another part of the question, this case at least is more easily handled with fun than with val rec, as with val rec, the n_equiv self‑documenting idiom cannot be applied. That's a point in favour of fun over val rec … = fn …
Update #1
A page which mentions the compared verbosity of fun vs that of val: TipsForWritingConciseSML (mlton.org). See “Clausal Function Definitions” around the middle of the page. For non‑self‑recursive function, val … fn is less verbose than fun, it may vary for self‑recursive functions.

Extending Immutable types (or: fast cache for immutable types) in OCaml

I have a recursive immutable data structure in ocaml which can be simplified to something like this:
type expr =
{
eexpr : expr_expr;
some_other_complex_field : a_complex_type;
}
and expr_expr =
| TInt of int
| TSum of (expr * expr)
| TMul of (expr * expr)
It's an AST, and sometimes it gets pretty complex (it's very deep).
there is a recursive function that evaluates an expression. For example, let's say,
let rec result expr =
match expr.eexpr with
| TInt i -> i
| TSum (e1, e2) -> result e1 + result e2
| TMul (e1, e2) -> result e1 * result e2
Now suppose I am mapping an expression to another expression, and I need to constantly check the result of an expr, sometimes more than once for the same expr, and sometimes for expressions that were recently mapped by using the pattern
{ someExpr with eexpr = TSum(someExpr, otherExpr) }
Now, the result function is very lightweight, but running it many times for a deep AST will not be very optimized. I know I could cache the value using a Hashtbl, but AFAIK the Hashtbl will only do structural equality, so it will need to traverse my long AST anyway.
I know the best option would be to include a probably immutable "result" field in the expr type. But I can't.
So is there any way in Ocaml to cache a value to an immutable type, so I don't have to calculate it eagerly every time I need it ?
Thanks!
Hash-cons the values of expr_expr. By doing this structurally equal values in your program will share exactly the same memory representation and you can substitute structural equality (=) by physical equality (==).
This paper should get you quickly started on hash-consing in OCaml.
You can use the functorial interface to control the kind of equality used by the hash table. I believe the semantics of (==) are legitimate for your purposes; i.e., if A == B then f A = f B for any pure function f. So you can cache the results of f A. Then if you find a B that's physically equal to A, the cached value is correct for B.
The downside of using (==) for hashing is that the hash function will send all structurally equal objects to the same hash bucket, where they will be treated as distinct objects. If you have a lot of structurally equal objects in the table, you get no benefit from the hashing. The behavior degenerates to a linear search.
You can't define the hash function to work with physical addresses, because the physical addresses can be changed at any time by the garbage collector.
However, if you know your table will only contain relatively few large-ish values, using physical equality might work for you.
I think you can merge the two ideas above : use hash-consing-like techniques to get the hash of the "pure expression" part of your data, and use this hash as key in the memoization table for the eval function.
Of course this only works when your eval function indeed only depends on the "pure expression" part of the function, as in the example you gave. I believe that is a relatively general case, at least if you restrict yourself to storing the successful evaluations (that won't, for example, return an error including some location information).
Edit: a small proof of concept:
type 'a _expr =
| Int of int
| Add of 'a * 'a
(* a constructor to avoid needing -rectypes *)
type pure_expr = Pure of pure_expr _expr
type loc = int
type loc_expr = {
loc : loc;
expr : loc_expr _expr;
pure : pure_expr (* or any hash_consing of it for efficiency *)
}
(* this is where you could hash-cons *)
let pure x = Pure x
let int loc n =
{ loc; expr = Int n; pure = pure (Int n) }
let add loc a b =
{ loc; expr = Add (a, b); pure = pure (Add(a.pure, b.pure)) }
let eval =
let cache = Hashtbl.create 251 in
let rec eval term =
(* for debug and checking memoization *)
Printf.printf "log: %d\n" term.loc;
try Hashtbl.find cache term.pure with Not_found ->
let result =
match term.expr with
| Int n -> n
| Add(a, b) -> eval a + eval b in
Hashtbl.add cache term.pure result;
result
in eval
let test = add 3 (int 1 1) (int 2 2)
# eval test;;
log: 3
log: 2
log: 1
- : int = 3
# eval test;;
log: 3
- : int = 3

Resources