Scheme syntax-rules pattern matching algorithm - algorithm

I'm writing a macro expansion algorithm for a programming project, and I am attempting to add an r7rs-small compliant macro expansion pass. One part of this expansion algorithm requires matching patterns.
However, I'm having difficulty coming up with a pattern-matching algorithm that deals with Scheme's repetition patterns. The given example is from the r7rs small spec, which expands let* into nested lets:
(define-syntax let*
(syntax-rules ()
((let* () body1 body2 ...)
(let () body1 body2 ...)
((let* ((name1 val1) (name2 val2) ....)
body1 body2 ...)
(let ((name1 val1))
(let* ((name2 val2) ...)
body1 body2 ...)))))
As you can see, the p ... syntax needs to be able to repeatedly match 0 or more repetitions of the pattern p.
My first attempt was:
-- This datatype is a given
data SExp = SAtom String | SPair SExp SExp | SEmpty
data Pat = PatEmpty -- ()
| PatPair Pat Pat -- (p1 . p2)
| PatWild -- _
| PatVar String -- x
| PatRepeated Pat -- p ...
| PatAtom String -- a
type MatchResult = Map String SExp
matchPat :: Pat -> SExp -> Maybe MatchResult
matchPat p e =
case (p, e) of
(PatEmpty, SEmpty) -> Just Map.empty
(PatWild, _) -> Just Map.empty
(PatVar x, _) -> Just (Map.singleton x e)
(PatAtom a, SAtom a') | a == a' -> Just Map.empty
(PatPair p1 p2, SPair e1 e2) -> do
-- This is the problem case
-- This implementation cannot handle repetitions,
-- since p1 should be able to consume parts of e2
res1 <- matchPat p1 e1
res2 <- matchPat p2 e2
Just (Map.union res1 res2)
_ -> Nothing
However, I'm skeptical that this pattern representation is good enough for implementing this kind of matching algorithm. Any help would be great.

Related

Ocaml - writing a function who's number of arguments is determined at runtime

I want to write a function f, that takes n arguments, where n is determined at runtime, and might vary at every call to the function, for example
let's say our function f takes an integer n which is the number of args, and n args of the same type and turns them into a list:
# f 3 'a' 'b' 'c';;
- : char list = ['a'; 'b'; 'c']
# f 2 1 2;;
- : int list = [1; 2]
I thaught of something like
let f acc n x =
if n = 0
then List.rev (x::acc)
else f [x] (x - 1)
but in this case it won't work because of the type difference.
Using currying, you can do something that resembles variadic functions, but you’ll have to convince the type checker. You will not be able to conveniently provide the arity of your function as a bare integer; instead, you can unary-encode the arity as a value of a GADT:
type (_, 'r) arity =
| O : ('r, 'r) arity
| I : ('f, 'r) arity -> (int->'f, 'r) arity
The encoding works as follows:
O : ('r, 'r) arity represents the arity of a “function that takes no argument” and returns an 'r;
I O : (int -> 'r, 'r) arity represents the arity of a function that takes an int and then returns an 'r;
I (I O) : (int -> int -> 'r, 'r) arity represents the arity of a function that takes two ints and then returns an 'r;
I (I (I O)) : (int -> int -> int -> 'r, 'r) arity is the arity of a function that takes three ints and then returns an 'r;
etc.
Instead of passing 3 as a first argument to your hypothetical variadic function, you would pass I (I (I O)). This value describes the sequence of arguments that the function is supposed to take (one int, then one int, then one int, then return). The function would then proceed recursively, destructing (inspecting) this description to decide what to do next You can implement your example function that builds the list of all its arguments, like so:
let rec f_aux : type f. int list -> (f, int list) arity -> f =
fun acc arity ->
begin match arity with
| O -> List.rev acc
| I a -> fun x -> f_aux (x :: acc) a
end
let f arity = f_aux [] arity
# f (C(C(C O))) ;;
- : int -> int -> int -> int list = <fun>
# f (C(C(C O))) 111 222 333 ;;
- : int list = [111; 222; 333]
As is common with GADTs, type inference is not enough and you have to annotate your definition with the intended type, including an explicit universal quantification (type f. … where f is the type variable being quantified).
The GADT defined above can only describe variadic functions that deal with ints, but notice that you can easily extend it to allow more types of arguments (then of course, you should adapt your variadic functions so that they deal with these added possibilities):
type (_, 'r) arity =
| O : ('r, 'r) arity
| I : ('f, 'r) arity -> (int->'f, 'r) arity
| B : ('f, 'r) arity -> (bool->'f, 'r) arity
| C : ('f, 'r) arity -> (char->'f, 'r) arity
| S : ('f, 'r) arity -> (string->'f, 'r) arity
(* etc. *)
let rec g_aux : type f. string -> (f, string) arity -> f =
fun acc arity ->
begin match arity with
| O -> acc
| I a -> fun x -> g_aux (acc ^ string_of_int x) a
| B a -> fun x -> g_aux (acc ^ if x then "true" else "false") a
| C a -> fun x -> g_aux (acc ^ String.make 1 x) a
| S a -> fun x -> g_aux (acc ^ x) a
(* etc. *)
end
let g arity = g_aux "" arity
# g (S(I(S(B(C O))))) ;;
- : string -> int -> string -> bool -> char -> string = <fun>
# g (S(I(S(B(C O))))) "Number " 42 " is prime. I swear, it’s " true '!' ;;
- : string = "Number 42 is prime. I swear, it’s true!"
As a matter of fact, this is essentially how pretty-printing is implemented in OCaml: when you write Printf.printf "%s%b" …, the format string is not actually a string, it is syntactic sugar kindly supplied by the compiler for a value of some very complicated GADT type such as (_,_,_,_,_,_) format6 (6 type parameters!). You might just as well build the GADT value by hand (don’t). This syntactic sugar is the only magic that the compiler does for pretty-printing, everything else works with standard language features.
Well, we have a system that works, at least it typechecks. Syntax is not pretty unless the compiler gives you sugar. More importantly, arities are encoded and checked within the static type system, which means, they are known at compile-time. You cannot (or at least it’s hard to do safely) read an arity as input of your program, dynamically, at run-time.
The actual question is: why would you actually need to do that, instead of just using a list? It brings nothing except syntactic convenience perhaps.
Your requirement doesn't make sense, since there is no way to dynamically change the number of parameters of a function at runtime. The number of parameters in any function call is directly visible by examining the text of the source code:
f a b (* Two parameters *)
f a b c (* Three parameters *)
There is no dynamic evaluation mechanism in OCaml (like the eval mechanism of other languages). This is part of what it means to be statically typed.
You can get the effect you want just by passing a list to the function.

SML syntax: `val rec` and `fun` compared to each other

What are the know things possible with one and not with the other? What are the known idioms to work around limitations of any one of the two?
What I know of it
In another question, Andreas Rossberg pointed to a restriction applying to val rec in SML: it must be of the form of an fn‑match, even when other expressions would make sense.
The fun syntax does not have such a restriction, but can't be used to introduce a simple binding (I mean, simply a name with an optional type annotation and nothing else), as it requires arguments to be exposed.
In an older question I lose track of, there was discrete comments in favour or fun over val / val rec.
I personally more use val / val rec, because it expose the distinction between self‑recursive and non‑self‑recursive binding (while what's exposed as self‑recursive may not actually be, the other way always hold, what's exposed as not self‑recursive is never self‑recursive), and also because it use the same syntax as anonymous lambda expressions (more consistency).
The (all related) questions
These are the things I know. Are there others? I less know about any workaround idioms. Are they some?
Limitations of both seems to me to be syntactical only, and not have real semantic or soundness background. Is this indeed or are there semantic and soundness background for these limitations?
A sample case (you can skip it)
If it's not abusing, I'm posting below a snippet, a variation of the one posted in the question linked above. This snippet expose a case where I'm having an issue with both (I could not be happy of neither one). The comments tells where are the two issues and why it's issues to my eyes. This sample can't really be simplified, as the issue are syntactical, and so the real use case matters.
(* ======================================================================== *)
(* A process chain. *)
datatype 'a process = Chain of ('a -> 'a process)
(* ------------------------------------------------------------------------ *)
(* An example controlling iterator using a process chain. it ends up to be
* a kind of co‑iteration (if that's not misusing the word). *)
val rec iter =
fn process: int -> int process =>
fn first: int =>
fn last: int =>
let
val rec step =
fn (i, Chain process) =>
if i < first then ()
else if i = last then (process i; ())
else if i > last then ()
else
let val Chain process = process i
in step (i + 1, Chain process)
end
in step (first, Chain process)
end
(* ------------------------------------------------------------------------ *)
(* A tiny test use case. *)
val rec process: int -> int process =
fn a: int =>
(print (Int.toString a);
Chain (fn a => (print "-";
Chain (fn a => (print (Int.toString a);
Chain (fn a => (print "|";
Chain process)))))))
(* Note the above is recursive: fn x => (a x; Chain (fn x => …)). We can't
* easily extract seperated `fn`, which would be nice to help composition.
* This is solved in the next section. *)
val () = iter process 0 20
val () = print "\n"
(* ======================================================================== *)
(* This section attempts to set‑up functions and operators to help write
* `process` in more pleasant way or with a more pleasant look (helps
* readability).
*)
(* ------------------------------------------------------------------------ *)
(* Make nested functions, parameters, with an helper function. *)
val chain: ('a -> unit) -> ('a -> 'a process) -> ('a -> 'a process) =
fn e =>
fn p =>
fn a => (e a; Chain p)
(* Now that we can extract the nested functions, we can rewrite: *)
val rec process: int -> int process =
fn a =>
let
val e1 = fn a => print (Int.toString a)
val e2 = fn a => print "-"
val e3 = fn a => print (Int.toString a)
val e4 = fn a => print "|"
in
(chain e1 (chain e2 (chain e3 (chain e4 process)))) a
end
(* Using this:
* val e1 = fn a => print (Int.toString a)
* val e2 = fn a => print "-"
* …
*
* Due to an SML syntactical restriction, we can't write this:
* val rec process = chain e1 (chain e2 ( … process))
*
* This requires to add a parameter on both side, but this, is OK:
* fun process a = (chain e1 (chain e2 ( … process))) a
*)
val e1 = fn a => print (Int.toString a)
val e2 = fn a => print "-"
val e3 = fn a => print (Int.toString a)
val e4 = fn a => print "|"
(* An unfortunate consequence of the need to use `fun`: the parameter added
* for `fun`, syntactically appears at the end of the expression, while it
* will be the parameter passed to `e1`. This syntactical distance acts
* against readability.
*)
fun process a = (chain e1 (chain e2 (chain e3 (chain e4 process)))) a
(* Or else, this, not better, with a useless `fn` wrapper: *)
val rec process = fn a =>
(chain e1 (chain e2 (chain e3 (chain e4 process)))) a
(* A purely syntactical function, to move the last argument to the front. *)
val start: 'a -> ('a -> 'b) -> 'b = fn a => fn f => f a
(* Now that we can write `start a f` instead of `f a`, we can write: *)
fun process a = start a (chain e1 (chain e2 (chain e3 (chain e4 process))))
infixr 0 THEN
val op THEN = fn (e, p) => (chain e p)
fun process a = start a (e1 THEN e2 THEN e3 THEN e4 THEN process)
(* This is already more pleasant (while still not perfect). Let's test it: *)
val () = iter process 0 20
val () = print "\n"
The val rec form computes a smallest fixpoint. Such a fixpoint isn't always well-defined or unique in the general case (at least not in a strict language). In particular, what should the meaning of a recursive binding be if the right-hand side(s) contain expressions that require non-trivial computation, and these computations already depend on what's being defined?
No useful answer exists, so SML (like many other languages) restricts recursion to (syntactic) functions. This way, it has a clear semantic explanation in terms of well-known fixpoint operators like Y, and can be given simple enough evaluation rules.
The same applies to fun, of course. More specifically,
fun f x y = e
is merely defined as syntactic sugar for
val rec f = fn x => fn y => e
So there has to be at least one parameter to fun to satisfy the syntactic requirement for val rec.
I will attempt to start to answer my own question.
For the case of the forced use of a wrapper fn due to syntactic restrictions (may be an issue to consider adressing with sML ?), I could find, not really a workaround, but an idiom which helps to make these cases less noisy.
I reused the start function from the sample (see question), and renamed it as n_equiv, for the reason given in comment. This would just require a few prior wording to explain what the η-equivalence is and also to tell about the syntactical restrictions which justify the definition and use of this function (which is always good for learning material anyway, and I'm planning to post some SML samples on a French forum).
(* A purely syntactical function, to try to make forced use of `fn` wrappers
* a bit more acceptable. The function is named `n_equiv`, which refers to
* the η-equivalence axiom. It explicitly tells the construction has no
* effect. The function syntactically swap the function expression and its
* argument, so that both occurrences of the arguments appears close
* to each other in text, which helps avoid disturbance.
*)
val n_equiv: 'a -> ('a -> 'b) -> 'b = fn a => fn f => f a
Use case from the sample in the question, now looks like this:
fun process a = n_equiv a (chain e1 (chain e2 (chain e3 (chain e4 process))))
…
fun process a = n_equiv a (e1 THEN e2 THEN e3 THEN e4 THEN process)
That's already better, as now one is clearly told the surrounding construct is neutral.
To answer another part of the question, this case at least is more easily handled with fun than with val rec, as with val rec, the n_equiv self‑documenting idiom cannot be applied. That's a point in favour of fun over val rec … = fn …
Update #1
A page which mentions the compared verbosity of fun vs that of val: TipsForWritingConciseSML (mlton.org). See “Clausal Function Definitions” around the middle of the page. For non‑self‑recursive function, val … fn is less verbose than fun, it may vary for self‑recursive functions.

Efficient functional algorithm for computing closure under an operator

I'm interested in efficient functional algorithms (preferably in Haskell, and even more preferably already implemented as part of a library!) for computing the closure of a container under a unary operator.
A basic and inefficient example of what I have in mind, for lists, is:
closure :: Ord a => (a -> a) -> [a] -> [a]
closure f xs = first_dup (iterate (\xs -> nub $ sort $ xs ++ map f xs) xs) where
first_dup (xs:ys:rest) = if xs == ys then xs else first_dup (ys:rest)
A more efficient implementation keeps tracks of the new elements generated at each stage (the "fringe") and doesn't apply the function to elements to which it has already been applied:
closure' :: Ord a => (a -> a) -> [a] -> [a]
closure' f xs = stable (iterate close (xs, [])) where
-- return list when it stabilizes, i.e., when fringe is empty
stable ((fringe,xs):iterates) = if null fringe then xs else stable iterates
-- one iteration of closure on (fringe, rest); key invariants:
-- (1) fringe and rest are disjoint; (2) (map f rest) subset (fringe ++ rest)
close (fringe, xs) = (fringe', xs') where
xs' = sort (fringe ++ xs)
fringe' = filter (`notElem` xs') (map f fringe)
As an example, if xs is a nonempty sublist of [0..19], then closure' (\x->(x+3)`mod`20) xs is [0..19], and the iteration stabilizes in 20 steps for [0], 13 steps for [0,1], and 4 steps for [0,4,8,12,16].
Even more efficiency could be gotten using a tree-based ordered-set implementation.
Has this been done already? What about the related but harder question of closure under binary (or higher-arity) operators?
How about something like this which uses the Hash Array Mapped Trie data structures in unordered-containers. For unordered-containers member and insert are O(min(n,W)) where W is the length of the hash.
module Closed where
import Data.HashSet (HashSet)
import Data.Hashable
import qualified Data.HashSet as Set
data Closed a = Closed { seen :: HashSet a, iter :: a -> a }
insert :: (Hashable a, Eq a) => a -> Closed a -> Closed a
insert a c#(Closed set iter)
| Set.member a set = c
| otherwise = insert (iter a) $ Closed (Set.insert a set) iter
empty :: (a -> a) -> Closed a
empty = Closed Set.empty
close :: (Hashable a, Eq a) => (a -> a) -> [a] -> Closed a
close iter = foldr insert (empty iter)
Here's a variation on the above that generates the solution set more lazily, in a breadth-first manner.
data Closed' a = Unchanging | Closed' (a -> a) (HashSet a) (Closed' a)
close' :: (Hashable a, Eq a) => (a -> a) -> [a] -> Closed' a
close' iter = build Set.empty where
inserter :: (Hashable a, Eq a) => a -> (HashSet a, [a]) -> (HashSet a, [a])
inserter a (set, fresh) | Set.member a set = (set, fresh)
| otherwise = (Set.insert a set, a:fresh)
build curr [] = Unchanging
build curr as =
Closed' iter curr $ step (foldr inserter (curr, []) as)
step (set, added) = build set (map iter added)
-- Only computes enough iterations of the closure to
-- determine whether a particular element has been generated yet
--
-- Returns both a boolean and a new 'Closed'' value which will
-- will be more precisely defined and thus be faster to query
member :: (Hashable a, Eq a) => a -> Closed' a -> (Bool, Closed' a)
member _ Unchanging = False
member a c#(Closed' _ set next) | Set.member a set = (True, c)
| otherwise = member a next
improve :: Closed' a -> Maybe ([a], Closed' a)
improve Unchanging = Nothing
improve (Closed' _ set next) = Just (Set.toList set, next)
seen' :: Closed' a -> HashSet a
seen' Unchanging = Set.empty
seen' (Closed' _ set Unchanging) = set
seen' (Closed' _ set next) = seen' next
And to check
>>> member 6 $ close (+1) [0]
...
>>> fst . member 6 $ close' (+1) [0]
True

Algorithm for type checking ML-like pattern matching?

How do you determine whether a given pattern is "good", specifically whether it is exhaustive and non-overlapping, for ML-style programming languages?
Suppose you have patterns like:
match lst with
x :: y :: [] -> ...
[] -> ...
or:
match lst with
x :: xs -> ...
x :: [] -> ...
[] -> ...
A good type checker would warn that the first is not exhaustive and the second is overlapping. How would the type checker make those kinds of decisions in general, for arbitrary data types?
Here's a sketch of an algorithm. It's also the basis of Lennart Augustsson's celebrated technique for compiling pattern matching efficiently. (The paper is in that incredible FPCA proceedings (LNCS 201) with oh so many hits.) The idea is to reconstruct an exhaustive, non-redundant analysis by repeatedly splitting the most general pattern into constructor cases.
In general, the problem is that your program has a possibly empty bunch of ‘actual’ patterns {p1, .., pn}, and you want to know if they cover a given ‘ideal’ pattern q. To kick off, take q to be a variable x. The invariant, initially satisfied and subsequently maintained, is that each pi is σiq for some substitution σi mapping variables to patterns.
How to proceed. If n=0, the bunch is empty, so you have a possible case q that isn't covered by a pattern. Complain that the ps are not exhaustive. If σ1 is an injective renaming of variables, then p1 catches every case that matches q, so we're warm: if n=1, we win; if n>1 then oops, there's no way p2 can ever be needed. Otherwise, we have that for some variable x, σ1x is a constructor pattern. In that case split the problem into multiple subproblems, one for each constructor cj of x's type. That is, split the original q into multiple ideal patterns qj = [x:=cj y1 .. yarity(cj)]q, and refine the patterns accordingly for each qj to maintain the invariant, dropping those that don't match.
Let's take the example with {[], x :: y :: zs} (using :: for cons). We start with
xs covering {[], x :: y :: zs}
and we have [xs := []] making the first pattern an instance of the ideal. So we split xs, getting
[] covering {[]}
x :: ys covering {x :: y :: zs}
The first of these is justified by the empty injective renaming, so is ok. The second takes [x := x, ys := y :: zs], so we're away again, splitting ys, getting.
x :: [] covering {}
x :: y :: zs covering {x :: y :: zs}
and we can see from the first subproblem that we're banjaxed.
The overlap case is more subtle and allows for variations, depending on whether you want to flag up any overlap, or just patterns which are completely redundant in a top-to-bottom priority order. Your basic rock'n'roll is the same. E.g., start with
xs covering {[], ys}
with [xs := []] justifying the first of those, so split. Note that we have to refine ys with constructor cases to maintain the invariant.
[] covering {[], []}
x :: xs covering {y :: ys}
Clearly, the first case is strictly an overlap. On the other hand, when we notice that refining an actual program pattern is needed to maintain the invariant, we can filter out those strict refinements that become redundant and check that at least one survives (as happens in the :: case here).
So, the algorithm builds a set of ideal exhaustive overlapping patterns q in a way that's motivated by the actual program patterns p. You split the ideal patterns into constructor cases whenever the actual patterns demand more detail of a particular variable. If you're lucky, each actual pattern is covered by disjoint nonempty sets of ideal patterns and each ideal pattern is covered by just one actual pattern. The tree of case splits which yield the ideal patterns gives you the efficient jump-table driven compilation of the actual patterns.
The algorithm I've presented is clearly terminating, but if there are datatypes with no constructors, it can fail to accept that the empty set of patterns is exhaustive. This is a serious issue in dependently typed languages, where exhaustiveness of conventional patterns is undecidable: the sensible approach is to allow "refutations" as well as equations. In Agda, you can write (), pronounced "my Aunt Fanny", in any place where no constructor refinement is possible, and that absolves you from the requirement to complete the equation with a return value. Every exhaustive set of patterns can be made recognizably exhaustive by adding in enough refutations.
Anyhow, that's the basic picture.
Here is some code from a non-expert. It shows what the problem looks like if you restrict your patterns to list constructors. In other words, the patterns can only be used with lists that contain lists. Here are some lists like that: [], [[]], [[];[]].
If you enable -rectypes in your OCaml interpreter, this set of lists has a single type: ('a list) as 'a.
type reclist = ('a list) as 'a
Here's a type for representing patterns that match against the reclist type:
type p = Nil | Any | Cons of p * p
To translate an OCaml pattern into this form, first rewrite using (::). Then replace []
with Nil, _ with Any, and (::) with Cons. So the pattern [] :: _ translates to
Cons (Nil, Any)
Here is a function that matches a pattern against a reclist:
let rec pmatch (p: p) (l: reclist) =
match p, l with
| Any, _ -> true
| Nil, [] -> true
| Cons (p', q'), h :: t -> pmatch p' h && pmatch q' t
| _ -> false
Here's how it looks in use. Note the use of -rectypes:
$ ocaml312 -rectypes
Objective Caml version 3.12.0
# #use "pat.ml";;
type p = Nil | Any | Cons of p * p
type reclist = 'a list as 'a
val pmatch : p -> reclist -> bool = <fun>
# pmatch (Cons(Any, Nil)) [];;
- : bool = false
# pmatch (Cons(Any, Nil)) [[]];;
- : bool = true
# pmatch (Cons(Any, Nil)) [[]; []];;
- : bool = false
# pmatch (Cons (Any, Nil)) [ [[]; []] ];;
- : bool = true
#
The pattern Cons (Any, Nil) should match any list of length 1, and it definitely seems to be working.
So then it seems fairly straightforward to write a function intersect that takes two patterns and returns a pattern that matches the intersection of what is matched by the two patterns. Since the patterns might not intersect at all, it returns None when there's no intersection and Some p otherwise.
let rec inter_exc pa pb =
match pa, pb with
| Nil, Nil -> Nil
| Cons (a, b), Cons (c, d) -> Cons (inter_exc a c, inter_exc b d)
| Any, b -> b
| a, Any -> a
| _ -> raise Not_found
let intersect pa pb =
try Some (inter_exc pa pb) with Not_found -> None
let intersectn ps =
(* Intersect a list of patterns.
*)
match ps with
| [] -> None
| head :: tail ->
List.fold_left
(fun a b -> match a with None -> None | Some x -> intersect x b)
(Some head) tail
As a simple test, intersect the pattern [_, []] against the pattern [[], _].
The former is the same as _ :: [] :: [], and so is Cons (Any, Cons (Nil, Nil)).
The latter is the same as [] :: _ :: [], and so is Cons (Nil, (Cons (Any, Nil)).
# intersect (Cons (Any, Cons (Nil, Nil))) (Cons (Nil, Cons (Any, Nil)));;
- : p option = Some (Cons (Nil, Cons (Nil, Nil)))
The result looks pretty right: [[], []].
It seems like this is enough to answer the question about overlapping patterns. Two patterns overlap if their intersection is not None.
For exhaustiveness you need to work with a list of patterns. Here is a function
exhaust that tests whether a given list of patterns is exhaustive:
let twoparts l =
(* All ways of partitioning l into two sets.
*)
List.fold_left
(fun accum x ->
let absent = List.map (fun (a, b) -> (a, x :: b)) accum
in
List.fold_left (fun accum (a, b) -> (x :: a, b) :: accum)
absent accum)
[([], [])] l
let unique l =
(* Eliminate duplicates from the list. Makes things
* faster.
*)
let rec u sl=
match sl with
| [] -> []
| [_] -> sl
| h1 :: ((h2 :: _) as tail) ->
if h1 = h2 then u tail else h1 :: u tail
in
u (List.sort compare l)
let mkpairs ps =
List.fold_right
(fun p a -> match p with Cons (x, y) -> (x, y) :: a | _ -> a) ps []
let rec submatches pairs =
(* For each matchable subset of fsts, return a list of the
* associated snds. A matchable subset has a non-empty
* intersection, and the intersection is not covered by the rest of
* the patterns. I.e., there is at least one thing that matches the
* intersection without matching any of the other patterns.
*)
let noncovint (prs, rest) =
let prs_firsts = List.map fst prs in
let rest_firsts = unique (List.map fst rest) in
match intersectn prs_firsts with
| None -> false
| Some i -> not (cover i rest_firsts)
in let pairparts = List.filter noncovint (twoparts pairs)
in
unique (List.map (fun (a, b) -> List.map snd a) pairparts)
and cover_pairs basepr pairs =
cover (fst basepr) (unique (List.map fst pairs)) &&
List.for_all (cover (snd basepr)) (submatches pairs)
and cover_cons basepr ps =
let pairs = mkpairs ps
in let revpair (a, b) = (b, a)
in
pairs <> [] &&
cover_pairs basepr pairs &&
cover_pairs (revpair basepr) (List.map revpair pairs)
and cover basep ps =
List.mem Any ps ||
match basep with
| Nil -> List.mem Nil ps
| Any -> List.mem Nil ps && cover_cons (Any, Any) ps
| Cons (a, b) -> cover_cons (a, b) ps
let exhaust ps =
cover Any ps
A pattern is like a tree with Cons in the internal nodes and Nil or Any at the leaves. The basic idea is that a set of patterns is exhaustive if you always reach Any in at least one of the patterns (no matter what the input looks like). And along the way, you need to see both Nil and Cons at each point. If you reach Nil at the same spot in all the patterns, it means there's a longer input that won't be matched by any of them. On the other hand, if you see just Cons at the same spot in all the patterns, there's an input that ends at that point that won't be matched.
The difficult part is checking for exhaustiveness of the two subpatterns of a Cons. This code works the way I do when I check by hand: it finds all the different subsets that could match at the left, then makes sure that the corresponding right subpatterns are exhaustive in each case. Then the same with left and right reversed. Since I'm a nonexpert (more obvious to me all the time), there are probably better ways to do this.
Here is a session with this function:
# exhaust [Nil];;
- : bool = false
# exhaust [Any];;
- : bool = true
# exhaust [Nil; Cons (Nil, Any); Cons (Any, Nil)];;
- : bool = false
# exhaust [Nil; Cons (Any, Any)];;
- : bool = true
# exhaust [Nil; Cons (Any, Nil); Cons (Any, (Cons (Any, Any)))];;
- : bool = true
I checked this code against 30,000 randomly generated patterns, and so I have some confidence that it's right. I hope these humble observations may prove to be of some use.
I believe the pattern sub-language is simple enough that it's easy to analyze. This is the reason for requiring patterns to be "linear" (each variable can appear only once), and so on. With these restrictions, every pattern is a projection from a kind of nested tuple space to a restricted set of tuples. I don't think it's too difficult to check for exhaustiveness and overlap in this model.

Ocaml Syntax Error

I'm using an implementation of lazy lists where the type can be either Nil or Cons (value, thunk), where thunk is a function from unit to the rest of the list.
I'm trying to write a function cross, which would function as List.combine does. Unfortunately, I'm having syntax errors.
open Sequence;;
let rec (cross : 'a Sequence.t -> 'b Sequence.t -> ('a * 'b) Sequence.t) = match seq1 with
Nil -> match seq2 with
Cons (value2, thunk2) -> Cons ((Nil, value2), function () -> (cross Nil (thunk2 ())))
| Cons (value1, thunk1) -> match seq2 with
Nil -> Cons ((value1, Nil), function() -> (cross Nil (thunk1 ())))
| Cons (value2, thunk2) -> Cons ((value1, value2), function() -> (cross (thunk1 ()) (thunk2 ())))
This produces the error:
Error: Unbound value seq1
what am I doing wrong?
UPDATE:
This type checks, but is not of the type I'm looking for.
let rec cross (seq1 : 'a Sequence.t) (seq2 : 'b Sequence.t) : ('a * 'b) Sequence.t = match seq1 with
Nil -> match seq2 with
Cons (value2, thunk2) -> Cons ((Nil, value2), function () -> (cross Nil (thunk2 ())))
| Cons (value1, thunk1) -> match seq2 with
Nil -> Cons ((value1, Nil), function() -> (cross Nil (thunk1 ())))
| Cons (value2, thunk2) -> Cons ((value1, value2), function() -> (cross (thunk1 ()) (thunk2 ())))
val cross :
'a Sequence.t Sequence.t ->
'a Sequence.t Sequence.t -> ('a Sequence.t * 'a Sequence.t) Sequence.t =
<fun>
This is not the type of cross that I want. I'm looking for:
'a Sequence.t -> 'b Sequence.t -> ('a * 'b) Sequence.t
you are going to kick yourself... where is seq1 defined?
let rec (cross : 'a Sequence.t -> 'b Sequence.t -> ('a * 'b) Sequence.t) =
You define the type of cross, but you don't bind the variables to anything (I guess, you can say that).
let rec cross (seq1:'a Sequence.t) (seq2:'a Sequence.t) :('a * 'b) Sequence.t =
EDIT:
I think your matching is well, mis-matched. Use begin ... end blocks around the cases, I think what is happening (and since I don't have Sequence, I cannot verify) is that the match cases you intend for the outer match are being applied to the inner one, matching seq2. for example,
match x with
| 0 -> match y with
| 1 -> "x:0, y:1"
| 2 -> match y with
| 0 -> "y:0, x:2"
Although, spatially, it looks fine, the second match, match y with is bound with the | 2 -> ... match case. Here is a version with the being ... end keywords surrounding the match cases. The second begin ... end isn't needed, but it's probably a good idea to do it anyway for clarity.
match x with
| 0 -> begin match y with
| 1 -> "x:0, y:1" end
| 2 -> begin match y with
| 0 -> "y:0, x:2" end
in the first line, you're trying to match against seq1, but this value is unbound meaning that it's nowhere to be found.
That's exactly the same as :
# let t =
match t1 with
_ -> ();;
Error: Unbound value t1
You have to name your arguments.
For your updated question, the reason the types are 'a Sequence.t Sequence.t is because of the line
Cons ((Nil, value2), ...)
Recall that Nil is a sequence itself, so by putting that there it forces all the elements of the input sequences to be sequences too.
This error occurs when you have mentioned a name which has not been defined (technically “bound to a
value”). This might happen if you have mistyped the name.

Resources