Very simple sexp parser - ruby

For an assignment, we had to implement something like a very basic sexp parser, such that for input like:
"((a b) ((c d) e) f)"
It would return:
[["a", "b"], [["c", "d"], "e"], "f"]
Since this was part of a larger assignment, the parser is only given valid input (matching parens &c). I came up with the following solution in Ruby:
def parse s, start, stop
tokens = s.scan(/#{Regexp.escape(start)}|#{Regexp.escape(stop)}|\w+/)
stack = [[]]
tokens.each do |tok|
case tok
when start
stack << []
when stop
stack[-2] << stack.pop
else
stack[-1] << tok
end
end
return stack[-1][-1]
end
Which may not be the best solution, but it does the job.
Now, I'm interested in an idiomatic Haskell solution for the core functionality (i.e. I don't care about the lexing or choice of delimiters, taking already lexed input would be fine), if possible using only "core" haskell, without extensions or libs like parsec.
Note that this is NOT part of the assignment, I'm just interested in the Haskell way of doing things.

[["a", "b"], [["c", "d"], "e"], "f"]
Does not have a valid type in haskell (because all elements of a list need to be of the same type in haskell), so you'll need to define your own datastructure for nested lists like this:
data NestedList = Value String | Nesting [NestedList]
Now if you have a list of Tokens where Token is defined as data Token = LPar | RPar | Symbol String, you can parse that into a NestedList like this:
parse = fst . parse'
parse' (LPar : tokens) =
let (inner, rest) = parse' tokens
(next, outer) = parse' rest
in
(Nesting inner : next, outer)
parse' (RPar : tokens) = ([], tokens)
parse' ((Symbol str) : tokens) =
let (next, outer) = parse' tokens in
(Value str : next, outer)
parse' [] = ([],[])

The idiomatic way in Haskell would be to use parsec, for combinator parsing.
There are lots of examples online, including,
This nice answer on SO.
Or here's another one.

While fancier parsers like Parsec are nice, you don't really need all that power
for this simple case. The classic way to parse is using the ReadS
type from the Prelude. That is also the way you would give your Sexp type a
Read instance.
It's good to be at least a little familiar with this style of
parsing, because there are quite a few examples of it in
the standard libraries.
Here's one simple solution, in the classic style:
import Data.Char (isSpace)
data Sexp = Atom String | List [Sexp]
deriving (Eq, Ord)
instance Show Sexp where
show (Atom a ) = a
show (List es) = '(' : unwords (map show es) ++ ")"
instance Read Sexp where
readsPrec n (c:cs) | isSpace c = readsPrec n cs
readsPrec n ('(':cs) = [(List es, cs') |
(es, cs') <- readMany n cs]
readsPrec _ (')':_) = error "Sexp: unmatched parens"
readsPrec _ cs = let (a, cs') = span isAtomChar cs
in [(Atom a, cs')]
readMany :: Int -> ReadS [Sexp]
readMany _ (')':cs) = [([], cs)]
readMany n cs = [(e : es, cs'') | (e, cs') <- readsPrec n cs,
(es, cs'') <- readMany n cs']
isAtomChar :: Char -> Bool
isAtomChar '(' = False
isAtomChar ')' = False
isAtomChar c = not $ isSpace c
Note that the Int parameter to readsPrec,
which usually indicates operator precedence, is not
used here.

Related

Find first unique character in a string in Elixir [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
Given a string that contains only lowercase English letters, I am writing an Elixir function that finds the first non-repeating character in it and returns its index or else -1.
Examples:
s = "leetcode"
should return 0 because "l" is the first character that does not repeat and the zero-based index is 0
s = "loveleetcode"
should return 2 because v is the first character that does not repeat and the zero-based index is 2
The following is my solution so far, can you make it better or fix it?
defmodule Algos do
def first_unique_char_index(str) do
arr = String.split(str, "", trim: true)
indexes = Enum.with_index(arr)
first = Enum.frequencies(arr)
|> Map.to_list
|> Enum.sort(fn ({a,_b}, {c,_d}) ->
{_char1, i1} = Enum.find(indexes, (fn {x,_i} -> x == a end))
{_char2, i2} = Enum.find(indexes, (fn {y,_j} -> y == c end))
i1 <= i2
end)
|> Enum.find(fn {_char, num} -> num == 1 end)
case first do
{char, _num} ->
result = Enum.find(indexes, fn {x, _i} -> char == x end)
{_letter, index} = result
index
nil ->
-1
end
end
end
Algos.first_unique_char_index("aabcc") # returns 2
Algos.first_unique_char_index("picadillo") # returns 0
Algos.first_unique_char_index("dood") # returns -1
As a sindenote, the problem is from the "first unique character in a string" LeetCode puzzle.
The below is probably the most performant solution; I decided to put it here because it reveals several interesting tricks.
"leetcode"
|> to_charlist()
|> Enum.with_index() # we need index to compare by
|> Enum.reduce(%{}, fn {e, i}, acc ->
# trick for the future: `:many > idx` for any integer `idx` :)
Map.update(acc, e, {e, i}, &{elem(&1, 0), :many})
end)
|> Enum.sort_by(&elem(elem(&1, 1), 1)) # sort to get a head
|> case do
[{_, {_, :many}} | _] -> "All dups"
[{_, {result, index}} | _] -> {<<result>>, index}
_ -> "Empty input"
end
#⇒ {"l", 0}
This is a good little puzzle, and one that could be solved via a couple accumulators. Instead of splitting the string, you could work with the internal binary representation, or (in order to skip the extra complexity involved with encoding) you could convert the string to a character list and focus on the integer components.
Here's a possible solution (not thoroughly tested):
defmodule FirstUniq do
def char(string) do
[first_char | rest] = to_charlist(string)
eval_char(first_char, 0, rest, rest)
end
# Case where we hit the end of the string without a duplicate!
defp eval_char(_char, index, [], _), do: index
# Case where a character repeats... increment the index and eval next char
defp eval_char(char, index, [x | _], [next_char | rest]) when char == x do
eval_char(next_char, index + 1, rest, rest)
end
# Case where the character does not repeat: keep looking
defp eval_char(char, index, [x | rest], acc2) when char != x do
eval_char(char, index, rest, acc2)
end
end
# should be 0 (because "l" does not occur more than once)
IO.puts(FirstUniq.char("leetcode"))
# should be 2 (because "v" is the first char that does not repeat)
IO.puts(FirstUniq.char("loveleetcode"))
The hard work is done by the eval_char/4 function, whose multiple clauses act something like a case statement. The trick is we have to keep two accumulators, which is analogous to having a nested loop.
I would recommend Exercism's Elixir Track for presenting many of the common patterns that you'll encounter in the language.

debugging a Haskell application

After learning few basics I wanted to try a "real world application" in Haskell, started with a Bittorrent client. Following through the explanation from this blog post, I did NOT use the Attoparsec parser combinator library. Instead following through Huttons book, I started writing the Parser Combinators. This is the code that I have so far (Still at the parsing stage, a long journey ahead):
module Main where
import System.Environment (getArgs)
import qualified Data.Map as Map
import Control.Monad (liftM, ap)
import Data.Char (isDigit, isAlpha, isAlphaNum, ord)
import Data.List(foldl')
main :: IO ()
main = do
[fileName] <- getArgs
contents <- readFile fileName
download . parse $ contents
parse :: String -> Maybe BenValue
parse s = case runParser value s of
[] -> Nothing
[(p, _)] -> Just p
download :: Maybe BenValue -> IO ()
download (Just p) = print p
download _ = print "Oh!! Man!!"
data BenValue = BenString String
| BenNumber Integer
| BenList [BenValue]
| BenDict (Map.Map String BenValue)
deriving(Show, Eq)
-- From Hutton, this follows: a Parser is a function
-- that takes a string and returns a list of results
-- each containing a pair : a result of type a and
-- an output string. (the string is the unconsumed part of the input).
newtype Parser a = Parser (String -> [(a, String)])
-- Unit takes a value and returns a Parser (a function)
unit :: a -> Parser a
unit v = Parser (\inp -> [(v, inp)])
failure :: Parser a
failure = Parser (\inp -> [])
one :: Parser Char
one = Parser $ \inp -> case inp of
[] -> []
(x: xs) -> [(x, xs)]
runParser :: Parser a -> String -> [(a, String)]
runParser (Parser p) inp = p inp
bind :: Parser a -> (a -> Parser b) -> Parser b
bind (Parser p) f = Parser $ \inp -> case p inp of
[] -> []
[(v, out)] -> runParser (f v) out
instance Monad Parser where
return = unit
p >>= f = bind p f
instance Applicative Parser where
pure = unit
(<*>) = ap
instance Functor Parser where
fmap = liftM
choice :: Parser a -> Parser a -> Parser a
choice p q = Parser $ \inp -> case runParser p inp of
[] -> runParser q inp
x -> x
satisfies :: (Char -> Bool) -> Parser Char
satisfies p = do
x <- one
if p x
then unit x
else failure
digit :: Parser Char
digit = satisfies isDigit
letter :: Parser Char
letter = satisfies isAlpha
alphanum :: Parser Char
alphanum = satisfies isAlphaNum
char :: Char -> Parser Char
char x = satisfies (== x)
many :: Parser a -> Parser [a]
many p = choice (many1 p) (unit [])
many1 :: Parser a -> Parser [a]
many1 p = do
v <- p
vs <- many p
unit (v:vs)
peek :: Parser Char
peek = Parser $ \inp -> case inp of
[] -> []
v#(x:xs) -> [(x, v)]
taken :: Int -> Parser [Char]
taken n = do
if n > 0
then do
v <- one
vs <- taken (n-1)
unit (v:vs)
else unit []
takeWhile1 :: (Char -> Bool) -> Parser [Char]
takeWhile1 pred = do
v <- peek
if pred v
then do
one
vs <- takeWhile1 pred
unit (v:vs)
else unit []
decimal :: Integral a => Parser a
decimal = foldl' step 0 `fmap` takeWhile1 isDigit
where step a c = a * 10 + fromIntegral (ord c - 48)
string :: Parser BenValue
string = do
n <- decimal
char ':'
BenString <$> taken n
signed :: Num a => Parser a -> Parser a
signed p = (negate <$> (char '-' *> p) )
`choice` (char '+' *> p)
`choice` p
number :: Parser BenValue
number = BenNumber <$> (char 'i' *> (signed decimal) <* char 'e')
list :: Parser BenValue
list = BenList <$> (char 'l' *> (many value) <* char 'e')
dict :: Parser BenValue
dict = do
char 'd'
pair <- many ((,) <$> string <*> value)
char 'e'
let pair' = (\(BenString s, v) -> (s,v)) <$> pair
let map' = Map.fromList pair'
unit $ BenDict map'
value = string `choice` number `choice` list `choice` dict
The above is a mix of code read/understood from the source code of the three sources the blog, the library, and the book. the download function just prints the "parse tree", obtained from the parser, Once I get the parser working will fill in the download function and test it out.
The parser is NOT working on few of the torrent files. :( There is definitely chance that I might have used code from the references incorrectly. And would like to know if there is anything obvious.
It works on "toy" examples and also on the test file picked from combinatorrent
When I pick a real world torrent like the Debian/Ubuntu etc, this fails.
I would like to debug and see what is happening, debugging with GHCI does not seem straight forward, I've tried :trace / :history style debugging mentioned in this document, but looks very primitive :-) .
My question to the experts out there is: "how to debug!!" :-)
Would really appreciate any hints on approaching how to debug this.
Thanks.
Because Haskell code is pure, "stepping" through it is less essential than in other languages. When I step through some Java code, I am often trying to see where a certain variable gets changed. That is obviously a non-issue in Haskell given that things are immutable.
That means we can also run snippets of code in GHCi to debug what is happening without worrying that what we run is going to change some global state, or what we run will work any differently than how it would if called deep inside our program. This mode of work benefits from iterating your design slowly building it to work on the full range of expected inputs.
Parsing is always a bit unpleasant - even in imperative languages. Nobody wants to run a parser just to get back Nothing - you want to know why you got back nothing. To that effect, most parsers libraries are helpful in giving you some information about what went wrong. That is a point for using a parser like attoparsec. Also, attoparsec works with ByteString by default - perfect for binary data. If you want to roll your own parser implementation, you'll have to debug it too.
Finally, based on your comments, it looks like you are having issues with character encodings. This is exactly the reason why we have ByteString - it represents a packed sequence of bytes - no encodings. The extension OverloadedStrings even makes it pretty easy to make ByteString literals that look just like regular strings.

Sort a String list by String length

I want to sort a list of String first by the length of the strings, and if the length is the same then it should sort lexically. I thought I could use the Data.List library and write my own compare function that does that. So the compare function should take a list of String as the argument and compare all the the elements (which are Strings). A compare function for Strings would look like this
comp a b
| length a > length b = GT
| length a < length b = LT
How could I address all the list elements with such a function?
First of all, your cmp function does not handle the case where the lengths are equal: you need to add that. Otherwise you'll get an runtime pattern match error:
comp a b
| length a > length b = GT
| length a < length b = LT
| otherwise = undefined -- TODO
also, note that this implementation sometimes computes the length twice, but it's likely that GHC optimizes this one away on its own, and we'll get to solving this later on more fundamentally anyway.
Then, once you've fixed your comp, all you need to do is pass it to Data.List.sortBy together with the list of strings you want to sort. An ipmplementation like that is provided below (<$> is the operator alias of fmap which works the same as map does on lists).
However, there's a better solution where you first compute the length of all elements in the list, by mapping each of the elements into a pair where the first member is the original string and the second one is its length. You then use a modified comp function that takes 2 pairs instead of just 2 strings, but otherwise behaves the same as your original comp. However, you then need to map the intermediate list back to just containing the strings (which is what the fst <$> is for, which is equivalent to map fst but, again, uses the, IMO nicer looking, <$> opetator).
So the somewhat naive solution would be:
sortByLenOrLex :: [String] -> [String]
sortByLenOrLex as = sortBy cmp as where
cmp a b | n > m = GT
| n < m = LT
| otherwise = compare a b
where n = length a
m = length b
and the more efficient one, as leftaroundabout points out, would be:
sortByLenOrLex' :: [String] -> [String]
sortByLenOrLex' as = fst <$> sortBy cmp (addLen <$> as) where
cmp (a,n) (b,m) | n > m = GT
| n < m = LT
| otherwise = compare a b
addLen x = (x, length x)
where the list is first amended with the lengths of each of its elements, so as to avoid duplicate, expensive length calls.
EDIT: please see chi's answer for a much nicer implementation of this algorithm!
Furthermore:
You can make your functions generic by making them operate on lists of lists of Ord:
sortByLenOrLex'' :: Ord a => [[a]] -> [[a]]
sortByLenOrLex'' as = fst <$> sortBy cmp (addLen <$> as) where
cmp (a,n) (b,m) | n > m = GT
| n < m = LT
| otherwise = compare a b
addLen x = (x, length x)
this gives you:
*Main> sortByLenOrLex'' [[1,2], [1,3], [1,2,3]]
[[1,2],[1,3],[1,2,3]]
...and if you want to make it as generic as possible, you can sort lists of Foldable of Ord:
sortByLenOrLex''' :: (Foldable f, Ord a) => [f a] -> [f a]
sortByLenOrLex''' as = unamend <$> sortBy cmp (amend <$> as) where
cmp (a,n,a') (b,m,b') | n > m = GT
| n < m = LT
| otherwise = compare a' b'
amend x = (x, length x, toList x)
unamend (x,_,_) = x
this gives you:
*Main> sortByLenOrLex''' [Just 3, Just 4, Just 3, Nothing]
[Nothing,Just 3,Just 3,Just 4]
*Main> sortByLenOrLex''' [(4,1),(1,1),(1,2),(1,1),(3,1)]
[(4,1),(1,1),(1,1),(3,1),(1,2)]
*Main> sortByLenOrLex''' [Left "bla", Right "foo", Right "foo", Right "baz"]
[Left "bla",Right "baz",Right "foo",Right "foo"]
*Main> sortByLenOrLex''' [(3,"hello"),(2,"goodbye"),(1,"hello")]
[(2,"goodbye"),(3,"hello"),(1,"hello")]
A variant of #Erik's solution, using some combinators from the library:
import Data.List
import Control.Arrow
sortByLen = map snd . sort . map (length &&& id)
This is essentially a Schwartzian transform.

Is it possible to debug pattern matching in a Haskell function?

I have defined a type
data Expr =
Const Double
| Add Expr Expr
| Sub Expr Expr
and declared it as an instance of Eq typeclass:
instance Eq Expr where
(Add (Const a1) (Const a2)) == Const b = a1+a2 == b
(Add (Const a1) (Const a2)) == (Add (Const b1) (Const b2)) = a1+a2 == b1 + b2
Of course, the evaluation of the expression Sub (Const 1) (Const 1) == Const 0 will fail. How can I debug at runtime the pattern matching process to spot that it's failing? I would like to see how Haskell takes the arguments of == and walks through the patterns. Is it possible at all?
edit: providing a real answer to the question...
I find the easiest way to see what patterns are matching is to add trace statements, like so:
import Debug.Trace
instance Eq Expr where
(Add (Const a1) (Const a2)) == Const b = trace "Expr Eq pat 1" $ a1+a2 == b
(Add (Const a1) (Const a2)) == (Add (Const b1) (Const b2)) = trace "Expr Eq pat 2" $ a1+a2 == b1 + b2
-- catch any unmatched patterns
l == r = error $ "Expr Eq failed pattern match. \n l: " ++ show l ++ "\n r: " ++ show r
If you don't include a final statement to catch any otherwise unmatched patterns, you'll get a runtime exception, but I find it's more useful to see what data you're getting. Then it's usually simple to see why it doesn't match the previous patterns.
Of course you don't want to leave this in production code. I only insert traces as necessary then remove them when I've finished. You could also use CPP to leave them out of production builds.
I also want to say that I think pattern matching is the wrong way to go about this. You'll end up with a combinatorial explosion in the number of patterns, which quickly grows unmanageable. If you want to make a Float instance for Expr for example, you'll need several more primitive constructors.
Instead, you presumably have an interpreter function interpret :: Expr -> Double, or at least could write one. Then you can define
instance Eq Expr where
l == r = interpret l == interpret r
By pattern matching, you're essentially re-writing your interpret function in the Eq instance. If you want to make an Ord instance, you'll end up re-writing the interpret function yet again.
If you wish to get some examples on how the matching may fail, you could have a look at QuickCheck. There's an example on the manual (the size of test data) about generating and testing recursive data types that seems to perfectly suit your needs.
While the -Wall flag gives you a list of patterns non matched, a run of QuickCheck gives you examples of input data that lead your given proposition to failure.
For example, if I write a generator for your Expr and I give in input to quickCheck a proposition prop_expr_eq :: Expr -> Bool that checks if an Expr is equal to itself, I obtain very quickly Const 0.0 as a first example of non-matching input.
import Test.QuickCheck
import Control.Monad
data Expr =
Const Double
| Add Expr Expr
| Sub Expr Expr
deriving (Show)
instance Eq Expr where
(Add (Const a1) (Const a2)) == Const b = a1+a2 == b
(Add (Const a1) (Const a2)) == (Add (Const b1) (Const b2)) = a1+a2 == b1 + b2
instance Arbitrary Expr where
arbitrary = sized expr'
where
expr' 0 = liftM Const arbitrary
expr' n | n > 0 =
let subexpr = expr' (n `div` 2)
in oneof [liftM Const arbitrary,
liftM2 Add subexpr subexpr,
liftM2 Sub subexpr subexpr]
prop_expr_eq :: Expr -> Bool
prop_expr_eq e = e == e
As you see, running the test gives you a counterexample to prove that your equality test is wrong. I know this may be a little bit an overkill, but the advantage if you write things good is that you also get unit tests for your code that look at arbitrary properties, not only pattern matching exhaustiveness.
*Main> quickCheck prop_expr_eq
*** Failed! Exception: 'test.hs:(11,5)-(12,81): Non-exhaustive patterns in function ==' (after 1 test):
Const 0.0
PS: Another good reading about unit testing with QuickCheck is in the free book real world haskell.
You can break your complex pattern into simpler patterns and use trace to see what's going on. Something like this:
instance Eq Expr where
x1 == x2 | trace ("Top level: " ++ show (x, y1)) True,
Add x11 x12 <- x1,
trace ("First argument Add: " ++ show (x11, x12)) True,
Const a1 <- x11,
trace ("Matched first Const: " ++ show a1) True,
Const a2 <- x12,
trace ("Matched second Const: " ++ show a2) True,
Const b <- x2
trace ("Second argument Const: " ++ show b) True
= a1+a2 == b
It's a bit desperate, but desperate times calls for desperate measures. :)
As you get used to Haskell you rarely, if ever, need to do this.

Algorithm for type checking ML-like pattern matching?

How do you determine whether a given pattern is "good", specifically whether it is exhaustive and non-overlapping, for ML-style programming languages?
Suppose you have patterns like:
match lst with
x :: y :: [] -> ...
[] -> ...
or:
match lst with
x :: xs -> ...
x :: [] -> ...
[] -> ...
A good type checker would warn that the first is not exhaustive and the second is overlapping. How would the type checker make those kinds of decisions in general, for arbitrary data types?
Here's a sketch of an algorithm. It's also the basis of Lennart Augustsson's celebrated technique for compiling pattern matching efficiently. (The paper is in that incredible FPCA proceedings (LNCS 201) with oh so many hits.) The idea is to reconstruct an exhaustive, non-redundant analysis by repeatedly splitting the most general pattern into constructor cases.
In general, the problem is that your program has a possibly empty bunch of ‘actual’ patterns {p1, .., pn}, and you want to know if they cover a given ‘ideal’ pattern q. To kick off, take q to be a variable x. The invariant, initially satisfied and subsequently maintained, is that each pi is σiq for some substitution σi mapping variables to patterns.
How to proceed. If n=0, the bunch is empty, so you have a possible case q that isn't covered by a pattern. Complain that the ps are not exhaustive. If σ1 is an injective renaming of variables, then p1 catches every case that matches q, so we're warm: if n=1, we win; if n>1 then oops, there's no way p2 can ever be needed. Otherwise, we have that for some variable x, σ1x is a constructor pattern. In that case split the problem into multiple subproblems, one for each constructor cj of x's type. That is, split the original q into multiple ideal patterns qj = [x:=cj y1 .. yarity(cj)]q, and refine the patterns accordingly for each qj to maintain the invariant, dropping those that don't match.
Let's take the example with {[], x :: y :: zs} (using :: for cons). We start with
xs covering {[], x :: y :: zs}
and we have [xs := []] making the first pattern an instance of the ideal. So we split xs, getting
[] covering {[]}
x :: ys covering {x :: y :: zs}
The first of these is justified by the empty injective renaming, so is ok. The second takes [x := x, ys := y :: zs], so we're away again, splitting ys, getting.
x :: [] covering {}
x :: y :: zs covering {x :: y :: zs}
and we can see from the first subproblem that we're banjaxed.
The overlap case is more subtle and allows for variations, depending on whether you want to flag up any overlap, or just patterns which are completely redundant in a top-to-bottom priority order. Your basic rock'n'roll is the same. E.g., start with
xs covering {[], ys}
with [xs := []] justifying the first of those, so split. Note that we have to refine ys with constructor cases to maintain the invariant.
[] covering {[], []}
x :: xs covering {y :: ys}
Clearly, the first case is strictly an overlap. On the other hand, when we notice that refining an actual program pattern is needed to maintain the invariant, we can filter out those strict refinements that become redundant and check that at least one survives (as happens in the :: case here).
So, the algorithm builds a set of ideal exhaustive overlapping patterns q in a way that's motivated by the actual program patterns p. You split the ideal patterns into constructor cases whenever the actual patterns demand more detail of a particular variable. If you're lucky, each actual pattern is covered by disjoint nonempty sets of ideal patterns and each ideal pattern is covered by just one actual pattern. The tree of case splits which yield the ideal patterns gives you the efficient jump-table driven compilation of the actual patterns.
The algorithm I've presented is clearly terminating, but if there are datatypes with no constructors, it can fail to accept that the empty set of patterns is exhaustive. This is a serious issue in dependently typed languages, where exhaustiveness of conventional patterns is undecidable: the sensible approach is to allow "refutations" as well as equations. In Agda, you can write (), pronounced "my Aunt Fanny", in any place where no constructor refinement is possible, and that absolves you from the requirement to complete the equation with a return value. Every exhaustive set of patterns can be made recognizably exhaustive by adding in enough refutations.
Anyhow, that's the basic picture.
Here is some code from a non-expert. It shows what the problem looks like if you restrict your patterns to list constructors. In other words, the patterns can only be used with lists that contain lists. Here are some lists like that: [], [[]], [[];[]].
If you enable -rectypes in your OCaml interpreter, this set of lists has a single type: ('a list) as 'a.
type reclist = ('a list) as 'a
Here's a type for representing patterns that match against the reclist type:
type p = Nil | Any | Cons of p * p
To translate an OCaml pattern into this form, first rewrite using (::). Then replace []
with Nil, _ with Any, and (::) with Cons. So the pattern [] :: _ translates to
Cons (Nil, Any)
Here is a function that matches a pattern against a reclist:
let rec pmatch (p: p) (l: reclist) =
match p, l with
| Any, _ -> true
| Nil, [] -> true
| Cons (p', q'), h :: t -> pmatch p' h && pmatch q' t
| _ -> false
Here's how it looks in use. Note the use of -rectypes:
$ ocaml312 -rectypes
Objective Caml version 3.12.0
# #use "pat.ml";;
type p = Nil | Any | Cons of p * p
type reclist = 'a list as 'a
val pmatch : p -> reclist -> bool = <fun>
# pmatch (Cons(Any, Nil)) [];;
- : bool = false
# pmatch (Cons(Any, Nil)) [[]];;
- : bool = true
# pmatch (Cons(Any, Nil)) [[]; []];;
- : bool = false
# pmatch (Cons (Any, Nil)) [ [[]; []] ];;
- : bool = true
#
The pattern Cons (Any, Nil) should match any list of length 1, and it definitely seems to be working.
So then it seems fairly straightforward to write a function intersect that takes two patterns and returns a pattern that matches the intersection of what is matched by the two patterns. Since the patterns might not intersect at all, it returns None when there's no intersection and Some p otherwise.
let rec inter_exc pa pb =
match pa, pb with
| Nil, Nil -> Nil
| Cons (a, b), Cons (c, d) -> Cons (inter_exc a c, inter_exc b d)
| Any, b -> b
| a, Any -> a
| _ -> raise Not_found
let intersect pa pb =
try Some (inter_exc pa pb) with Not_found -> None
let intersectn ps =
(* Intersect a list of patterns.
*)
match ps with
| [] -> None
| head :: tail ->
List.fold_left
(fun a b -> match a with None -> None | Some x -> intersect x b)
(Some head) tail
As a simple test, intersect the pattern [_, []] against the pattern [[], _].
The former is the same as _ :: [] :: [], and so is Cons (Any, Cons (Nil, Nil)).
The latter is the same as [] :: _ :: [], and so is Cons (Nil, (Cons (Any, Nil)).
# intersect (Cons (Any, Cons (Nil, Nil))) (Cons (Nil, Cons (Any, Nil)));;
- : p option = Some (Cons (Nil, Cons (Nil, Nil)))
The result looks pretty right: [[], []].
It seems like this is enough to answer the question about overlapping patterns. Two patterns overlap if their intersection is not None.
For exhaustiveness you need to work with a list of patterns. Here is a function
exhaust that tests whether a given list of patterns is exhaustive:
let twoparts l =
(* All ways of partitioning l into two sets.
*)
List.fold_left
(fun accum x ->
let absent = List.map (fun (a, b) -> (a, x :: b)) accum
in
List.fold_left (fun accum (a, b) -> (x :: a, b) :: accum)
absent accum)
[([], [])] l
let unique l =
(* Eliminate duplicates from the list. Makes things
* faster.
*)
let rec u sl=
match sl with
| [] -> []
| [_] -> sl
| h1 :: ((h2 :: _) as tail) ->
if h1 = h2 then u tail else h1 :: u tail
in
u (List.sort compare l)
let mkpairs ps =
List.fold_right
(fun p a -> match p with Cons (x, y) -> (x, y) :: a | _ -> a) ps []
let rec submatches pairs =
(* For each matchable subset of fsts, return a list of the
* associated snds. A matchable subset has a non-empty
* intersection, and the intersection is not covered by the rest of
* the patterns. I.e., there is at least one thing that matches the
* intersection without matching any of the other patterns.
*)
let noncovint (prs, rest) =
let prs_firsts = List.map fst prs in
let rest_firsts = unique (List.map fst rest) in
match intersectn prs_firsts with
| None -> false
| Some i -> not (cover i rest_firsts)
in let pairparts = List.filter noncovint (twoparts pairs)
in
unique (List.map (fun (a, b) -> List.map snd a) pairparts)
and cover_pairs basepr pairs =
cover (fst basepr) (unique (List.map fst pairs)) &&
List.for_all (cover (snd basepr)) (submatches pairs)
and cover_cons basepr ps =
let pairs = mkpairs ps
in let revpair (a, b) = (b, a)
in
pairs <> [] &&
cover_pairs basepr pairs &&
cover_pairs (revpair basepr) (List.map revpair pairs)
and cover basep ps =
List.mem Any ps ||
match basep with
| Nil -> List.mem Nil ps
| Any -> List.mem Nil ps && cover_cons (Any, Any) ps
| Cons (a, b) -> cover_cons (a, b) ps
let exhaust ps =
cover Any ps
A pattern is like a tree with Cons in the internal nodes and Nil or Any at the leaves. The basic idea is that a set of patterns is exhaustive if you always reach Any in at least one of the patterns (no matter what the input looks like). And along the way, you need to see both Nil and Cons at each point. If you reach Nil at the same spot in all the patterns, it means there's a longer input that won't be matched by any of them. On the other hand, if you see just Cons at the same spot in all the patterns, there's an input that ends at that point that won't be matched.
The difficult part is checking for exhaustiveness of the two subpatterns of a Cons. This code works the way I do when I check by hand: it finds all the different subsets that could match at the left, then makes sure that the corresponding right subpatterns are exhaustive in each case. Then the same with left and right reversed. Since I'm a nonexpert (more obvious to me all the time), there are probably better ways to do this.
Here is a session with this function:
# exhaust [Nil];;
- : bool = false
# exhaust [Any];;
- : bool = true
# exhaust [Nil; Cons (Nil, Any); Cons (Any, Nil)];;
- : bool = false
# exhaust [Nil; Cons (Any, Any)];;
- : bool = true
# exhaust [Nil; Cons (Any, Nil); Cons (Any, (Cons (Any, Any)))];;
- : bool = true
I checked this code against 30,000 randomly generated patterns, and so I have some confidence that it's right. I hope these humble observations may prove to be of some use.
I believe the pattern sub-language is simple enough that it's easy to analyze. This is the reason for requiring patterns to be "linear" (each variable can appear only once), and so on. With these restrictions, every pattern is a projection from a kind of nested tuple space to a restricted set of tuples. I don't think it's too difficult to check for exhaustiveness and overlap in this model.

Resources