Quicksort proof using Coq - sorting

I am writing a thesis on program verification of the quicksort algorithm using the Coq system. I have defined a quicksort in Coq but my supervisor and myself arn't very comfortable writing the actual proof using tactics. Is there anyone that can help with that section of the coq proof? The following is what we have come up with so far:
Inductive nat : Type :=
| O : nat
| S : nat -> nat.
Check (S (S (S (S O)))).
Definition isZero (n:nat) : bool :=
match n with
O => true
|S p => false
end.
Inductive List: Set :=
| nil: List
| cons: nat -> List -> List.
Fixpoint Concat (L R: List) : List :=
match L with
| nil => R
| cons l ls => cons l (Concat ls R)
end.
Fixpoint Less (n m:nat) :=
match m with
O => false
|S q => match n with
O => true
|S p => Less p q
end
end.
Fixpoint Lesseq (n m:nat) :=
match n with
O => true
|S p => match m with
O => false
|S q => Lesseq p q
end
end.
Fixpoint Greatereq (n m:nat) :=
match n with
O => true
|S p => match m with
O => true
|S q => Greatereq p q
end
end.
Fixpoint Allless (l:List) (n:nat) : List :=
match l with
nil => nil
|cons m ls => match Less n m with
false => Allless ls n
|true => cons m (Allless ls n)
end
end.
Fixpoint Allgeq (l:List) (n:nat) : List :=
match l with
nil => nil
|cons m ls => match Greatereq n m with
false => Allgeq ls n
|true => cons m (Allgeq ls n)
end
end.
Fixpoint qaux (n:nat) (l:List) : List :=
match n with
O => nil
|S p => match l with
nil => nil
|cons m ls => let low := Allless ls m in
(let high := Allgeq ls m in
Concat (qaux p low) (cons m (qaux p high)))
end
end.
Fixpoint length (l:List) : nat :=
match l with
nil => O
|cons m ls => S (length ls)
end.
Fixpoint Quicksort (l:List) : List := qaux (length l) l.
I know for a proof to work we need a lemma or a theorem but then I am not sure where to start after that. Thanks for the help :)

There are many nice theorems you can prove about your code.
Define a function pos that maps a number and a list to the index of the number in the list.
Th 1: For all lists S, and a, b in S, (a <= b) <-> (pos a (sort S)) <= (pos b (sort S)). This would be a correctness theorem for the sort function.
Th 2: (sort (sort S)) = sort S
Define the functions min and max to find the minimum and maximum elements of the list S.
Th 3: The pos of the minimal element of the sorted list is 0.
Th 4: The pos of the maximal element of the reverse of the sorted list is 0.
Abstract a predicate out of your sort routine, so that you can pass it as an argument.
Th 5: Show that (sort <= S) = (reverse (sort >= S))
etc. You can continue this ad infinitum. It's lots of fun. :-)

View your problem as a problem of "symbolic testing". Write a function that tests that your output is correct, and then show that all combinations of your initial code and your testing function work as intended.
Here is my favorite testing function for a sorting algorithm on your datatype.
Fixpoint sorted (l : List) : bool :=
match l with cons a l' =>
match l' with cons b l'' =>
if Lesseq a b then sorted l' else false
| nil => true
end
| nil => true
end.
then you can start a proof in the following way:
Lemma Quicksort_sorted : forall l, sorted (Quicksort l) = true.
But you will have to prove many intermediate lemmas before getting to proving the main one. So formal proof is really like testing, except that you ensure full coverage of the test.

Related

Coq: defining more than two mutually recursive functions on inductive type

I am defining three mutually recursive functions on inductive type event, using two different ways: using with and fix keywords, however, Coq complains about principal argument and The reference ... was not found in ..., respectively.
Two implementations of the functions are the following:
Require Import List.
Parameter max: list nat -> nat.
Inductive event : Type := node: list event -> event.
Parameter eventbool : forall (P:event->Prop) (e:event), {P e} + {~ P e}.
Definition event_sumbool_b (e: event) (p: event -> Prop) : bool :=
if eventbool p e then true else false.
Fixpoint parentround (e: event) : nat :=
match e with
| node l => max (rounds l)
end
with rounds l :=
match l with
| nil => 0::nil
| h::tl => round h:: rounds tl
end
with round e :=
if event_sumbool_b e roundinc then parentround e + 1 else parentround e
with roundinc e :=
exists S:list event, (forall y, List.In y S /\ round y = parentround e).
Coq complains Recursive call to round has principal argument equal to "h" instead of "tl". Even though, h in calling round is a sub-term of e. Following the idea in Recursive definitions over an inductive type, I replaced parentround with the following:
Fixpoint parentround (e: event) : nat :=
let round :=
( fix round (e: event) : nat :=
if event_sumbool_b e roundinc then parentround e + 1 else parentround e
) in
let roundinc :=
( fix roundinc (e: event) : Prop :=
exists S:list event, (forall y, List.In y S /\ round y = parentround e)
) in
match e with
| node l => max (rounds l)
end
with rounds l :=
match l with
| nil => 0::nil
| h::tl => round h:: rounds tl
end.
It now complains about missing definition of roundinc.
Please help me defining these three mutually recursive functions parentround, round and roundinc.
EDIT There is a fourth function rounds in the definition, however, it is not problematic so far.
It was a bit difficult for me to figure out what you would like to compute, I'd start with something like:
From mathcomp Require Import ssreflect ssrbool eqtype ssrnat seq choice fintype.
From mathcomp Require Import bigop.
Definition max_seq l := \max_(x <- l) x.
Inductive event : Type :=
node : seq event -> event.
Fixpoint round_lvl (e : event) : nat :=
(* XXX: Missing condition *)
let cond_inc x := round_lvl x in
match e with
| node l => max_seq (map cond_inc l)
end.
But I had trouble parsing what the existence condition means. I suggest you try to express the problem without Coq code first, and maybe from that a solution pops.
From my experience, it is really hard to define mutual function on a type T and on list T. I recall a post on Coq-club about this subjet, I have to find it again.
The "easy" solution is to define a mutual inductive type where you define event and event_list at the same time. However, you won't be able to use the list library...

In Idris, how to add 1 to a Fin until a "max" is Reached

I have a data type which takes a number in constructor, and this number MUST be between 1 and 5 (represented as 0..4):
import Data.Fin
data Stars = MkStars (Fin 5)
I want to create a function that add one to an existing star, and if it's already a 5 stars, doesn't do anything
I tried
addOneStar: Stars -> Stars
addOneStar (MkStars FZ) = MkStars (FS FZ)
addOneStar (MkStars (FS x)) = if x < 3
then MkStars (FS (FS x))
else MkStars (FS x)
But it doesn't compile with the error :
Type mismatch between
Fin 4 (Type of x)
and
Fin 3 (Expected type)
Specifically:
Type mismatch between
1
and
0
Can someone explain to me why is there this error ?
And How to fix it ?
Cactus' answer explains the problem and gives one possible solution. Since we have dependent types in Idris, though, I would at least advertise the possibility of a solution that makes use of that and could thus be seen as more idiomatic:
nextFin : (m : Fin (S n)) -> {auto p : (toNat m) `LT` n} -> Fin (S n)
nextFin {n = Z} FZ {p} = absurd $ succNotLTEzero p
nextFin {n = Z} (FS FZ) impossible
nextFin {n = S k} FZ = FS FZ
nextFin {n = S k} (FS l) {p} = let p = fromLteSucc p in
FS $ nextFin l
This function takes as argument a proof that the given Fin is not the last one. This way you can be sure on the level of the type checker that the program never even tries to give you a nextFin for that.
If you don't want that, but rather something akin to the cited answer, you can also use strengthen together with with rules to avoid most of the cases:
nextFin : Fin (S n) -> Maybe $ Fin (S n)
nextFin m with (strengthen m)
| Left k = Nothing
| Right k = Just $ FS k
The type of the constructor FS is FS : Fin n -> Fin (S n), so if you have x : Fin 5, even if you know it is less than 3 : Fin 5, its type is still Fin 5, so you can't pass it to FS and get another Fin 5; you'll get a Fin 6 instead.
You can write a function nextFin : Fin n -> Maybe (Fin n) that returns Nothing for the largest Fin; but that function has to rebuild the new Fin, it can't just apply FS at the topmost level. The idea is to use the fact that FZ : Fin n either has or doesn't have a successor depending on whether n is 1 or greater; and the successor of FS k is the successor of k wrapped in FS:
import Data.Fin
total nextFin : Fin n -> Maybe (Fin n)
nextFin {n = Z} k = absurd k
nextFin {n = (S Z)} _ = Nothing
nextFin {n = (S (S n))} FZ = Just (FS FZ)
nextFin {n = (S (S n))} (FS k) = map FS $ nextFin k
Fin implements the Enum interface, providing the successor function succ with exactly the desired semantics, which makes the implementation of addOneStar trivial:
addOneStar: Stars -> Stars
addOneStar (MkStars s) = MkStars $ succ s
The other answers have addressed your question directly. I'm here to offer an alternative design.
You mention that your number must be between 1 and 5. (It looks like you're building some sort of movie rating system?) Rather than indirectly represent this as a bounded natural number between 0 and 4, why not just enumerate the small number of allowed cases directly? You don't need dependent types for this; the following is valid Haskell 98.
data Stars = OneStar
| TwoStars
| ThreeStars
| FourStars
| FiveStars
deriving (Eq, Ord, Show, Read, Bounded, Enum)
addOneStar FiveStars = FiveStars
addOneStar s = succ s

Generality of `foldr` or other higher order function

Here's a simple function that takes a list and a number and works out if the length of the list is greater than that number.
e.g.
compareLengthTo [1,2,3] 3 == EQ
compareLengthTo [1,2] 3 == LT
compareLengthTo [1,2,3,4] 3 == GT
compareLengthTo [1..] 3 == GT
Note that it has two properties:
It works for infinite lists.
It is tail recursive and uses constant space.
import Data.Ord
compareLengthTo :: [a] -> Int -> Ordering
compareLengthTo l n = f 0 l
where
f c [] = c `compare` n
f c (l:ls) | c > n = GT
| otherwise = f (c + 1) ls
Is there a way to write compareLengthTo using foldr only?
Note, here's a version of compareLengthTo using drop:
compareLengthToDrop :: [a] -> Int -> Ordering
compareLengthToDrop l n = f (drop n (undefined:l))
where
f [] = LT
f [_] = EQ
f _ = GT
I guess another question is then, can you implement drop in terms of foldr?
Here ya go (note: I just changed one comparison, which makes it lazier):
compareLengthTo :: [a] -> Int -> Ordering
compareLengthTo l n = foldr f (`compare` n) l 0
where
f l cont c | c >= n = GT
| otherwise = cont $! c + 1
This uses exactly the same sort of technique used to implement foldl in terms of foldr. There's a classic article about the general technique called A tutorial on the universality and expressiveness of fold. You can also see a step-by-step explanation I wrote on the Haskell Wiki.
To get you started, note that foldr is being applied to four arguments here, rather than the usual three. This works out because the function being folded takes three arguments, and the "base case" is a function, (`compare` n).
Edit
If you want to use lazy Peano numerals as J. Abrahamson does, you can count down instead of counting up.
compareLengthTo :: [a] -> Nat -> Ordering
compareLengthTo l n = foldr f final l n
where
f _ _ Zero = GT
f _ cont (Succ p) = cont p
final Zero = EQ
final _ = LT
By it's very definition, foldr is not tail-recursive:
-- slightly simplified
foldr :: (a -> r -> r) -> r -> ([a] -> r)
foldr cons nil [] = nil
foldr cons nil (a:as) = cons a (foldr cons nil as)
so you cannot achieve that end. That said, there are some attractive components of foldr's semantics. In particular, it is "productive" which allows folds written with foldr to behave nicely with laziness.
We can see foldr as saying how to break down (catalyze) a list one "layer" at a time. If the cons argument can return without caring about any further layers of the list then it can terminate early and we avoid ever having to examine any more tails of the list---this is how foldr can act non-strictly at times.
Your function, to work on infinite lists, does something similar to the numeric argument. We'd like to operate on that argument "layer by layer". To make this more clear, let's define the naturals as follows
data Nat = Zero | Succ Nat
Now "layer by layer" more clearly means "counting down to zero". We can formalize that notion like so:
foldNat :: (r -> r) -> r -> (Nat -> r)
foldNat succ zero Zero = zero
foldNat succ zero (Succ n) = succ (foldNat succ zero n)
and now we can define something a bit like what we're looking for
compareLengthTo :: Nat -> [a] -> Ordering
compareLengthTo = foldNat succ zero where
zero :: [a] -> Ordering
zero [] = EQ -- we emptied the list and the nat at the same time
zero _ = GT -- we're done with the nat, but more list remains
succ :: ([a] -> Ordering) -> ([a] -> Ordering)
succ continue [] = LT -- we ran out of list, but had more nat
succ continue (_:as) = continue as -- keep going, both nat and list remain
It can take some time to study the above to see how it works. In particular, note that I instantiated r as a function, [a] -> Ordering. The form of the function above is "recursion on the natural numbers" and it allows it to accept infinite lists so long as the Nat argument isn't...
infinity :: Nat
infinity = Succ infinity
Now, the above function works on this strange type, Nat, which models the non-negative integers. We can translate the same concept to Int by replacing foldNat with foldInt, written similarly:
foldInt :: (r -> r) -> r -> (Int -> r)
foldInt succ zero 0 = zero
foldInt succ zero n = succ (foldInt succ zero (n - 1))
which you can verify embodies the exact same pattern as foldNat but avoids the use of the awkward Succ and Zero constructors. You can also verify that foldInt behaves pathologically if we give it negative integers... which is about what we'd expect.
Have to participate into this coding competion:
"Prelude":
import Test.QuickCheck
import Control.Applicative
compareLengthTo :: [a] -> Int -> Ordering
compareLengthTo l n = f 0 l
where
f c [] = c `compare` n
f c (l:ls) | c > n = GT
| otherwise = f (c + 1) ls
My first attempt was to write this
compareLengthTo1 :: [a] -> Int -> Ordering
compareLengthTo1 l n = g $ foldr f (Just n) l
where
-- we go below zero
f _ (Just 0) = Nothing
f _ (Just n) = Just (n - 1)
f _ Nothing = Nothing
g (Just 0) = EQ
g (Just _) = LT
g Nothing = GT
And it works for finite arguments:
prop1 :: [()] -> NonNegative Int -> Property
prop1 l (NonNegative n) = compareLengthTo l n === compareLengthTo1 l n
-- >>> quickCheck prop1
-- +++ OK, passed 100 tests.
But it fails for infinite lists. Why?
Let's define a variant using peano naturals:
data Nat = Zero | Succ Nat
foldNat :: (r -> r) -> r -> (Nat -> r)
foldNat succ zero Zero = zero
foldNat succ zero (Succ n) = succ (foldNat succ zero n)
natFromInteger :: Integer -> Nat
natFromInteger 0 = Zero
natFromInteger n = Succ (natFromInteger (n - 1))
natToIntegral :: Integral a => Nat -> a
natToIntegral = foldNat (1+) 0
instance Arbitrary Nat where
arbitrary = natFromInteger . getNonNegative <$> arbitrary
instance Show Nat where
show = show . (natToIntegral :: Nat -> Integer)
infinity :: Nat
infinity = Succ infinity
compareLengthTo2 :: [a] -> Nat -> Ordering
compareLengthTo2 l n = g $ foldr f (Just n) l
where
f _ (Just Zero) = Nothing
f _ (Just (Succ n)) = Just n
f _ Nothing = Nothing
g (Just Zero) = EQ
g (Just _) = LT
g Nothing = GT
prop2 :: [()] -> Nat -> Property
prop2 l n = compareLengthTo l (natToIntegral n) === compareLengthTo2 l n
-- >>> compareLengthTo2 [] infinity
-- LT
After staring long enough we see that it works for infinite numbers, not infinite lists.
That's why J. Abrahamson used foldNat in his definition.
So if we fold the number argument, we will get function which works on infinite lists, but finite numbers:
compareLengthTo3 :: [a] -> Nat -> Ordering
compareLengthTo3 l n = g $ foldNat f (Just l) n
where
f (Just []) = Nothing
f (Just (x:xs)) = Just xs
f Nothing = Nothing
g (Just []) = EQ
g (Just _) = GT
g Nothing = LT
prop3 :: [()] -> Nat -> Property
prop3 l n = compareLengthTo l (natToIntegral n) === compareLengthTo3 l n
nats :: [Nat]
nats = iterate Succ Zero
-- >>> compareLengthTo3 nats (natFromInteger 10)
-- GT
foldr and foldNat are kind of functions which generalise structural recursion on the argument (catamorphisms). They have nice property that given finite inputs and total functions as arguments, they are also total i.e. always terminate.
That's why we foldNat in the last example. We assume that Nat argument is finite, so compareLengthTo3 works on all [a] - even infinite.

Defining Unlambda-style tree notation in Coq

Here is a definition of polymorphic binary trees I am using in a Coq project.
Inductive tree { X : Type } : Type :=
| t_a : X -> tree
| t_m : tree -> tree -> tree.
A binary tree of natural numbers ( 1 ( ( 2 3 ) 4 ) ), declared using this definition, would be:
t_m ( t_a 1 ) ( t_m ( t_m ( t_a 2 ) ( t_a 3 ) ) ( t_a 4 ) )
As you can see, the definition becomes unusable very quickly with increasing number of leaves. What I want to do is define an Unlambda-style notation for trees so that I can replace the above with
' 1 ' ' 2 3 4
Is this possible?
I tried to get a solution that used only Coq notations, but couldn't get it to work. I suspect that Coq's extensible parser is not powerful enough to understand the notation you want. There is, however, a poor man's solution that involves dependent types. The idea is to write a parser for that notation and use that parser's type to encode the parser state. The type says that the parser "reads" some token (actually, takes that token as an argument to a function call), and goes into some next state that depends on the token it just read.
There's a small subtlety, though, which is that one cannot write that type using just regular Coq function types, because the number of arguments that function would take would depend on all the arguments it is being applied to. One solution is to use a coinductive type to encode this behavior, declaring a coercion to make it look like a function:
Inductive tree (X : Type) : Type :=
| t_a : X -> tree X
| t_m : tree X -> tree X -> tree X.
Arguments t_a {X} _.
Arguments t_m {X} _ _.
CoInductive tree_builder X : nat -> Type :=
| TbDone : tree X -> tree_builder X 0
| TbRead : forall n, (forall o : option X, tree_builder X match o with
| Some x => n
| None => S (S n)
end) ->
tree_builder X (S n).
Arguments TbDone {X} _.
Arguments TbRead {X} _ _.
(* Destructors for tree_builder *)
Definition case0 {X} (x : tree_builder X 0) : tree X :=
match x with
| TbDone t => t
end.
Definition caseS {X n} (x : tree_builder X (S n)) :
forall o : option X, tree_builder X match o with
| Some x => n
| None => S (S n)
end :=
match x with
| TbRead _ f => f
end.
Definition tb X n := tree_builder X (S n).
(* force is what does the magic here: it takes a tb and coerces it to a
function that may produce another tb, depending on what it is applied to. *)
Definition force X n (x : tb X n) : forall o : option X,
match o with
| Some x =>
match n with
| 0 => tree X
| S n' => tb X n'
end
| None =>
tb X (S n)
end :=
fun o =>
match o return tree_builder X match o with
| Some x => n
| None => S (S n)
end ->
match o with
| Some x => match n with
| 0 => tree X
| S n' => tb X n'
end
| None => tb X (S n)
end
with
| Some x => match n return tree_builder X n -> match n with
| 0 => tree X
| S n' => tb X n'
end
with
| 0 => fun t => case0 t
| S _ => fun t => t
end
| None => fun t => t
end (caseS x o).
Coercion force : tb >-> Funclass.
Our parser, then, is just a term of type tb X 0. As it is usually done, it has to be written in continuation-passing style because of the variable number of arguments.
Fixpoint parser_cont_type X (n : nat) : Type :=
match n with
| 0 => tree X
| S n' => tree X -> parser_cont_type X n'
end.
CoFixpoint parser X n : parser_cont_type X n -> tree_builder X n :=
match n with
| 0 => fun k => TbDone k
| S n' => fun k : tree X -> parser_cont_type X n' =>
TbRead n' (fun o => match o return tree_builder X match o with
| Some _ => n'
| None => S (S n')
end
with
| Some x => parser X n' (k (t_a x))
| None => parser X (S (S n')) (fun (t1 t2 : tree X) => k (t_m t1 t2))
end)
end.
Definition parser' X : tb X 0 :=
parser X 1 (fun t => t).
Next, we can define some extra notation to make this easier to use:
Notation "[ x ]" := (Some x) (at level 0).
Notation "''" := None (at level 0).
Notation "!" := (parser' _) (at level 20).
Here's how one could write your example tree, for instance:
Definition my_tree : tree nat := Eval hnf in ! '' [1] '' '' [2] [3] [4].
Notice the initial ! to start a call to the parser, and the [] that are needed to mark the leaves. I also couldn't get Coq's parser to accept ' as a token on its own. Besides these minor details, however, it is fairly close to what you had.
One problem is that, because the parser is defined using Coq functions, one needs to do a little bit of simplification to get a term that is exactly like the one you had originally. This is why I added the Eval call on the definition. This is probably not as practical as a real notation, and the definition is admittedly a bit tricky, but it could be pretty useful for some cases.
Here's a gist with the entire .v file.
UPDATE: I've written a post with a much simplified version of this technique made more generic.

Is there a fast algorithm to determine the godel number of a term of a context free language?

Suppose we have a simple grammar specification. There is a way to enumerate terms of that grammar that guarantees that any finite term will have a finite position, by iterating it diagonally. For example, for the following grammar:
S ::= add
add ::= mul | add + mul
mul ::= term | mul * term
term ::= number | ( S )
number ::= digit | digit number
digit ::= 0 | 1 | ... | 9
You can enumerate terms like that:
0
1
0+0
0*0
0+1
(0)
1+0
0*1
0+0*0
00
... etc
My question is: is there a way to do the opposite? That is, to take a valid term of that grammar, say, 0+0*0, and find its position on such enumeration - in that case, 9?
For this specific problem, we can cook up something fairly simple, if we allow ourselves to choose a different enumeration ordering. The idea is basically the one in Every Bit Counts, which I also mentioned in the comments. First, some preliminaries: some imports/extensions, a data type representing the grammar, and a pretty-printer. For the sake of simplicity, my digits only go up to 2 (big enough to not be binary any more, but small enough not to wear out my fingers and your eyes).
{-# LANGUAGE TypeSynonymInstances #-}
import Control.Applicative
import Data.Universe.Helpers
type S = Add
data Add = Mul Mul | Add :+ Mul deriving (Eq, Ord, Show, Read)
data Mul = Term Term | Mul :* Term deriving (Eq, Ord, Show, Read)
data Term = Number Number | Parentheses S deriving (Eq, Ord, Show, Read)
data Number = Digit Digit | Digit ::: Number deriving (Eq, Ord, Show, Read)
data Digit = D0 | D1 | D2 deriving (Eq, Ord, Show, Read, Bounded, Enum)
class PP a where pp :: a -> String
instance PP Add where
pp (Mul m) = pp m
pp (a :+ m) = pp a ++ "+" ++ pp m
instance PP Mul where
pp (Term t) = pp t
pp (m :* t) = pp m ++ "*" ++ pp t
instance PP Term where
pp (Number n) = pp n
pp (Parentheses s) = "(" ++ pp s ++ ")"
instance PP Number where
pp (Digit d) = pp d
pp (d ::: n) = pp d ++ pp n
instance PP Digit where pp = show . fromEnum
Now let's define the enumeration order. We'll use two basic combinators, +++ for interleaving two lists (mnemonic: the middle character is a sum, so we're taking elements from either the first argument or the second) and +*+ for the diagonalization (mnemonic: the middle character is a product, so we're taking elements from both the first and second arguments). More information on these in the universe documentation. One invariant we'll maintain is that our lists -- with the exception of digits -- are always infinite. This will be important later.
ss = adds
adds = (Mul <$> muls ) +++ (uncurry (:+) <$> adds +*+ muls)
muls = (Term <$> terms ) +++ (uncurry (:*) <$> muls +*+ terms)
terms = (Number <$> numbers) +++ (Parentheses <$> ss)
numbers = (Digit <$> digits) ++ interleave [[d ::: n | n <- numbers] | d <- digits]
digits = [D0, D1, D2]
Let's see a few terms:
*Main> mapM_ (putStrLn . pp) (take 15 ss)
0
0+0
0*0
0+0*0
(0)
0+0+0
0*(0)
0+(0)
1
0+0+0*0
0*0*0
0*0+0
(0+0)
0+0*(0)
0*1
Okay, now let's get to the good bit. Let's assume we have two infinite lists a and b. There's two things to notice. First, in a +++ b, all the even indices come from a, and all the odd indices come from b. So we can look at the last bit of an index to see which list to look in, and the remaining bits to pick an index in that list. Second, in a +*+ b, we can use the standard bijection between pairs of numbers and single numbers to translate between indices in the big list and pairs of indices in the a and b lists. Nice! Let's get to it. We'll define a class for Godel-able things that can be translated back and forth between numbers -- indices into the infinite list of inhabitants. Later we'll check that this translation matches the enumeration we defined above.
type Nat = Integer -- bear with me here
class Godel a where
to :: a -> Nat
from :: Nat -> a
instance Godel Nat where to = id; from = id
instance (Godel a, Godel b) => Godel (a, b) where
to (m_, n_) = (m + n) * (m + n + 1) `quot` 2 + m where
m = to m_
n = to n_
from p = (from m, from n) where
isqrt = floor . sqrt . fromIntegral
base = (isqrt (1 + 8 * p) - 1) `quot` 2
triangle = base * (base + 1) `quot` 2
m = p - triangle
n = base - m
The instance for pairs here is the standard Cantor diagonal. It's just a bit of algebra: use the triangle numbers to figure out where you're going/coming from. Now building up instances for this class is a breeze. Numbers are just represented in base 3:
-- this instance is a lie! there aren't infinitely many Digits
-- but we'll be careful about how we use it
instance Godel Digit where
to = fromIntegral . fromEnum
from = toEnum . fromIntegral
instance Godel Number where
to (Digit d) = to d
to (d ::: n) = 3 + to d + 3 * to n
from n
| n < 3 = Digit (from n)
| otherwise = let (q, r) = quotRem (n-3) 3 in from r ::: from q
For the remaining three types, we will, as suggested above, check the tag bit to decide which constructor to emit, and use the remaining bits as indices into a diagonalized list. All three instances necessarily look very similar.
instance Godel Term where
to (Number n) = 2 * to n
to (Parentheses s) = 1 + 2 * to s
from n = case quotRem n 2 of
(q, 0) -> Number (from q)
(q, 1) -> Parentheses (from q)
instance Godel Mul where
to (Term t) = 2 * to t
to (m :* t) = 1 + 2 * to (m, t)
from n = case quotRem n 2 of
(q, 0) -> Term (from q)
(q, 1) -> uncurry (:*) (from q)
instance Godel Add where
to (Mul m) = 2 * to m
to (m :+ t) = 1 + 2 * to (m, t)
from n = case quotRem n 2 of
(q, 0) -> Mul (from q)
(q, 1) -> uncurry (:+) (from q)
And that's it! We can now "efficiently" translate back and forth between parse trees and their Godel numbering for this grammar. Moreover, this translation matches the above enumeration, as you can verify:
*Main> map from [0..29] == take 30 ss
True
We did abuse many nice properties of this particular grammar -- non-ambiguity, the fact that almost all the nonterminals had infinitely many derivations -- but variations on this technique can get you quite far, especially if you are not too strict on requiring every number to be associated with something unique.
Also, by the way, you might notice that, except for the instance for (Nat, Nat), these Godel numberings are particularly nice in that they look at/produce one bit (or trit) at a time. So you could imagine doing some streaming. But the (Nat, Nat) one is pretty nasty: you have to know the whole number ahead of time to compute the sqrt. You actually can turn this into a streaming guy, too, without losing the property of being dense (every Nat being associated with a unique (Nat, Nat)), but that's a topic for another answer...

Resources