How to define level tree traversal of a binary tree in isabelle/hol - binary-tree

I have written level traversal of binary tree in other language definitions, but I don't know how to represent level traversal in isabelle/hol.Has anyone defined it or how to define it?

In principle, you can do it exactly the same way as in Haskell. The problematic bit is that you have to prove termination of the recursive auxiliary function (what is called tbf in the Haskell code you linked). The easiest way to show this is by finding some sort of measure on the input (a list of trees) that decreases with every recursive call.
I propose the following measure: sum the sizes of all the trees in the list, where the size is the number of all the nodes in the tree (including leaf nodes).
We can use the binary trees from HOL-Library (HOL-Library.Tree). First, we define some auxiliary functions on trees, including our size functions, and prove some facts about them:
primrec tree_values :: "'a tree ⇒ 'a list" where
"tree_values Leaf = []"
| "tree_values (Node l x r) = [x]"
primrec tree_children :: "'a tree ⇒ 'a tree list" where
"tree_children Leaf = []"
| "tree_children (Node l x r) = [l, r]"
primrec tree_size :: "'a tree ⇒ nat" where
"tree_size Leaf = 1"
| "tree_size (Node l x r) = tree_size l + tree_size r + 1"
definition tree_list_size :: "'a tree list ⇒ nat"
where "tree_list_size = sum_list ∘ map tree_size"
lemma tree_size_pos: "tree_size t > 0"
by (induction t) auto
lemma tree_size_nonzero [simp]: "tree_size t ≠ 0"
by (simp add: tree_size_pos)
lemma tree_list_size_children [simp]:
"tree_list_size (tree_children t) = tree_size t - 1"
by (cases t) (auto simp: tree_list_size_def)
Next, we will need another simple lemma on sum_list and concat:
lemma sum_list_concat: "sum_list (concat xs) = sum_list (map sum_list xs)"
by (induction xs) auto
Finally, we can define BFS and prove its termination:
function bfs_aux :: "'a tree list ⇒ 'a list" where
"bfs_aux ts =
(if ts = [] then [] else concat (map tree_values ts) # bfs_aux (concat (map tree_children ts)))"
by auto
termination
proof (relation "measure tree_list_size")
fix ts :: "'a tree list"
assume ts: "ts ≠ []"
have "tree_list_size (concat (map tree_children ts)) =
sum_list (map (tree_list_size ∘ tree_children) ts)"
by (simp add: map_concat sum_list_concat tree_list_size_def o_assoc)
also from ‹ts ≠ []› have "… < sum_list (map tree_size ts)"
by (intro sum_list_strict_mono) (auto simp: tree_size_pos)
also have "… = tree_list_size ts"
by (simp add: tree_list_size_def)
finally show "(concat (map tree_children ts), ts) ∈ measure tree_list_size"
by simp
qed auto
definition bfs :: "'a tree ⇒ 'a list"
where "bfs t = bfs_aux [t]‹›
And we can test it:
value "bfs (⟨⟨⟨Leaf, ''d'', Leaf⟩, ''b'', ⟨Leaf, ''e'', Leaf⟩⟩, ''a'',
⟨⟨Leaf, ''f'', Leaf⟩, ''c'', ⟨Leaf, ''g'', Leaf⟩⟩⟩)"
> "[''a'', ''b'', ''c'', ''d'', ''e'', ''f'', ''g'']"
:: "char list list"
For more on defining functions with non-trivial recursion patterns like this and proving their termination, see the documentation of the function package (Section 4 in particular).

Related

Isabelle/HOL: proof by 'simp' is slow while 'value' is instantaneous

I am new to Isabelle/HOL, still in the study of the prog-prov exercises. In the meantime, I am exercising by applying these proof techniques to questions of combinatorial words. I observe a very different behavior (in terms of efficiency), between 'value' and 'lemma'.
Can one explain the different evaluation/search strategies between the two commands?
Is there a way to have the speed of 'value' used inside a proof of a 'lemma'?
Of course, I am asking because I have not found the answer in the documentation (so far). What is the manual where this difference of efficiency would be documented and explained?
Here is a minimal piece of source to reproduce the problem.
theory SlowLemma
imports Main
begin
(* Alphabet for Motzkin words. *)
datatype alphabet = up | lv | dn
(* Keep the [...] notation for lists. *)
no_notation Cons (infixr "#" 65) and append (infixr "#" 65)
primrec count :: "'a ⇒ 'a list ⇒ nat" where
"count _ Nil = 0" |
"count s (Cons h q) = (if h = s then Suc (count s q) else count s q)"
(* prefix n l simply returns undefined if n > length l. *)
fun prefix :: "'a list ⇒ nat ⇒ 'a list" where
"prefix _ 0 = []" |
"prefix (Cons h q) (Suc n) = Cons h (prefix q n)"
definition M_ex_7 :: "alphabet list" where
"M_ex_7 ≡ [lv, lv, up, up, lv, dn, dn]"
definition M_ex_19 :: "alphabet list" where
"M_ex_19 ≡ [lv, lv, up, up, lv, up, lv, dn, lv, dn, lv, up, dn, dn, lv, up, dn, lv, lv]"
fun height :: "alphabet list ⇒ int" where
"height w = (int (count up w + count up w)) - (int (count dn w + count dn w))"
primrec is_pre_M :: "alphabet list ⇒ nat ⇒ bool" where
"is_pre_M _ (0 :: nat) = True"
| "is_pre_M w (Suc n) = (let w' = prefix w (Suc n) in is_pre_M w' n ∧ height w' ≥ 0)"
fun is_M :: "alphabet list ⇒ bool" where
"is_M w = (is_pre_M w (length w) ∧ height w = 0)"
(* These two calls to value are fast. *)
value "is_M M_ex_7"
value "is_M M_ex_19"
(* This first lemma goes fast. *)
lemma is_M_M_ex_7: "is_M M_ex_7"
by (simp add: M_ex_7_def)
(* This second lemma takes five minutes. *)
lemma is_M_M_ex_19: "is_M M_ex_19"
by (simp add: M_ex_19_def)
end
simp is a proof method that goes through the proof kernel, i.e., every step has to be justified. For long rewriting chains, this may be quite expensive.
On the other hand, value uses the code generator where possible. All used constants are translated into ML code, which is then executed. You have to trust the result, i.e., it didn't go through the kernel and may be wrong.
The equivalent of value as a proof method is eval. Thus, an easy way to speed up your proofs is to use this:
lemma is_M_M_ex_19: "is_M M_ex_19"
by eval
Opinions in the Isabelle community about whether or not this should be used differ. Some say it's similar to axiomatization (because you have to trust it), others consider it a reasonable way if going through the kernel is prohibitively slow. Everyone agrees though that you have to be really careful about custom setup of the code generator (which you haven't done, so it should be fine).
There's middle ground: the code_simp method will set up simp to use only the equations that would otherwise be used by eval. That means: a much smaller set of rules for simp, while still going through the kernel. In your case, it is actually the same speed as by eval, so I would highly recommend doing that:
lemma is_M_M_ex_19: "is_M M_ex_19"
by code_simp
In your case, the reason why code_simp is much faster than simp is because of a simproc that has exponential runtime in the number of nested let expressions. Hence, another solution would be to use simp add: Let_def to just unfold let expressions.
Edited to reflect comment by Andreas Lochbihler

Breaking after finding the kth element of an inorder traversal using a higher order traversal function

I have the following code to do an inorder traversal of a Binary Tree:
data BinaryTree a =
Node a (BinaryTree a) (BinaryTree a)
| Leaf
deriving (Show)
inorder :: (a -> b -> b) -> b -> BinaryTree a -> b
inorder f acc tree = go tree acc
where go Leaf z = z
go (Node v l r) z = (go r . f v . go l) z
Using the inorder function above I'd like to get the kth element without having to traverse the entire list.
The traversal is a little like a fold given that you pass it a function and a starting value. I was thinking that I could solve it by passing k as the starting value, and a function that'll decrement k until it reaches 0 and at that point returns the value inside the current node.
The problem I have is that I'm not quite sure how to break out of the recursion of inorder traversal short of modifying the whole function, but I feel like having to modify the higher order function ruins the point of using a higher order function in the first place.
Is there a way to break after k iterations?
I observe that the results of the recursive call to go on the left and right subtrees are not available to f; hence no matter what f does, it cannot choose to ignore the results of recursive calls. Therefore I believe that inorder as written will always walk over the entire tree. (edit: On review, this statement may be a bit strong; it seems f may have a chance to ignore left subtrees. But the point basically stands; there is no reason to elevate left subtrees over right subtrees in this way.)
A better choice is to give the recursive calls to f. For example:
anyOldOrder :: (a -> b -> b -> b) -> b -> BinaryTree a -> b
anyOldOrder f z = go where
go Leaf = z
go (Node v l r) = f v (go l) (go r)
Now when we write
flatten = anyOldOrder (\v ls rs -> ls ++ [v] ++ rs) []
we will find that flatten is sufficiently lazy:
> take 3 (flatten (Node 'c' (Node 'b' (Node 'a' Leaf Leaf) Leaf) undefined))
"abc"
(The undefined is used to provide evidence that this part of the tree is never inspected during the traversal.) Hence we may write
findK k = take 1 . reverse . take k . flatten
which will correctly short-circuit. You can make flatten slightly more efficient with the standard difference list technique:
flatten' t = anyOldOrder (\v l r -> l . (v:) . r) id t []
Just for fun, I also want to show how to implement this function without using an accumulator list. Instead, we will produce a stateful computation which walks over the "interesting" part of the tree, stopping when it reaches the kth element. The stateful computation looks like this:
import Control.Applicative
import Control.Monad.State
import Control.Monad.Trans.Maybe
kthElem k v l r = l <|> do
i <- get
if i == k
then return v
else put (i+1) >> r
Looks pretty simple, hey? Now our findK function will farm out to kthElem, then do some newtype unwrapping:
findK' k = (`evalState` 1) . runMaybeT . anyOldOrder (kthElem 3) empty
We can verify that it is still as lazy as desired:
> findK' 3 $ Node 'c' (Node 'b' (Node 'a' Leaf Leaf) Leaf) undefined
Just 'c'
There are (at least?) two important generalizations of the notion of folding a list. The first, more powerful, notion is that of a catamorphism. The anyOldOrder of Daniel Wagner's answer follows this pattern.
But for your particular problem, the catamorphism notion is a bit more power than you need. The second, weaker, notion is that of a Foldable container. Foldable expresses the idea of a container whose elements can all be mashed together using the operation of an arbitrary Monoid. Here's a cute trick:
{-# LANGUAGE DeriveFoldable #-}
-- Note that for this trick only I've
-- switched the order of the Node fields.
data BinaryTree a =
Node (BinaryTree a) a (BinaryTree a)
| Leaf
deriving (Show, Foldable)
index :: [a] -> Int -> Maybe a
[] `index` _ = Nothing
(x : _) `index` 0 = Just x
(_ : xs) `index` i = xs `index` (i - 1)
(!?) :: Foldable f => Int -> f a -> Maybe a
xs !? i = toList xs `index` i
Then you can just use !? to index into your tree!
That trick is cute, and in fact deriving Foldable is a tremendous convenience, but it won't help you understand anything. I'll start by showing how you can define treeToList fairly directly and efficiently, without using Foldable.
treeToList :: BinaryTree a -> [a]
treeToList t = treeToListThen t []
The magic is in the treeToListThen function. treeToListThen t more converts t to a list and appends the list more to the end of the result. This slight generalization turns out to be all that's required to make conversion to a list efficient.
treeToListThen :: BinaryTree a -> [a] -> [a]
treeToListThen Leaf more = more
treeToListThen (Node v l r) more =
treeToListThen l $ v : treeToListThen r more
Instead of producing an inorder traversal of the left subtree and then appending everything else, we tell the left traversal what to stick on the end when it's done! This avoids the potentially serious inefficiency of repeated list concatenation that can turn things O(n^2) in bad cases.
Getting back to the Foldable notion, turning things into lists is a special case of foldr:
toList = foldr (:) []
So how can we implement foldr for trees? It ends up being somewhat similar to what we did with toList:
foldrTree :: (a -> b -> b) -> b -> BinaryTree a -> b
foldrTree _ n Leaf = n
foldrTree c n (Node v l r) = foldrTree c rest l
where
rest = v `c` foldrTree c n r
That is, when we go down the left side, we tell it that when it's done, it should deal with the current node and its right child.
Now foldr isn't quite the most fundamental operation of Foldable; that is actually
foldMap :: (Foldable f, Monoid m)
=> (a -> m) -> f a -> m
It is possible to implement foldr using foldMap, in a somewhat tricky fashion using a peculiar Monoid. I don't want to overload you with details of that right now, unless you ask (but you should look at the default definition of foldr in Data.Foldable). Instead, I'll show how foldMap can be defined using Daniel Wagner's anyOldOrder:
instance Foldable BinaryTree where
foldMap f = anyOldOrder bin mempty where
bin lres v rres = lres <> f v <> rres

Performance of Foldable's default methods

I've been exploring the Foldable class and also the the Monoid class.
Firstly, lets say I want to fold over a list of the Monoid First. Like so:
x :: [First a]
fold? mappend mempty x
Then I assume in this case the most appropriate fold would be foldr, as mappend for First is lazy in it's second argument.
Conversely, for Last we'd want to foldl' (or just foldl I'm not sure).
Now moving away from lists, I've defined a simple binary tree like so:
{-# LANGUAGE GADTs #-}
data BinaryTree a where
BinaryTree :: BinaryTree a -> BinaryTree a -> BinaryTree a
Leaf :: a -> BinaryTree a
And I've made it Foldable with the most straightforward definition:
instance Foldable BinaryTree where
foldMap f (BinaryTree left right) =
(foldMap f left) `mappend` (foldMap f right)
foldMap f (Leaf x) = f x
As Foldable defines fold as simply foldMap id we can now do:
x1 :: BinaryTree (First a)
fold x1
x2 :: BinaryTree (Last a)
fold x2
Assuming our BinaryTree is balanced, and there's not many Nothing values, these operations should take O(log(n)) time I believe.
But Foldable also defines a whole lot of default methods like foldl, foldl', foldr and foldr' based on foldMap.
These default definitions seem to be implemented by composing a bunch of functions, wrapped in a Monoid called Endo, one for each element in the collection, and then composing them all.
For the purpose of this discussion I am not modifying these default definitions.
So lets now consider:
x1 :: BinaryTree (First a)
foldr mappend mempty x1
x2 :: BinaryTree (Last a)
foldl mappend mempty x2
Does running these retain O(log(n)) performance of the ordinary fold? (I'm not worried about constant factors for the moment). Does laziness result in the tree not needing to be fully traversed? Or will the default definitions of foldl and foldr require an entire traversal of the tree?
I tried to go though the algorithm step by step (much like they did on the Foldr Foldl Foldl' article) but I ended up completely confusing myself as this is a bit more complex as it involves an interaction between Foldable, Monoid and Endo.
So what I'm looking for is an explanation of why (or why not) the default definition of say foldr, would only take O(log(n)) time on a balanced binary tree like above. A step by step example like what's from the Foldr Foldl Foldl' article would be really helpful, but I understand if that's too difficult, as I totally confused myself attempting it.
Yes, it has O(log(n)) best case performance.
Endo is a wrapper around (a -> a) kind of functions that:
instance Monoid (Endo a) where
mempty = Endo id
Endo f `mappend` Endo g = Endo (f . g)
And the default implementation of foldr in Data.Foldable:
foldr :: (a -> b -> b) -> b -> t a -> b
foldr f z t = appEndo (foldMap (Endo #. f) t) z
The definition of . (function composition) in case:
(.) f g = \x -> f (g x)
Endo is defined by newtype constructor, so it only exists at compile stage, not run-time.
#. operator changes the type of it's second operand and discard the first.
The newtype constructor and #. operator guarantee that you can ignore the wrapper when considering performance issues.
So the default implementation of foldr can be reduced to:
-- mappend = (.), mempty = id from instance Monoid (Endo a)
foldr :: (a -> b -> b) -> b -> t a -> b
foldr f z t = foldMap f t z
For your Foldable BinaryTree:
foldr f z t
= foldMap f t z
= case t of
Leaf a -> f a z
-- what we care
BinaryTree l r -> ((foldMap f l) . (foldMap f r)) z
The default lazy evaluation in Haskell is ultimately simple, there are just two rules:
function application first
evaluate the arguments from left to right if the values matter
That makes it easy to trace the evaluation of the last line of the code above:
((foldMap f l) . (foldMap f r)) z
= (\z -> foldMap f l (foldMap f r z)) z
= foldMap f l (foldMap f r z)
-- let z' = foldMap f r z
= foldMap f l z' -- height 1
-- if the branch l is still not a Leaf node
= ((foldMap f ll) . (foldMap f lr)) z'
= (\z -> foldMap f ll (foldMap f lr)) z'
= foldMap f ll (foldMap f lr z')
-- let z'' = foldMap f lr z'
= foldMap f ll z'' -- height 2
The right branch of the tree is never expanded before the left has been fully expanded, and it goes one level higher after an O(1) operation of function expansion and application, therefore when it reached the left-most Leaf node:
= foldMap f leaf#(Leaf a) z'heightOfLeftMostLeaf
= f a z'heightOfLeftMostLeaf
Then f looks at the value a and decides to ignore its second argument (like what mappend will do to First values), the evaluation short-circuits, results O(height of the left-most leaf), or O(log(n)) performance when the tree is balanced.
foldl is all the same, it's just foldr with mappend flipped i.e. O(log(n)) best case performance with Last.
foldl' and foldr' are different.
foldl' :: (b -> a -> b) -> b -> t a -> b
foldl' f z0 xs = foldr f' id xs z0
where f' x k z = k $! f z x
At every step of reduction, the argument is evaluated first and then the function application, the tree will be traversed i.e. O(n) best case performance.

Functional O(1) append and O(n) iteration from first element list data structure

I'm looking for a functional data structure that supports the following operations:
Append, O(1)
In order iteration, O(n)
A normal functional linked list only supports O(n) append, while I could use a normal LL and then reverse it, the reverse operation would be O(n) also which (partially) negates the O(1) cons operation.
You can use John Hughes's constant-time append lists, which seem nowadays to be called DList. The representation is a function from lists to lists: the empty list is the identity function; append is composition, and singleton is cons (partially applied). In this representation every enumeration will cost you n allocations, so that may not be so good.
The alternative is to make the same algebra as a data structure:
type 'a seq = Empty | Single of 'a | Append of 'a seq * 'a seq
Enumeration is a tree walk, which will either cost some stack space or will require some kind of zipper representation. Here's a tree walk that converts to list but uses stack space:
let to_list t =
let rec walk t xs = match t with
| Empty -> xs
| Single x -> x :: xs
| Append (t1, t2) -> walk t1 (walk t2 xs) in
walk t []
Here's the same, but using constant stack space:
let to_list' t =
let rec walk lefts t xs = match t with
| Empty -> finish lefts xs
| Single x -> finish lefts (x :: xs)
| Append (t1, t2) -> walk (t1 :: lefts) t2 xs
and finish lefts xs = match lefts with
| [] -> xs
| t::ts -> walk ts t xs in
walk [] t []
You can write a fold function that visits the same elements but doesn't actually reify the list; just replace cons and nil with something more general:
val fold : ('a * 'b -> 'b) -> 'b -> 'a seq -> 'b
let fold f z t =
let rec walk lefts t xs = match t with
| Empty -> finish lefts xs
| Single x -> finish lefts (f (x, xs))
| Append (t1, t2) -> walk (t1 :: lefts) t2 xs
and finish lefts xs = match lefts with
| [] -> xs
| t::ts -> walk ts t xs in
walk [] t z
That's your linear-time, constant-stack enumeration. Have fun!
I believe you can just use standard functional linked list:
To append element, you can use cons (which is O(1))
To iterate elements in the order in which they were inserted, you can first reverse the list,
(which is O(N)) and then traverse it, which is also O(N) (and 2xO(N) is still just O(N)).
How about a difference list?
type 'a DList = DList of ('a list -> 'a list)
module DList =
let append (DList f) (DList g) = (DList (f << g))
let cons x (DList f) = (DList (fun l -> x::(f l)))
let snoc (DList f) x = (DList (fun l -> f(x::l)))
let empty = DList id
let ofList = List.fold snoc empty
let toList (DList f) = f []
You could create a functional Deque, which provides O(1) adding to either end, and O(N) for iteration in either direction. Eric Lippert wrote about an interesting version of an immutable Deque on his blog note that if you look around you will find the other parts of the series, but that is the explanation of the final product. Note also that with a bit of tweaking it can be modified to utilize F# discriminated unions and pattern matching (although that is up to you).
Another interesting property of this version, O(1) peek, removal, and add, from either end (i.e. dequeueLeft, dequeueRight, dequeueLeft, dequeueRight, etc. is still O(N), versus O(N*N) with a double list method).
What about a circularly-linked list? It supports O(1) appends and O(n) iteration.

Monads and custom traversal functions in Haskell

Given the following simple BST definition:
data Tree x = Empty | Leaf x | Node x (Tree x) (Tree x)
deriving (Show, Eq)
inOrder :: Tree x -> [x]
inOrder Empty = []
inOrder (Leaf x) = [x]
inOrder (Node root left right) = inOrder left ++ [root] ++ inOrder right
I'd like to write an in-order function that can have side effects. I achieved that with:
inOrderM :: (Show x, Monad m) => (x -> m a) -> Tree x -> m ()
inOrderM f (Empty) = return ()
inOrderM f (Leaf y) = f y >> return ()
inOrderM f (Node root left right) = inOrderM f left >> f root >> inOrderM f right
-- print tree in order to stdout
inOrderM print tree
This works fine, but it seems repetitive - the same logic is already present in inOrder and my experience with Haskell leads me to believe that I'm probably doing something wrong if I'm writing a similar thing twice.
Is there any way that I can write a single function inOrder that can take either pure or monadic functions?
In inOrder you are mapping a Tree x to a [x], i. e. you sequentialize your tree. Why not just use mapM or mapM_ on the resulting list?
mapM_ print $ inOrder tree
Just to remind the types of the functions I've mentioned:
mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]
mapM_ :: (Monad m) => (a -> m b) -> [a] -> m ()
You might want to look at implementing the Data.Traversable class or Data.Foldable class for your tree structure. Each only requires the definition of a single method.
In particular, if you implement the Data.Foldable class, you get the following two functions for free:
mapM_ :: (Foldable t, Monad m) => (a -> m b) -> t a -> m ()
toList :: Foldable t => t a -> [a]
It will also give you the rich set of functions (foldr, concatMap, any, ...) that you are used to using with the list type.
You only have to implement one of the following functions to create an instance of Data.Foldable:
foldMap :: Monoid m => (a -> m) -> t a -> m
foldr :: (a -> b -> b) -> b -> t a -> b

Resources