Magic code for level binary tree traversal - what is going on? - algorithm

We have a definition of binary tree:
type 'a tree =
| Node of 'a tree * 'a * 'a tree
| Null;;
And also a helpful function for traversing the tree"
let rec fold_tree f a t =
match t with
| Null -> a
| Node (l, x, r) -> f x (fold_tree f a l) (fold_tree f a r);;
And here is a "magic" function which, when given a binary tree, returns a list in which we have lists of elements on particular levels, for example, when given a tree:
(source: ernet.in)
the function returns [[1];[2;3];[4;5;6;7];[8;9]].
let levels tree =
let aux x fl fp =
fun l ->
match l with
| [] -> [x] :: (fl (fp []))
| h :: t -> (x :: h) :: (fl (fp t))
in fold_tree aux (fun x -> x) tree [];;
And apparently it works, but I can't wrap my mind around it. Could anyone explain in simple terms what is going on? Why does this function work?

How do you combine two layer lists of two subtrees and get a layer list of a bugger tree? Suppose you have this tree
a
/ \
x y
where x and y are arbitrary trees, and they have their layer lists as [[x00,x01,...],[x10,x11,...],...] and [[y00,y01,...],[y10,y11,...],...] respectively.
The layer list of the new tree will be [[a],[x00,x01,...]++[y00,y01,...],[x10,x11,...]++[y10,y11,...],...]. How does this function build it?
Let's look at this definition
let rec fold_tree f a t = ...
and see what kind of arguments we are passing to fold_tree in our definition of levels.
... in fold_tree aux (fun x -> x) tree []
So the first argument, aux, is some kind of long and complicated function. We will return to it later.
The second argument is also a function — the identity function. This means that fold_tree will also return a function, because fold_tree always returns the same type of value as its second argument. We will argue that the function fold_tree applied to this set of arguments takes a list of layers, and adds layers of a given tree to it.
The third argument is our tree.
Wait, what's the fourth argument? fold_tree is only supposed to get tree? Yes, but since it returns a function (see above), that function gets applied to that fourth argument, the empty list.
So let's return to aux. This aux function accepts three arguments. One is the element of the tree, and two others are the results of the folds of the subtrees, that is, whatever fold_tree returns. In our case, these two things are functions again.
So aux gets a tree element and two functions, and returns yet another function. Which function is that? It takes a list of layers, and adds layers of a given tree to it. How it does that? It prepends the root of the tree to the first element (which is the top layer) of the list, and then adds the layers of the right subtree to the tail of the list (which is all the layers below the top) by calling the right function on it, and then adds the layers of the left subtree to the result by calling the left function on it. Or, if the incoming list is empty, it just the layers list afresh by applying the above step to the empty list.

Related

SML Syntax Breakdown

I am trying to study SML (for full transparency this is in preparation for an exam (exam has not started)) and one area that I have been struggling with is higher level functions such as map and foldl/r. I understand that they are used in situations where you would use a for loop in oop languages (I think). What I am struggling with though is what each part in a fold or map function is doing. Here are some examples that if someone could break them down I would be very appreciative
fun cubiclist L = map (fn x=> x*x*x) L;
fun min (x::xs) = foldr (fn (a,b) => if (a < b) then a else b) x xs;
So if I could break down the parts I see and high light the parts I'm struggling with I believe that would be helpful.
Obviously right off the bat you have the name of the functions and the parameters that are being passed in but one question I have on that part is why are we just passing in a variable to cubiclist but for min we pass in (x::xs)? Is it because the map function is automatically applying the function to each part in the map? Also along with that will the fold functions typically take the x::xs parameters while map will just take a variable?
Then we have the higher order function along with the anonymous functions with the logic/operations that we want to apply to each element in the list. But the parameters being passed in for the foldr anonymous function I'm not quite sure about. I understand we are trying to capture the lowest element in the list and the then a else b is returning either a or b to be compared with the other elements in the list. I'm pretty sure that they are rutnred and treated as a in future comparisons but where do we get the following b's from? Where do we say b is the next element in the list?
Then the part that I really don't understand and have no clue is the L; and x xs; at the end of the respective functions. Why are they there? What are they doing? what is their purpose? is it just syntax or is there actually a purpose for them being there, not saying that syntax isn't a purpose or a valid reason, but does they actually do something? Are those variables that can be changed out with something else that would provide a different answer?
Any help/explanation is much appreciated.
In addition to what #molbdnilo has already stated, it can be helpful to a newcomer to functional programming to think about what we're actually doing when we crate a loop: we're specifying a piece of code to run repeatedly. We need an initial state, a condition for the loop to terminate, and an update between each iteration.
Let's look at simple implementation of map.
fun map f [] = []
| map f (x :: xs) = f x :: map f xs
The initial state of the contents of the list.
The termination condition is the list is empty.
The update is that we tack f x onto the front of the result of mapping f to the rest of the list.
The usefulness of map is that we abstract away f. It can be anything, and we don't have to worry about writing the loop boilerplate.
Fold functions are both more complex and more instructive when comparing to loops in procedural languages.
A simple implementation of fold.
fun foldl f init [] = init
| foldl f init (x :: xs) = foldl f (f init x) xs
We explicitly provide an initial value, and a list to operate on.
The termination condition is the list being empty. If it is, we return the initial value provided.
The update is to call the function again. This time the initial value is updated, and the list is the tail of the original.
Consider summing a list of integers.
foldl op+ 0 [1,2,3,4]
foldl op+ 1 [2,3,4]
foldl op+ 3 [3,4]
foldl op+ 6 [4]
foldl op+ 10 []
10
Folds are important to understand because so many fundamental functions can be implemented in terms of foldl or foldr. Think of folding as a means of reducing (many programming languages refer to these functions as "reduce") a list to another value of some type.
map takes a function and a list and produces a new list.
In map (fn x=> x*x*x) L, the function is fn x=> x*x*x, and L is the list.
This list is the same list as cubiclist's parameter.
foldr takes a function, an initial value, and a list and produces some kind of value.
In foldr (fn (a,b) => if (a < b) then a else b) x xs, the function is fn (a,b) => if (a < b) then a else b, the initial value is x, and the list is xs.
x and xs are given to the function by pattern-matching; x is the argument's head and xs is its tail.
(It follows from this that min will fail if it is given an empty list.)

SML: Counting nodes

My assignment is to write a function that will compute the size of a binary tree. This is the implementation of the tree structure:
datatype 'a bin_tree =
Leaf of 'a
| Node of 'a bin_tree (* left tree *)
* int (* size of left tree *)
* int (* size of right tree *)
* 'a bin_tree (* right tree *)
I was given this template from my professor:
fun getSize Empty = 0
| getSize (Leaf _) = 1
| getSize (Node(t1,_,t2)) = getSize t1 + getSize t2;
I was wondering if I need to manipulate this to agree with my tree structure in order to get it to work?
The 'a bin_tree type memoizes the size of each sub-tree. So if you're allowed to assume that the size that is stored is correct, you can return the size of a tree without recursion.
The template given by your professor is not for this type, but for another tree type that does not memoize the size. It demonstrates how you can calculate the size for such a tree by pattern matching and recursion, both language features of which you need to also use.
So the task is for you to write an entirely different function for the 'a bin_tree type. You have to figure out what the right way to pattern match is. First off, the template for getSize does not add up: There are three cases with three constructors, Empty, Leaf x and Node (L, x, R). But the 'a bin_tree type only has two constructors, Leaf x and Node (L, sizeL, sizeR, R).
So you want to read up on how to perform pattern matching on data types.

Generate infinite list from function results

I have a function that takes an integer and returns a list of integers.
How do I efficiently map this function to an initial integer, then for each item of the resulting list that has not be previously mapped, apply the same function and essentially generate an infinite list.
E.g.
f :: Int -> [Int]
f 0 = [1,2]++(f 1)++(f 2)
Additionally, I need to be able to index the resulting list up to 10E10. How would this be optimised? memoization?
You want a breadth-first search. The basic idiom goes like this:
bfs :: (a -> [a]) -> [a] -> [a]
bfs f xs = xs ++ bfs f (concatMap f xs)
Notice how we keep the current "state" in the argument xs, output it and then recursively call with a new state which is f applied to each element of the input state.
If you want to filter out elements you haven't seen before, you need to also pass along some extra state keeping track of which elements you've seen, e.g. a Data.Set, and adjust the algorithm accordingly. I'll leave that bit to you because I'm an irritating pedagogue.

Find the deepest element of a Binary Tree in SML

This is a homework question.
My question is simple: Write a function btree_deepest of type 'a btree -> 'a list that returns the list of the deepest elements of the tree. If the tree is empty, then deepest should return []. If there are multiple elements of the input tree at the same maximal depth, then deepest should return a list containing those deepest elements, ordered according to a preorder traversal. Your function must use the provided btree_reduce function and must not be recursive.
Here is my code:
(* Binary tree datatype. *)
datatype 'a btree = Leaf | Node of 'a btree * 'a * 'a btree
(* A reduction function. *)
(* btree_reduce : ('b * 'a * 'b -> 'b) -> 'b -> 'a tree -> 'b) *)
fun btree_reduce f b bt =
case bt of
Leaf => b
| Node (l, x, r) => f (btree_reduce f b l, x, btree_reduce f b r)
(* btree_size : 'a btree -> int *)
fun btree_size bt =
btree_reduce (fn(x,a,y) => x+a+y) 1 bt
(* btree_height : 'a btree -> int *)
fun btree_height bt =
btree_reduce (fn(l,n,r) => Int.max(l, r)+1) 0 bt
I know that I have to create a function to pass to btree_reduce to build the list of deepest elements and that is where I am faltering.
If I were allowed to use recursion then I would just compare the heights of the left and right node then recurse on whichever branch was higher (or recurse on both if they were the same height) then return the current element when the height is zero and throw these elements into a list.
I think I just need a push in the right direction to get started...
Thanks!
Update:
Here is an attempt at a solution that doesn't compile:
fun btree_deepest bt =
let
val (returnMe, height) = btree_reduce (fn((left_ele, left_dep),n,(right_ele, right_dep)) =>
if left_dep = right_dep
then
if left_dep = 0
then ([n], 1)
else ([left_ele::right_ele], left_dep + 1)
else
if left_dep > right_dep
then (left_ele, left_dep+1)
else (right_ele, right_dep+1)
)
([], 0) bt
in
returnMe
end
In order to get the elements of maximum depth, you will need to keep track of two things simultaneously for every subtree visited by btree_reduce: The maximum depth of that subtree, and the elements found at that depth. Wrap this information up in some data structure, and you have your type 'b (according to btree_reduce's signature).
Now, when you need to combine two subtree results in the function you provide to btree_reduce, you have three possible cases: "Left" sub-result is "deeper", "less deep", or "of equal depth" to the "right" sub-result. Remember that the sub-result represent the depths and node values of the deepest nodes in each subtree, and think about how to combine them to gain the depth and the values of the deepest nodes for the current tree.
If you need more pointers, I have an implementation of btree_deepest ready which I'm just itching to share; I've not posted it yet since you specifically (and honorably) asked for hints, not the solution.
Took a look at your code; it looks like there is some confusion based on whether X_ele are single elements or lists, which causes the type error. Try using the "#" operator in your first 'else' branch above:
if left_dep = 0
then ([n], 1)
else (left_ele # right_ele, left_dep + 1)

binary search tree for finding more than one object

I've just read about binary search trees from the "Learn You a Haskell" book, and I'm wondering whether it is effective to search more than one element using this tree? For example, suppose I have a bunch of objects where every object has some index, and
5
/ \
3 7
/ \ / \
1 4 6 8
if I need to find an element by index 8, I need to do only three steps 5 -> 7 -> 8, instead of iterating over the whole list until the end. But what if I need to find several objects, say 1, 4, 6, 8? It seems like I'd need to repeat the same action for each element 5-> 3 -> 1 5 -> 3 -> 4, 5 -> 7 -> 6 and 5 -> 7 -> 8.
So my question is: does it still make sense to use binary search tree for finding more than one element? Could it be better than checking each element for condition (which leads only to O(n) in the worst case)?
Also, what kind of data structure is better to use if I need to check more than one attribute. E.g. in the example above, I was looking only for the id attribute, but what if I also need to search by name, or color, etc?
You can share some of the work. See members, which takes in a list of values and outputs a list of exactly those values of the input list that are in the tree. Note: The order of the input list is not perserved in the output list.
EDIT: I'm actually not sure if you can get better performance (from a theoretical standpoint) with members over doing map member. I think that if the input list is sorted, then you could by splitting the list in threes (lss, eqs, gts) could be done easily.
data BinTree a
= Branch (BinTree a) a (BinTree a)
| Leaf
deriving (Show, Eq, Ord)
empty :: BinTree a
empty = Leaf
singleton :: a -> BinTree a
singleton x = Branch Leaf x Leaf
add :: (Ord a) => a -> BinTree a -> BinTree a
add x Leaf = singleton x
add x tree#(Branch left y right) = case compare x y of
EQ -> tree
LT -> Branch (add x left) y right
GT -> Branch left y (add x right)
member :: (Ord a) => a -> BinTree a -> Bool
member x Leaf = False
member x (Branch left y right) = case compare x y of
EQ -> True
LT -> member x left
GT -> member x right
members :: (Ord a) => [a] -> BinTree a -> [a]
members xs Leaf = []
members xs (Branch left y right) = eqs ++ members lts left ++ members gts right
where
comps = map (\x -> (compare x y, x)) xs
grab ordering = map snd . filter ((ordering ==) . fst)
eqs = grab EQ comps
lts = grab LT comps
gts = grab GT comps
A quite acceptable solution when searching for multiple elements is to search for them one at a time with the most efficient algorithm (which is O(log n) in your case). However, it can be quite advantageous to step through the entire tree and pool all the elements that match a certain condition, it really depends on where and how often you search inside your code. If you only search at one point in your code it would make sense to collect all the elements in the tree in one shot instead of searching for them one by one. If you decide to opt for that solution then you could feasibly use other data structures such as a list.
If you need to check for multiple attributes I suggest replacing "id" with a tuple containing all the different possible identifiers (id, color, ...). You can then unpack the tuple and compare whichever identifiers you want.
Assuming your binary tree is balanced, if you have a constant number k of search items, then k searches with a total time of O(k * log(n)) is still better than a single O(n) search, where at each character, you still have to do k comparisons, making it O(k*n). Even if the list of search items is sorted, and you can binary search in O(log(k)) time to see if your current item is a match, you're still at O(n * log(k)), which is worse than the tree unless k is Theta(n).
No.
A single search is O(log n). 4 searchs is (4 log n). A linear search, which would pick up all items, is O(n). The tree structure of a btree means finding more than one datum requires a walk (which is actually worse than a list walk).

Resources