The right way to use a data structure in OCaml - data-structures

Ok, I have written a binary search tree in OCaml.
type 'a bstree =
|Node of 'a * 'a bstree * 'a bstree
|Leaf
let rec insert x = function
|Leaf -> Node (x, Leaf, Leaf)
|Node (y, left, right) as node ->
if x < y then
Node (y, insert x left, right)
else if x > y then
Node (y, left, insert x right)
else
node
I guess the above code does not have problems.
When using it, I write
let root = insert 4 Leaf
let root = insert 5 root
...
Is this the correct way to use/insert to the tree?
I mean, I guess I shouldn't declare the root and every time I again change the variable root's value, right?
If so, how can I always keep a root and can insert a value into the tree at any time?

This looks like good functional code for inserting into a tree. It doesn't mutate the tree during insertion, but instead it creates a new tree containing the value. The basic idea of immutable data is that you don't "keep" things. You calculate values and pass them along to new functions. For example, here's a function that creates a tree from a list:
let tree_of_list l = List.fold_right insert l Leaf
It works by passing the current tree along to each new call to insert.
It's worth learning to think this way, as many of the benefits of FP derive from the use of immutable data. However, OCaml is a mixed-paradigm language. If you want to, you can use a reference (or mutable record field) to "keep" a tree as it changes value, just as in ordinary imperative programming.
Edit:
You might think the following session shows a modification of a variable x:
# let x = 2;;
val x : int = 2
# let x = 3;;
val x : int = 3
#
However, the way to look at this is that these are two different values that happen to both be named x. Because the names are the same, the old value of x is hidden. But if you had another way to access the old value, it would still be there. Maybe the following will show how things work:
# let x = 2;;
val x : int = 2
# let f () = x + 5;;
val f : unit -> int = <fun>
# f ();;
- : int = 7
# let x = 8;;
val x : int = 8
# f ();;
- : int = 7
#
Creating a new thing named x with the value 8 doesn't affect what f does. It's still using the same old x that existed when it was defined.
Edit 2:
Removing a value from a tree immutably is analogous to adding a value. I.e., you don't actually modify an existing tree. You create a new tree without the value that you don't want. Just as inserting doesn't copy the whole tree (it re-uses large parts of the previous tree), so deleting won't copy the whole tree either. Any parts of the tree that aren't changed can be re-used in the new tree.
Edit 3
Here's some code to remove a value from a tree. It uses a helper function that adjoins two trees that are known to be disjoint (furthermore all values in a are less than all values in b):
let rec adjoin a b =
match a, b with
| Leaf, _ -> b
| _, Leaf -> a
| Node (v, al, ar), _ -> Node (v, al, adjoin ar b)
let rec delete x = function
| Leaf -> Leaf
| Node (v, l, r) ->
if x = v then adjoin l r
else if x < v then Node (v, delete x l, r)
else Node (v, l, delete x r)
(Hope I didn't just spoil your homework!)

Related

Insert children from a list to the the haskell rose tree leaves

I have the following rose tree in haskell:
data R = B [R] -- B for Branch
I am trying to create the following function :
append :: R -> [R] -> R
This functions is supposed to take as input a rose tree (R) and a list of a rose trees [R] (assuming that number of leaves of R is equal to the number of elements from the list so each leaf gets a new branch) and add each element from [R] to the R and then return the final rose trees.
An example would be
append B[B[B[],B[]],B[],B[]] [ B[B[],B[]], B[], B[B[]], B[] ] =
B[B[B[B[B[],B[]]],B[B[]]],B[B[B[]]],B[B[]]]
The list in this case consist of 4 elements specifically :
B[B[],B[]]
B[]
B[B[]]
B[]
And each one got inserted accordingly
I managed to partially implement the solution:
append (B []) (l:list) = l
append (B [x]) list = append x list
However these steps just add the first element from the list to each leaf! So how could I implement this functionality so I can move around the list?

Having troubles implementing Dijkstra algorithm in OCaml

I'm trying to implement Dijkstra's algorithm in OCaml, this is the pseudo-code I'm following :
So far, I've written those functions, putting everything in it would be too long and maybe not necessary at this point. But I'll give their type.
(* init_dist build the distances (distance Map) between the nodes given a nodes list. *)
let init_dist nodes source
val init_dist : node list -> node -> float DistMap.t = <fun>
(* min_elt_and_key return a (node,float) tuple, by finding the minimum key and returning its key with it. *)
let min_elt_and_key map ~compare_element
val min_elt_and_key : 'a DistMap.t -> compare_element:('a -> 'a -> int) -> node * 'a = <fun>
(* update_distances updates dMap and prevMap based on n1 and n2, returns a tuple of Map *)
let update_distances n1 n2 dMap prevMap
(* find_minimum returns the minimum node based on the distance map. *)
let find_minimum nodes_Q first_node map
val find_minimum : node list -> node -> float DistMap.t -> node = <fun>
And here's where I am at.
let dijkstra graph source target =
let nodes = Graph.nodes in
let distanceMap = init_dist nodes source in
let prevMap = DistMap.empty in
let nodes_Q = Graph.nodes in
let rec dijkstra_aux dMap pMap nodes_Q target =
(Graph.nodes has type node list)
My trouble here is that I don't know how I would be able to look for the minimum by going through nodes_Q, remove it, and keep going. Loops have unit type in OCaml so doing imperative programming here would be quit hard, plus the end goal is to return a tuple of map (dist and prev). Plus another problem would be raised, how could I code something for the for loop in the while ? I basically wrote every function necessary to write the algorithm, but assembling them isn't an easy task.
I've seen this link : http://rosettacode.org/wiki/Dijkstra%27s_algorithm#OCaml
However, I've only started coding in OCaml a few months back and my data structures seems fairly different than the ones I have.
What would you recommend me to achieve that?
Thanks.
The way to think about an imperative loop in a functional language is to re-imagine the body of the loop as a function. Any values manipulated in the loop are parameters of the function. To do the next iteration of the loop, the function calls itself recursively with new values of the parameters.
The values manipulated in your case are Q, dist, and prev. So you can imagine the final loop in your pseudo-code as something like this:
let rec dijkstra_aux q dist prev =
if empty q then
(dist, prev)
else
let min_elt = smallest_element_of q in
let q' = remove_element q min_elt in
let (dist', prev') = update_neighbors q dist prev in
dijkstra_aux q' dist' prev'

Leftist heap two version create implementation

Recently, I am reading the book Purely-functional-data-structures
when I came to “Exercise 3.2 Define insert directly rather than via a call to merge” for Leftist_tree。I implement a my version insert.
let rec insert x t =
try
match t with
| E -> T (1, x, E, E)
| T (_, y, left, right ) ->
match (Elem.compare x y) with
| n when n < 0 -> makeT x left (insert y right)
| 0 -> raise Same_elem
| _ -> makeT y left (insert x right)
with
Same_elem -> t
And for verifying if it works, I test it and the merge function offered by the book.
let rec merge m n = match (m, n) with
| (h, E) -> h
| (E, h) -> h
| (T (_, x, a1, b1) as h1, (T (_, y, a2, b2) as h2)) ->
if (Elem.compare x y) < 0
then makeT x a1 (merge b1 h2)
else makeT y a2 (merge b2 h1)
Then I found an interesting thing.
I used a list ["a";"b";"d";"g";"z";"e";"c"] as input to create this tree. And the two results are different.
For merge method I got a tree like this:
and insert method I implemented give me a tree like this :
I think there's some details between the two methods even though I follow the implementation of 'merge' to design the 'insert' version. But then I tried a list inverse ["c";"e";"z";"g";"d";"b";"a"] which gave me two leftist-tree-by-insert tree. That really confused me so much that I don't know if my insert method is wrong or right. So now I have two questions:
if my insert method is wrong?
are leftist-tree-by-merge and leftist-tree-by-insert the same structure? I mean this result give me an illusion like they are equal in one sense.
the whole code
module type Comparable = sig
type t
val compare : t -> t -> int
end
module LeftistHeap(Elem:Comparable) = struct
exception Empty
exception Same_elem
type heap = E | T of int * Elem.t * heap * heap
let rank = function
| E -> 0
| T (r ,_ ,_ ,_ ) -> r
let makeT x a b =
if rank a >= rank b
then T(rank b + 1, x, a, b)
else T(rank a + 1, x, b, a)
let rec merge m n = match (m, n) with
| (h, E) -> h
| (E, h) -> h
| (T (_, x, a1, b1) as h1, (T (_, y, a2, b2) as h2)) ->
if (Elem.compare x y) < 0
then makeT x a1 (merge b1 h2)
else makeT y a2 (merge b2 h1)
let insert_merge x h = merge (T (1, x, E, E)) h
let rec insert x t =
try
match t with
| E -> T (1, x, E, E)
| T (_, y, left, right ) ->
match (Elem.compare x y) with
| n when n < 0 -> makeT x left (insert y right)
| 0 -> raise Same_elem
| _ -> makeT y left (insert x right)
with
Same_elem -> t
let rec creat_l_heap f = function
| [] -> E
| h::t -> (f h (creat_l_heap f t))
let create_merge l = creat_l_heap insert_merge l
let create_insert l = creat_l_heap insert l
end;;
module IntLeftTree = LeftistHeap(String);;
open IntLeftTree;;
let l = ["a";"b";"d";"g";"z";"e";"c"];;
let lh = create_merge `enter code here`l;;
let li = create_insert l;;
let h = ["c";"e";"z";"g";"d";"b";"a"];;
let hh = create_merge h;;
let hi = create_insert h;;
16. Oct. 2015 update
by observing the two implementation more precisely, it is easy to find that the difference consisted in merge a base tree T (1, x, E, E) or insert an element x I used graph which can express more clearly.
So i found that my insert version will always use more complexity to finish his work and doesn't utilize the leftist tree's advantage or it always works in the worse situation, even though this tree structure is exactly “leftist”.
and if I changed a little part , the two code will obtain the same result.
let rec insert x t =
try
match t with
| E -> T (1, x, E, E)
| T (_, y, left, right ) ->
match (Elem.compare x y) with
| n when n < 0 -> makeT x E t
| 0 -> raise Same_elem
| _ -> makeT y left (insert x right)
with
Same_elem -> t
So for my first question: I think the answer is not exact. it can truly construct a leftist tree but always work in the bad situation.
and the second question is a little meaningless (I'm not sure). But it is still interesting for this condition. for instance, even though the merge version works more efficiently but for construct a tree from a list without the need for insert order like I mentioned (["a";"b";"d";"g";"z";"e";"c"], ["c";"e";"z";"g";"d";"b";"a"] , if the order isn't important, for me I think they are the same set.) The merge function can't choose the better solution. (I think the the tree's structure of ["a";"b";"d";"g";"z";"e";"c"] is better than ["c";"e";"z";"g";"d";"b";"a"]'s )
so now my question is :
is the tree structure that each sub-right spine is Empty is a good structure?
if yes, can we always construct it in any input order?
A tree with each sub-right spine empty is just a list. As such a simple list is a better structure for a list. The runtime properties will be the same as a list, meaning inserting for example will take O(n) time instead of the desired O(log n) time.
For a tree you usually want a balanced tree, one where all children of a node are ideally the same size. In your code each node has a rank and the goal would be to have the same rank for the left and right side of each node. If you don't have exactly 2^n - 1 entries in the tree this isn't possible and you have to allow some imbalance in the tree. Usually a difference in rank of 1 or 2 is allowed. Insertion should insert the element on the side with smaller rank and removal has to rebalance any node that exceeds the allowed rank difference. This keeps the tree reasonably balanced, ensuring the desired runtime properties are preserved.
Check your text book what difference in rank is allowed in your case.

Should I avoid constructing in Haskell?

While reading a snipped from Haskell for Great Good I found the following situation:
treeInsert :: (Ord a) => a -> Tree a -> Tree a
treeInsert x EmptyTree = singleton x
treeInsert x (Node a left right)
| x == a = Node x left right
| x < a = Node a (treeInsert x left) right
| x > a = Node a left (treeInsert x right)
Wouldn't it be better for performance if we just reused the given Tree when x == a?
treeInsert :: (Ord a) => a -> Tree a -> Tree a
treeInsert x EmptyTree = singleton x
treeInsert x all#(Node a left right)
| x == a = all
| x < a = Node a (treeInsert x left) right
| otherwise = Node a left (treeInsert x right)
In real life coding, what should I do? Are there any drawbacks when returning the same thing?
Let's look at the core! (Without optimisations here)
$ ghc-7.8.2 -ddump-simpl wtmpf-file13495.hs
The relevant difference is that the first version (without all#(...)) has
case GHC.Classes.> # a_aUH $dOrd_aUV eta_B2 a1_aBQ
of _ [Occ=Dead] {
GHC.Types.False ->
Control.Exception.Base.patError
# (TreeInsert.Tree a_aUH)
"wtmpf-file13495.hs:(9,1)-(13,45)|function treeInsert"#;
GHC.Types.True ->
TreeInsert.Node
# a_aUH
a1_aBQ
left_aBR
(TreeInsert.treeInsert # a_aUH $dOrd_aUV eta_B2 right_aBS)
where reusing the node with that as-pattern does just
TreeInsert.Node
# a_aUI
a1_aBR
left_aBS
(TreeInsert.treeInsert # a_aUI $dOrd_aUW eta_B2 right_aBT);
This is an avoided check that may well make a significant performance difference.
However, this difference has actually nothing to do with the as-pattern. It's just because your first snippet uses a x > a guard, which is not trivial. The second uses otherwise, which is optimised away.
If you change the first snippet to
treeInsert :: (Ord a) => a -> Tree a -> Tree a
treeInsert x EmptyTree = singleton x
treeInsert x (Node a left right)
| x == a = Node x left right
| x < a = Node a (treeInsert x left) right
| otherwise = Node a left (treeInsert x right)
then the difference boils down to
GHC.Types.True -> TreeInsert.Node # a_aUH a1_aBQ left_aBR right_aBS
vs
GHC.Types.True -> wild_Xa
Which is indeed just the difference of Node x left right vs all.
...without optimisations, that is. The versions diverge further when I turn on -O2. But I can't really make out how the performance would differ, there.
In real life coding, what should I do? Are there any drawbacks when returning the same thing?
a == b does not guarantee that f a == f b for all functions f. So, you may have to return new object even if they compare equal.
In other words, it may not be feasible to change Node x left right to Node a left right or all when a == x regardless of performance gains.
For example you may have types which carry meta data. When you compare them for equality, you may only care about the values and ignore the meta data. But if you replace them just because they compare equal then you will loose the meta data.
newtype ValMeta a b = ValMeta (a, b) -- value, along with meta data
deriving (Show)
instance Eq a => Eq (ValMeta a b) where
-- equality only compares values, ignores meta data
ValMeta (a, b) == ValMeta (a', b') = a == a'
The point is Eq type-class only says that you may compare values for equality. It does not guarantee anything beyond that.
A real-world example where a == b does not guarantee that f a == f b is when you maintain a Set of unique values within a self-balancing tree. A self-balancing tree (such as Red-Black tree) has some guarantees about the structure of tree but the actual depth and structure depends on the order that the data were added to or removed from the set.
Now when you compare 2 sets for equality, you want to compare that values within the set are equal, not that the underlying trees have the same exact structure. But if you have a function such as depth which exposes the depth of underlying tree maintaining the set then you cannot guarantee that the depths are equal even if the sets compare equal.
Here is a video of great Philip Wadler realizing live and on-stage that many useful relations do not preserve equality (starting at 42min).
Edit: Example from ghc where a == b does not imply f a == f b:
\> import Data.Set
\> let a = fromList [1, 2, 3, 4, 5, 10, 9, 8, 7, 6]
\> let b = fromList [1..10]
\> let f = showTree
\> a == b
True
\> f a == f b
False
Another real-world example is hash-table. Two hash-tables are equal if and only if their key-value pairs tie out. However, the capacity of a hash-table, i.e. the number of keys you may add before having to re-allocate and rehash, depends on the order of inserts/deletes.
So if you have a function which returns the capacity of hash table, it may return different values for hash-tables a and b even though a == b.
My two cents... perhaps not even about the original question:
Instead of writing guards with x < a and x == a, I would match compare a b against LT, EQ and GT, e.g.:
treeInsert x all#(Node a left right) =
case compare x a of
EQ -> ...
LT -> ...
GT -> ...
I would do this especially if x and a can be complex data structures, since a test like x < a could be expensive.
answer seems to be wrong. I just leave it here, for reference...
With your second function you avoid creating a new node, because the compiler cannot really understand equality (== is just some function.) If you change the first version to
-- version C
treeInsert :: (Ord a) => a -> Tree a -> Tree a
treeInsert x EmptyTree = singleton x
treeInsert x (Node a left right)
| x == a = Node a left right -- Difference here! Changed x to a.
| x < a = Node a (treeInsert x left) right
| x > a = Node a left (treeInsert x right)
the compiler will probably be able to do common subexpression elimination, because the optimizer will be able to see that Node a left right is the same as Node a left right.
On the other hand, I doubt that the compiler can deduce from a == x that Node a left right is the same as Node x left right.
So, I'm pretty sure that under -O2, version B and version C are the same, but version A is probably slower because it does an extra instantiation in the a == x case.
Well, if the first case had used a instead of x as follows, then there's at least the chance that GHC would eliminate the allocation of a new node through common subexpression elimination.
treeInsert x (Node a left right)
| x == a = Node a left right
However, this is all but irrelevant in any non-trivial use case, because the path down the tree to the node is going to be duplicated even when the element already exists. And this path is going to be significantly longer than a single node unless your use case is trivial.
In the world of ML, the fairly idiomatic way to avoid this is to throw a KeyAlreadyExists exception, and then catch that exception at the top-level insertion function and return the original tree. This would cause the stack to be unwound instead of allocating any of the Nodes on the heap.
A direct implementation of the ML idiom is basically a no-no in Haskell, for good reasons. If avoiding this duplication matters, the simplest and possibly best thing to do is to check if the tree contains the key before you insert it.
The downside of this approach, compared to a direct Haskell insert or the ML idiom, is that it involves two traversals of the path instead of one. Now, here is a non-duplicating, single-pass insert you can implement in Haskell:
treeInsert :: Ord a => a -> Tree a -> Tree a
treeInsert x original_tree = result_tree
where
(result_tree, new_tree) = loop x original_tree
loop x EmptyTree = (new_tree, singleton x)
loop x (Node a left right) =
case compare x a of
LT -> let (res, new_left) = loop x left
in (res, Node a new_left right)
EQ -> (original_tree, error "unreachable")
GT -> let (res, new_right) = loop x right
in (res, Node a left new_right)
However, older versions of GHC (roughly 7-10 years ago) don't handle this sort of recursion through lazy pairs of results very efficiently, and in my experience check-before-insert is likely to perform better. I'd be slightly surprised if this observation has really changed in the context of more recent GHC versions.
One can certainly imagine a function that directly constructs (but does not return) a new path for the tree, and decides to return the new path or the original path once it's known whether the element exists already. (The new path would immediately become garbage if it is not returned.) This conforms to the basic principles of the GHC runtime, but isn't really expressible in the source language.
Of course, any completely non-duplicating insertion function on a lazy data structure is going to have different strictness properties than a simple, duplicating insert. So no matter the implementation technique, they are different functions if laziness matters.
But of course, whether or not the path is duplicated may not matter that much. The cases where it would matter the most would be when you are using the tree persistently, because in linear use cases the old path would become garbage immediately after each insertion. And of course, this only matters when you are inserting a significant number of duplicates.

Can I insert data unsorted in Red-black tree?

While I'm still struggling to find a solution for this question, i have another one which maybe is easier. The following is the insert function of Okasaki red-black tree implementation. What I want to do is to keep the data unsorted as i insert into the tree. So the data always go to the leftmost/bottom-most leaf everytime i insert. There is no need to compare for x < y, x > y or x == y. It seems pretty straightforward at first by just removing these guards and only do: ins s#(T color a y b) = balance color (ins a) y b. The behaviour seems to be that the tree is kept balanced but the coloring becomes a bit messed up. And eventually that affects future inserts.. Any idea how this can be achieved? I think this could possibility my first step to my previous question. I just started playing with Haskell, so I am not getting it right straightforward. Thanks a lot.
insertSet x s = T B a y b
where ins E = T R E x E
ins s#(T color a y b) =
if x < y then balance color (ins a) y b
else if x > y then balance color a y (ins b)
else s
['d','a','s','f'] s
/\
a f
/
d (unsorted tree)
you can use my RBTree implementation in haskellDB,
http://hackage.haskell.org/package/RBTree
using the insert function:
insert :: (a -> a -> Ordering) -> RBTree a -> a -> RBTree a
feed it a (\_ _ -> LT) function, then you can always put new element into left-most place.

Resources