What is the Zipper data structure and should I be using it? - data-structures

The question is simple: I cannot understand the Zipper data structure.
My question is related to its uses with a Tree.
I want to understand how can I change the tree node using zipper. And how not to copy the whole tree (or the most part of it).
Please, clarify if I'm wrong with zipper. Maybe it cannot help with the tree update?
Or, maybe, it is possible to update the tree and I just cannot see the way?

Let's start with the Zipper-analog for lists. If you'd like to modify the nth element of a list, it takes O(n) because you have to copy the n-1 first elements. Instead, you can keep the list as a structure ((first n-1 elements reversed) nth element (remaining elements)). For example, the list (1 2 3 4 5 6) modifiable at 3 would be represented as ((2 1) 3 (4 5 6)). Now, you can easily change the 3 to something else. You can also easily move the focus left ((1) 2 (3 4 5 6)) and right ((3 2 1) 4 (5 6)).
A zipper is the same idea applied to trees. You represent a certain focus in the tree plus a context (up to parents, down to children) which gives you the whole tree in a form where it's easily modifiable at the focus and it's easy to move the focus up and down.

Here is a very nice article explaining using the zipper for a tiling window manager in Haskell. The Wikipedia article is not a good reference.
In short, the zipper is a pointer or handle to a particular node in a tree or list structure. The zipper gives a natural way of taking a tree structure and treating it as though the tree was "picked up" by the focused node - in effect, you get a second tree without requiring additional copies made of the original tree or affecting other users of the tree.
The example given shows how you have the windows originally sorted by location on the screen, and then to model focus you use a zipper pointed at the focus window. You get a nice set of O(1) operations such as insert and delete without having to special case the focus window or write additional code.

Learn You a Haskell also has a great chapter about zippers.

The code focuses on a cell like this picture shows. There are areas above,below, to the left and to the right. We move over this grid. The focus is the green square.
Basic Haskell
type cell = { alive : bool ; column : int ; row : int }
;;
type grid = {gamegrid : cell list list}
;;
type gridzipper =
{ above : grid
; below : grid
; left : cell list
; right : cell list
; focus : cell }
let left g =
match g.left with
[] -> None
| hd::tl -> let newgridzipper = { g with focus = hd; left = tl; right = g.right # [g.focus] } in
Some(newgridzipper)
;;
The left function moves the focus to the left. Similarly the other functions shift the focus to other grid cells.

Related

How to determine if two binary trees are equal or different

The picture below is the case of different binary trees that can be made with 3 nodes.
But why is the following case not included in the number of cases?
Is it the same case as the third case from the left in the picture above? If so, I think the parent-child relationship will be different.
I'd appreciate it if you could tell me how to determine how binary trees are equal to or different from each other.
You mean 3 nodes. Your case is not included because the cases all use A as the root node, so it is easier to demonstrate the different possible combinations using always the same elements in the same order, i.e. A as root, then B -> C on top and symmetrically C -> B at the bottom.
Using B or C as the root, you could achieve the same number of variations. This number can be computed using the following formula:
In this case, n=3, so F(0)*F(2) + F(1)*F(1) + F(2)*F(0) = 2 + 1 + 2 = 5
F(0) = 1 (empty is considered one variation)
F(1) = 1
F(2) = 2
So in your picture, only one row is actually relevant if that is supposed to be a binary search tree. Also note that there is a difference between a binary tree and a binary search tree. From first article below:
As we know, the BST is an ordered data structure that allows no
duplicate values. However, Binary Tree allows values to be repeated
twice or more. Furthermore, Binary Tree is unordered.
References
https://www.baeldung.com/cs/calculate-number-different-bst
https://en.wikipedia.org/wiki/Binary_tree#Using_graph_theory_concepts
https://encyclopediaofmath.org/wiki/Binary_tree

Binary tree without pointers

Below is a representation of a binary tree that I use in my project. In the bottom are the leaf nodes (orange boxes), and every level is the sum of the children below.
So, 3 on the leftmost node is the sum of 1 and 2 (it's left and right children), 10 is the sum of 3 and 7 (again left and right children).
What I am trying to do is, store this tree in a flat array without using any pointers. So this array is basically an integer array, holding 2n-1 nodes (n is the number of the leaf nodes).
So the index of the root element is 0 (let's call it p), and the index of it's left child is 2p+1, index of the right child is 2p+2. Please see Binary Tree (Array implementation)
Everything works nicely if I know the number of leaf values beforehand but I can't seem to find a way to store this tree in a dynamically expanding array.
If I need to add 9 for example as the 9th element to the array, the structure needs to change and I need to recalculate all the indices again which I refrain because there may be hundreds of thousand of elements in the array at any time.
Does anyone know of an implementation that handles dynamic arrays with this implementation?
EDIT:
Below is the demonstration of what happens when I add new elements to the array. 36 was the root before, now it's a second level element and the new root array[0] is 114, which triggers a new layout.

Which element is the 'middle' in a B-Tree of even order?

If I have an B-Tree of order 4 with the following data in it...
and I need to add 2 to the tree; do I...
add the 2 to the node (making it invalid, as it now has 4 keys), then split the node, taking the value 2 as the middle value and propagating it up
OR
do I not add the 2, take 3 as the middle value, propagate 3 up, then add 2 into the correct node?
Excuse the poor diagram.
You perform the first option. For a B-tree of any order you always add the node then perform splits that propagate upwards. For a great interactive demonstration of a variety of basic (insert, delete, search) operations on data structures, there is a useful algorithm visualization page I go to located here. Find the B-tree page and you will find that it performs option 1.
How to find which element to push upward:
1)Push the element in proper position of Btree and check if overflow occurs.
If then follow steps 2 and 3 given below.
2)find CEILING((order of Btree+1)/2).
3)Move that index element upward giving two pointers to left and right subtree.
Note:First insert the element then follow steps 2 and 3 if overflow occurs.
Here in this example first insert 2.
The partial leaf of the tree becomes |1| 2| 3| 5|.
overflow occurs because only 3 keys can there be in any node.
Find ceiling ((4+1)/2)= ceiling(5/2)= 3 (index no)
3rd index value 3 is the middle element. so propagate it up. 3's left pointer points to 1|2 and right points to 5.

How I can iterate over a binary tree without using pointers or references?

One can iterate over a list without using pointers or references.
In some cases this removes the need for actually having the list.
Consider the following code,
int i;
for (i = 0; i < 10; ++i)
printf("%i ", i % 2);
It directly outputs the list 0 1 0 1 0 1 0 1 0 1 without actually storing the list in memory.
How can one do a similar thing with binary trees?
Please show a way to implement tree_iterator, new_root_tree_iterator and, traverse_tree_iterator for the following code where upper_boundary_of_tree is something analogous to the number 10 in the list example and that I don't know how to define.
People have been having trouble with what I mean by upper_boundary_of_tree. The upper boundary of the tree would not be represented by the number 10. I don't know how to represent some upper boundary of a tree. This is part of the question. The upper boundary of the tree is similar to how the number 10 is used in the list code above, as in it does the same function, marking where to stop iterating but, it is very definitely not the same thing.
If you need to you can have a free_tree_iterator function as well.
tree_iterator i = new_root_tree_iterator();
while (traverse_tree_iterator(&i, upper_boundary_of_tree))
foobar(i);
This has been bothering me for a while.
You cannot iterate over non-existing data. What is shown in the first example is a loop that is printing either 0 or 1 ten times. The idea of iterating through the array (or any other data structure) is to handle each element in some sort of way, e.g calculate sum of array elements.
In other words, first example is emulating iteration over array of 10 elements, which are zeros and ones, one following another. So, on each iteration you are predicting the value based on its index.
If you want to calculate index of the children nodes in a binary tree, you can use heap distribution formula: 2n+1 and 2n+2, would calculate index of left and right nodes respectively, where n is a 0-based index. Based on the index calculated, you can emulate node's value.
To print the notation for a tree structure without constructing a tree, analogously to the way you print a sequence instead of constructing a list, you first have to decide what your notation will look like.
Can you write down an example?
For instance, a Lisp-like notation?
((1 2) (3 4)) ;; that's a tree
Also, given an upper bound of 10, meaning there will be ten numbered leaf nodes, which of the many possible binary trees are you supposed to print?
A straight list is a degenerate binary tree, so actually your simple for loop basically satisfies the homework problem.

Are duplicate keys allowed in the definition of binary search trees?

I'm trying to find the definition of a binary search tree and I keep finding different definitions everywhere.
Some say that for any given subtree the left child key is less than or equal to the root.
Some say that for any given subtree the right child key is greater than or equal to the root.
And my old college data structures book says "every element has a key and no two elements have the same key."
Is there a universal definition of a bst? Particularly in regards to what to do with trees with multiple instances of the same key.
EDIT: Maybe I was unclear, the definitions I'm seeing are
1) left <= root < right
2) left < root <= right
3) left < root < right, such that no duplicate keys exist.
Many algorithms will specify that duplicates are excluded. For example, the example algorithms in the MIT Algorithms book usually present examples without duplicates. It is fairly trivial to implement duplicates (either as a list at the node, or in one particular direction.)
Most (that I've seen) specify left children as <= and right children as >. Practically speaking, a BST which allows either of the right or left children to be equal to the root node, will require extra computational steps to finish a search where duplicate nodes are allowed.
It is best to utilize a list at the node to store duplicates, as inserting an '=' value to one side of a node requires rewriting the tree on that side to place the node as the child, or the node is placed as a grand-child, at some point below, which eliminates some of the search efficiency.
You have to remember, most of the classroom examples are simplified to portray and deliver the concept. They aren't worth squat in many real-world situations. But the statement, "every element has a key and no two elements have the same key", is not violated by the use of a list at the element node.
So go with what your data structures book said!
Edit:
Universal Definition of a Binary Search Tree involves storing and search for a key based on traversing a data structure in one of two directions. In the pragmatic sense, that means if the value is <>, you traverse the data structure in one of two 'directions'. So, in that sense, duplicate values don't make any sense at all.
This is different from BSP, or binary search partition, but not all that different. The algorithm to search has one of two directions for 'travel', or it is done (successfully or not.) So I apologize that my original answer didn't address the concept of a 'universal definition', as duplicates are really a distinct topic (something you deal with after a successful search, not as part of the binary search.)
If your binary search tree is a red black tree, or you intend to any kind of "tree rotation" operations, duplicate nodes will cause problems. Imagine your tree rule is this:
left < root <= right
Now imagine a simple tree whose root is 5, left child is nil, and right child is 5. If you do a left rotation on the root you end up with a 5 in the left child and a 5 in the root with the right child being nil. Now something in the left tree is equal to the root, but your rule above assumed left < root.
I spent hours trying to figure out why my red/black trees would occasionally traverse out of order, the problem was what I described above. Hopefully somebody reads this and saves themselves hours of debugging in the future!
All three definitions are acceptable and correct. They define different variations of a BST.
Your college data structure's book failed to clarify that its definition was not the only possible.
Certainly, allowing duplicates adds complexity. If you use the definition "left <= root < right" and you have a tree like:
3
/ \
2 4
then adding a "3" duplicate key to this tree will result in:
3
/ \
2 4
\
3
Note that the duplicates are not in contiguous levels.
This is a big issue when allowing duplicates in a BST representation as the one above: duplicates may be separated by any number of levels, so checking for duplicate's existence is not that simple as just checking for immediate childs of a node.
An option to avoid this issue is to not represent duplicates structurally (as separate nodes) but instead use a counter that counts the number of occurrences of the key. The previous example would then have a tree like:
3(1)
/ \
2(1) 4(1)
and after insertion of the duplicate "3" key it will become:
3(2)
/ \
2(1) 4(1)
This simplifies lookup, removal and insertion operations, at the expense of some extra bytes and counter operations.
In a BST, all values descending on the left side of a node are less than (or equal to, see later) the node itself. Similarly, all values descending on the right side of a node are greater than (or equal to) that node value(a).
Some BSTs may choose to allow duplicate values, hence the "or equal to" qualifiers above. The following example may clarify:
14
/ \
13 22
/ / \
1 16 29
/ \
28 29
This shows a BST that allows duplicates(b) - you can see that to find a value, you start at the root node and go down the left or right subtree depending on whether your search value is less than or greater than the node value.
This can be done recursively with something like:
def hasVal (node, srchval):
if node == NULL:
return false
if node.val == srchval:
return true
if node.val > srchval:
return hasVal (node.left, srchval)
return hasVal (node.right, srchval)
and calling it with:
foundIt = hasVal (rootNode, valToLookFor)
Duplicates add a little complexity since you may need to keep searching once you've found your value, for other nodes of the same value. Obviously that doesn't matter for hasVal since it doesn't matter how many there are, just whether at least one exists. It will however matter for things like countVal, since it needs to know how many there are.
(a) You could actually sort them in the opposite direction should you so wish provided you adjust how you search for a specific key. A BST need only maintain some sorted order, whether that's ascending or descending (or even some weird multi-layer-sort method like all odd numbers ascending, then all even numbers descending) is not relevant.
(b) Interestingly, if your sorting key uses the entire value stored at a node (so that nodes containing the same key have no other extra information to distinguish them), there can be performance gains from adding a count to each node, rather than allowing duplicate nodes.
The main benefit is that adding or removing a duplicate will simply modify the count rather than inserting or deleting a new node (an action that may require re-balancing the tree).
So, to add an item, you first check if it already exists. If so, just increment the count and exit. If not, you need to insert a new node with a count of one then rebalance.
To remove an item, you find it then decrement the count - only if the resultant count is zero do you then remove the actual node from the tree and rebalance.
Searches are also quicker given there are fewer nodes but that may not be a large impact.
For example, the following two trees (non-counting on the left, and counting on the right) would be equivalent (in the counting tree, i.c means c copies of item i):
__14__ ___22.2___
/ \ / \
14 22 7.1 29.1
/ \ / \ / \ / \
1 14 22 29 1.1 14.3 28.1 30.1
\ / \
7 28 30
Removing the leaf-node 22 from the left tree would involve rebalancing (since it now has a height differential of two) the resulting 22-29-28-30 subtree such as below (this is one option, there are others that also satisfy the "height differential must be zero or one" rule):
\ \
22 29
\ / \
29 --> 28 30
/ \ /
28 30 22
Doing the same operation on the right tree is a simple modification of the root node from 22.2 to 22.1 (with no rebalancing required).
In the book "Introduction to algorithms", third edition, by Cormen, Leiserson, Rivest and Stein, a binary search tree (BST) is explicitly defined as allowing duplicates. This can be seen in figure 12.1 and the following (page 287):
"The keys in a binary search tree are always stored in such a way as to satisfy the binary-search-tree property: Let x be a node in a binary search tree. If y is a node in the left subtree of x, then y:key <= x:key. If y is a node in the right subtree of x, then y:key >= x:key."
In addition, a red-black tree is then defined on page 308 as:
"A red-black tree is a binary search tree with one extra bit of storage per node: its color"
Therefore, red-black trees defined in this book support duplicates.
Any definition is valid. As long as you are consistent in your implementation (always put equal nodes to the right, always put them to the left, or never allow them) then you're fine. I think it is most common to not allow them, but it is still a BST if they are allowed and place either left or right.
I just want to add some more information to what #Robert Paulson answered.
Let's assume that node contains key & data. So nodes with the same key might contain different data.
(So the search must find all nodes with the same key)
left <= cur < right
left < cur <= right
left <= cur <= right
left < cur < right && cur contain sibling nodes with the same key.
left < cur < right, such that no duplicate keys exist.
1 & 2. works fine if the tree does not have any rotation-related functions to prevent skewness.
But this form doesn't work with AVL tree or Red-Black tree, because rotation will break the principal.
And even if search() finds the node with the key, it must traverse down to the leaf node for the nodes with duplicate key.
Making time complexity for search = theta(logN)
3. will work well with any form of BST with rotation-related functions.
But the search will take O(n), ruining the purpose of using BST.
Say we have the tree as below, with 3) principal.
12
/ \
10 20
/ \ /
9 11 12
/ \
10 12
If we do search(12) on this tree, even tho we found 12 at the root, we must keep search both left & right child to seek for the duplicate key.
This takes O(n) time as I've told.
4. is my personal favorite. Let's say sibling means the node with the same key.
We can change above tree into below.
12 - 12 - 12
/ \
10 - 10 20
/ \
9 11
Now any search will take O(logN) because we don't have to traverse children for the duplicate key.
And this principal also works well with AVL or RB tree.
Working on a red-black tree implementation I was getting problems validating the tree with multiple keys until I realized that with the red-black insert rotation, you have to loosen the constraint to
left <= root <= right
Since none of the documentation I was looking at allowed for duplicate keys and I didn't want to rewrite the rotation methods to account for it, I just decided to modify my nodes to allow for multiple values within the node, and no duplicate keys in the tree.
Those three things you said are all true.
Keys are unique
To the left are keys less than this one
To the right are keys greater than this one
I suppose you could reverse your tree and put the smaller keys on the right, but really the "left" and "right" concept is just that: a visual concept to help us think about a data structure which doesn't really have a left or right, so it doesn't really matter.
1.) left <= root < right
2.) left < root <= right
3.) left < root < right, such that no duplicate keys exist.
I might have to go and dig out my algorithm books, but off the top of my head (3) is the canonical form.
(1) or (2) only come about when you start to allow duplicates nodes and you put duplicate nodes in the tree itself (rather than the node containing a list).
Duplicate Keys
• What happens if there's more than one data item with
the same key?
– This presents a slight problem in red-black trees.
– It's important that nodes with the same key are distributed on
both sides of other nodes with the same key.
– That is, if keys arrive in the order 50, 50, 50,
• you want the second 50 to go to the right of the first one, and the
third 50 to go to the left of the first one.
• Otherwise, the tree becomes unbalanced.
• This could be handled by some kind of randomizing
process in the insertion algorithm.
– However, the search process then becomes more complicated if
all items with the same key must be found.
• It's simpler to outlaw items with the same key.
– In this discussion we'll assume duplicates aren't allowed
One can create a linked list for each node of the tree that contains duplicate keys and store data in the list.
The elements ordering relation <= is a total order so the relation must be reflexive but commonly a binary search tree (aka BST) is a tree without duplicates.
Otherwise if there are duplicates you need run twice or more the same function of deletion!

Resources