Can you guys help me with some homework question I'm stuck in?
A local minimum in a full binary tree is defined as a node which is smaller than all of its neighbors (neighbors = parent, left child, right child).
I need to find a local minimum in a given full binary tree, which each of its nodes has a different number, in O(logn) complixity time.
Well, since the requirement is O(logn) then I tried to think of a way to go only through 1 path through the tree down to a leaf.
Or maybe I could look only at a half of the tree each time in a recursion and this way it will do the logn.
So say I have this in the tree:
70
/ \
77 60
There are 3 cases:
1) the root is smaller than both the left and the right child //then i'm done
2) the root is smaller only than the left
3) the root is smaller only than the right
The above tree is case 2.
So let's "throw away" the left subtree because there's no way the 77 can be a "local minimum" since it's greater than its parent.
So we're left with the right subtree. And so on, until we find the local minimum.
Problem here is, when we throw that left subtree, we could miss another local minimum down below. Here's an example:
70
/ \
77 60
/ \ / \
1 8 9 14
/ \ / \ / \ / \
3 4 5 6 2 7 15 13
So in this case, the only local minimum is "1", but we missed it because at the beginning we decided to search through the right subtree of the root.
By definition, a local min is a node whose value is smaller than that of any other nodes that are joined to it. Thus in your example, '1', '5', '6', '2', '7', '13' are all local minimums.
Once that's clear, the problem is simple.
First we check the root and see if it's smaller than both children. If yes, then we are done. If not, then we pick up its smaller child and recursively apply the check.
We terminate either 1) we found a node that is smaller than both of its children, or 2) we reach the bottom level (i.e. the leaves).
In case 1), the node at which we stop is the local min, because i) it's smaller than both of its children, and ii) it's smaller than its parent, which is the precondition of our deciding to check this node.
In case 2), we are left with two leaves (that are siblings), and at least one of them is smaller than the parent (otherwise the parent will be returned). Then, either (or both) of them are local min, as long as it's smaller than its parent.
Following this approach, at most 2 nodes per level are being looked at, thus requiring only O(log n) checks.
Hope this is helpful and clear enough.
"2" is considered as a local minimum if it has no children, if you want to find one within O(logn). Not possible to find "1" within O(logn)
I believe this can be done in O(N).
Start from the root. Check if it has a child that is smaller than it. If it has, go down to that child. If it doesn't have one, we are done.
Then, repeat this process by continuously choosing the child that is smaller than the current node. Check if it is the local min. If not, repeat.
By doing this, we guarantee that the current node's parent is always larger than itself. If the algorithm stops, there can be only two cases:
The current node is a leaf.
The two children of the current node are both larger than it.
In either cases, the current node is a local min.
EDIT:
Miss the "full" binary tree. Should be log(n).
Related
I understand the idea when deleting a node that has two subtrees: I "erase" the node's value and replace it with either its predecessor from the left subtree's value or its successor from the right subtree's value, and then delete that node.
However, does it matter if I choose the successor from the right subtree or the predecessor from the left subtree? Or is either way valid as long as I still have a binary search tree after performing the deletion?
Both ways to perform a delete operation are valid if the node has two children.
Remember that when you get either the in-order predecessor node or the in-order successor node, you must call the delete operation on that node.
It doesn't matter which one you choose to replace. In fact you may need both.
Look at the following BST.
7
/ \
4 10
/ \ /
1 5 8
\
3
To delete 1, you need to replace 1 with right node 3.
And to delete 10, you need to replace 10 with left node 8.
For exa, this is the tree.
10
12 -1
5 1 1 -2
2 3 10 -9
How to find the node with maximum value?
Given the problem as stated, you need to traverse the entire tree. See proof below.
Traversing the entire tree should be a fairly trivial process.
Proof that we need to traverse the entire tree:
Assume we're able to identify which side of a tree the maximum is on without traversing the entire tree.
Given any tree with the maximum node on the left. Call this maximum x.
Pick one of the leaf nodes on the right. Add 2 children to it: x+1 and -x-1.
Since x+1-x-1 = 0, adding these won't change the sum at the leaf we added it to, thus nor the sums at any other nodes in the tree.
Since this can be added to any leaf in the tree, and it doesn't affect the sums, we'd need to traverse the entire tree to find out if this occurs anywhere.
Thus our assumption that we can identify which side of a tree the maximum is on without traversing the entire tree is incorrect.
Thus we need to traverse the entire tree.
In the general case, you need to traverse the entire tree. If the values in the tree are not constrained (e.g. all non-negative, but in your example there are negative values), then the value in a node tells you nothing about the individual values below it.
I am implementing the disjoint-set datastructure to do union find. I came across the following statement in Wikipedia:
... whenever two trees of the same rank r are united, the rank of the result is r+1.
Why should the rank of the joined tree be increased by only one when the trees are of the same rank? What happens if I simply add the two ranks (i.e. 2*r)?
First, what is rank? It is almost the same as the height of a tree. In fact, for now, pretend that it is the same as the height.
We want to keep trees short, so keeping track of the height of every tree helps us do that. When unioning two trees of different height, we make the root of the shorter tree a child of the root of the taller tree. Importantly, this does not change the height of the taller tree. That is, the rank of the taller tree does not change.
However, when unioning two trees of the same height, we make one root the child of the other, and this increases the height of that overall tree by one, so we increase the rank of that root by one.
Now, I said that rank was almost the same as the height of the tree. Why almost? Because of path compression, a second technique used by the union-find data structure to keep trees short. Path compression can alter an existing tree to make it shorter than indicated by its rank. In principle, it might be better to make decisions based on the actual height than using rank as a proxy for height, but in practice, it is too hard/too slow to keep track of the true height information, whereas it is very easy/fast to keep track of rank.
You also asked "What happens if I simply add the two ranks (i.e. 2*r)?" This is an interesting question. The answer is probably nothing, meaning everything will still work just fine, with the same efficiency as before. (Well, assuming that you use 1 as your starting rank rather than 0.) Why? Because the way rank is used, what matters is the relative ordering of ranks, not their absolute magnitudes. If you add them, then your ranks will be 1,2,4,8 instead of 1,2,3,4 (or more likely 0,1,2,3), but they will still have exactly the same relative ordering so all is well. Your rank is simply 2^(the old rank). The biggest danger is that you run a larger risk of overflowing the integer used to represent the rank when dealing with very large sets (or, put another way, that you will need to use more space to store your ranks).
On the other hand, notice that by adding the two ranks, you are approximating the size of the trees rather than the heights of the trees. By always adding the two ranks, whether they are equal or not, then you are exactly tracking the sizes of the trees. Again, everything works just fine, with the same caveats about the possibility of overflowing integers if your trees are very large.
In fact, union-by-size is widely recognized as a legitimate alternative to union-by-rank. For some applications, you actually want to know the sizes of the sets, and for those applications union-by-size is actually preferabe to union-by-rank.
Because in this case - you add one tree is a "sub tree" of the other - which makes the original subtree increase its size.
Have a look at the following example:
1 3
| |
2 4
In the above, the "rank" of each tree is 2.
Now, let's say 1 is going to be the new unified root, you will get the following tree:
1
/ \
/ \
3 2
|
4
after the join the rank of "1" is 3, rank_old(1) + 1 - as expected.1
As for your second question, because it will yield false height for the trees.
If we take the above example, and merge the trees to get the tree of rank 3. What would happen if we then want to merge it with this tree2:
9
/ \
10 11
|
13
|
14
We'll find out both ranks are 4, and try to merge them the same way we did before, without favoring the 'shorter' tree - which will result in trees with higher height, and ultimately - worse time complexity.
(1) Disclaimer: The first part of this answer is taken from my answer to a similar question (though not identical due to your last part of the question)
(2) Note that the above tree is syntatically made, it cannot be created in an optimized disjoint forests algorithms, but it still demonstrates the issues needed for the answer.
If you read that paragraph in a little more depth, you'll realize that rank is more like depth, not size:
Since it is the depth of the tree that affects the running time, the tree with smaller depth gets added under the root of the deeper tree, which only increases the depth if the depths were equal. In the context of this algorithm, the term "rank" is used instead of "depth" ...
and a merge of equal depth trees only increases the depth of the tree by one since the root of the one is added to the root of the other.
Consider:
A D
/ \ merged with / \
B C E F
is:
A
/|\
B C D
/ \
E F
The depth was 2 for both, and it's 3 for the merged one.
Rank represents the depth of the tree, not the number of nodes in it. When you join a tree with a smaller rank with a tree with a larger rank, the overall rank remains the same.
Consider adding a tree with rank 4 to the root of the tree of rank 6: since we added a node above the root of the depth-4 tree, that subtree now has a rank of 5. The subtree to which we've added our depth-4 tree, however, is 6, so the rank does not change.
Now consider adding a tree with rank 6 to the root of a second tree of rank 6: since the root of the first depth-6 tree now has an extra node above it, the rank of that subtree (and the tree overall) changes to 7.
Since the rank of the tree determines the processing speed, the algorithm tries to keep the rank as low as possible by always attaching a shorter tree to the taller one, keeping the overall rank unchanged. The rank changes only when the trees have identical ranks, in which case one of them gets attached to the root of the other, bumping up the rank by one.
Actually Here two important properties should be known very well to us ....
1) What is Rank ?
2) Why Rank is Used ???
Rank is nothing but the depth of a tree .U can say rank as depth (level) of a tree . When we make union nodes then these (graph nodes ) will be formed as a tree with an ultimate root node.Rank is expressed only for those root nodes .
A merged with D
Initially A has rank (level) 0 and D has rank(level) 0 . So u can merge them making anyone of them as a root . Because if u make A as root the rank(level) will be 1
and if u make D as a root then the rank will also be 1
A
`D
Here rank ( level ) is 1 when root is A .
Now think for another ,
A merge B -----> A
`D `C / \
D B
\
C
So the level will be increased by 1 , see exactly without root (A) there is at most height / depth / rank is 2 . rank[ 1] -> {D,B} and rank [2] -> {C} ................
Now our main objective is to make tree with minimum rank(depth) as possible while merging ..
Now when two differnt rank tree merge ,then
A(rank 0) merge B(rank 1)---> B Here merged tree rank is 1 same as high rank (1)
`C / \
A C
When small rank goes under over high rank . Then the merged tree's rank(height/depth) will be the same rank associated with higher rank tree .That means the rank will not increase , the merged tree rank will be same as higher rank before ...
But if we will do the reverse work means high rank tree goes under over low rank tree then see ,
A ( rank 0 ) merge B (rank 1 ) --> A ( merged tree rank 2 greater than both )
`C `B
`C
So , whatever is seen from following observation is that if we try to keep rank (height) of merged tree as minimum possible then , we have to choose the first process. i think this part is clear !!
Now u have to understand what is our objective to keep tree's height minimum as possible ..........
when we use disjoint set union then for path compression ( finding ultimate root with whom a node is connected ) when we traverse from a node to it's root node then if it's height (rank) is long then time processing will be slow .That's why when we try to merge two trees then we try to keep heigh/depth/rank as minimum as possible
In CLRS, third Edition, on page 155, it is given that in MAX-HEAPIFY,
"the worst case occurs when the bottom level of the tree is exactly half full"
I guess the reason is that in this case, Max-Heapify has to "float down" through the left subtree.
But the thing I couldn't get is "why half full" ?
Max-Heapify can also float down if left subtree has only one leaf. So why not consider this as the worst case ?
Read the entire context:
The children's subtrees each have size at most 2n/3 - the worst case occurs when the last row of the tree is exactly half full
Since the running time T(n) is analysed by the number of elements in the tree (n), and the recursion steps into one of the subtrees, we need to find an upper bound on the number of nodes in a subtree, relative to n, and that will yield that T(n) = T(max num. nodes in subtree) + O(1)
The worst case of number of nodes in a subtree is when the final row is as full as possible on one side, and as empty as possible on the other. This is called half full. And the left subtree size will be bounded by 2n/3.
If you're proposing a case with only a few nodes, then that's irrelevant, since all base cases can be considered O(1) and ignored.
Already there's an accepted answer but this answer is for those people who are still a bit confused (as I was), or something still doesn't click. So here's a little bit longer and more detailed explanation.
Though it might sound redundant, we have to be very clear about the exact definitions because through our attention to the details... chances are when you do that proving things becomes much easier.
From CLRS (section 6.1), a Binary Heap data structure is an array object that can be viewed as a nearly complete binary tree
From Wikipedia, In a complete binary tree, every level (except possibly the last level) is completely filled, and all the nodes in the last level are as far left as possible.
Again, from Wikipedia, A balanced binary tree is a binary tree structure in which the left and right sub-trees of every node differ in height by no more than 1.
Now that we are armed, let's dive in.
So, in comparison to the root, the height of the left and right sub-tree can differ by 1 at most.
Let's consider a tree T and let the height of the left sub-tree = h+1 and the height of the right sub-tree = h
What can be the worst-case in MAX_HEAPIFY? The worst-case happens when we end up doing maximum number of comparisons and swaps while trying to maintain the heap property.
When the MAX_HEAPIFY algorithm runs and if it recursively goes through the longest path then we can consider a possible worst-case because it will end up doing the maximum number of comparisons and swaps in the longest path.
Well, it seems all of the longest paths happen to be in the left sub-tree (as its height is h+1). But someone might as well ask: Why not the right sub-tree? Remember the above definition, all the nodes in the last level have to be as far left as possible.
Now because we have to cover every possibility that can lead to a worst-case, we need to get more number of longer paths, if any exist, and because of that, we ought to make the left sub-tree FULL (But Why? So that we can get more paths to choose from and opt for the path that gives the worst-case time among all).
Since the left subtree has a height h+1, it will have 2^(h+1) no. of leaf nodes, and, therefore, 2^(h+1) number of paths from the root. This is the maximum number of possible paths in a tree T of h+1 height.
Note: Please hold on to it if you are still reading, maybe just for the sake of crystal clarity.
Here's the image of the tree structure in the worst-case situation.
In the above image, as you can see, consider that the left (in yellow) sub-tree and the right (in pink) sub-tree each has x nodes. The pink portion is a complete right sub-tree and the yellow portion is the left sub-tree excluding the last level.
Notice that both the left (yellow) and the right (pink) sub-trees have a height of h.
Now, from the start, we have considered the left subtree to be of height h+1 as a whole (i.e. including the yellow portion and the last level).
Now, if I may ask, how many nodes do we have to add in the last level i.e. below the yellow portion to make the left sub-tree completely FULL?
Well, the bottom-most layer of the yellow portion has ⌈x/2⌉ nodes (i.e. Total number of leaves in a tree/subtree having n nodes = ⌈n/2⌉; for proof visit this link), and now if we add 2 children to each of these nodes or leaves => total x (≈x) nodes have been added (How? ⌈x/2⌉ leaves * 2 ≈ x nodes).
With this addition, we make the left sub-tree of height h+1 (i.e. the yellow portion with height h and the one last level added) FULL, hence meeting the worst-case criteria.
Since the left sub-tree is FULL, the whole Tree is HALF FULL.
Now someone might as well ask: What if we add more nodes, or, specifically, what if we add nodes in the right sub-tree? Well, we don't. And that's because now if we happen to add more nodes, the nodes will be added in the right sub-tree (as the left sub-tree is FULL), which, in turn, will tend to balance out the tree more. Now as the tree is starting to get more balanced, we are tending to move towards the best-case scenario and not the worst-case.
Final question : How many nodes do we have in total?
Total nodes in the tree, n = x (from the yellow portion) + x (from the pink portion) + x (addition of the last level below the yellow portion) = 3x
Can you notice something? As a by-product, the left sub-tree in total contains at most 2x nodes i.e. 2n/3 nodes (bcoz x = n/3).
Consider the deletion procedure on a BST, when the node to delete has two children. Let's say i always replace it with the node holding the minimum key in its right subtree.
The question is: is this procedure commutative? That is, deleting x and then y has the same result than deleting first y and then x?
I think the answer is no, but i can't find a counterexample, nor figure out any valid reasoning.
EDIT:
Maybe i've got to be clearer.
Consider the transplant(node x, node y) procedure: it replace x with y (and its subtree).
So, if i want to delete a node (say x) which has two children i replace it with the node holding the minimum key in its right subtree:
y = minimum(x.right)
transplant(y, y.right) // extracts the minimum (it doesn't have left child)
y.right = x.right
y.left = x.left
transplant(x,y)
The question was how to prove the procedure above is not commutative.
Deletion (in general) is not commutative. Here is a counterexample:
4
/ \
3 7
/
6
What if we delete 4 and then 3?
When we delete 4, we get 6 as the new root:
6
/ \
3 7
Deleting 3 doesn't change the tree, but gives us this:
6
\
7
What if we delete 3 and then 4?
When we delete 3 the tree doesn't change:
4
\
7
/
6
However, when we now delete 4, the new root becomes 7:
7
/
6
The two resulting trees are not the same, therefore deletion is not commutative.
UPDATE
I didn't read the restriction that this is when you always delete a node with 2 children. My solution is for the general case. I'll update it if/when I can find a counter-example.
ANOTHER UPDATE
I don't have concrete proof, but I'm going to hazard a guess:
In the general case, you handle deletions differently based on whether you have two children, one child, or no children. In the counter-example I provided, I first delete a node with two children and then a node with one child. After that, I delete a node with no children and then another node with one child.
In the special case of only deleting nodes with two children, you want to consider the case where both nodes are in the same sub-tree (since it wouldn't matter if they are in different sub-trees; you can be sure that the overall structure won't change based on the order of deletion). What you really need to prove is whether the order of deletion of nodes in the same sub-tree, where each node has two children, matters.
Consider two nodes A and B where A is an ancestor of B. Then you can further refine the question to be:
Is deletion commutative when you are considering the deletion of two nodes from a Binary Search Tree which have a ancestor-descendant relationship to each other (this would imply that they are in the same sub-tree)?
When you delete a node (let's say A), you traverse the right sub-tree to find the minimum element. This node will be a leaf node and can never be equal to B (because B has two children and cannot be a leaf node). You would then replace the value of A with the value of this leaf-node. What this means is that the only structural change to the tree is the replacement of A's value with the value of the leaf-node, and the loss of the leaf-node.
The same process is involved for B. That is, you replace the value of the node and replace a leaf-node. So in general, when you delete a node with two children, the only structural change is the change in value of the node you are deleting, and the deletion of the leaf node who's value you are using as replacement.
So the question is further refined:
Can you guarantee that you will always get the same replacement node regardless of the order of deletion (when you are always deleting a node with two children)?
The answer (I think) is yes. Why? Here are a few observations:
Let's say you delete the descendant node first and the ancestor node second. The sub-tree that was modified when you deleted the descendant node is not in the left sub-tree of the ancestor node's right child. This means that this sub-tree remains unaffected. What this also means is regardless of the order of deletion, two different sub-trees are modified and therefore the operation is commutative.
Again, let's say you delete the descendant node first and the ancestor node second. The sub-tree that was modified when you deleted the descendant node is in the left sub-tree of the ancestor node's right child. But even here, there is no overlap. The reason is when you delete the descendant node first, you look at the left sub-tree of the descendant node's right child. When you then delete the ancestor node, you will never go down that sub-tree since you will always be going towards the left after you enter the ancestor node's right-child's left sub-tree. So again, regardless of what you delete first you are modifying different sub-trees and so it appears order doesn't matter.
Another case is if you delete the ancestor node first and you find that the minimum node is a child of the descendant node. This means that the descendant node will end up with one child, and deleting the one child is trivial. Now consider the case where in this scenario, you deleted the descendant node first. Then you would replace the value of the descendant node with its right child and then delete the right child. Then when you delete the ancestor node, you end up finding the same minimum node (the old deleted node's left child, which is also the replaced node's left child). Either way, you end up with the same structure.
This is not a rigorous proof; these are just some observations I've made. By all means, feel free to poke holes!
It seems to me that the counterexample shown in Vivin's answer is the sole case of non-commutativity, and that it is indeed eliminated by the restriction that only nodes with two children can be deleted.
But it can also be eliminated if we discard what appears to be one of Vivin's premises, which is that we should traverse the right subtree as little as possible to find any acceptable successor. If, instead, we always promote the smallest node in the right subtree as the successor, regardless of how far away it turns out to be located, then even if we relax the restriction on deleting nodes with fewer than two children, Vivin's result
7
/
6
is never reached if we start at
4
/ \
3 7
/
6
Instead, we would first delete 3 (without successor) and then delete 4 (with 6 as successor), yielding
6
\
7
which is the same as if the order of deletion were reversed.
Deletion would then be commutative, and I think it is always commutative, given the premise I have named (successor is always smallest node in right subtree of deleted node).
I do not have a formal proof to offer, merely an enumeration of cases:
If the two nodes to be deleted are in different subtrees, then deletion of one does not affect the other. Only when they are in the same path can the order of deletion possibly affect the outcome.
So any effect on commutativity can come only when an ancestor node and one of its descendants are both deleted. Now, how does their vertical relationship affect commutativity?
Descendant in the left subtree of the ancestor. This situation will not affect commutativity because the successor comes from the right subtree and cannot affect the left subtree at all.
Descendant in the right subtree of the ancestor. If the ancestor's successor is always the smallest node in the right subtree, then order of deletion cannot change the choice of successor, no matter what descendant is deleted before or after the ancestor. Even if the successor to the ancestor turns out to be the descendant node that is also to be deleted, that descendant too is replaced with the the next-largest node to it, and that descendant cannot have its own left subtree remaining to be dealt with. So deletion of an ancestor and any right-subtree descendant will always be commutative.
I think there are two equally viable ways to delete a node, when it has 2 children: SKIP TO CASE 4...
Case 1: delete 3 (Leaf node)
2 3
/ \ --> / \
1 3 1
Case 2: delete 2 (Left child node)
2 3
/ \ --> / \
1 3 1
Case 3: delete 2 (Right child node)
2 2
/ \ --> / \
1 3 3
______________________________________________________________________
Case 4: delete 2 (Left & Right child nodes)
2 2 3
/ \ --> / \ or / \
1 3 1 3
BOTH WORK and have different resulting trees :)
______________________________________________________________________
As algorithm explained here: http://www.mathcs.emory.edu/~cheung/Courses/323/Syllabus/Trees/AVL-delete.html
Deleting a node with 2 children nodes:
1) Replace the (to-delete) node with its in-order predecessor or in-order successor
2) Then delete the in-order predecessor or in-order successor
I respond here to Vivin's second update.
I think this is a good recast of the question:
Is deletion commutative when you are
considering the deletion of two nodes
from a Binary Search Tree which have a
ancestor-descendant relationship to
each other (this would imply that they
are in the same sub-tree)?
but this bold sentence below is not true:
When you delete a node (let's say A),
you traverse the right sub-tree to
find the minimum element. This node
will be a leaf node and can never be equal to B
since the minimum element in A's right subtree can have a right child. So, it is not a leaf.
Let's call the minimum element in A's right subtree successor(A).
Now, it is true that B cannot be successor(A), but it can be in its right subtree. So, it is a mess.
I try to summarize.
Hypothesis:
A and B have two children each.
A and B are in the same subtree.
Other stuff we can deduce from hypothesis:
B is not successor(A), neither A is successor(B).
Now, given that, i think there are 4 different cases (as usual, let be A an ancestor of B):
B is in A's left subtree
B is an ancestor of successor(A)
successor(A) is an ancestor of B
B and successor(A) don't have any relationship. (they are in different A's subtrees)
I think (but of course i cannot prove it) that cases 1, 2 and 4 don't matter.
So, only in the case successor(A) is an ancestor of B deletion procedure could not be commutative. Or could it?
I pass the ball : )
Regards.