I am reading through Okasaki's Purely Functional Data Structures and am trying to do some of the exercises. One of them is to prove that binomial heap merge takes O(log n) time where n is the number of nodes in the heap.
functor BinomialHeap (Element:ORDERED):HEAP=
struct
structure Elem=Element
datatype Tree = Node of int*Elem.T*Tree list
type Heap = Tree list
fun link (t1 as Node (r,x1,c1), t2 as Node (_,x2,c2))=
if Elem.leq(x1,x2)
then Node (r+1,x1,t2::c1)
else Node (r+1,x2,t1::c2)
fun insTree (t,[])=[t]
|insTree (t,ts as t'::ts')=
if rank t < rank t' then t::ts else insTree(link(t,t'),ts')
fun insert (x,ts)=insTree(Node(0,x,[]),ts) (*just for reference*)
fun merge (ts1,[])=ts1
|merge ([],ts2)=ts2
|merge (ts1 as t1::ts1', ts2 as t2:ts2')=
if rank t1 < rank t2 then t1::merge(ts1',ts2)
else if rank t2 < rank t1 then t2::merge(ts1,ts2')
else insTree (link(t1,t2), merge (ts1',ts2'))
end
It is clear that merge will call itself max(numNodes ts1, numNodes ts2) times, but since insTree is O(log n) worst case, can you explain how merge is O(log n)?
First note that merge will be called at most (numNodes ts1 + numNodes ts2) times, and this is O(log n) times. (Just to be clear, ts1 and ts2 are lists of binomial trees, where such a tree of rank r contains exactly 2^r nodes, and each rank can occur at most once. Therefore, there are O(log n1) such trees in ts1 and O(log n2) in ts2, where n1 and n2 are the number of nodes in the heaps and n=n1+n2.)
The key point to notice is that insTree is called at most once for each rank (either through merge or recursively), and the largest possible rank is log_2(n). The reason is the following:
If insTree is called from merge, then say r = rank t1 = rank t2, and link(t1,t2) will have rank r+1. So let's say insTree is called for rank r+1. Now think about what happens with merge(ts1', ts2'). Let the smallest rank that occurs as a tree in ts1' and ts2' be r' >= r+1. Then insTree will be called again from merge for rank r'+1, since the two trees of rank r' will get linked to form a tree of rank r'+1. However, the merged heap merge(ts1', ts2') can therefore not contain a tree of rank r', and the previous call to insTree can therefore not recurse further than r'.
So putting things together:
insTree is called at most O(log n) times, with each call being constant time (since we count the recursion as a separate call)
merge is called at most O(log n) times, with each call being constant time (since we count the calls to insTree separately and link is constant time)
=> The entire merge operation is O(log n).
EDIT: By the way, merging binomial heaps is very much like adding binary numbers. A heap of size n will have a tree of rank r if and only if the binary number n has a '1' at the 2^r position. When merging such heaps, you proceed from the lowest rank to highest rank -- or least significant to most significant position. Trees of the same rank need to be linked (the 'ones' added), and inserted / "carried" into the higher rank positions.
Related
The problem statement is as follows:
Imagine you are reading in a stream of integers. Periodically, you
wish to be able to look up the rank of a number x (the number of
values less than or equal to x). Implement the data structures and
algorithms to support these operations.That is, implement the method
track (in t x), which is called when each number is generated, and the
method getRankOfNumber(int x) , which returns the number of values
less than or equal to X (not including x itself).
EXAMPLE: Stream(in order of appearance): 5, 1, 4, 4, 5, 9, 7, 13, 3
getRankOfNumber(1) = 0 getRankOfNumber(3) = 1 getRankOfNumber(4) = 3
The suggested solution uses a modified Binary Search Tree, where each node stores stores the number of nodes to the left of that node. The time complexity for both methods is is O(logN) for balanced tree and O(N) for unbalanced tree, where N is the number of nodes.
But how can we construct a balanced BST from a stream of random integers? Won't the tree become unbalanced in due time if we keep on adding to the same tree and the root is not the median? Shouldn't the worst case complexity be O(N) for this solution (in which case a HashMap with O(1) and O(N) for track() and getRankOfNumber() respectively would be better)?
you just need to build an AVL or Red-Black Tree to have the O(lg n) complexities you desire.
about the rank, its kind of simple. Let's call count(T) the number of elements of a tree with root T.
the rank of a node N will be:
firstly there will be count(N's left subtree) nodes before N (elements smaller than N)
let A = N's father. If N is right son of A, then there will be 1 + count(A's left subtree) nodes before N
if A is right son of some B, then there will be 1 + count(B's left subtree) nodes before N
recursively, run all the way up until you reach the root or until the node you are in isn't someone's right son.
as the height of a balanced tree is at most lg(n), this method will take O(lg n) to return you someone's rank ( O(lg n) to find + O(lg n) to run back and measure the rank ), but this taking in consideration that all nodes store the sizes of their left and right subtrees.
hope that helps :)
Building a Binary Search Tree (BST) using the stream of numbers should be easier to imagine. All the values less than the node, goes to the left and all the values greater than the node, goes to the right.
Then Rank of any x will be number of nodes in left subtree of that node with value x.
Operations to be done: Find the node with Value x O(logN) + Count Nodes of left Subtree of node found O(logN) = Total O(logN + logN) = O(logN)
In case to optimize searching of counts of node of left subtree from O(logN) to O(1), you can keep another class variable 'leftSubTreeSize' in Node class, and populate it during insertion of a node.
We implement Disjoint Data structure with tree. in this data structure makeset() create a set with one element, merge(i, j) merge two tree of set i and j in such a way that tree with lower height become a child of root of the second tree. if we do n makeset() operation and n-1 merge() operations in random manner, and then do one find operation. what is the cost of this find operation in worst case?
I) O(n)
II) O(1)
III) O(n log n)
IV) O(log n)
Answer: IV.
Anyone could mentioned a good tips that the author get this solution?
The O(log n) find is only true when you use union by rank (also known as weighted union). When we use this optimisation, we always place the tree with lower rank under the root of the tree with higher rank. If both have the same rank, we choose arbitrarily, but increase the rank of the resulting tree by one. This gives an O(log n) bound on the depth of the tree. We can prove this by showing that a node that is i levels below the root (equivalent to being in a tree of rank >= i) is in a tree of at least 2i nodes (this is the same as showing a tree of size n has log n depth). This is easily done with induction.
Induction hypothesis: tree size is >= 2^j for j < i.
Case i == 0: the node is the root, size is 1 = 2^0.
Case i + 1: the length of a path is i + 1 if it was i and the tree was then placed underneath
another tree. By the induction hypothesis, it was in a tree of size >= 2^i at
that time. It is being placed under another tree, which by our merge rules means
it has at least rank i as well, and therefore also had >= 2^i nodes. The new tree
therefor has >= 2^i + 2^i = 2^(i + 1) nodes.
I am reading Binomial Heap in Purely Functional Data Structures.
The implementation of insTree function confused me quite much. Here are the set of codes
datatype Tree = Node of int * Elem.T * Tree list
fun link (t1 as Node (r, x1, c1), t2 as Node (_, x2, c2)) =
if Elem.leq (x1, x2) then Node (r+1, x1, t2::c1)
else Node (r+1, x2, t1::c2)
fun rank (Node (r, x, c)) = r
fun insTree (t, []) = [t]
| insTree (t, ts as t' :: ts') =
if rank t < rank t' then t::ts else insTree (link (t, t'), ts')
My confusion lies on the bit in insTree that why it does not consider the situation of rank t > rank t'?
In if rank t < rank t' then t::ts else insTree (link (t, t'), ts'),
if t's rank is less than t''s rank, then put the t into the heap, no question asked
the else has two cases: equal and bigger.
For equal, yes, we can link two trees (we link only two trees with the same rank), and then try to insert the new linked tree into the heap, no question asked.
but even the bigger case would have the same as equal, why? Even if rank t > rank t', we still link them?
Edit
I thought the process of inserting a binomial tree into a binomial heap should be like this:
We get the tree t and the heap
In the heap (actually a list), we compare rank of the tree t with every tree in the heap
We find a missing rank (increasing order in the heap) which matches the rank of t, we put t at that slot
We find a tree in the heap with same rank as t, then we link two trees and process a rank+1 new tree and try again insert the new tree to the heap.
So, I think the correct fun insTree could be like this:
fun insTree (t, []) = [t]
| insTree (t, ts as t' :: ts') =
if rank t < rank t' then t::ts
else if rank t = rank t' then insTree (link (t, t'), ts')
else t'::(insTree (t, ts'))
insTree is a helper function that is not visible to the user. The user calls insert, which in turn calls insTree with a tree of rank 0, and a list of trees of increasing rank. insTree has an invariant that the rank of t is <= the rank of the first tree in the list. So if it's not <, then it must be =.
You're right that if insTree was a general-purpose public function, rather than a special-purpose private function, then it would have to deal with the missing case.
An important detail behind this is that a binomial heap is not any tree that happens to have k children. It's a tree that's rigorously defined as
The binomial tree of order 0 is a single node, and
The binomial tree of order n is a single node with binomial trees of order 0, 1, 2, ..., n - 1 as children.
This answer, explain why the insert function, which is the one required to construct a binomial heap, do not manage these cases (In theory they cannot happen). Maybe the cases you are proposing make sense for a merging operation (but the underlying implementation should differ).
Okasaki's implementation in Purely Functional Data Structures (page 22) does it in two: one to merge the forest, and one to propagate the carries. This strikes me as harder to analyze, and also probably slower, than a one-pass version. Am I missing something?
Okasaki's implementation:
functor BinomialHeap (Element:ORDERED):HEAP=
struct
structure Elem=Element
datatype Tree = Node of int*Elem.T*Tree list
type Heap = Tree list
fun link (t1 as Node (r,x1,c1), t2 as Node (_,x2,c2))=
if Elem.leq(x1,x2)
then Node (r+1,x1,t2::c1)
else Node (r+1,x2,t1::c2)
fun insTree (t,[])=[t]
|insTree (t,ts as t'::ts')=
if rank t < rank t' then t::ts else insTree(link(t,t'),ts')
fun insert (x,ts)=insTree(Node(0,x,[]),ts) (*just for reference*)
fun merge (ts1,[])=ts1
|merge ([],ts2)=ts2
|merge (ts1 as t1::ts1', ts2 as t2:ts2')=
if rank t1 < rank t2 then t1::merge(ts1',ts2)
else if rank t2 < rank t1 then t2::merge(ts1,ts2')
else insTree (link(t1,t2), merge (ts1',ts2'))
end
This strikes me as hard to analyze because you have to prove an upper bound on the cost of propagating all the carries (see below). The top-down merge implementation I came up with is much more obviously O(log n) where n is the size of the larger heap:
functor BinomialHeap (Element:ORDERED):HEAP=
struct
structure Elem=Element
datatype Tree = Node of int*Elem.T*Tree list
type Heap = Tree list
fun rank (Node(r,_,_))=r
fun link (t1 as Node (r,x1,c1), t2 as Node (_,x2,c2))=
if Elem.leq(x1,x2)
then Node (r+1,x1,t2::c1)
else Node (r+1,x2,t1::c2)
fun insTree (t,[])=[t]
|insTree (t,ts as t'::ts')=
if rank t < rank t' then t::ts else insTree(link(t,t'),ts')
fun insert (x,ts)=insTree(Node(0,x,[]),ts)
fun merge(ts1,[])=ts1
|merge([],ts2)=ts2
|merge (ts1 as t1::ts1', ts2 as t2::ts2')=
if rank t1 < rank t2 then t1::merge(ts1',ts2)
else if rank t2 < rank t1 then t2::merge(ts1,ts2')
else mwc(link(t1,t2),ts1',ts2')
(*mwc=merge with carry*)
and mwc (c,ts1,[])=insTree(c,ts1)
|mwc (c,[],ts2)=insTree(c,ts2)
|mwc (c,ts1 as t1::ts1', ts2 as t2::ts2')=
if rank c < rank t1
then if rank c < rank t2 then c::merge(ts1,ts2)
else mwc(link(c,t2),ts1,ts2')
else mwc(link(c,t1),ts1',ts2)
end
Proof that Okasaki's implementation is O(log n): if a carry is "expensive" (requires one or more links), then it produces a zero. Therefore the next expensive carry will stop when it reaches that point. So the total number of links required to propagate all the carries is no more than around the total length of the binary representation before propagation, which is bounded above by ceil(log n), where n is the size of the larger heap.
I'm currently implementing a red-black tree data structure to perform some optimizations for an application.
In my application, at a given point I need to remove all elements less than or equal to a given value (you can assume that the elements are integers) from the tree.
I could delete the elements one by one, but I would like to have something faster. Therefore, my question is: if I delete a whole subtree of a red-black tree, how could I fix the tree to recover the height and color invariants?
When you delete one element from a red-black tree it takes O(log n) time, where n is the number of elements currently in the tree.
If you remove only few of the elements, then it's best just to remove them one by one, ending up with O(k log n) operations (k = removed elements, n = elements in the tree before removals).
But if you know that you are going to remove a large number of nodes (e.g. 50% or more of the tree), then it's better to iterate through the elements you want to keep (O(k') operation where k' = elements what will be kept), then scrap the tree (O(1) or O(n) depending on your memory management scheme) and rebuild the tree (O(k' log k')) operation. The total complexity is O(k')+O(k' log k') = O(k' log k'), which obviously is less than O(k log n) when k' < k (you keep less than 50% of the tree).
Well, the point being anyway that when you are going to remove MOST of the elements, it's better in practice to enumerate the ones you want to keep and then rebuild the tree.
EDIT: The below is for a generic sub-tree delete. What you need is just a single Split operation (based on your actual question contents).
It is possible to delete a whole subtree of a Red-Black tree in worst case O(log n) time.
It is known that Split and Join operations on a red-black tree can be done in O(log n) time.
Split : Given a value k and a red-black Tree T, Split T into two red-black trees T1 and T2 such that all values in T1 < k and all values in T2 >= k.
Join : Combine two red-black trees T1 and T2 into a single red-black tree T. T1 and T2 satisfy max in T1 <= min in T2 (or T1 <= T2 in short).
What you need is two Splits and one Join.
In your case, the subtree you need to delete will correspond to a range of values L <= v <= U.
So you first Split on L, to get T1 and T2 with T1 <= T2. Split T2 on U to get T3 and T4 with T3 <= T4. Now Join the trees T1 and T4.
In pseudoCode, your code will look something like this:
Tree DeleteSubTree( Tree tree, Tree subTree) {
Key L = subTree.Min();
Key U = subTree.Max();
Pair <Tree> splitOnL = tree.Split(L);
Pair <Tree> splitOnU = splitOnL.Right.Split(U);
Tree newTree = splitOnL.Left.Join(splitOnU.Right);
return newTree;
}
See this for more information: https://cstheory.stackexchange.com/questions/1045/subrange-of-a-red-and-black-tree
Bulk deletion from a red-black tree is hard because the black-height invariant gets messed up pretty badly. Assuming you're not doing (soft) real-time, I would either delete one-by-one (since you had to insert them one by one, we're talking about a smaller constant factor here) or switch to a splay tree.