Time complexity of BST inorder traversal if implemented this way - time

Well normally if using depth-first traversal, we get O(n) time. However, if we find the minimum element first then call the successor() method n times, what time complexity will it be?
I think it may be O(n log n) because successor is O(log n) but that doesn't seem right. Can anyone offer any in-depth analysis here (probably involving some limit analysis)?

If parent pointers are present at each node, calling the successor method n times takes O(n) time. To see this observe that each edge in the tree gets visited at most twice (once from parent to child and once from child to the parent) by all the successor calls combined. Thus the total number of edges visited by all the successor calls is at most 2n. So the running time is O(n).
Now if parent pointers are not present, in every call we have to start from the root and search for the successor element by travelling through O(log n) nodes (if the tree is balanced). So the complexity becomes O(n log n).

Not quite a formal argument, but a fairly convincing one for O(n):
The successor function always takes the shortest path from the starting node to its successor. It either goes down or it goes up, but once it's started doing one it can't change to the other. Therefore it has to take the shortest path.
The successor function has to produce the same output as the depth-first method, so it has to visit the same nodes in the same order (i.e. the outputted ones, it doesn't have to go past the same ones, although it does).
The depth-first method also always takes the shortest path between each outputted node (in each step it goes either down or up, not both).
Therefore each method takes exactly the same path, and are in fact equivalent.

Related

Why is time complexity of binary tree traversal (like preorder) not exponential?

Why is time complexity of binary tree traversal (like preorder) not exponential?
For example, in common implementation of Fibonacci sequence, it is exponential because for every instance, you call the Fibonacci function twice. So, how come it is O(n) for preorder traversal (where also the recursive function gets called twice)
[I know it is O(n) as every nodes gets traversed, so please don't answer in terms of why it is O(n). Answer in comparison with the Fibonacci recursive implementation, as I want to see the difference].
I'll assume you are referring to this recursive Fibonacci algorithm, which takes as input a number 𝑖, and returns the 𝑖th number from the Fibonacci sequence:
def fibonacci(number):
if number < 2:
return number
else:
return fibonacci(number - 1) + fibonacci(number - 2)
If we consider each distinct value of number (that is used for calls of this function) as a "node", then notice an important difference with the binary-tree-traversal problem:
This Fibonacci algorithm will visit the same node multiple times, and this gets worse as recursive calls are made on already visited "nodes". The recursion tree is not really a tree, but a directed acyclic graph. Here is the recursion tree for when calling Fibonacci(5):
So we see that Fibonacci(3) is calculated twice, each time doing the whole work of deeper recursion. So Fibonacci(2) and Fibonacci(0) are each called 3 times, and Fibonacci(1) 5 times. In total there are 15 "visits", including the duplicates.
This is not happening with the recursive tree-traversal algorithm, where the recursion tree really is a tree (that is equivalent to the tree being traversed).
This explains the difference in time complexity.
It also explains why this naive Fibonacci algorithm can be improved by avoiding the duplicate "node visits", so that it becomes O(n) too.
There are two ways of looking at it:
For each node, you are making the recursive call exactly twice. This already tells you that it's a constant factor for each additional node, regardless of where in the tree this node is situated.
But just in case this is not enough for you there is the other viewpoint too:
In each recursive call the sub-problem is halved. So you are doubling the amount of calls but halving the amount of work to be done.
Both of these cases differ from the "flawed fibonacci implementation" in that you are never visiting a node more than once. In the fibonacci example you are doing the same work over and over again.

Confusion related to the time complexity of an algorithm

I was going through this algorithm https://codereview.stackexchange.com/questions/63921/print-all-nodes-from-root-to-leaves
In one of the comments it is mentioned that printing the paths from the root to leaf itself has average time complexity of O(nlogn). I am not quite sure how he came up with that. Any clarification will be much appreciated.
I think this is what they mean:
in the best case, the tree is perfectly balanced, and it contains N nodes, where log(N)+1 is the number of levels. The tree has N/2 leaves.
Every time we move to a lower level, we duplicate the currently accumulated path. If you assume copying an array of length k as an O(k) operation, then when we move from the second to last level to a leaf we do an O(log(N)) operation. As there are N/2 leaves, and for each we do an O(log(N)) operation, you get O(N*log(N)).
Instead of duplicating arrays, the function could pass recursively the same array, and the current level number, making sure that the path is printed only up to the level of the leaf.

BST tree- running time

I have a pseudo- code:
function func(BST t):
x = MIN(t)
for i=1..n do:
print x.key
x = SUCCESSOR(x)
Now, I need to prove it's runnig time is THETA(n).
BUT, I know SUCCESSOR running time is O(logn), and therefor running time is O(nlogn).
where is my mistake here?
Thank in advance...
There are two possibilities:
This not true, the run time is O(nlogn)
You know the exact implementation of SUCCESSOR, which has upper bounded logarithmic complexity (as stated, O(logn)), but you can deduce, that when performing it one after another it actually degenerates to theta(1). In fact, good implementation of SUCCESSOR in BST should have amortized theta(1) complexity as each node will be visited at most twice during the whole func execution.
It really depends on the implementation of your BST, but if your BST holds a 'father' node, and is using it to find the successor, it will need to traverse each edge at most twice - once one you go "down", the first time to the node, and one when you go "up", back from it.
Since a tree has n-1 edges, you get at most 2*(n-1) number edges read, and this is O(n).
Note that indeed the worst case of the SUCCESSOR() function is O(logn), but the average case is O(1), if it is implemented the way I described.

What is the intuition behind the Fibonacci heap data structure?

I've read the Wikipedia article on Fibonacci heaps and read CLRS's description of the data structure, but they provide little intuition for why this data structure works. Why are Fibonacci heaps designed the way they are? How do they work?
Thanks!
This answer is going to be pretty long, but I hope it helps provide some insight as to where the Fibonacci heap comes from. I'm going to assume that you're already familiar with binomial heaps and amortized analysis.
Motivation: Why Fibonacci Heaps?
Before jumping into Fibonacci heaps, it's probably good to explore why we even need them in the first place. There are plenty of other types of heaps (binary heaps and binomial heaps, for example), so why do we need another one?
The main reason comes up in Dijkstra's algorithm and Prim's algorithm. Both of these graph algorithms work by maintaining a priority queue holding nodes with associated priorities. Interestingly, these algorithms rely on a heap operation called decrease-key that takes an entry already in the priority queue and then decreases its key (i.e. increases its priority). In fact, a lot of the runtime of these algorithms is explained by the number of times you have to call decrease-key. If we could build a data structure that optimized decrease-key, we could optimize the performance of these algorithms. In the case of the binary heap and binomial heap, decrease-key takes time O(log n), where n is the number of nodes in the priority queue. If we could drop that to O(1), then the time complexities of Dijkstra's algorithm and Prim's algorithm would drop from O(m log n) to (m + n log n), which is asymptotically faster than before. Therefore, it makes sense to try to build a data structure that supports decrease-key efficiently.
There is another reason to consider designing a better heap structure. When adding elements to an empty binary heap, each insertion takes time O(log n). It's possible to build a binary heap in time O(n) if we know all n elements in advance, but if the elements arrive in a stream this isn't possible. In the case of the binomial heap, inserting n consecutive elements takes amortized time O(1) each, but if insertions are interlaced with deletions, the insertions may end up taking Ω(log n) time each. Therefore, we might want to search for a priority queue implementation that optimizes insertions to take time O(1) each.
Step One: Lazy Binomial Heaps
To start off building the Fibonacci heap, we're going to begin with a binomial heap and modify it try to make insertions take time O(1). It's not all that unreasonable to try this out - after all, if we're going to do a lot of insertions and not as many dequeues, it makes sense to optimize insertions.
If you'll recall, binomial heaps work by storing all of the elements in the heap in a collection of binomial trees. A binomial tree of order n has 2n nodes in it, and the heap is structures as a collection of binomial trees that all obey the heap property. Typically, the insertion algorithm in a binomial heap work as follows:
Create a new singleton node (this is a tree of order 0).
If there is a tree of order 0:
Merge the two trees of order 0 together into a tree of order 1.
If there is a tree of order 1:
Merge the two trees of order 1 together into a tree order 2.
If there is a tree of order 2:
...
This process ensures that at each point in time, there is at most one tree of each order. Since each tree holds exponentially more nodes than its order, this guarantees that the total number of trees is small, which lets dequeues run quickly (because we don't have to look at too many different trees after doing a dequeue-min step).
However, this also means that the worst-case runtime of inserting a node into a binomial heap is Θ(log n), because we might have Θ(log n) trees that need to get merged together. Those trees need to be merged together only because we need to keep the number of trees low when doing a dequeue step, and there's absolutely no benefit in future insertions to keeping the number of trees low.
This introduces the first departure from binomial heaps:
Modification 1: When inserting a node into the heap, just create a tree of order 0 and add it to the existing collection of trees. Do not consolidate trees together.
There is another change we can make. Normally, when we merge together two binomial heaps, we do a merge step to combine them together in a way that ensures that there is at most one tree of each order in the resulting tree. Again, we do this compression so that dequeues are fast, and there's no real reason why the merge operation ought to have to pay for this. Therefore, we'll make a second change:
Modification 2: When merging two heaps together, just combine all their trees together without doing any merging. Do not consolidate any trees together.
If we make this change, we pretty easily get O(1) performace on our enqueue operations, since all we're doing is creating a new node and adding it to the collection of trees. However, if we just make this change and don't do anything else, we completely break the performance of the dequeue-min operation. Recall that dequeue-min needs to scan across the roots of all the trees in the heap after removing the minimum value so that it can find the smallest value. If we add in Θ(n) nodes by inserting them in the way, our dequeue operation will then have to spend Θ(n) time looking over all of these trees. That's a huge performance hit... can we avoid it?
If our insertions really just add more trees, then the first dequeue we do will certainly take Ω(n) time. However, that doesn't mean that every dequeue has to be expensive. What happens if, after doing a dequeue, we coalesce all the trees in the heap together such that we end up with only one tree of each order? This will take a long time initially, but if we start doing multiple dequeues in succession, those future dequeues will be significantly faster because there are fewer trees lying around.
There's a slight problem with this setup, though. In a normal binomial heap, the trees are always stored in order. If we just keep throwing new trees into our collection of trees, coalescing them at random times, and adding even more trees after that, there's no guarantee that the trees will be in any order. Therefore, we're going to need a new algorithm to merge those trees together.
The intuition behind this algorithm is the following. Suppose we create a hash table that maps from tree orders to trees. We could then do the following operation for each tree in the data structure:
Look up and see if there's already a tree of that order.
If not, insert the current tree into the hash table.
Otherwise:
Merge the current tree with the tree of that order, removing the old tree from the
hash table.
Recursively repeat this process.
This operation ensures that when we're done, there's at most one tree of each order. It's also relatively efficient. Suppose that we start with T total trees and end up with t total trees. The number of total merges we'll end up doing will be T - t - 1, and each time we do a merge it will take time O(1) to do it. Therefore, the runtime for this operation will be linear in the number of trees (each tree is visited at least once) plus the number of merges done.
If the number of trees is small (say, Θ(log n)), then this operation will only take time O(log n). If the number of trees is large (say, Θ(n)), then this operation will take Θ(n) time, but will leave only Θ(log n) trees remaining, making future dequeues much faster.
We can quantify just how much better things will get by doing an amortized analysis and using a potential function. Let Φ to be our potential function and let Φ be the number of trees in the data structure. This means that the costs of the operations are as follows:
Insert: Does O(1) work and increases the potential by one. Amortized cost is O(1).
Merge: Does O(1) work. The potential of one heap is dropped to 0 and the other heap's potential is increased by a corresponding amount, so there is no net change in potential. The amortized cost is thus O(1).
Dequeue-Min: Does O(#trees + #merges) work and decreases the potential down to Θ(log n), the number of trees we'd have in the binomial tree if we were eagerly merging the trees together. We can account for this in a different way. Let's have the number of trees be written as Θ(log n) + E, where E is the "excess" number of trees. In that case, the total work done is Θ(log n + E + #merges). Notice that we'll do one merge per excess tree, and so the total work done is Θ(log n + E). Since our potential drops the number of trees from Θ(log n) + E down to Θ(log n), the drop in potential is -E. Therefore, the amortized cost of a dequeue-min is Θ(log n).
Another intuitive way to see why the amortized cost of a dequeue-min is Θ(log n) is by looking at why we have surplus trees. These extra trees are there because those darned greedy inserts are making all these extra trees and not paying for them! We can therefore "backcharge" the cost associated with doing all the merges back to the individual insertions that took up all that time, leaving behind the Θ(log n) "core" operation and a bunch of other operations that we'll blame on the insertions.
Therefore:
Modification 3: On a dequeue-min operation, consolidate all trees to ensure there's at most one tree of each order.
At this point, we have insert and merge running in time O(1) and dequeues running in amortized time O(log n). That's pretty nifty! However, we still don't have decrease-key working yet. That's going to be the challenging part.
Step Two: Implementing Decrease-Key
Right now, we have a "lazy binomial heap" rather than a Fibonacci heap. The real change between a binomial heap and a Fibonacci heap is how we implement decrease-key.
Recall that the decrease-key operation should take an entry already in the heap (usually, we'd have a pointer to it) and a new priority that's lower than the existing priority. It then changes the priority of that element to the new, lower priority.
We can implement this operation very quickly (in time O(log n)) using a straightforward algorithm. Take the element whose key should be decreased (which can be located in O(1) time; remember, we're assuming we have a pointer to it) and lower its priority. Then, repeatedly swap it with its parent node as long as its priority is lower than its parent, stopping when the node comes to rest or when it reaches the root of the tree. This operation takes time O(log n) because each tree has height at most O(log n) and each comparison takes time O(1).
Remember, though, that we're trying to do even better than this - we want the runtime to be O(1)! That's a very tough bound to match. We can't use any process that will move the node up or down the tree, since those trees have heights that can be Ω(log n). We'll have to try something more drastic.
Suppose that we want to decrease the key of a node. The only way that the heap property will be violated is if the node's new priority is lower than that of its parent. If we look at the subtree rooted at that particular node, it will still obey the heap property. So here's a totally crazy idea: what if whenever we decrease the key of a node, we cut the link to the parent node, then bring the entire subtree rooted at the node back up to the top level of the tree?
Modification 4: Have decrease-key decrease the key of a node and, if its priority is smaller than its parent's priority, cut it and add it to the root list.
What will the effect of this operation be? Several things will happen.
The node that previously had our node as a child now thinks it has the wrong number of children. Recall that a binomial tree of order n is defined to have n children, but that's not true any more.
The collection of trees in the root list will go up, increasing the cost of future dequeue operations.
The trees in our heap aren't necessarily going to be binomial trees any more. They might be "formerly" binomial trees that lost children at various points in time.
Number (1) isn't too much of a problem. If we cut a node from its parent, we can just decrease the order of that node by one to indicate that it has fewer children than it thought it previously did. Number (2) also isn't a problem. We can just backcharge the extra work done in the next dequeue-min operation to the decrease-key operation.
Number (3) is a very, very serious issue that we will need to address. Here's the problem: the efficiency of a binomial heap partially stems from the fact that any collection of n nodes can be stored in a collection of Θ(log n) trees of different order. The reason for this is that each binomial tree has 2n nodes in it. If we can start cutting nodes out of trees, then we risk having trees that have a large number of children (that is, a high order) but which don't have many nodes in them. For example, suppose we start with a single tree of order k and then perform decrease-key operations on all the grandchildren of k. This leaves k as a tree with order k, but which only contains k + 1 total nodes. If we keep repeating this process everywhere, we might end up with a bunch of trees of various orders that have a very small number of children in them. Consequently, when we do our coalesce operation to group the trees together, we might not reduce the number of trees to a manageable level, breaking the Θ(log n)-time bound that we really don't want to lose.
At this point, we're in a bit of a bind. We need to have a lot of flexibility with how the trees can be reshaped so that we can get the O(1) time decrease-key functionality, but we can't let the trees get reshaped arbitrarily or we will end up with decrease-key's amortized runtime increasing to something greater than O(log n).
The insight needed - and, quite honestly, what I think is the real genius in the Fibonacci heap - is a compromise between the two. The idea is the following. If we cut a tree from its parent, we're already planning on decreasing the rank of the parent node by one. The problem really arises when a node loses a lot of children, in which case its rank decreases significantly without any nodes higher up in the tree knowing about it. Therefore, we will say that each node is only allowed to lose one child. If a node loses a second child, then we'll cut that node from its parent, which propagates the information that nodes are missing higher up in the tree.
It turns out that this is a great compromise. It lets us do decrease-keys quickly in most contexts (as long as the nodes aren't children of the same tree), and only rarely do we have to "propagate" a decrease-key by cutting a node from its parent and then cutting that node from its grandparent.
To keep track of which nodes have lost children, we'll assign a "mark" bit to each node. Each node will initial have the mark bit cleared, but whenever it loses a child it will have the bit set. If it loses a second child after the bit has already been set, we'll clear the bit, then cut the node from its parent.
Modification 5: Assign a mark bit to each node that is initially false. When a child is cut from an unmarked parent, mark the parent. When a child is cut from a marked parent, unmark the parent and cut the parent from its parent.
In this CS Theory Stack Exchange question and this older Stack Overflow question, I've sketched out a proof that shows that if trees are allowed to be modified in this way, then any tree of order n must contain at least Θ(φn) nodes, where φ is the golden ratio, about 1.61. This means that the number of nodes in each tree is still exponential in the order of the tree, though it's a lower exponent from before. As a result, the analysis we did earlier about the time complexity of the decrease-key operation still holds, though the term hidden in the Θ(log n) bit will be different.
There's one very last thing to consider - what about the complexity of decrease-key? Previously, it was O(1) because we just cut the tree rooted at the appropriate node and moved it to the root list. However, now we might have to do a "cascading cut," in which we cut a node from its parent, then cut that node from its parent, etc. How does that give O(1) time decrease-keys?
The observation here is that we can add a "charge" to each decrease-key operation that we can then spend to cut the parent node from its parent. Since we only cut a node from its parent if it's already lost two children, we can pretend that each decrease-key operation pays for the work necessary to cut its parent node. When we do cut the parent, we can charge the cost of doing so back to one of the earlier decrease-key operations. Consequently, even though any individual decrease-key operation might take a long time to finish, we can always amortize the work across the earlier calls so that the runtime is amortized O(1).
Step Three: Linked Lists Abound!
There is one final detail we have to talk about. The data structure I've described so far is tricky, but it doesn't seem catastrophically complicated. Fibonacci heaps have a reputation for being fearsome... why is that?
The reason is that in order to implement all of the operations described above, the tree structures need to be implemented in very clever ways.
Typically, you'd represent a multiway tree either by having each parent point down to all the children (perhaps by having an array of children) or by using the left-child/right-sibling representation, where the parent has a pointer to one child, which in turn points to a list of the other children. For a binomial heap, this is perfect. The main operation we need to do on trees is a join operation in which we make one root node a child of another, so it's perfectly reasonable to the pointers in the tree directed from parents to children.
The problem in a Fibonacci heap is that this representation is inefficient when considering the decrease-key step. Fibonacci heaps need to support all the basic pointer manipulations of a standard binomial heap and the ability to cut a single child from a parent.
Consider the standard representations of multiway trees. If we represent the tree by having each parent node store an array or list of pointers to its children, then we can't efficiently (in O(1)) remove a child node from the list of children. In other words, the runtime for decrease-key would be dominated by the bookkeeping step of removing the child rather than the logical step of moving a subtree to the root list! The same issue appears in the left-child, right-sibling representation.
The solution to this problem is to store the tree in a bizarre fashion. Each parent node stores a pointer to a single (and arbitrary) one of its children. The children are then stored in a circularly-linked list, and each points back up to its parent. Since it's possible to concatenate two circularly-linked lists in O(1) time and to insert or remove a single entry from one in O(1) time, this makes it possible to efficiently support the necessary tree operations:
Make one tree a child of another: if the first tree has no children, set its child pointer to point to the second tree. Otherwise, splice the second tree into the circularly-linked child list of the first tree.
Remove a child from a tree: splice that child node out of the linked list of children for the parent node. If it's the single node chosen to represent the children of the parent node, choose one of the sibling nodes to replace it (or set the pointer to null if it's the last child.)
There are absurdly many cases to consider and check when performing all these operations simply due to the number of different edge cases that can arise. The overhead associated with all the pointer juggling is one of the reasons why Fibonacci heaps are slower in practice than their asymptotic complexity might suggest (the other big one is the logic for removing the minimum value, which requires an auxiliary data structure).
Modification 6: Use a custom representation of the tree that supports efficient joining of trees and cutting one tree from another.
Conclusion
I hope this answer sheds some light on the mystery that is the Fibonacci heap. I hope that you can see the logical progression from a simpler structure (the binomial heap) to a more complex structure by a series of simple steps based on reasonable insights. It's not unreasonable to want to make insertions amortized-efficient at the expense of deletions, and it's similarly not too crazy to implement decrease-key by cutting out subtrees. From there, the rest of the details are in ensuring that the structure is still efficient, but they're more consequences of the other parts rather than causes.
If you're interested in learning more about Fibonacci heaps, you may want to check out this two-part series of lecture slides. Part one introduces binomial heaps and shows how lazy binomial heaps work. Part two explores Fibonacci heaps. These slides go into more mathematical depth than what I've covered here.

Median of BST in O(logn) time complexity

I came across solution given at http://discuss.joelonsoftware.com/default.asp?interview.11.780597.8 using Morris InOrder traversal using which we can find the median in O(n) time.
But is it possible to achieve the same using O(logn) time? The same has been asked here - http://www.careercup.com/question?id=192816
If you also maintain the count of the number of left and right descendants of a node, you can do it in O(logN) time, by doing a search for the median position. In fact, you can find the kth largest element in O(logn) time.
Of course, this assumes that the tree is balanced. Maintaining the count does not change the insert/delete complexity.
If the tree is not balanced, then you have Omega(n) worst case complexity.
See: Order Statistic Tree.
btw, BigO and Smallo are very different (your title says Smallo).
Unless you guarantee some sort of balanced tree, it's not possible.
Consider a tree that's completely degenerate -- e.g., every left pointer is NULL (nil, whatever), so each node only has a right child (i.e., for all practical purposes the "tree" is really a singly linked list).
In this case, just accessing the median node (at all) takes linear time -- even if you started out knowing that node N was the median, it would still take N steps to get to that node.
We can find the median by using the rabbit and the turtle pointer. The rabbit moves twice as fast as the turtle in the in-order traversal of the BST. This way when the rabbit reaches the end of traversal, the turtle in at the median of the BST.
Please see the full explanation.

Resources