I'm studying for some technical interviews coming up and was just going over lecture slides from a year or two ago about data structures.
I'm not clear on why the worst case runtimes of merge for a leftist heap is O(log n) whereas for a skew heap it is O(n), when a skew heap essentially merges in the same way as a leftist heap.
A leftist heap merges A and B by picking the tree with the smaller root and recursively merging its right subtree with the larger tree. Then it checks the null path lengths and swaps its two subtrees if it violates leftist structure property.
A skew heap does the same thing but blindly swaps its two subtrees every time as it recursively merges A and B.
Why would the worst case of merge for a skew heap become O(n)? Is it because we can't guarantee a height bound as it recursively merges (since it's swapping sides every time)? Does this have to do with Floyd's Algorithm, that the sum of the heights from all nodes in a tree grows in O(n)?
leftist heap has a right path of length at most log(N+1). While skew heap's right path can be arbitrarily long(it can be N). Since the performance of merge depends on the length of the right path, so the worst-case merge times are like this. However, I don't know how skew heap does find. Can you give me some special case that the right path of skew heap is as long as N?
Related
How is the bottom up approach of heap construction of the order O(n) ? Anany Levitin says in his book that this is more efficient compared to top down approach which is of order O(log n). Why?
That to me seems like a typo.
There are two standard algorithms for building a heap. The first is to start with an empty heap and to repeatedly insert elements into it one at a time. Each individual insertion takes time O(log n), so we can upper-bound the cost of this style of heap-building at O(n log n). It turns out that, in the worst case, the runtime is Θ(n log n), which happens if you insert the elements in reverse-sorted order.
The other approach is the heapify algorithm, which builds the heap directly by starting with each element in its own binary heap and progressively coalescing them together. This algorithm runs in time O(n) regardless of the input.
The reason why the first algorithm requires time Θ(n log n) is that, if you look at the second half of the elements being inserted, you'll see that each of them is inserted into a heap whose height is Θ(log n), so the cost of doing each bubble-up can be high. Since there are n / 2 elements and each of them might take time Θ(log n) to insert, the worst-case runtime is Θ(n log n).
On the other hand, the heapify algorithm spends the majority of its time working on small heaps. Half the elements are inserted into heaps of height 0, a quarter into heaps of height 1, an eighth into heaps of height 2, etc. This means that the bulk of the work is spent inserting elements into small heaps, which is significantly faster.
If you consider swapping to be your basic operation -
In top down construction,the tree is constructed first and a heapify function is called on the nodes.The worst case would swap log n times ( to sift the element to the top of the tree where height of tree is log n) for all the n/2 leaf nodes. This results in a O(n log n) upper bound.
In bottom up construction, you assume all the leaf nodes to be in order in the first pass, so heapify is now called only on n/2 nodes. At each level, the number of possible swaps increases but the number of nodes on which it happens decreases.
For example -
At the level right above leaf nodes,
we have n/4 nodes that can have at most 1 swap each.
At its' parent level we have,
n/8 nodes that can have at most 2 swaps each and so on.
On summation, we'll come up with a O(n) efficiency for bottom up construction of a heap.
It generally refers to a way of solving a problem. Especially in computer science algorithms.
Top down :
Take the whole problem and split it into two or more parts.
Find solution to these parts.
If these parts turn out to be too big to be solved as a whole, split them further and find find solutions to those sub-parts.
Merge solutions according to the sub-problem hierarchy thus created after all parts have been successfully solved.
In the regular heapify(), we perform two comparisons on each node from top to bottom to find the largest of three elements:
Parent node with left child
The larger node from the first comparison with the second child
Bottom up :
Breaking the problem into smallest possible(and practical) parts.
Finding solutions to these small sub-problems.
Merging the solutions you get iteratively(again and again) till you have merged all of them to get the final solution to the "big" problem. The main difference in approach is splitting versus merging. You either start big and split "down" as required or start with the smallest and merge your way "up" to the final solution.
Bottom-up Heapsort, on the other hand, only compares the two children and follows the larger child to the end of the tree ("top-down"). From there, the algorithm goes back towards the tree root (“bottom-up”) and searches for the first element larger than the root. From this position, all elements are moved one position towards the root, and the root element is placed in the field that has become free.
Binary Heap can be built in two ways:
Top-Down Approach
Bottom-Up Approach
In the Top-Down Approach, first begin with 3 elements. You consider 2 of them as heaps and the third as a key k. You then create a new Heap by joining these two sub-heaps with the key as the root node. Then, you perform Heapify to maintain the heap order (either Min or Max Heap order).
The, we take two such heaps(containing 3 elements each) and another element as a k, and create a new heap. We keep repeating this process, and increasing the size of each sub-heap until all elements are added.
This process adds half the elements in the bottom level, 1/4th in the second last one, 1/8th in the third last one and so on, therefore, the complexity of this approach results in a nearly observed time of O(n).
In the bottom up approach, we first simply create a complete binary tree from the given elements. We then apply DownHeap operation on each parent of the tree, starting from the last parent and going up the tree until the root. This is a much simpler approach. However, as DownHeap's worst case is O(logn) and we will be applying it on n/2 elements of the tree; the time complexity of this particular method results in O(nlogn).
Regards.
Fibonacci heaps are efficient in an amortized sense, but how efficient are they in the worst case? Specifically, what is the worst-case time complexity of each of these operations on an n-node Fibonacci heap?
find min
delete-min
insert
decrease-key
merge
The find-min operation on a Fibonacci heap always takes worst-case O(1) time. There's always a pointer maintained that directly points to that object.
The cost of a delete-min, in the worst-case, takes time Θ(n). To see this, imagine starting with an empty heap and doing a series of n insertions into it. Each node will be stored in its own tree, and doing a delete-min the heap will coalesce all these objects into O(log n) trees, requiring Θ(n) work to visit all the nodes at least once.
The cost of an insertion is worst-case O(1); this is just creating a single node and adding it to the list. A merge is similarly O(1) since it just splices two lists together.
The cost of a decrease-key in the worst case is Θ(n). It's possible to build a degenerate Fibonacci heap in which all the elements are stored in a single tree consisting of a linked list of n marked nodes. Doing a decrease-key on the bottommost node then triggers a run of cascading cuts that will convert the tree into n independent nodes.
I almost agree with the great answer from #templatetypedef.
There cannot be a tree of a classical Fibonacci heap with $n$ marked nodes. This would mean that the height of the tree is O(n), but since for each subtree of rank $k$ its children are of ranks $\geq 0, \geq 1, ... , \geq k-1$. It is easy to see that the depth of the tree is at most O(logn). And therefore a single Decrease-key operation can cost O(logn).
I checked this thread, and it takes some modification of the Fibonacci heap, as it has marked node in the root list and does operation which do not belong to the Fibonacci heap.
In the CLRS book, building a heap by top-down heapify has the complexity O(n). A heap can also be built by repeatedly calling insertion, which has the complexity nlg(n) in the worst case.
My question is: is there any insight why the latter method has the worse performance?
I asked this question since I feel there are simple insights behind the math. For example,
quicksort, merge sort, and heapsort are all based on reducing unnecessary comparisons, but with different methods.
quicksort: balanced partition, no need to compare left subset to right subset.
merge sort: simply compare the two minimum elements from two sub-arrays.
heapsort: if A has larger value than B, A has larger value than B's descendants, and no need to compare with them.
The main difference between the two is what direction they work: upwards (the O(n log n) algorithm) or downwards (the O(n)) algorithm.
In the O(n log n) algorithm done by making n insertions, each insertion might potentially bubble up an element from the bottom of the (current) heap all the way up to the top. So imagine that you've built all of the heap except the last full layer. Imagine that every time you do an insertion in that layer, the value you've inserted is the smallest overall value. In that case, you'd have to bubble the new element all the way up to the top of the heap. During this time, the heap has height (roughly) log n - 1, so the total number of swaps you'll have to do is (roughly) n log n / 2 - n / 2, giving a runtime of Θ(n log n) in the worst-case.
In the O(n) algorithm done by building the heap in one pass, new elements are inserted at the tops of various smaller heaps and then bubbled down. Intuitively, there are progressively fewer and fewer elements higher and higher up in the heap, so most of the work is spent on the leaves, which are lower down, than in the higher elements.
The major difference in the runtimes has to do with the direction. In the O(n log n) version, since elements are bubbled up, the runtime is bounded by the sum of the lengths of the paths from each node to the root of the tree, which is Θ(n log n). In the O(n) version, the runtime is bounded by the lengths of the paths from each node to the leaves of the tree, which is much lower (O(n)), hence the better runtime.
Hope this helps!
One standard implementation of the Dijkstra algorithm uses a heap to store distances from the starting node S to all unexplored nodes. The argument for using a heap is that we can efficiently pop the minimum distance from it, in O(log n). However, to maintain the invariant of the algorithm, one also needs to update some of the distances in the heap. This involves:
popping non-min elements from the heaps
computing the updated distances
inserting them back into the heap
I understand that popping non-min elements from a heap can be done in O(log n) if one knows the location of that element in the heap. However, I fail to understand how one can know this location in the case of the Dijkstra algorithm. It sounds like a binary search tree would be more appropriate.
More generally, my understanding is that the only thing that a heap can do better than a balanced binary search tree is to access (without removing) the min element. Is my understanding correct?
However, I fail to understand how one can know this location in the case of the Dijkstra algorithm.
You need an additional array that keeps track of where in the heap the elements live, or an extra data member inside the heap's elements. This has to be updated after each heap operation.
the only thing that a heap can do better than a balanced binary search tree is to access (without removing) the min element
Even a BST can be amended to keep a pointer to the min element in addition to the root pointer, giving O(1) access to the min (effectively amortizing the O(lg n) work over the other operations).
The only advantage of heaps in terms of worst-case complexity is the "heapify" algorithm, which turns an array into a heap by reshuffling its elements in-place, in linear time. For Dijkstra's, this doesn't matter, since it's going to do n heap operations of O(lg n) cost apiece anyway.
The real reason for heaps, then, is constants. A properly implemented heap is just a contiguous array of elements, while a BST is a pointer structure. Even when a BST is implemented inside an array (which can be done if the number of elements is known from the start, as in Dijkstra's), the pointers take up more memory, and navigating them takes more time than the integer operations that are used to navigate a heap.
We know that heaps and red-black tree both have these properties:
worst-case cost for searching is lgN;
worst-case cost for insertion is lgN.
So, since the implementation and operation of red-black trees is difficult, why don't we just use heaps instead of red-black trees? I am confused.
You can't find an arbitrary element in a heap in O(log n). It takes O(n) to do this. You can find the first element (the smallest, say) in a heap in O(1) and extract it in O(log n). Red-black trees and heaps have quite different uses, internal orderings, and implementations: see below for more details.
Typical use
Red-black tree: storing dictionary where as well as lookup you want elements sorted by key, so that you can for example iterate through them in order. Insert and lookup are O(log n).
Heap: priority queue (and heap sort). Extraction of minimum and insertion are O(log n).
Consistency constraints imposed by structure
Red-black tree: total ordering: left child < parent < right child.
Heap: dominance: parent < children only.
(note that you can substitute a more general ordering than <)
Implementation / Memory overhead
Red-black tree: pointers used to represent structure of tree, so overhead per element. Typically uses a number of nodes allocated on free store (e.g. using new in C++), nodes point to other nodes. Kept balanced to ensure logarithmic lookup / insertion.
Heap: structure is implicit: root is at position 0, children of root at 1 and 2, etc, so no overhead per element. Typically just stored in a single array.
Red Black Tree:
Form of a binary search tree with a deterministic balancing strategy. This Balancing guarantees good performance and it can always be searched in O(log n) time.
Heaps:
We need to search through every element in the heap in order to determine if an element is inside. Even with optimization, I believe search is still O(N). On the other hand, It is best for finding min/max in a set O(1).