How do I merge two heap trees (the kind of the trees is not defined), when the size of one is n and of the other is m in O(log(n+m))?
The question is indeed absurd. Even copying a single heap into a new heap tree consumes O(N log N) time. How can you merge two such trees in only log(N) time complexity. Even in the best case, when the two heaps are identical, the time required to copy the nodes to a new heap tree takes O(N) time atleast.
Related
A heap can be constructed from a list in O(n logn) time, because inserting an element into a heap takes O(logn) time and there are n elements.
Similarly, a binary search tree can be constructed from a list in O(n logn) time, because inserting an element into a BST takes on average logn time and there are n elements.
Traversing a heap from min-to-max takes O(n logn) time (because we have to pop n elements, and each pop requires an O(logn) sink operation). Traversing a BST from min-to-max takes O(n) time (literally just inorder traversal).
So, it appears to me that constructing both structures takes equal time, but BSTs are faster to iterate over. So, why do we use "Heapsort" instead of "BSTsort"?
Edit: Thank you to Tobias and lrlreon for your answers! In summary, below are the points why we use heaps instead of BSTs for sorting.
Construction of a heap can actually be done in O(n) time, not O(nlogn) time. This makes heap construction faster than BST construction.
Additionally, arrays can be easily transformed into heaps in-place, because heaps are always complete binary trees. BSTs can't be easily implemented as an array, since BSTs are not guaranteed to be complete binary trees. This means that BSTs require additional O(n) space allocation to sort, while Heaps require only O(1).
All operations on heaps are guaranteed to be O(logn) time. BSTs, unless balanced, may have O(n) operations. Heaps are dramatically simpler to implement than Balanced BSTs are.
If you need to modify a value after creating the heap, all you need to do is apply the sink or swim operations. Modifying a value in a BST is much more conceptually difficult.
There are multiple reasons I can imagine you would want to prefer a (binary) heap over a search tree:
Construction: A binary heap can actually be constructed in O(n) time by applying the heapify operations bottom-up from the smallest to the largest subtrees.
Modification: All operations of the binary heap are rather straightforward:
Inserted an element at the end? Sift it up until the heap condition holds
Swapped the last element to the beginning? Swift it down until the heap condition holds
Changed the key of an entry? Sift it up or down depending on the direction of the change
Conceptual simplicity: Due to its implicit array representation, a binary heap can be implemented by anyone who knows the basic indexing scheme (2i+1, 2i+2 are the children of i) without considering many difficult special cases.
If you look at these operations in a binary search tree, in theory
they are also quite simple, but the tree has to be stored explicitly, e.g. using pointers, and most of the operations require the tree to be
rebalanced to preserve the O(log n) height, which requires complicated rotations (red black-trees) or splitting/merging
nodes (B-trees)
EDIT: Storage: As Irleon pointed out, to store a BST you also need more storage, as at least two child pointers need to be stored for every entry in addition to the value itself, which can be a large storage overhead especially for small value types. At the same time, the heap needs no additional pointers.
To answer your question about sorting: A BST takes O(n) time to traverse in-order, the construction process takes O(n log n) operations which, as mentioned before, are much more complex.
At the same time Heapsort can actually be implemented in-place by building a max-heap from the input array in O(n) time and and then repeatedly swapping the maximum element to tbe back and shrinking the heap. You can think of Heapsort as Insertion sort with a helpful data structure that lets you find the next maximum in O(log n) time.
If the sorting method consists of storing the elements in a data structure and after extracting in a sorted way, then, although both approaches (heap and bst) have the same asymptotic complexity O(n log n), the heap tends to be faster. The reason is the heap always is a perfectly balanced tree and its operations always are O(log n), in a determistic way, not on average. With bst's, depending on the approah for balancing, insertion and deletion tend to take more time than the heap, no matter which balancing approach is used. In addition, a heap is usually implemented with an array storing the level traversal of the tree, without the need of storing any kind of pointers. Thus, if you know the number of elements, which usually is the case, the extra storage required for a heap is less than the used for a bst.
In the case of sorting an array, there is a very important reason which it would rather be preferable a heap than a bst: you can use the same array for storing the heap; no need to use additional memory.
Fibonacci heaps are efficient in an amortized sense, but how efficient are they in the worst case? Specifically, what is the worst-case time complexity of each of these operations on an n-node Fibonacci heap?
find min
delete-min
insert
decrease-key
merge
The find-min operation on a Fibonacci heap always takes worst-case O(1) time. There's always a pointer maintained that directly points to that object.
The cost of a delete-min, in the worst-case, takes time Θ(n). To see this, imagine starting with an empty heap and doing a series of n insertions into it. Each node will be stored in its own tree, and doing a delete-min the heap will coalesce all these objects into O(log n) trees, requiring Θ(n) work to visit all the nodes at least once.
The cost of an insertion is worst-case O(1); this is just creating a single node and adding it to the list. A merge is similarly O(1) since it just splices two lists together.
The cost of a decrease-key in the worst case is Θ(n). It's possible to build a degenerate Fibonacci heap in which all the elements are stored in a single tree consisting of a linked list of n marked nodes. Doing a decrease-key on the bottommost node then triggers a run of cascading cuts that will convert the tree into n independent nodes.
I almost agree with the great answer from #templatetypedef.
There cannot be a tree of a classical Fibonacci heap with $n$ marked nodes. This would mean that the height of the tree is O(n), but since for each subtree of rank $k$ its children are of ranks $\geq 0, \geq 1, ... , \geq k-1$. It is easy to see that the depth of the tree is at most O(logn). And therefore a single Decrease-key operation can cost O(logn).
I checked this thread, and it takes some modification of the Fibonacci heap, as it has marked node in the root list and does operation which do not belong to the Fibonacci heap.
What's the worst case time complexity in a log-structured merge tree for a simple search query (like querying a single WHERE clause)?
Is it O(log N)? O(N*Log N)? Something else?
How about for a multiple query, like searching for multiple WHERE clauses in a key-value database?
The wikipedia page on LSM trees is currently lacking this info.
And I'm trying to make sense of the original paper.
I have been wondering the same.
If you have a series of trees, getting smaller by a constant factor each time, and you need to search them all for a single key, the cost seems to be O(log(N)^2).
Say the first (binary) tree takes log_2(N) branches to reach a node. The second might be half the size, and take (log_2(N) - 1) branches to find a node. The smallest tree will be some O(1) constant in size and there are roughly log_2(N) trees total. Summing the series gives O(log_2(N)^2).
However, I'm wondering if there is some more clever scheme where arbitrary single-key lookups, insertions or deletions have amortized cost O(log(N)), but haven't been able to find an answer (yet).
For a simple search indexed by a LSM tree, it is O(log n). This is because the biggest tree in the LSM tree is a B tree, which is O(log n), and the other trees are subsets of B trees or in the case of in memory trees, more efficient trees, which are no worse than O(log n). The number of trees is a constant, so it doesn't affect the order of the search time.
We know that heaps and red-black tree both have these properties:
worst-case cost for searching is lgN;
worst-case cost for insertion is lgN.
So, since the implementation and operation of red-black trees is difficult, why don't we just use heaps instead of red-black trees? I am confused.
You can't find an arbitrary element in a heap in O(log n). It takes O(n) to do this. You can find the first element (the smallest, say) in a heap in O(1) and extract it in O(log n). Red-black trees and heaps have quite different uses, internal orderings, and implementations: see below for more details.
Typical use
Red-black tree: storing dictionary where as well as lookup you want elements sorted by key, so that you can for example iterate through them in order. Insert and lookup are O(log n).
Heap: priority queue (and heap sort). Extraction of minimum and insertion are O(log n).
Consistency constraints imposed by structure
Red-black tree: total ordering: left child < parent < right child.
Heap: dominance: parent < children only.
(note that you can substitute a more general ordering than <)
Implementation / Memory overhead
Red-black tree: pointers used to represent structure of tree, so overhead per element. Typically uses a number of nodes allocated on free store (e.g. using new in C++), nodes point to other nodes. Kept balanced to ensure logarithmic lookup / insertion.
Heap: structure is implicit: root is at position 0, children of root at 1 and 2, etc, so no overhead per element. Typically just stored in a single array.
Red Black Tree:
Form of a binary search tree with a deterministic balancing strategy. This Balancing guarantees good performance and it can always be searched in O(log n) time.
Heaps:
We need to search through every element in the heap in order to determine if an element is inside. Even with optimization, I believe search is still O(N). On the other hand, It is best for finding min/max in a set O(1).
I'm studying for some technical interviews coming up and was just going over lecture slides from a year or two ago about data structures.
I'm not clear on why the worst case runtimes of merge for a leftist heap is O(log n) whereas for a skew heap it is O(n), when a skew heap essentially merges in the same way as a leftist heap.
A leftist heap merges A and B by picking the tree with the smaller root and recursively merging its right subtree with the larger tree. Then it checks the null path lengths and swaps its two subtrees if it violates leftist structure property.
A skew heap does the same thing but blindly swaps its two subtrees every time as it recursively merges A and B.
Why would the worst case of merge for a skew heap become O(n)? Is it because we can't guarantee a height bound as it recursively merges (since it's swapping sides every time)? Does this have to do with Floyd's Algorithm, that the sum of the heights from all nodes in a tree grows in O(n)?
leftist heap has a right path of length at most log(N+1). While skew heap's right path can be arbitrarily long(it can be N). Since the performance of merge depends on the length of the right path, so the worst-case merge times are like this. However, I don't know how skew heap does find. Can you give me some special case that the right path of skew heap is as long as N?