This is something I do not quite understand. When I read literature on heaps, it always says that the big advantage of a heap is that you have the top (max if max heap) element immediately available. But couldn't you just use a BST and store a pointer to the same node (bottom-rightmost) and update the pointer with insertions/deletions?
If I'm not mistaken, with the BST implementation I'm describing you would have
================================================
| Insert | Remove Max
================================================
Special BST | O(log(n)) | O(1)
================================================
Max Heap | O(log(n)) | O(log(n))
================================================
making it better.
Pseudo-code:
Insert:
Same as regular BST insert, but can keep track of whether
item inserted is max because traversal will be entirely
in the right direction.
Delete
Set parent of max equal to null. Done.
What am I missing here?
But couldn't you just use a BST and store a pointer to the same node (bottom-rightmost) and update the pointer with insertions/deletions?
Yes, you could.
with the BST implementation I'm describing you would have [...] Remove Max O(1) [...] making it better.
[...] Set parent of max equal to null. Done.
No, Max removal wouldn't (always) be O(1), for the following reasons:
After you have removed the Max, you need to also update the pointer to reference the bottom right-most node. For example, take this tree, before the Max is removed:
8
/ \
5 20 <-- Max pointer
/ /
2 12
/ \
10 13
\
14
You'll have to find the node with value 14, so to update the Max pointer.
The above operation can be made to be O(1), by keeping the tree balanced, let's say according to the AVL rules. In that case the left child of the previous Max node would not have a right child, and the new Max node would be either its left child, or if it didn't have one, its parent. But as some deletions will make the tree unbalanced, they would need to be followed by a rebalancing operation. And that may involve several rotations. For instance, take this balanced BST:
8
/ \
5 13
/ \ / \
2 6 9 15 <-- Max pointer
/ \ \ \
1 4 7 10
/
3
After removal of node 15, it is easy to determine that 13 is the next Max, but the subtree rooted at 13 would not be balanced. After balancing it, the tree as a whole is unbalanced, and another rotation would be needed. The number of rotations could be O(logn).
Concluding, you can use a balanced BST with a Max pointer, but extraction of the Max node is still a O(logn) operation, giving it the same time complexity as the same operation in a binary heap.
What can a binary heap do that a binary search tree cannot?
Considering that a binary heap uses no pointers, and thus has much less "administrative" overhead than a self-balancing BST, the actual space consumption and runtime of the insert/delete operations will be better by a factor -- while their asymptotic complexity is the same.
Also, a binary heap can be built from a non-sorted array in O(n) time, while building a BST costs O(nlogn).
However, a BST is the way to go when you need to be able to traverse the values in their proper order, or find a value, or find a value's predecessor/successor. A binary heap has worse time complexities for such operations.
Both max heaps and balanced BST’S (eg AVL trees) perform these operations in O(log n) time. But BST’s take a constant factor more space due to pointers and their code is more complicated.
Since you're talking about BST's and not Balanced BST's, consider the following skewed BST:
1
\
2
\
3
\
...
\
n
You can hold a pointer reference to the max (n-th) element, but if you're inserting a value < n, it will require O(n) insertion time in the worst case. Also, to see the max value in the heap, you could simply do heap[0] (assuming the heap is implemented using an array) to get the max element in O(1) time for heap as well.
Related
If you have an AVL tree, what's the best way to get the median from it? The median would be defined as the element with index ceil(n/2) (index starts with 1) in the sorted list.
So if the list was
1 3 5 7 8
the median is 5. If the list was
1 3 5 7 8 10
the median is 5.
If you can augment the tree, I think it's best to let each node know the size (number of nodes) of the subtree, (i.e. 1 + left.size + right.size). Using this, the best way I can think of makes median searching O(lg n) time because you can traverse by comparing indexes.
Is there a better way?
Augmenting the AVL tree to store subtree sizes is generally the best approach here if you need to optimize over median queries. It takes time O(log n), which is pretty fast.
If you'll be computing the median a huge number of times, you could potentially use an augmented tree and also cache the median value so that you can read it in time O(1). Each time you do an insertion or deletion, you might need to recompute the median in time O(log n), which will slow things down a bit but not impact the asymptotic costs.
Another option would be to thread a doubly-linked list through the nodes in the tree so that you can navigate from a node to its successor or predecessor in constant time. If you do that, then you can store a pointer to the median element, and then on an insertion or a deletion, move the pointer to the left or to the right as appropriate. If you delete the median itself, you can just move the pointer left or right as you'd like. This doesn't require any augmentation and might be a bit faster, but it adds two extra pointers into each node.
Hope this helps!
Balanced BST and max heap both perform insert and delete in O(logn). However, finding max value in a max heap is O(1) but this is O(logn) in balanced BST.
If we remove the max value in a max heap it takes O(logn) because it is a delete operation.
In balanced BST, deleting the max element = finding max value + delete; it equals logn + logn reduces to O(logn). So even deleting the max value in balanced BST is O(logn).
I have read one such application of max heap is a priority queue and its primary purpose is to remove the max value for every dequeue operation. If deleting max element is O(logn) for both max heap and balanced BST, I have the following questions
What is the purpose of a max heap in the priority queue just because it is easy to implement rather than using full searchable balanced BST?
Since there is no balancing factor calculation, the max heap can be called an unbalanced binary tree?
Every balanced BST can be used as a priority queue and which is also searchable in O(logn) however max heap search is O(n) correct?
All the time complexities are calculated for worst-case. Any help is greatly appreciated.
What is the purpose of a max heap in the priority queue just because it is easy to implement rather than using full searchable balanced BST?
Some advantages of a heap are:
Given an unsorted input array, a heap can still be built in O(n) time, while a BST needs O(nlogn) time.
If the initial input is an array, that same array can serve as heap, meaning no extra memory is needed for it. Although one could think of ways to create a BST using the data in-place in the array, it would be quite odd (for primitive types) and give more processing overhead. A BST is usually created from scratch, copying the data into the nodes as they are created.
Interesting fact: a sorted array is also a heap, so if it is known that the input is sorted, nothing needs to be done to build the heap.
A heap can be stored as an array without the need of storing cross references, while a BST usually consists of nodes with left & right references. This has at least two consequences:
The memory used for a BST is about 3 times greater than for a heap.
Although several operations have the same time complexity for both heap and BST, the overhead for adapting a BST is much greater, so that the actual time spent on these operations is a (constant) factor greater in the BST case.
Since there is no balancing factor calculation, the max heap can be called an unbalanced binary tree?
A heap is in fact a complete binary tree, so it is always as balanced as it can be: the leaves will always be positioned in the last or one-but-last level. A self-balancing BST (like AVL, red-black,...) cannot beat that high level of balancing, where you will often have leaves occurring at three levels or even more.
Every balanced BST can be used as a priority queue and which is also searchable in O(logn) however max heap search is O(n) correct?
Yes, this is true. So if the application needs the search feature, then a BST is superior.
What is the purpose of a max heap in the priority queue just because it is easy to implement rather than using full searchable balanced BST?
Nope. Max heap fits better, since it is carefully instrumented to return next (respecting priority) element ASAP, in O(1) time. That's what you want from the simplest possible priority queue.
Since there is no balancing factor calculation, the max heap can be called an unbalanced binary tree?
Nope. There is a balance as well. Long story short, balancing a heap is done by shift-up or shift-down operations (swapping elements which are out of order).
Every balanced BST can be used as a priority queue and which is also searchable in O(logn) however max heap search is O(n) correct?
Yeah! As well as linked list could be used or array. It is just gonna be more expensive in terms of O-notation and much slower on practice.
Does a skewed binary tree take more space than, say, a perfect binary tree ?
I was solving the question #654 - Maximum Binary Tree on Leetcode, where given an array you gotta make a binary tree such that, the root is the maximum number in the array and the right and left sub-tree are made on the same principle by the sub-array on the right and left of the max number, and there its concluded that in average and best case(perfect binary tree) the space taken would be O(log(n)), and worst case(skewed binary tree) would be O(n).
For example, given nums = [1,3,2,7,4,6,5],
the tree would be as such,
7
/ \
3 6
/ \ / \
1 2 4 5
and if given nums = [7,6,5,4,3,2,1],
the tree would be as such,
7
\
6
/ \
5
/ \
4
/ \
3
/ \
2
/ \
1
According to my understanding they both should take O(n) space, since they both have n nodes. So i don't understand how they come to that conclusion.
Thanks in advance.
https://leetcode.com/problems/maximum-binary-tree/solution/
Under "Space complexity," it says:
Space complexity : O(n). The size of the set can grow upto n in the worst case. In the average case, the size will be nlogn for n elements in nums, giving an average case complexity of O(logn).
It's poorly worded, but it is correct. It's talking about the amount of memory required during construction of the tree, not the amount of memory that the tree itself occupies. As you correctly pointed out, the tree itself will occupy O(n) space, regardless if it's balanced or degenerate.
Consider the array [1,2,3,4,5,6,7]. You want the root to be the highest number, and the left to be everything that's to the left of the highest number in the array. Since the array is in ascending order, what happens is that you extract the 7 for the root, and then make a recursive call to construct the left subtree. Then you extract the 6 and make another recursive call to construct that node's left subtree. You continue making recursive calls until you place the 1. In all, you have six nested recursive calls: O(n).
Now look what happens if your initial array is [1,3,2,7,5,6,4]. You first place the 7, then make a recursive call with the subarray [1,3,2]. Then you place the 3 and make a recursive call to place the 1. Your tree is:
7
3
1
At this point, your call depth is 2. You return and place the 2. Then return from the two recursive calls. The tree is now:
7
3
1 2
Constructing the right subtree also requires a call depth of 2. At no point is the call depth more than two. That's O(log n).
It turns out that the call stack depth is the same as the tree's height. The height of a perfect tree is O(log n), and the height of a degenerate tree is O(n).
Consider a binary tree and some traverse criterion that defines an ordering of the tree's elements.
Does it exists some particular traverse criterion that would allow a begin_insert operation, i.e. the operation of adding a new element that would be at position 1 according to the ordering induced by the traverse criterion, with O(log(N)) cost?
I don't have any strict requirement, like the tree guaranteed to be balanced.
EDIT:
But I cannot accept lack of balance if that allows degeneration to O(N) in worst case scenarios.
EXAMPLE:
Let's try to see if in-order traversal would work.
Consider the BT (not a binary search tree)
6
/ \
13 5
/ \ /
2 8 9
In-order traversal gives 2-13-8-6-9-5
Perform begin_insert(7) in such a way that in-order traversal gives 7-2-13-8-6-9-5:
6
/ \
13 5
/ \ /
2 8 9
/
7
Now, I think this is not a legitimate O(log(N)) strategy, because if I keep adding values in this way the cost degenerates into O(N) as the tree becomes increasingly unbalanced
6
/ \
13 5
/ \ /
2 8 9
/
7
/
*
/
*
/
This strategy would work if I rebalance the tree by preserving ordering:
8
/ \
2 9
/ \ / \
7 13 6 5
but this costs at least O(N).
According to this example my conclusion would be that in-order traversal does not solve the problem, but since I received feedback that it should work maybe I am missing something?
Inserting, deleting and finding in a binary tree all rely on the same search algorithm to find the right position to do the operation. The complexity of this O(max height of the tree). The reason is that to find the right location you start at the root node and compare keys to decide if you should go into the left subtree or the right subtree and you do this until you find the right location. The worst case is when you have to travel down the longest chain which is also the definition for height of the tree.
If you don't have any constraints and allow any tree then this is going to be O(N) since you allow a tree with only left children (for example).
If you want to get better guarantees you must use algorithms that promise that the height of the tree has a lower bound. For example AVL guarantees that your tree is balanced so the max height is always log N and all the operations above run in O(log N). Red-black trees don't guarantee log N but promise that the tree is not going to be too unbalanced (min height * 2 >= max height) so it keeps O(log N) complexity (see page for details).
Depending on your usage patterns you might be able to find more specialized data structures that give even better complexity (see Fibonacci heap).
If you have an AVL tree, what's the best way to get the median from it? The median would be defined as the element with index ceil(n/2) (index starts with 1) in the sorted list.
So if the list was
1 3 5 7 8
the median is 5. If the list was
1 3 5 7 8 10
the median is 5.
If you can augment the tree, I think it's best to let each node know the size (number of nodes) of the subtree, (i.e. 1 + left.size + right.size). Using this, the best way I can think of makes median searching O(lg n) time because you can traverse by comparing indexes.
Is there a better way?
Augmenting the AVL tree to store subtree sizes is generally the best approach here if you need to optimize over median queries. It takes time O(log n), which is pretty fast.
If you'll be computing the median a huge number of times, you could potentially use an augmented tree and also cache the median value so that you can read it in time O(1). Each time you do an insertion or deletion, you might need to recompute the median in time O(log n), which will slow things down a bit but not impact the asymptotic costs.
Another option would be to thread a doubly-linked list through the nodes in the tree so that you can navigate from a node to its successor or predecessor in constant time. If you do that, then you can store a pointer to the median element, and then on an insertion or a deletion, move the pointer to the left or to the right as appropriate. If you delete the median itself, you can just move the pointer left or right as you'd like. This doesn't require any augmentation and might be a bit faster, but it adds two extra pointers into each node.
Hope this helps!