Heap vs binary search tree (when it is better than the other?) - algorithm

In what situations is using a min-heap more efficient than using a binary search tree? Is it true that the time of finding the minimum in a binary search tree is equal to finding minimum value in min-heap - O(1)?

This is almost like comparing coffee cups and koala bears. Heaps and binary search trees are intended to perform very different functions. A heap is an implementation of the priority queue abstract data type. At the basic level, a priority queue (and thus a heap) is just a bag where you put things, and when you reach in to get an item out you always get the smallest (min-heap) or largest (max-heap) item in the bag.
You can get fancy and give your heap the ability to remove any arbitrary item, or to change the priority of an item in the heap, but those are more advanced functionality and don't fall within the bounds of the traditional definition of the heap data structure.
A binary search tree is a much different beast. It's a bag where you put things, and you can quickly reach in to grab any item by key or you can list all of the items in order (or reverse order).
You can use a binary search tree to implement a priority queue, meaning that you could in principle replace a heap with a binary tree. The binary search tree wouldn't perform as well as the heap, but it would get the job done.
But the reverse isn't true. You can't use a heap to replace a binary search tree.
So the question of which is better is really a question of what do you want to do?
If you want an ordered set of items from which you can quickly locate any item, or that you can traverse in order, then you want a binary search tree.
If you want an implementation of the priority queue abstract data type: a bag that will quickly give you the smallest (or largest, depending on how you define it) item when you ask for it, then you want to use a heap.

The two have different uses and are not interchangeable.
A heap is a structure that guarantees you that the value of a given node is lower or equal (for a min heap; greater or equal for a max heap) than the value of any node underneath. This allows to get the minimum (or maximum) value in O(1).
A binary search tree is a structure that keeps all nodes ordered. This allows to retrieve any value in O(h) (h being the height of the tree, and h=log2(n) if the tree is balanced, with n the number of nodes).

Related

Implementation of Double Ended Priority Queue using Binary Search Tree/Linked List

I have to implement a Doubly Ended Priority Queue using both Doubly Linked list as well as Binary search Tree.
Main functions should be getMin() and getMax()
Using Doubly Linked List:
The idea to get minimum and maximum element in O(1) is to insert small elements at one side of list and greater elements on other side, but there will be problem in insertion of elements everytime(It will not be O(1) then)
Is there any better way to implement it ?
Using BST:
I couldn't understand how will I be able to implement the getMin() and getMax() in BST.
Normal priority queues are usually implemented using heap so we can get the top value easily in O(1) and insert new elements in O(logn). I doubt there is a way to implement a priority queue using doubly linked lists that gets the same asymtoptical complexity, let alone a double-ended priority queue (i can be wrong though). Using a BST we can do both operations in O(logn):
Insertions and deletions are the same as in your usual BST
To get the min value, start a traversal at the root and to all the way to the left until the current node has no left child. The last node you visited contains the min value
To get the max value, start a traversal at the root and go all the way to the right until the current node has no right child. The last node you visited contains the max value
Of course, getMin and getMax will only be O(logn) if the BST is balanced, otherwise it can degenerate to O(n)
insertion of elements everytime(It will not be O(1) then) Is there any
better way to implement it ?
what you are basically trying to do is to search for the first bigger element in order which can't be done in O(1). Linear search (which means going through elements one by one) is probably the best way to do this. If you have huge lists and want to focus on efficiency you could use exponential search or interpolation search (interpolation only works if you know about the probability of the stored keys) but you can't get closer than O(loglog(n)).
I couldn't understand how will I be able to implement the getMin() and
getMax() in BST.
If you're not allowed to add any additional structure to your BST then the only way to get min and max is traversal as Lucas Sampaio mentioned already.
However it can be helpful to store a reference to the current minimum and maximum so you can access them faster

Removing multiple items from balancing binary tree at once

I'm using a red-black binary tree with linked leafs on a project (Java's TreeMap), to quickly find and iterate through the items. The problem is that I can easily get 35000 items or so on the tree, and several times I have to remove "all items above X", which can be almost the entire tree (like 30000 items at once, because all of them are bigger than X), and that takes too much time to remove them and rebalance the tree each time.
Is there any algorithm that can help me here (so I can make my own tree implementation)?
You're looking for the split operation on a red/black tree, which takes the red/black tree and some value k and splits it into two red/black trees, one with all keys greater than or equal to k and one with all keys less than k. This can be implemented in O(log n) time if you augment the structure to store some extra information. In your case, since you're using Java, you can just split the tree and discard the root of the tree you don't care about so that the garbage collector can handle it.
Details on how to implement this are given in this paper, starting on page 9. It's implemented in terms of a catenate (or join) operations which combines two trees, but I think the exposition is pretty clear.
Hope this helps!

If you have a binomial heap of size 14, how can you tell which node is the root node?

Hi guys, I just had a question about this diagram.
How can I tell which node is the root node and how would I heapify something like this?
Thank you.
Edit: Sorry, when I said heapify I meant make a max heap.
Normally with a regular heap, I would go from left to right, starting at the first node that isn't a leaf node and sift downwards. I don't see how I can do that here though.
This is a binomial heap, it doesn't have one root but a set of roots (because a binomial heap is a set of binomial trees).
What do you mean by "make a max heap" ?
Max heaps and binomial heaps are as close from each other as java and javascript are.
If you extract the minimum n times you can obtain a sorted array which is a max heap. The complexity is O(n*log(n)).
I think you're trying to treat the binomial heap as a binary heap, which doesn't work.
A Binary Heap can be stored in an array without explicit links - the links are implicit in the positions within the array. An unordered array can be "heapified", reordering to make a valid binary heap in O(n) time. That is a key advantage of binary heaps - there's a lightweight implementation that uses memory well.
I've never implemented a Binomial Heap and though I've studied them, that was a while ago. I'm pretty confident, though, that a binomial heap isn't a binary heap and can't be implemented that way. Binomial heaps have their own advantages, but they don't keep all the advantages of binary heaps. If binomial heaps were universally superior, no-one would care about binary heaps.
IIRC, the normal implementation of binomial trees (on which binomial heaps are based) is that you have a linked list of children for each parent node and a linked list of roots. Those linked lists use explicit links. This is how you support k children per node, with no upper bound on k.
The important extra operation for binary heaps is the merge. If a binomial heap were stored in an array with implicit links, a merge would obviously require lots of copying - copying items from one array into the other for a start. The efficient merge would therefore be impossible - the key advantage of the binomial heap would be lost.
With explicit links, however, combining two binomial trees into one is an O(1) pointer-fiddling operation (adding an item to the head of a linked list), so two binominal heaps can be merged with O(log n) binomial tree merges very efficiently.
It's a bit like the difference between a sorted array and a binary search tree. Sure, the sorted array has advantages, but it also has limitations. Some operations are more efficient when all you have to do is modify a link or two without moving items around in an array. Sometimes you don't need those operations, and it's more efficient to avoid the need for links and just binary search a sorted array, which is equivalent to searching a perfectly balanced binary search tree with implicit links.
Conceptually, the root should be the only node that has no ancestors - 1 in the case of your diagram.

What is the precise definition of the Heap data structure?

The definition of heap given in wikipedia (http://en.wikipedia.org/wiki/Heap_(data_structure)) is
In computer science, a heap is a specialized tree-based data structure
that satisfies the heap property: If A is a parent node of B then
key(A) is ordered with respect to key(B) with the same ordering
applying across the heap. Either the keys of parent nodes are always
greater than or equal to those of the children and the highest key is
in the root node (this kind of heap is called max heap) or the keys of
parent nodes are less than or equal to those of the children (min
heap)
The definition says nothing about the tree being complete. For example, according to this definition, the binary tree 5 => 4 => 3 => 2 => 1 where the root element is 5 and all the descendants are right children also satisfies the heap property. I want to know the precise definition of the heap data structure.
As others have said in comments: That is the definition of a heap, and your example tree is a heap, albeit a degenerate/unbalanced one. The tree being complete, or at least reasonably balanced, is useful for more efficient operations on the tree. But an inefficient heap is still a heap, just like an unbalanced binary search tree is still a binary search tree.
Note that "heap" does not refer to a data structure, it refers to any data structure fulfilling the heap property or (depending on context) a certain set of operations. Among the data structures which are heaps, most efficient ones explicitly or implicitly guarantee the tree to be complete or somewhat balanced. For example, a binary heap is by definition a complete binary tree.
In any case, why do you care? If you care about specific lower or upper bounds on specific operations, state those instead of requiring a heap. If you discuss specific data structure which are heaps and complete trees, state that instead of just speaking about heaps (assuming, of course, that the completeness matters).
Since this question was asked, the Wikipedia definition has been updated:
In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete1 tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C.2 The node at the "top" of the heap (with no parents) is called the root node.
However, "heap data structure" really denotes a family of different data structures, which also includes:
Binomial heap
Fibonaicci heap
Leftist heap
Skew heap
Pairing heap
2-3 heap
...and these are certainly not necessarily complete trees.
On the other hand, the d-ary heap data structures -- including the binary heap -- most often refer to complete trees, such that they can be implemented in an array in level-order, without gaps:
The 𝑑-ary heap consists of an array of 𝑛 items, each of which has a priority associated with it. These items may be viewed as the nodes in a complete 𝑑-ary tree, listed in breadth first traversal order.

Listing values in a binary heap in sorted order using breadth-first search?

I'm currently reading this paper and on page five, it discusses properties of binary heaps that it considers to be common knowledge. However, one of the points they make is something that I haven't seen before and can't make sense of. The authors claim that if you are given a balanced binary heap, you can list the elements of that heap in sorted order in O(log n) time per element using a standard breadth-first search. Here's their original wording:
In a balanced heap, any new element can be
inserted in logarithmic time. We can list the elements of a heap in order by weight, taking logarithmic
time to generate each element, simply by using breadth first search.
I'm not sure what the authors mean by this. The first thing that comes to mind when they say "breadth-first search" would be a breadth-first search of the tree elements starting at the root, but that's not guaranteed to list the elements in sorted order, nor does it take logarithmic time per element. For example, running a BFS on this min-heap produces the elements out of order no matter how you break ties:
1
/ \
10 100
/ \
11 12
This always lists 100 before either 11 or 12, which is clearly wrong.
Am I missing something? Is there a simple breadth-first search that you can perform on a heap to get the elements out in sorted order using logarithmic time each? Clearly you can do this by destructively modifying heap by removing the minimum element each time, but the authors' intent seems to be that this can be done non-destructively.
You can get the elements out in sorted order by traversing the heap with a priority queue (which requires another heap!). I guess this is what he refers to as a "breadth first search".
I think you should be able to figure it out (given your rep in algorithms) but basically the key of the priority queue is the weight of a node. You push the root of the heap onto the priority queue. Then:
while pq isn't empty
pop off pq
append to output list (the sorted elements)
push children (if any) onto pq
I'm not really sure (at all) if this is what he was referring to but it vaguely fitted the description and there hasn't been much activity so I thought I might as well put it out there.
In case that you know that all elements lower than 100 are on left you can go left, but in any case even if you get to 100 you can see that there no elements on left so you go out. In any case you go from node (or any other node) at worst twice before you realise that there are no element you are searching for. Than men that you go in this tree at most 2*log(N) times. This is simplified to log(N) complexity.
Point is that even if you "screw up" and you traverse to "wrong" node you go that node at worst once.
EDIT
This is just how heapsort works. You can imagine, that you have to reconstruct heap using N(log n) complexity each time you take out top element.

Resources