What is the correct definition of a heap - data-structures

I was reading about heaps in Java programming. In my textbook, I found this definition of a heap: a heap is a complete binary tree with the following properties: 1) the value in the root is the smallest item in the tree;
2) every subtree is a heap
But when I was watching videos about heaps, I found a totally different definition of heaps which says: In a heap the parent keys are bigger then the children.
Now I am confused because the two definitions do not fit with each other.
Which definition is the correct one?
Thanks!

Both the definition are correct.
There are two types of Heap.
Min Heap: In which parent node is always smaller than its children.
Max Heap: In which, parent node is always larger than its children.
This smaller/larger value of the parent than it's children is called Heap Property. This Heap Property has be satisfied by each node of the tree.
The complexity of constructing the Heap from a given array is O(n). This operation is called Heapify.
Given a Heap, adding/removing a node/element from the Heap. The complexity of the operation is O(log(n)).
The complexity of the Sorting any array using the Heap data structure (Heap Sort) is O(n.log(n)). Basically you extract the top (root) element from the Min Heap. This operation is repeated n times, So complexity is O(n.log(n))

Quoting wikipedia here
In computer science, a heap is a specialized tree-based data structure
that satisfies the heap property: If A is a parent node of B then the
key of node A is ordered with respect to the key of node B with the
same ordering applying across the heap. A heap can be classified
further as either a "max heap" or a "min heap". In a max heap, the
keys of parent nodes are always greater than or equal to those of the
children and the highest key is in the root node. In a min heap, the
keys of parent nodes are less than or equal to those of the children
and the lowest key is in the root node. Heaps are crucial in several
efficient graph algorithms such as Dijkstra's algorithm, and in the
sorting algorithm heapsort. A common implementation of a heap is the
binary heap, in which the tree is a complete binary tree (see figure).
There are 2 types of heaps:
Min Heap: Parent node is always smaller than the childeren.
Max Heap: Parent node is always larger than the childeren.

Related

Implementing priority queue using max heap vs balanced BST

Balanced BST and max heap both perform insert and delete in O(logn). However, finding max value in a max heap is O(1) but this is O(logn) in balanced BST.
If we remove the max value in a max heap it takes O(logn) because it is a delete operation.
In balanced BST, deleting the max element = finding max value + delete; it equals logn + logn reduces to O(logn). So even deleting the max value in balanced BST is O(logn).
I have read one such application of max heap is a priority queue and its primary purpose is to remove the max value for every dequeue operation. If deleting max element is O(logn) for both max heap and balanced BST, I have the following questions
What is the purpose of a max heap in the priority queue just because it is easy to implement rather than using full searchable balanced BST?
Since there is no balancing factor calculation, the max heap can be called an unbalanced binary tree?
Every balanced BST can be used as a priority queue and which is also searchable in O(logn) however max heap search is O(n) correct?
All the time complexities are calculated for worst-case. Any help is greatly appreciated.
What is the purpose of a max heap in the priority queue just because it is easy to implement rather than using full searchable balanced BST?
Some advantages of a heap are:
Given an unsorted input array, a heap can still be built in O(n) time, while a BST needs O(nlogn) time.
If the initial input is an array, that same array can serve as heap, meaning no extra memory is needed for it. Although one could think of ways to create a BST using the data in-place in the array, it would be quite odd (for primitive types) and give more processing overhead. A BST is usually created from scratch, copying the data into the nodes as they are created.
Interesting fact: a sorted array is also a heap, so if it is known that the input is sorted, nothing needs to be done to build the heap.
A heap can be stored as an array without the need of storing cross references, while a BST usually consists of nodes with left & right references. This has at least two consequences:
The memory used for a BST is about 3 times greater than for a heap.
Although several operations have the same time complexity for both heap and BST, the overhead for adapting a BST is much greater, so that the actual time spent on these operations is a (constant) factor greater in the BST case.
Since there is no balancing factor calculation, the max heap can be called an unbalanced binary tree?
A heap is in fact a complete binary tree, so it is always as balanced as it can be: the leaves will always be positioned in the last or one-but-last level. A self-balancing BST (like AVL, red-black,...) cannot beat that high level of balancing, where you will often have leaves occurring at three levels or even more.
Every balanced BST can be used as a priority queue and which is also searchable in O(logn) however max heap search is O(n) correct?
Yes, this is true. So if the application needs the search feature, then a BST is superior.
What is the purpose of a max heap in the priority queue just because it is easy to implement rather than using full searchable balanced BST?
Nope. Max heap fits better, since it is carefully instrumented to return next (respecting priority) element ASAP, in O(1) time. That's what you want from the simplest possible priority queue.
Since there is no balancing factor calculation, the max heap can be called an unbalanced binary tree?
Nope. There is a balance as well. Long story short, balancing a heap is done by shift-up or shift-down operations (swapping elements which are out of order).
Every balanced BST can be used as a priority queue and which is also searchable in O(logn) however max heap search is O(n) correct?
Yeah! As well as linked list could be used or array. It is just gonna be more expensive in terms of O-notation and much slower on practice.

Min-Heap to Max-Heap, Comparison

I want to Find Maximum number of comparison when convert min-heap to max-heap with n node. i think convert min-heap to max-heap with O(n). it means there is no way and re-create the heap.
As a crude lower bound, given a tree with the (min- or max-) heap property, we have no prior idea about how the values at the leaves compare to one another. In a max heap, the values at the leaves all may be less than all values at the interior nodes. If the heap has the topology of a complete binary tree, then even finding the min requires at least roughly n/2 comparisons, where n is the number of tree nodes.
If you have a min-heap of known size then you can create a binary max-heap of its elements by filling an array from back to front with the values obtained by iteratively deleting the root node from the min-heap until it is exhausted. Under some circumstances this can even be done in place. Using the rule that the root node is element 0 and the children of node i are elements 2i and 2i+1, the (max-) heap condition will automatically be satisfied for the heap represented by the new array.
Each deletion from a min-heap of size m requires up to log(m) element comparisons to restore the heap condition, however. I think that adds up to O(n log n) comparisons for the whole job. I am doubtful that you can do it any with any lower complexity without adding conditions. In particular, if you do not perform genuine heap deletions (incurring the cost of restoring the heap condition), then I think you incur comparable additional costs to ensure that you end up with a heap in the end.

What are the differences between heap and red-black tree?

We know that heaps and red-black tree both have these properties:
worst-case cost for searching is lgN;
worst-case cost for insertion is lgN.
So, since the implementation and operation of red-black trees is difficult, why don't we just use heaps instead of red-black trees? I am confused.
You can't find an arbitrary element in a heap in O(log n). It takes O(n) to do this. You can find the first element (the smallest, say) in a heap in O(1) and extract it in O(log n). Red-black trees and heaps have quite different uses, internal orderings, and implementations: see below for more details.
Typical use
Red-black tree: storing dictionary where as well as lookup you want elements sorted by key, so that you can for example iterate through them in order. Insert and lookup are O(log n).
Heap: priority queue (and heap sort). Extraction of minimum and insertion are O(log n).
Consistency constraints imposed by structure
Red-black tree: total ordering: left child < parent < right child.
Heap: dominance: parent < children only.
(note that you can substitute a more general ordering than <)
Implementation / Memory overhead
Red-black tree: pointers used to represent structure of tree, so overhead per element. Typically uses a number of nodes allocated on free store (e.g. using new in C++), nodes point to other nodes. Kept balanced to ensure logarithmic lookup / insertion.
Heap: structure is implicit: root is at position 0, children of root at 1 and 2, etc, so no overhead per element. Typically just stored in a single array.
Red Black Tree:
Form of a binary search tree with a deterministic balancing strategy. This Balancing guarantees good performance and it can always be searched in O(log n) time.
Heaps:
We need to search through every element in the heap in order to determine if an element is inside. Even with optimization, I believe search is still O(N). On the other hand, It is best for finding min/max in a set O(1).

What is the precise definition of the Heap data structure?

The definition of heap given in wikipedia (http://en.wikipedia.org/wiki/Heap_(data_structure)) is
In computer science, a heap is a specialized tree-based data structure
that satisfies the heap property: If A is a parent node of B then
key(A) is ordered with respect to key(B) with the same ordering
applying across the heap. Either the keys of parent nodes are always
greater than or equal to those of the children and the highest key is
in the root node (this kind of heap is called max heap) or the keys of
parent nodes are less than or equal to those of the children (min
heap)
The definition says nothing about the tree being complete. For example, according to this definition, the binary tree 5 => 4 => 3 => 2 => 1 where the root element is 5 and all the descendants are right children also satisfies the heap property. I want to know the precise definition of the heap data structure.
As others have said in comments: That is the definition of a heap, and your example tree is a heap, albeit a degenerate/unbalanced one. The tree being complete, or at least reasonably balanced, is useful for more efficient operations on the tree. But an inefficient heap is still a heap, just like an unbalanced binary search tree is still a binary search tree.
Note that "heap" does not refer to a data structure, it refers to any data structure fulfilling the heap property or (depending on context) a certain set of operations. Among the data structures which are heaps, most efficient ones explicitly or implicitly guarantee the tree to be complete or somewhat balanced. For example, a binary heap is by definition a complete binary tree.
In any case, why do you care? If you care about specific lower or upper bounds on specific operations, state those instead of requiring a heap. If you discuss specific data structure which are heaps and complete trees, state that instead of just speaking about heaps (assuming, of course, that the completeness matters).
Since this question was asked, the Wikipedia definition has been updated:
In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete1 tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C.2 The node at the "top" of the heap (with no parents) is called the root node.
However, "heap data structure" really denotes a family of different data structures, which also includes:
Binomial heap
Fibonaicci heap
Leftist heap
Skew heap
Pairing heap
2-3 heap
...and these are certainly not necessarily complete trees.
On the other hand, the d-ary heap data structures -- including the binary heap -- most often refer to complete trees, such that they can be implemented in an array in level-order, without gaps:
The 𝑑-ary heap consists of an array of 𝑛 items, each of which has a priority associated with it. These items may be viewed as the nodes in a complete 𝑑-ary tree, listed in breadth first traversal order.

O(klogk) time algorithm to find kth smallest element from a binary heap

We have an n-node binary heap which contains n distinct items (smallest item at the root). For a k<=n, find a O(klogk) time algorithm to select kth smallest element from the heap.
O(klogn) is obvious, but couldn't figure out a O(klogk) one. Maybe we can use a second heap, not sure.
Well, your intuition was right that we need extra data structure to achieve O(klogk) because if we simply perform operations on the original heap, the term logn will remain in the resulting complexity.
Guessing from the targeted complexity O(klogk), I feel like creating and maintaining a heap of size k to help me achieve the goal. As you may be aware, building a heap of size k in top-down fashion takes O(klogk), which really reminds me of our goal.
The following is my try (not necessarily elegant or efficient) in an attempt to attain O(klogk):
We create a new min heap, initializing its root to be the root of the original heap.
We update the new min heap by deleting the current root and inserting the two children of the current root in the original heap. We repeat this process k times.
The resulting heap will consist of k nodes, the root of which is the kth smallest element in the original heap.
Notes: Nodes in the new heap should store indexes of their corresponding nodes in the original heap, rather than the node values themselves. In each iteration of step 2, we really add a net of one more node into the new heap (one deleted, two inserted), k iterations of which will result in our new heap of size k. During the ith iteration, the node to be deleted is the ith smallest element in the original heap.
Time Complexity: in each iteration, it takes O(3logk) time to delete one element from and insert two into the new heap. After k iterations, it is O(3klogk) = O(klogk).
Hope this solution inspires you a bit.
Assuming that we're using a minheap, so that a root node is always smaller than its children nodes.
Create a sorted list toVisit, which contains the nodes which we will traverse next. This is initially just the root node.
Create an array smallestNodes. Initially this is empty.
While length of smallestNodes < k:
Remove the smallest Node from toVisit
add that node to smallestNodes
add that node's children to toVisit
When you're done, the kth smallest node is in smallestNodes[k-1].
Depending on the implementation of toVisit, you can get insertion in log(k) time and removal in constant time (since you're only removing the topmost node). That makes O(k*log(k)) total.

Resources