Suppose one wants to implement a Max Heap using a doubly linked list. Can
one achieve the same complexity for the operations Insert, ExtractMaxHeap and MaxHeapify using a doubly linked list as compared to the standard array implementation.
My answer is we can do all three operation in log(n) time using array
implementation. However, for doubly linked list
Insert - logn ExtractMaxHeap -o(1) MaxHeapify -
o(logn)
In a standard array implementation one can find the child of an element in constant time. If the current node is identified by index i, then the left child is identified by index 2i+1, and the right child by index 2i+2*.
*In heaps with k children, that would be ki+1, ki+2, ..., ki+k. The principle is the same
Given a node in a doubly linked list, there is no way to get to the first child in constant time. The deeper the node is in the list, the more steps it will take -- walking the chain of the linked list -- to get to the sub-chain having the child nodes.
In the array implementation you don't need to visit the elements that lie between a node and its children. The access to the child (by index) is immediate. This is not true in a linked list. You have no other choice than to first visit the next node, then its next, ...etc. until you arrive at the "index" where the child node is at. The length of this chain, to walk towards the child, increases exponentially with the depth of the node where you start from.
A similar inefficiency occurs when you need to find the parent of a node.
As all basic operations on a heap involve swapping child with parent values, this problem gives these operations a worse time complexity in a doubly linked list implementation than in the array implementation.
If the elements are stored entirely in a linked list, with no other structure, then no. If the elements are sorted, then insertion will be O(n). If they are not, then extraction will be O(n).
Related
I currently have a double-linked list of objects in descending sorted order. (The list is intrusive--pointers in the objects.) I have a very limited set of operations:
add a node with the highest possible key
remove a node with the highest possible key (doesn't matter which one)
remove a node with key 0 (doesn't matter which one)
increment key of a node with highest current key (doesn't matter which one)
decrement key of any given node whose key is above 0
Operations 1-4 will be constant time, but operation 5 is O(n), where n=number of nodes with same key value. This is because such nodes, when incremented, have to be moved past their siblings with the same key value, and placed after that range. And finding that re-insert place will be O(n).
I thought of the heap (heapsort heap, not malloc heap) as a solution where worst-case would be O(log n) (where n=number of nodes). However, based on my recollection and what Google is finding me, it seems invariably implemented in an array, as opposed to a binary tree. So:
Question: is there an implementation of a heap that uses pointers in the manner of a binary tree, as opposed to an array, that maintains O() of the typical array implementation?
One common way to do this is to use an array-based heap, but:
In the heap you store pointers to nodes;
In each node you store its index in the heap; and
Whenever you swap elements in the heap, you update the indexes in the corresponding nodes;
This preserves the complexity of all the heap operations, and costs around 1.5 pointers and 1 integer per node. (the extra .5 is because of the way growable arrays are implemented).
Alternatively, you can just link the nodes together into a tree with pointers. To support the operations you want, though, this requires 3 pointers per node (parent, left, right)
Both ways work fine, but the array implementation is simpler, faster, and uses a bit less memory.
ETA:
I should point out, though, that if you use pointers then you can use different kinds of heaps. A Fibonacci heap will let you decrement the value of a node in amortized constant time. It's kinda complicated, though, and slow in practice: https://en.wikipedia.org/wiki/Fibonacci_heap
Unfortunately the answer to the written problem isn't an answer to the headline title of the written problem.
Solution 1: amortized O(1) data structure
A solution was found with amortized O(1) implementations of all required operations.
It is simply a double-linked list of double-linked lists. The "main" double-linked list nodes are called parents, and we have at most one parent per key value. The parent nodes keep a double-linked list of child nodes with the same key value. Each child additionally points to its parent.
add a node with the highest possible value: If there is no list head or it's value is not max, add new node to head of main linked list. Otherwise, add it to tail of the head node's child list.
remove a (any) node with the highest possible value: In the case of multiple items with highest value, it doesn't matter which we remove. So, if head parent has children, remove the tail child from the child list. Otherwise, remove the parent from the main list.
remove a (any) node with value 0: Same operations.
increment value of a (any) node with the highest current value: In case of multiple nodes with same key value, we can choose any, so choose the head parent's tail child. Remove it from the child list. If incrementing its value exceeds max value then you're done. Otherwise it's a new head node. If instead there are no children, then increment the head parent in place, and if it exceeds maximum value remove it.
decrement value of any node above 0: If the node is a child, remove from child list, then either add to parent's successor's child list or as a new node after the parent. A parent with no children: if the successor in the main list still has a smaller key, you're done. Otherwise remove it and add as successor's tail child. A parent with children: same but promote the head child to take its place. This is O(n), where n=number of nodes of given size, because you must change the parent pointer for all children. However, if the odds of the node selected for decrement being the parent node of all nodes of given size are 1/n, this amortizes to O(1).
The main downside is that we logically have 7 different pointers for each node. If it's in the parent role we need previous and next parent, and head and tail child. If it's in the child role we need previous and next child, and parent. These can be unionized into two alternate substructures of 4 and 3 pointers, which saves storage, but not CPU time (excepting perhaps the need to zero out unused pointers for cleanliness). Updating them all won't be fast.
Solution 2: Sloppy is Good Enough
Another approach is simply to be sloppy. The application benefits from finding nodes with higher scores but it's not critical that they be absolutely in perfect order. So rather than an O(n) operation to move nodes potentially from one end of the chain to the other, we could accept a solution that does an O(1) albeit at times imperfect job.
This could be the current implementation of a double linked list. It can support all operations except decrement in O(1). It can handle decrement of a unique key value in O(1). Only decrement of a non-unique key value would go O(n), as we need to skip the remaining nodes with the previous key value to find the first with the same or higher key. in the worst case, we could simply cap that search at say 5 or 10 links. This too would provide a nominally O(1) solution. However, some pernicious usage patterns may slowly cause the entire list to become quite unordered.
As I understand, binary heap does not support removing random elements. What if I need to remove random elements from a binary heap?
Obviously, I can remove an element and re-arrange the entire heap in O(N). Can I do better?
Yes and no.
The problem is a binary heap does not support search for an arbitrary element. Finding it is itself O(n).
However, if you have a pointer to the element (and not only its value) - you can swap the element with the rightest leaf, remove this leaf, and than re-heapify the relevant sub-heap (by sifting down the newly placed element as much as needed). This results in O(logn) removal, but requires a pointer to the actual element you are looking for.
Amit is right in his answer but here is one more nuance:
the position of the removed item (where you put the right-most leaf) can be required to be bubbled up (compare with parent and move up until parent is larger than you).
Sometimes it is required to bubble down (compare with children and move down until all children are smaller than you). It all depends on the case.
Depends on what is meant by "random element." If it means that the heap contains elements [e1, e2, ..., eN] and one wants to delete some ei (1 <= i <= N), then this is possible.
If you are using a binary heap implementation from some library, it might be that it doesn't provide you with the API that you need. In that case, you should look for another library that has it.
If you were to implement it yourself, you would need two additional calls:
A procedure deleteAtIndex(heap, i) that deletes the node at index i
by positioning the last element in the heap array at i, decrementing the
element count, and finally shuffling down/up
the new ith element to maintain the heap invariant. The most
common use of this procedure is to "pop" the heap by calling
deleteAtIndex(heap, 1) -- assuming 1-origin indexing. This operation
will run O(log n) (though, to be complete, I'll note that the highest bound can be
improved up to O(log(log n)) depending on some assumptions about your elements' keys).
A procedure deleteElement(heap, e) that deletes the element e (your arbitrary element).
Your heap algorithm would maintain an array ElementIndex such that ElementIndex[e] returns
the current index of element e: calling deleteAtIndex(heap, ElementIndex[e])
will then do what you want. It will also run in O(log n) because the array access is constant.
Since binary heaps are often used in algorithms that merely pop the highest (or lowest)
priority element (rather than deleting arbitrary elements), I imagine that some libraries might miss on the deleteAtIndex API to save space (the extra ElementIndex array mentioned above).
I don't understand how binary search trees are always defined as "sorted". I get in an array representation of a binary heap you have a fully sorted array. I haven't seen array representations of binary search trees so hard for me to see them as sorted like an array eg [0,1,2,3,4,5] but rather sorted with respect to each node. What is the right way to think about a BST being "sorted" conceptually?
There are many types of binary search trees. All of them have one thing in common: they satisfy an invariant which enables binary search, namely an order relation by which every element in the tree can be compared to any other element in the tree, in a total preorder.
What does that mean?
Let's consider the typical statement of a BST invariant in a textbook, which states that every node's key is greater than all keys in its left sub-tree, and less than all keys in its right sub-tree. We omit conflict resolution details for keys which compare equal.
What does that BST look like? Here's an example:
The way I would explain it to a class of three-year-olds, is try to collapse all the nodes to the bottom level of the leaves, just let them fall down. Or, for high-schoolers, draw a line from each node/key projecting them on the x-axis. Once you did that, it's obvious the keys are already in (ascending) order.
Is this imaginary and casual observation analogous to our definition of a sorted sequence? Yes, it is. Since the elements of the BST satisfy a total preorder, an in-order traversal of the BST must produce those elements in order (Ex: prove it).
It is equivalent to state that if we had stored a BST's keys, by an in-order traversal, in an array, the array would be sorted.
Therefore, by way of our initial definition of a BST, the in-order traversal is the intuitive way of thinking of one as "sorted".
Does this help? It's a binary heap shown as an array
as far as data structures are concerned (arrays, trees, linked lists, etc), "sorted" means that sequentially going through all it's elements you'll find that their values are ordered according to some rule ( >, <, <=, etc).
For arrays, this is easy to picture because it's a linear data structure.
But trees are not, however, iterating through a BST you will notice that all the element are ordered accoring to the rule left value <= node value < right value ( or something similar); the very definition of a sorted data structure.
It is not "sorted" in the same sense an array might be sorted (and trees, except for heaps, are rarely represented as arrays anyway), but they have a structure that allows you to easily traverse the elements in sorted order: simply traverse the nodes of the BST with a depth-first search, and output each node's value after you've looked at its left child (if any) but before you look at its right child (if any).
By the way, the array in which a heap is stored is almost always not sorted. The heap itself can also not be said to be "sorted", because it does not have enough structure to be able to readily produce the elements in sorted order without destroying the heap by successively removing the top element. For example, although you do know that the top element is larger than both of its children (or smaller, depending on the heap type), you cannot tell in advance which child is smaller than the other.
The heap property says:
If A is a parent node of B then the key of node A is ordered with
respect to the key of node B with the same ordering applying across
the heap. Either the keys of parent nodes are always greater than or
equal to those of the children and the highest key is in the root node
(this kind of heap is called max heap) or the keys of parent nodes are
less than or equal to those of the children and the lowest key is in
the root node (min heap).
But why in this wiki, the Binary Heap has to be a Complete Binary Tree? The Heap Property doesn't imply that in my impression.
According to the wikipedia article you provided, a binary heap must conform to both the heap property (as you discussed) and the shape property (which mandates that it is a complete binary tree). Without the shape property, one would lose the runtime advantage that the data structure provides (i.e. the completeness ensures that there is a well defined way to determine the new root when an element is removed, etc.)
Every item in the array has a position in the binary tree, and this position is calculated from the array index. The positioning formula ensures that the tree is 'tightly packed'.
For example, this binary tree here:
is represented by the array
[1, 2, 3, 17, 19, 36, 7, 25, 100].
Notice that the array is ordered as if you're starting at the top of the tree, then reading each row from left-to-right.
If you add another item to this array, it will represent the slot below the 19 and to the right of the 100. If this new number is less than 19, then values will have to be swapped around, but nonetheless, that is the slot that will be filled by the 10th item of the array.
Another way to look at it: try constructing a binary heap which isn't a complete binary tree. You literally cannot.
You can only guarantee O(log(n)) insertion and (root) deletion if the tree is complete. Here's why:
If the tree is not complete, then it may be unbalanced and in the worst case, simply a linked list, requiring O(n) to find a leaf, and O(n) for insertion and deletion. With the shape requirement of completeness, you are guaranteed O(log(n)) operations since it takes constant time to find a leaf (last in array), and you are guaranteed that the tree is no deeper than log2(N), meaning the "bubble up" (used in insertion) and "sink down" (used in deletion) will require at most log2(N) modifications (swaps) of data in the heap.
This being said, you don't absolutely have to have a complete binary tree, but you just loose these runtime guarantees. In addition, as others have mentioned, having a complete binary tree makes it easy to store the tree in array format forgoing object reference representation.
The point that 'complete' makes is that in a heap all interior (not leaf) nodes have two children, except where there are no children left -- all the interior nodes are 'complete'. As you add to the heap, the lowest level of nodes is filled (with childless leaf nodes), from the left, before a new level is started. As you remove nodes from the heap, the right-most leaf at the lowest level is removed (and pushed back in at the top). The heap is also perfectly balanced (hurrah!).
A binary heap can be looked at as a binary tree, but the nodes do not have child pointers, and insertion (push) and deletion (pop or from inside the heap) are quite different to those procedures for an actual binary tree.
This is a direct consequence of the way in which the heap is organised. The heap is held as a vector with no gaps between the nodes. The parent of the i'th item in the heap is item (i - 1) / 2 (assuming a binary heap, and assuming the top of the heap is item 0). The left child of the i'th item is (i * 2) + 1, and the right child one greater than that. When there are n nodes in the heap, a node has no left child if (i * 2) + 1 exceeds n, and no right child if (i * 2) + 2 does.
The heap is a beautiful thing. It's one flaw is that you do need a vector large enough for all entries... unlike a real binary tree, you cannot allocate a node at a time. So if you have a heap for an indefinite number of items, you have to be ready to extend the underlying vector as and when needed -- or run some fragmented structure which can be addressed as if it was a vector.
FWIW: when stepping down the heap, I find it convenient to step to the right child -- (i + 1) * 2 -- if that is < n then both children are present, if it is == n only the left child is present, otherwise there are no children.
By maintaining binary heap as a complete binary gives multiple advantages such as
1.heap is complete binary tree so height of heap is minimum possible i.e log(size of tree). And insertion, build heap operation depends on height. So if height is minimum then their time complexity will be reduced.
2.All the items of complete binary tree stored in contiguous manner in array so random access is possible and it also provide cache friendliness.
In order for a Binary Tree to be considered a heap two it must meet two criteria. 1) It must have the heap property. 2) it must be a complete tree.
It is possible for a structure to have either of these properties and not have the other, but we would not call such a data structure a heap. You are right that the heap property does not entail the shape property. They are separate constraints.
The underlying structure of a heap is an array where every node is an index in an array so if the tree is not complete that means that one of the index is kept empty which is not possible beause it is coded in such a way that each node is an index .I have given a link below so that u can see how the heap structure is built
http://www.sanfoundry.com/java-program-implement-min-heap/
Hope it helps
I find that all answers so far either do not address the question or are, essentially, saying "because the definition says so" or use a similar circular argument. They are surely true but (to me) not very informative.
To me it became immediately obvious that the heap must be a complete tree when I remembered that you insert a new element not at the root (as you do in a binary search tree) but, rather, at the bottom right.
Thus, in a heap, a new element propagates from the bottom up - it is "moved up" within the tree till it finds a suitable place.
In a binary search tree a newly inserted element moves the other way round - it is inserted at the root and it "moves down" till it finds its place.
The fact that each new element in a heap starts as the bottom right node means that the heap is going to be a complete tree at all times.
As I understand, binary heap does not support removing random elements. What if I need to remove random elements from a binary heap?
Obviously, I can remove an element and re-arrange the entire heap in O(N). Can I do better?
Yes and no.
The problem is a binary heap does not support search for an arbitrary element. Finding it is itself O(n).
However, if you have a pointer to the element (and not only its value) - you can swap the element with the rightest leaf, remove this leaf, and than re-heapify the relevant sub-heap (by sifting down the newly placed element as much as needed). This results in O(logn) removal, but requires a pointer to the actual element you are looking for.
Amit is right in his answer but here is one more nuance:
the position of the removed item (where you put the right-most leaf) can be required to be bubbled up (compare with parent and move up until parent is larger than you).
Sometimes it is required to bubble down (compare with children and move down until all children are smaller than you). It all depends on the case.
Depends on what is meant by "random element." If it means that the heap contains elements [e1, e2, ..., eN] and one wants to delete some ei (1 <= i <= N), then this is possible.
If you are using a binary heap implementation from some library, it might be that it doesn't provide you with the API that you need. In that case, you should look for another library that has it.
If you were to implement it yourself, you would need two additional calls:
A procedure deleteAtIndex(heap, i) that deletes the node at index i
by positioning the last element in the heap array at i, decrementing the
element count, and finally shuffling down/up
the new ith element to maintain the heap invariant. The most
common use of this procedure is to "pop" the heap by calling
deleteAtIndex(heap, 1) -- assuming 1-origin indexing. This operation
will run O(log n) (though, to be complete, I'll note that the highest bound can be
improved up to O(log(log n)) depending on some assumptions about your elements' keys).
A procedure deleteElement(heap, e) that deletes the element e (your arbitrary element).
Your heap algorithm would maintain an array ElementIndex such that ElementIndex[e] returns
the current index of element e: calling deleteAtIndex(heap, ElementIndex[e])
will then do what you want. It will also run in O(log n) because the array access is constant.
Since binary heaps are often used in algorithms that merely pop the highest (or lowest)
priority element (rather than deleting arbitrary elements), I imagine that some libraries might miss on the deleteAtIndex API to save space (the extra ElementIndex array mentioned above).