Is a dynamically sized heap insertion technically O(n)? - data-structures

Inserting an element into a heap involves appending it to the end of the array and then propagating it upwards until it's in the "right spot" and satisfies the heap property, the operation of which is O(logn).
However, in C, for instance, calling realloc in order to resize the array for the new element can (and likely will) result in having to copy the entirety of the array to another location in memory, which is O(n) in the best and worst case, right?
Are heaps in C (or any language, for that matter) usually done with a fixed, pre-allocated size, or is the copy operation inconsequential enough to make a dynamically sized heap a viable choice (e.g, a binary heap to keep a quickly searchable list of items)?

A typical scheme is to double the size when you run out of room. This doubling--and the copying that goes with it--does indeed take O(n) time.
However, notice that you don't have to perform this doubling very often. If you average out the total cost of all the doubling over all the operations performed on the heap that did not involve doubling, then the cost is indeed inconsequential. (This kind of averaging is known as amortized analysis.)

Related

What is O(1) space complexity?

I am having a hard time understanding what is O(1) space complexity. I understand that it means that the space required by the algorithm does not grow with the input or the size of the data on which we are using the algorithm. But what does it exactly mean?
If we use an algorithm on a linked list say 1->2->3->4, to traverse the list to reach "3" we declare a temporary pointer. And traverse the list until we reach 3. Does this mean we still have O(1) extra space? Or does it mean something completely different. I am sorry if this does not make sense at all. I am a bit confused.
To answer your question, if you have a traversal algorithm for traversing the list which allocate a single pointer to do so, the traversal algorithms is considered to be of O(1) space complexity. Additionally, let's say that traversal algorithm needs not 1 but 1000 pointers, the space complexity is still considered to be O(1).
However, if let's say for some reason the algorithm needs to allocate 'N' pointers when traversing a list of size N, i.e., it needs to allocate 3 pointers for traversing a list of 3 elements, 10 pointers for a list of 10 elements, 1000 pointers for a list of 1000 elements and so on, then the algorithm is considered to have a space complexity of O(N). This is true even when 'N' is very small, eg., N=1.
To summarise the two examples above, O(1) denotes constant space use: the algorithm allocates the same number of pointers irrespective to the list size. In contrast, O(N) denotes linear space use: the algorithm space use grows together with respect to the input size.
It is just the amount of memory used by a program. the amount of computer memory that is the main memory required by the algorithm to complete its execution with respect to the input size.
Space Complexity(s(P)) of an algorithm is the total space taken by the algorithm to complete its execution with respect to the input size. It includes both Constant space and Auxiliary space.
S(P) = Constant space + Auxiliary space
Constant space is the one that is fixed for that algorithm, generally equals to space used by input and local variables. Auxiliary Space is the extra/temporary space used by an algorithm.
Let's say I create some data structure with a fixed size, and no matter what I do to the data structure, it will always have the same fixed size. Operations performed on this data structure are therefore O(1).
An example, let's say I have an array of fixed size 100. Any operation I do, whether that is reading from the array or updating an element, that operation will be O(1) on the array. The array's size (and thus the amount of memory it's using) is not changing.
Another example, let's say I have a LinkedList to which I add elements to it. Every time I add an element to the LinkedList, that is a O(N) operation to the list because I am growing the amount of memory required to hold all of it's elements together.
Hope this helps!

Sloppy Heap Sort

Has anyone ever heard of this heap repair technique: SloppyHeapSort? It uses a "Sloppy" sift-down approach. Basically, it takes the element to be repaired, moves it to the bottom of the heap (without comparing it to its children) by replacing it with its larger child until it hits the bottom. Then, sift-up is called until it reaches its correct location. This makes just over lg n comparisons (in a heap of size n).
However, this cannot be used for heap construction, only for heap repair. Why is this? I don't understand why it wouldn't work if you were trying to build a heap.
The algorithm, if deployed properly, could certainly be used as part of the heap construction algorithm. It is slightly complicated by the fact that during heap construction, the root of the subheap being repaired is not the beginning of the array, which affects the implementation of sift-up (it needs to stop when the current element of the array is reached, rather than continuing to the bottom of the heap).
It should be noted that the algorithm has the same asymptotic performance as the standard heap-repair algorithm; however, it probably involves fewer comparisons. In part, this is because the standard heap-repair algorithm is called after swapping the root of the heap (the largest element) for the last element in the heap array.
The last element is not necessarily the smallest element in the heap, but it is certainly likely to be close to the bottom. After the swap, the standard algorithm will move the swapped element down up to log2N times, with each step requiring two comparisons; because the element is likely to belong near the bottom of the heap, most of the time the maximum number of comparisons will be performed. But occasionally, only two or four comparisons might be performed.
The "sloppy" algorithm instead starts by moving the "hole" from the top of the heap to somewhere near the bottom (log2N comparisons) and then moving the last element up until it finds it home, which will usually take only a few comparisons (but could, in the worst case, take nearly log2N comparisons).
Now, in the case of heapify, heap repair is performed not with the last element in the subheap, but rather with a previously unseen element taken from the original vector. This actually doesn't change the average performance analysis much, because if you start heap repair with a random element, instead of an element likely to be small, the expected number of sift-down operations is still close to the maximum. (Half of the heap is in the last level, so the probability of needing the maximum number of sift-downs for a random element is one-half.)
While the sloppy algorithm (probably) improves the number of element comparisons, it increases the number of element moves. The classic algorithm performs at most log2N swaps, while the sloppy algorithm performs at least log2N swaps, plus the additional ones during sift-up. (In both cases, the swaps can be improved to moves by not inserting the new element until its actual position is known, halving the number of memory stores.)
As a postscript, I wasn't able to find any reference to your "sloppy" algorithm. On the whole, when asking about a proposed algorithm it is generally better to include a link.
There is a linear time algorithm to construct a heap. I believe what the author meant is that using this approach to build a heap is no efficient and better algorithms exist. Of course you can build heap by adding the elements one by one using the described strategy - you simply can do better.

Can heaps really only use O(1) auxiliary storage?

One of the biggest advantages of using heaps for priority queues (as opposed to, say, red-black trees) seems to be space-efficiency: unlike balanced BSTs, heaps only require O(1) auxiliary storage.
i.e, the ordering of the elements alone is sufficient to satisfy the invariants and guarantees of a heap, and no child/parent pointers are necessary.
However, my question is: is the above really true?
It seems to me that, in order to use an array-based heap while satisfying the O(log n) running time guarantee, the array must be dynamically expandable.
The only way we can dynamically expand an array in O(1) time is to over-allocate the memory that backs the array, so that we can amortize the memory operations into the future.
But then, doesn't that overallocation imply a heap also requires auxiliary storage?
If the above is true, that seems to imply heaps have no complexity advantages over balanced BSTs whatsoever, so then, what makes heaps "interesting" from a theoretical point of view?
You appear to confuse binary heaps, heaps in general, and implementations of binary heaps that use an array instead of an explicit tree structure.
A binary heap is simply a binary tree with properties that make it theoretically interesting beyond memory use. They can be built in linear time, whereas building a BST necessarily takes n log n time. For example, this can be used to select the k smallest/largest values of a sequence in better-than-n log n time.
Implementing a binary heap as an array yields an implicit data structure. This is a topic formalized by theorists, but not actively pursued by most of them. But in any case, the classification is justified: To dynamically expand this structure one indeed needs to over-allocate, but not every data structure has to grow dynamically, so the case of a statically-sized pre-allocated data structure is interesting as well.
Furthermore, there is a difference between space needed for faster growth and space needed because each element is larger than it has to be. The first can be avoided by not growing, and also reduced to arbitrarily small constant factor of the total size, at the cost of a greater constant factor on running time. The latter kind of space overhead is usually unavoidable and can't be reduced much (the pointers in a tree are at least log n bits, period).
Finally, there are many heaps other than binary heaps (fibonacci, binominal, leftist, pairing, ...), and almost all except binary heaps offer better bounds for at least some operations. The most common ones are decrease-key (alter the value of a key already in the structure in a certain way) and merge (combined two heaps into one). The complexity of these operations are important for the analysis of several algorithms using priority queues, and hence the motivation for a lot of research into heaps.
In practice the memory use is important. But (with over-allocation and without), the difference is only a constant factor overall, so theorists are not terribly interested in binary heaps. They'd rather get better complexity for decrease-key and merge; most of them are happy if the data structure takes O(n) space. The extremely high memory density, ease of implementation and cache friendliness are far more interesting for practitioners, and it's them who sing the praise of binary heaps far and wide.

Best algorithm/data structure for a continually updated priority queue

I need to frequently find the minimum value object in a set that's being continually updated. I need to have a priority queue type of functionality. What's the best algorithm or data structure to do this? I was thinking of having a sorted tree/heap, and every time the value of an object is updated, I can remove the object, and re-insert it into the tree/heap. Is there a better way to accomplish this?
A binary heap is hard to beat for simplicity, but it has the disadvantage that decrease-key takes O(n) time. I know, the standard references say that it's O(log n), but first you have to find the item. That's O(n) for a standard binary heap.
By the way, if you do decide to use a binary heap, changing an item's priority doesn't require a remove and re-insert. You can change the item's priority in-place and then either bubble it up or sift it down as required.
If the performance of decrease-key is important, a good alternative is a pairing heap, which is theoretically slower than a Fibonacci heap, but is much easier to implement and in practice is faster than the Fibonacci heap due to lower constant factors. In practice, pairing heap compares favorably with binary heap, and outperforms binary heap if you do a lot of decrease-key operations.
You could also marry a binary heap and a dictionary or hash map, and keep the dictionary updated with the position of the item in the heap. This gives you faster decrease-key at the cost of more memory and increased constant factors for the other operations.
Quoting Wikipedia:
To improve performance, priority queues typically use a heap as their
backbone, giving O(log n) performance for inserts and removals, and
O(n) to build initially. Alternatively, when a self-balancing binary
search tree is used, insertion and removal also take O(log n) time,
although building trees from existing sequences of elements takes O(n
log n) time; this is typical where one might already have access to
these data structures, such as with third-party or standard libraries.
If you are looking for a better way, there must be something special about the objects in your priority queue. For example, if the keys are numbers from 1 to 10, a countsort-based approach may outperform the usual ones.
If your application looks anything like repeatedly choosing the next scheduled event in a discrete event simulation, you might consider the options listed in e.g. http://en.wikipedia.org/wiki/Discrete_event_simulation and http://www.acm-sigsim-mskr.org/Courseware/Fujimoto/Slides/FujimotoSlides-03-FutureEventList.pdf. The later summarizes results from different implementations in this domain, including many of the options considered in other comments and answers - and a search will find a number of papers in this area. Priority queue overhead really does make some difference in how many times real time you can get your simulation to run - and if you wish to simulate something that takes weeks of real time this can be important.

Big Oh notation - push and pop

I think I am starting to understand at least the theory behind big Oh notation, i.e. it is a way of measuring the rate at which the speed of a function grows. In other words, big O quantifies an algorithm's efficiency. But the implementation of it is something else.
For example, in the best case scenario push and pull operations will be O(1) because the number of steps it takes to remove from or add to the stack are going to be fixed. Regardless of the value, the process will be the same.
I'm trying to envision how a sequence of events such as push and pop can degrade performance from O(1) to O(n^2). If I have an array of n/2 capacity, n push and pop operations, and a dynamic array that doubles or halves its capacity when full or half full, how is it possible that the sequence in which these operations occur can affect the speed in which a program completes? Since push and pop work on the top element of the stack, I'm having trouble seeing how efficiency goes from a constant to O(n^2).
Thanks in advance.
You're assuming that the dynamic array does its resize operations quite intelligently. If this is not the case, however, you might end up with O(n^2) runtime: Suppose the array does not double its size when full but simply is resized to size+1. Also, suppose it starts with size 1. You'd insert the first element in O(1). When inserting the second elment, the array would need to be resized to size 2, requiring it to copy the previous value. When inserting element k, it would currently have size k-1, and need to be resized to size k, resulting in k-1 elements that need to be copied, and so on.
Thus, for inserting n elements, you'd end up with copying the array n-1 times: O(n) resizes. The copy operations are also linearly dependent on n since the more elements are have been inserted, the more need to be copied: O(n) copies per resize. This results in O(n*n) = O(n^2) as its runtime complexity.
If I implement a stack as (say) a linked list, then pushes and pops will always be constant time (i.e. O(1)).
I would not choose a dynamic array implementation for a stack, unless runtime wasn't an issue for me, I happened to have a dynamic array ready-built and available to use, and I didn't have a more efficient stack implementation handy. However, if I did use an array that resized up or down when it became full or half-empty respectively, its runtime would be O(1) while the numbers of pushes and pops are low enough not to trigger the resize and O(n) when there is a resize (hence overall O(n)).
I can't think of a case where a dynamic array used as a stack could deliver performance as bad as O(n^2) unless there was a bug in its implementation.

Resources