So a regular binary heap has an operation extract_min which is O(log(n)) worst time. Suppose the amortized cost of extract_min is O(1). Let n be the size of the heap
So a sequence where we have n extract_min operations performed and it initially contained n elements. Does this mean that the entire sequence would be processed in O(n) time since each operation is O(1)?
Lets get this out of the way first: Removing ALL the elements in a heap via extract_min operations takes O(N log N) time.
This is a fact, so when you ask "Does constant amortized time extract_min imply linear time for removing all the elements?", what you are really asking is "Can extract_min take constant amortized time even though it takes O(N log N) time to extract all the elements?"
The answer to this actually depends on what operations the heap supports.
If the heap supports only the add and extract_min operations, then every extract_min that doesn't fail (in constant time) must correspond to a previous add. We can then say that add takes amortized O(log N) time, and extract_min take amortized O(1) time, because we can assign all of its non-constant costs to a previous add.
If the heap supports an O(N) time make_heap operation (amortized or not), however, then its possible to perform N extract_min operations without doing anything else that adds up to O(N log N) time. The whole O(N log N) cost would then have to be assigned to the N extract_min operations, and we could not claim that extract_min takes amortized constant time.
Related
I ran into an interview question recently. no additional info is given into question (maybe default implementation should be used...)
n arbitrary sequences of insert and remove operations on empty min heap
(location for delete element is known) has amortized cost of:
A) insert O(1), remove O(log n)
B) insert O(log n), remove O(1)
The option (B) is correct.
I'm surprized when see answer sheet. i know this is tricky, maybe empty heap, maybe knowing location of elements for delete,... i dont know why (A) is false? Why (B) is true?
When assigning amortized costs to operations on a data structure, you need to ensure that, for any sequence of operations performed, that the sum of the amortized costs is always at least as big as the sum of the actual costs of those operations.
So let's take Option 1, which assigns an amortized cost of O(1) to insertions and an amortized cost of O(log n) to deletions. The question we have to ask is the following: is it true that for any sequence of operations on an empty binary heap, the real cost of those operations is upper-bounded by the amortized cost of those operations? And in this case, the answer is no. Imagine that you do a sequence purely of n insertions into the heap. The actual cost of performing these operations can be Θ(n log n) if each element has to bubble all the way up to the top of the heap. However, the amortized cost of those operations, with this accounting scheme, would be O(n), since we did n operations and pretended that each one cost O(1) time. Therefore, this amortized accounting scheme doesn't work, since it will let us underestimate the work that we're doing.
On the other hand, let's look at Option 2, where we assign O(log n) as our amortized insertion cost and O(1) as our amortized remove cost. Now, can we find a sequence of n operations where the real cost of those operations exceeds the amortized costs? In this case, the answer is no. Here's one way to see this. We've set the amortized cost of an insertion to be O(log n), which matches its real cost, and so the only way that we could end up underestimating the total is with our amortized cost of a deletion (O(1)), which is lower than the true cost of a deletion. However, that's not a problem here. In order for us to be able to do a delete operation, we have to have previously inserted the element that we're deleting. The combined real cost of the insertion and the deletion is O(log n) + O(log n) = O(log n), and the combined amortized cost of the insertion and the deletion is O(log n) + O(1) = O(log n). So in that sense, pretending that deletions are faster doesn't change our overall cost.
A nice intuitive way to see why the second approach works but the first one doesn't is to think about what amortized analysis is all about. The intuition behind amortization is to charge earlier operations a bit more so that future operations appear to take less time. In the case of the second accounting scheme, that's exactly what we're doing: we're shifting the cost of the deletion of an element from the binary heap back onto the cost of inserting that element into the heap in the first place. In that way, since we're only shifting work backwards, the sum of the amortized costs can't be lower than the sum of the real costs. On the other hand, in the first case, we're shifting work forward in time by making deletions pay for insertions. But that's a problem, because if we do a bunch of insertions and then never do the corresponding deletions we'll have shifted the work to operations that don't exist.
Because the heap is initially empty, you can't have more deletes than inserts.
An amortized cost of O(1) per deletion and O(log N) per insertion is exactly the same as an amortized cost of O(log N) for both inserts and deletes, because you can just count the deletion cost when you do the corresponding insert.
It does not work the other way around. Since you can have more inserts than deletes, there might not be enough deletes to pay the cost of each insert.
I have googled for lots of websites and they all say "the time complexity of clearing a heap is O(n log n)." The reason is:
Swapping the tailing node the root costs O(1).
Swapping "the new root" to suitable place costs O(level) = O(log n).
So deleting a node (the root) costs O(log n).
So deleting all n nodes costs O(n log n).
In my opinion, the answer is right but not "tight" because:
The heap (or its level) becoming smaller during deleting.
As a result, the cost of "swapping the new root to suitable place" becomes smaller.
The aforementioned reason of "O(n log n)" does not embody such change.
The time complexity of creating a heap is proved as O(n) at here.
I tend to believe that the time complexity of clearing a heap is O(n) as well because creating and clearing is very similar - both contain "swapping node to suitable position" and "change of heap size".
However, when considering O(n) time for clearing a heap, here is a contradiction:
By creating and clearing a heap, it is possible to sort an array in O(n) time.
The lower limit of time complexity of sorting is O(n log n).
I have thought about the question for a whole day but still been confused.
What on earth clearing a heap costs? Why?
As you correctly observe, the time taken is O((log n) + (log n-1) + ... + (log 2) + (log 1)). That's the same as O(log(n!)), which is the same as O(n log n) (proof in many places, but for example: What is O(log(n!)) and O(n!) and Stirling Approximation).
So you're right that the argument given for the time complexity of removing every element of a heap being O(nlog n) is wrong, but the result is still right.
Your equivalence between creating and "clearing" the heap is wrong. When you create the heap, there's a lot of slack because the heap invariant allows many choices at every level and this happens to mean that it's possible to find a valid ordering of the elements in O(n) time. When "clearing" the heap, there's no such slack (and the standard proof about comparison sorts needing at least n log n time proves that it's not possible).
A data structure supports an operation foo such that a sequence of n operations foo takes Θ(n log n) time to perform in the worst case.
a) What is the amortized time of an foo operation?
b) How large can the actual time of a single foo operation be?
a) First I assume foo is O(log n) worst case.
So the amortized cost comes from how often the foo tales its worst case. Since we know nothing further the amortized is between O(1) and log n
b) O(log n)
Is this correct? What is the proper way to argue here?
a) if n operations take Θ(n log n), then by definition the amortized time for a foo operation is Θ(log n) The amortized time is averaged over all the operations, so you don't count the worst case against just the operation that caused it, but amortized against all the others, too.
b) foo could occasionally cost O(n), as long as it's not more than O(log n) times. foo could even occasionally cost O(n log n), as long as that doesn't happen more than a constant (i.e., O(1)) number of times.
When you do amortized analysis, you don't multiple the worst case by the number of operations, but rather by the number of times that worst case actually happens.
For example, take the strategy of pushing elements into a vector one at a time, but growing the memory by doubling the allocated size each time the new element does not fit in the current allocation. Each doubling instance costs O(n) because you have to copy/move all the current elements. But the amortized time is actually linear, because you copy 1 element once, 2 elements once, 4 elements once, etc: overall you've done log(n) doublings but the sum of the cost of each of these is just 1+2+4+8+...+n = 2*n-1 = O(n). So the amortized time of this push implementation is O(1), even though the worst case is O(n).
Given a binary heap, How can I convert it to a binomial queue in linear time- O(n)? I thought of splitting the heap however I got stuck as the time for deletion is O(lg n)
Assuming that you have access to the backing array that contains the binary heap and you can iterate over it in O(n) time, then you can create your binomial heap simply by doing n inserts. As the Wikipedia article says:
Inserting a new element to a heap can be done by simply creating a new
heap containing only this element and then merging it with the
original heap. Due to the merge, insert takes O(log n) time. However,
across a series of n consecutive insertions, insert has an amortized
time of O(1) (i.e. constant).
In other words, doing n inserts into the binomial heap will require O(n) time.
You cannot do this in O(n) time by using the standard binary heap remove operation. As you noted, that would be O(log n) for each removal, resulting in O(n log n) complexity.
If an operation has an amortized time of O(1), can it ever, worst-case, take O(N^2) time?
Yes, it can. Amortized complexity takes into account the frequency with which the worst case appears. Thus as soon as the worst case appears in about 1 in N^2 operations the amortized complexity will be constant.
Let's take a simple example - the dynamically expanding array(I will call that vector as it is called in c++) in most languages has an amortized constant time for pushing an element to its back. Most of the time pushing an element is a simple assignment of a value, but once in a while all the elements allocated will be assigned and we need to re-allocate the vector. This would be the worst case of a push_back operation and when that happens the operation is with linear complexity. Still the way vector grows makes sure that re-allocation is infrequent enough. Each time the vector is re-allocated it doubles its size. Thus before another re-allocation happens we will have n simple push_back operations(assuming n was the capacity of the vector before re-allocation). As a result the worst case of linear complexity appears at most once in a linear number of operations.
Analogously to the case above imagine a data structure that re-allocates in O(n^2), but makes sure that re-allocation is performed at most once in n^2 constant operations. This would be an example of an operation with amortized complexity of O(1) and worst-case complexity O(N^2).