Can an operation that takes O(1) amortized time have worst-case O(n^2) time? - complexity-theory

If an operation has an amortized time of O(1), can it ever, worst-case, take O(N^2) time?

Yes, it can. Amortized complexity takes into account the frequency with which the worst case appears. Thus as soon as the worst case appears in about 1 in N^2 operations the amortized complexity will be constant.
Let's take a simple example - the dynamically expanding array(I will call that vector as it is called in c++) in most languages has an amortized constant time for pushing an element to its back. Most of the time pushing an element is a simple assignment of a value, but once in a while all the elements allocated will be assigned and we need to re-allocate the vector. This would be the worst case of a push_back operation and when that happens the operation is with linear complexity. Still the way vector grows makes sure that re-allocation is infrequent enough. Each time the vector is re-allocated it doubles its size. Thus before another re-allocation happens we will have n simple push_back operations(assuming n was the capacity of the vector before re-allocation). As a result the worst case of linear complexity appears at most once in a linear number of operations.
Analogously to the case above imagine a data structure that re-allocates in O(n^2), but makes sure that re-allocation is performed at most once in n^2 constant operations. This would be an example of an operation with amortized complexity of O(1) and worst-case complexity O(N^2).

Related

Amortized cost of insert/remove on min-heap

I ran into an interview question recently. no additional info is given into question (maybe default implementation should be used...)
n arbitrary sequences of insert and remove operations on empty min heap
(location for delete element is known) has amortized cost of:
A) insert O(1), remove O(log n)
B) insert O(log n), remove O(1)
The option (B) is correct.
I'm surprized when see answer sheet. i know this is tricky, maybe empty heap, maybe knowing location of elements for delete,... i dont know why (A) is false? Why (B) is true?
When assigning amortized costs to operations on a data structure, you need to ensure that, for any sequence of operations performed, that the sum of the amortized costs is always at least as big as the sum of the actual costs of those operations.
So let's take Option 1, which assigns an amortized cost of O(1) to insertions and an amortized cost of O(log n) to deletions. The question we have to ask is the following: is it true that for any sequence of operations on an empty binary heap, the real cost of those operations is upper-bounded by the amortized cost of those operations? And in this case, the answer is no. Imagine that you do a sequence purely of n insertions into the heap. The actual cost of performing these operations can be Θ(n log n) if each element has to bubble all the way up to the top of the heap. However, the amortized cost of those operations, with this accounting scheme, would be O(n), since we did n operations and pretended that each one cost O(1) time. Therefore, this amortized accounting scheme doesn't work, since it will let us underestimate the work that we're doing.
On the other hand, let's look at Option 2, where we assign O(log n) as our amortized insertion cost and O(1) as our amortized remove cost. Now, can we find a sequence of n operations where the real cost of those operations exceeds the amortized costs? In this case, the answer is no. Here's one way to see this. We've set the amortized cost of an insertion to be O(log n), which matches its real cost, and so the only way that we could end up underestimating the total is with our amortized cost of a deletion (O(1)), which is lower than the true cost of a deletion. However, that's not a problem here. In order for us to be able to do a delete operation, we have to have previously inserted the element that we're deleting. The combined real cost of the insertion and the deletion is O(log n) + O(log n) = O(log n), and the combined amortized cost of the insertion and the deletion is O(log n) + O(1) = O(log n). So in that sense, pretending that deletions are faster doesn't change our overall cost.
A nice intuitive way to see why the second approach works but the first one doesn't is to think about what amortized analysis is all about. The intuition behind amortization is to charge earlier operations a bit more so that future operations appear to take less time. In the case of the second accounting scheme, that's exactly what we're doing: we're shifting the cost of the deletion of an element from the binary heap back onto the cost of inserting that element into the heap in the first place. In that way, since we're only shifting work backwards, the sum of the amortized costs can't be lower than the sum of the real costs. On the other hand, in the first case, we're shifting work forward in time by making deletions pay for insertions. But that's a problem, because if we do a bunch of insertions and then never do the corresponding deletions we'll have shifted the work to operations that don't exist.
Because the heap is initially empty, you can't have more deletes than inserts.
An amortized cost of O(1) per deletion and O(log N) per insertion is exactly the same as an amortized cost of O(log N) for both inserts and deletes, because you can just count the deletion cost when you do the corresponding insert.
It does not work the other way around. Since you can have more inserts than deletes, there might not be enough deletes to pay the cost of each insert.

Fundamental question of amortized analysis

A data structure supports an operation foo such that a sequence of n operations foo takes Θ(n log n) time to perform in the worst case.
a) What is the amortized time of an foo operation?
b) How large can the actual time of a single foo operation be?
a) First I assume foo is O(log n) worst case.
So the amortized cost comes from how often the foo tales its worst case. Since we know nothing further the amortized is between O(1) and log n
b) O(log n)
Is this correct? What is the proper way to argue here?
a) if n operations take Θ(n log n), then by definition the amortized time for a foo operation is Θ(log n) The amortized time is averaged over all the operations, so you don't count the worst case against just the operation that caused it, but amortized against all the others, too.
b) foo could occasionally cost O(n), as long as it's not more than O(log n) times. foo could even occasionally cost O(n log n), as long as that doesn't happen more than a constant (i.e., O(1)) number of times.
When you do amortized analysis, you don't multiple the worst case by the number of operations, but rather by the number of times that worst case actually happens.
For example, take the strategy of pushing elements into a vector one at a time, but growing the memory by doubling the allocated size each time the new element does not fit in the current allocation. Each doubling instance costs O(n) because you have to copy/move all the current elements. But the amortized time is actually linear, because you copy 1 element once, 2 elements once, 4 elements once, etc: overall you've done log(n) doublings but the sum of the cost of each of these is just 1+2+4+8+...+n = 2*n-1 = O(n). So the amortized time of this push implementation is O(1), even though the worst case is O(n).

amortized analysis on a binary heap

So a regular binary heap has an operation extract_min which is O(log(n)) worst time. Suppose the amortized cost of extract_min is O(1). Let n be the size of the heap
So a sequence where we have n extract_min operations performed and it initially contained n elements. Does this mean that the entire sequence would be processed in O(n) time since each operation is O(1)?
Lets get this out of the way first: Removing ALL the elements in a heap via extract_min operations takes O(N log N) time.
This is a fact, so when you ask "Does constant amortized time extract_min imply linear time for removing all the elements?", what you are really asking is "Can extract_min take constant amortized time even though it takes O(N log N) time to extract all the elements?"
The answer to this actually depends on what operations the heap supports.
If the heap supports only the add and extract_min operations, then every extract_min that doesn't fail (in constant time) must correspond to a previous add. We can then say that add takes amortized O(log N) time, and extract_min take amortized O(1) time, because we can assign all of its non-constant costs to a previous add.
If the heap supports an O(N) time make_heap operation (amortized or not), however, then its possible to perform N extract_min operations without doing anything else that adds up to O(N log N) time. The whole O(N log N) cost would then have to be assigned to the N extract_min operations, and we could not claim that extract_min takes amortized constant time.

Worst case time complexity of hashing (insertion)

If I use basic collision handling by having to relocate the input value to the next empty slot, wouldn't I need n*(n+1)/2 hits in total?
Example:
Input: 0,0,0;
Allocated size=3;
Thus it would require 6 hits in total to allocate all three values.
I've read that the worst case complexity is O(n) but shouldn't it be O(n^2) then?
Each insertion is O(1) in average (with a good hash function and resizing strategy), but O(N) in the worst case. So N insertions are O(N^2) in the worst case.
It's O(N) on average.
It's indeed O(N^2) in the worst case.

Choosing O(n) over O(1) when for all of n, O(1) is faster than O(n)?

Example of when I would choose O(n) algorithm over O(1) algorithm if for all of n, O(1) is faster than O(n)
Often, real data lends itself to algorithms with worse time complexities. For example, bubble sort, which runs in O(n^2) time is often faster on almost sorted data. Oftentimes, the constant factors might make an algorithm too slow to be practical. Remember, big-O deals with things that are more efficient in the limit, rather than in the immediate case. An algorithm that is O(1) with a constant factor of 10000000 will be significantly slower than an O(n) algorithm with a constant factor of 1 for n < 10000000.
One example is the O(1) algorithm consumes lots of memory while the O(n) one does not. And memory is more important for you compare to performance.

Resources