Amortized time of dynamic array - algorithm

As a simple example, in a specific implementation of the dynamic array, we double the size of the array each time it fills up.
Because of this, array reallocation may be required, and in the worst case an insertion may require O(n).
However, a sequence of n insertions can always be done in O(n) time, because the rest of the insertions are done in constant time, so n insertions can be completed in O(n) time.
The amortized time per operation is therefore O(n) / n = O(1).
--from Wiki
But in another book :Each doubling takes O(n) time, but happens so rarely that its amortized time is still O(1).
Hope somebody could tell me does the rare situation infer the Wiki explanation? Thanks

Amortized basically means average per number of operations.
So, if you have an array of n, you need to insert n+1 items until you'll need the reallocation.
So, you've done n inserts which each one of them took O(1), and another insert that took O(n), so in total you've got n+1 actions that cost you 2n operations .
2n / n+1 ~= 1
That's why the Amortized time is still O(1)

Yes, these two statements say the same thing, Wiki just explains it more thoroughly.

Related

How do i measure the O-notation of a for loop?

i have a question about O-notation. (big O)
In my code, i am using a for loop to iterate through an array of users.
The for loop has if-statements that makes it break out of the loop, if the rigth user is found.
My question is how i measure the O-notation?
Is the O-notation is O(N) as i loop through all the users in the array?
Or is the O-notation O(1), as the loop breaks and never runs again?
O notation defines an "order of" relationship between an amount of work (however measured) and the number of items processed (usually 'n'). So "O(n)" means "in direct proportion to the number of items n". "O(1)" means simply "constant". If a loop processes every item once then the amount of work is intuitively in direct proportion to n, but let's say that your exit condition gets hit on average half way through, we might be tempted to say that this is O(n/2), but instead we still say that it is O(n) because the relationship to n is still direct/linear. Similarly if you were to assess the relationship to be O(7n^3 + 2n), you'd say the relationship was simply O(n^3) because n^3 is the term that dominates as n grows large.
The answer to your specific question is therefore O(n) because the number of iterations is in direct proportion to n. All that this says is that if N user records take M milliseconds to process, 2N should take about 2M milliseconds.
It is probably worth noting that O notation is strictly concerned with worst case and not the average cost of algorithms (although I have started to find that it is quite common for people to use it in the latter sense). It is always a good idea to specify to avoid ambiguity.
Big O notation answers the following two questions:
If there are N data elements, how many steps will the algorithm take?
How will the performance of the algorithm change if the number of data elements increases?
Best-case scenario in your case is that the user you are searching for is found at the first index. Time complexity in this case would be O(1) because number of steps taken by the algorithm are constant and do not change if the number of elements in the array are changed.
The worst-case scenario is that your loop will have to iterate over all the users. That makes the time complexity to be O(N) because number of steps taken by the algorithm will be directly proportional to the number of elements in the array.
Big O notation generally refers to the worst-case scenario, so you can say that the time complexity in your case is O(N).
Best case complexity of for loop is O(1) and worst case complexity is O(N). In linear search best case is O(N) and worst case is O(N). It also depends on the approach followed by you to solve problem. Like for(int i = n; i>1; i=i/2) in this case complexity is O(log(N). Complexity of if else condition is O(1).

Amortized cost of insert/remove on min-heap

I ran into an interview question recently. no additional info is given into question (maybe default implementation should be used...)
n arbitrary sequences of insert and remove operations on empty min heap
(location for delete element is known) has amortized cost of:
A) insert O(1), remove O(log n)
B) insert O(log n), remove O(1)
The option (B) is correct.
I'm surprized when see answer sheet. i know this is tricky, maybe empty heap, maybe knowing location of elements for delete,... i dont know why (A) is false? Why (B) is true?
When assigning amortized costs to operations on a data structure, you need to ensure that, for any sequence of operations performed, that the sum of the amortized costs is always at least as big as the sum of the actual costs of those operations.
So let's take Option 1, which assigns an amortized cost of O(1) to insertions and an amortized cost of O(log n) to deletions. The question we have to ask is the following: is it true that for any sequence of operations on an empty binary heap, the real cost of those operations is upper-bounded by the amortized cost of those operations? And in this case, the answer is no. Imagine that you do a sequence purely of n insertions into the heap. The actual cost of performing these operations can be Θ(n log n) if each element has to bubble all the way up to the top of the heap. However, the amortized cost of those operations, with this accounting scheme, would be O(n), since we did n operations and pretended that each one cost O(1) time. Therefore, this amortized accounting scheme doesn't work, since it will let us underestimate the work that we're doing.
On the other hand, let's look at Option 2, where we assign O(log n) as our amortized insertion cost and O(1) as our amortized remove cost. Now, can we find a sequence of n operations where the real cost of those operations exceeds the amortized costs? In this case, the answer is no. Here's one way to see this. We've set the amortized cost of an insertion to be O(log n), which matches its real cost, and so the only way that we could end up underestimating the total is with our amortized cost of a deletion (O(1)), which is lower than the true cost of a deletion. However, that's not a problem here. In order for us to be able to do a delete operation, we have to have previously inserted the element that we're deleting. The combined real cost of the insertion and the deletion is O(log n) + O(log n) = O(log n), and the combined amortized cost of the insertion and the deletion is O(log n) + O(1) = O(log n). So in that sense, pretending that deletions are faster doesn't change our overall cost.
A nice intuitive way to see why the second approach works but the first one doesn't is to think about what amortized analysis is all about. The intuition behind amortization is to charge earlier operations a bit more so that future operations appear to take less time. In the case of the second accounting scheme, that's exactly what we're doing: we're shifting the cost of the deletion of an element from the binary heap back onto the cost of inserting that element into the heap in the first place. In that way, since we're only shifting work backwards, the sum of the amortized costs can't be lower than the sum of the real costs. On the other hand, in the first case, we're shifting work forward in time by making deletions pay for insertions. But that's a problem, because if we do a bunch of insertions and then never do the corresponding deletions we'll have shifted the work to operations that don't exist.
Because the heap is initially empty, you can't have more deletes than inserts.
An amortized cost of O(1) per deletion and O(log N) per insertion is exactly the same as an amortized cost of O(log N) for both inserts and deletes, because you can just count the deletion cost when you do the corresponding insert.
It does not work the other way around. Since you can have more inserts than deletes, there might not be enough deletes to pay the cost of each insert.

amortized bound of sorted linked list

I am trying to prove that the amortized complexity of insert operation in a sorted LinkedList is O(1).
I know that the worst case time is O(n) but finding it hard to find an appropriate potential function.
I'll be glad if someone could help.
Thanks.
O(1) amortized means that a sequence of n insertions costs O(n) time in the worst case. This is not true for this case because inserting the elements in reverse order takes O(n*n).

What does Wikipedia mean when it says the complexity of inserting an item at the end of a dynamic array is O(1) amortized?

http://en.wikipedia.org/wiki/Dynamic_array#Performance
What exactly does it mean?
I thought inserting at the end would be O(n), as you'd have to allocate say, twice the space of the original array, and then move all the items to that location and finally insert the item. How is this O(1)?
Amortized O(1) efficiency means that the sum of the runtimes of n insertions will be O(n), even if any individual operation may take a lot longer.
You are absolutely correct that appending an element can take O(n) time because of the work required to copy everything over. However, because the array is doubled each time it is expanded, expensive doubling steps happen exponentially less and less frequently. As a result, the total work done in n inserts comes out to be O(n) rather than O(n2).
To elaborate: suppose you want to insert a total of n elements. The total amount of work done copying elements when resizing the vector will be at most
1 + 2 + 4 + 8 + ... + n ≤ 2n - 1
This is because first you copy one element, then twice that, then twice that, etc., and in the absolute worst case copy over all n elements. The sum of this geometric series works out to 2n - 1, so at most O(n) elements get moved across all copy steps. Since you do n inserts and only O(n) total work copying across all of them, the amortized efficiency is O(1) per operation. This doesn't say each operation takes O(1) time, but that n operations takes O(n) time total.
For a graphical intuition behind this, as well as a rationale for doubling the array versus just increasing it by a small amount, you might want to check out these lecture slides. The pictures toward the end might be very relevant.
Hope this helps!
Each reallocation in isolation is O(N), yes. But then on the next N insertions, you don't need to do anything. So the "average" cost per insertion is O(1). We say that "the cost is amortized across multiple operations".

Amortized Analysis of Algorithms

I am currently reading amortized analysis. I am not able to fully understand how it is different from normal analysis we perform to calculate average or worst case behaviour of algorithms. Can someone explain it with an example of sorting or something ?
Amortized analysis gives the average performance (over time) of each operation in
the worst case.
In a sequence of operations the worst case does not occur often in each operation - some operations may be cheap, some may be expensive Therefore, a traditional worst-case per operation analysis can give overly pessimistic bound. For example, in a dynamic array only some inserts take a linear time, though others - a constant time.
When different inserts take different times, how can we accurately calculate the total time? The amortized approach is going to assign an "artificial cost" to each operation in the sequence, called the amortized cost of an operation. It requires that the total real cost of the sequence should be bounded by the total of the amortized costs of all the operations.
Note, there is sometimes flexibility in the assignment of amortized costs.
Three methods are used in amortized analysis
Aggregate Method (or brute force)
Accounting Method (or the banker's method)
Potential Method (or the physicist's method)
For instance assume we’re sorting an array in which all the keys are distinct (since this is the slowest case, and takes the same amount of time as when they are not, if we don’t do anything special with keys that equal the pivot).
Quicksort chooses a random pivot. The pivot is equally likely to be the smallest key,
the second smallest, the third smallest, ..., or the largest. For each key, the
probability is 1/n. Let T(n) be a random variable equal to the running time of quicksort on
n distinct keys. Suppose quicksort picks the ith smallest key as the pivot. Then we run quicksort recursively on a list of length i − 1 and on a list of
length n − i. It takes O(n) time to partition and concatenate the lists–let’s
say at most n dollars–so the running time is
Here i is a random variable that can be any number from 1 (pivot is the
smallest key) to n (pivot is largest key), each chosen with probability 1/n,
so
This equation is called a recurrence. The base cases for the recurrence are T(0) = 1 and T(1) = 1. This means that sorting a list of length zero or one takes at most one dollar (unit of time).
So when you solve:
The expression 1 + 8j log_2 j might be an overestimate, but it doesn’t
matter. The important point is that this proves that E[T(n)] is in O(n log n).
In other words, the expected running time of quicksort is in O(n log n).
Also there’s a subtle but important difference between amortized running time
and expected running time. Quicksort with random pivots takes O(n log n) expected running time, but its worst-case running time is in Θ(n^2). This means that there is a small
possibility that quicksort will cost (n^2) dollars, but the probability that this
will happen approaches zero as n grows large.
Quicksort O(n log n) expected time
Quickselect Θ(n) expected time
For a numeric example:
The Comparison Based Sorting Lower Bound is:
Finally you can find more information about quicksort average case analysis here
average - a probabilistic analysis, the average is in relation to all of the possible inputs, it is an estimate of the likely run time of the algorithm.
amortized - non probabilistic analysis, calculated in relation to a batch of calls to the algorithm.
example - dynamic sized stack:
say we define a stack of some size, and whenever we use up the space, we allocate twice the old size, and copy the elements into the new location.
overall our costs are:
O(1) per insertion \ deletion
O(n) per insertion ( allocation and copying ) when the stack is full
so now we ask, how much time would n insertions take?
one might say O(n^2), however we don't pay O(n) for every insertion.
so we are being pessimistic, the correct answer is O(n) time for n insertions, lets see why:
lets say we start with array size = 1.
ignoring copying we would pay O(n) per n insertions.
now, we do a full copy only when the stack has these number of elements:
1,2,4,8,...,n/2,n
for each of these sizes we do a copy and alloc, so to sum the cost we get:
const*(1+2+4+8+...+n/4+n/2+n) = const*(n+n/2+n/4+...+8+4+2+1) <= const*n(1+1/2+1/4+1/8+...)
where (1+1/2+1/4+1/8+...) = 2
so we pay O(n) for all of the copying + O(n) for the actual n insertions
O(n) worst case for n operation -> O(1) amortized per one operation.

Resources