analyzing time complexity - algorithm

I have 2 arrays
a of length n
b of length m
Now i want to find all elements that are common to both the arrays
Algorithm:
Build hash map consisting of all elements of A
Now for each element x in B check if the element x is present in hashmap.
Analyzing overall time complexity
for building hashmap O(n)
for second step complexity is O(m)
So the overall is O(m+n). Am i correct?
What is O(m+n) = ?? when m is large or vice versa?

O(m) + O(n) = O(m+n), if you know that m>n then O(m+n)=O(m+m)=O(m).
Note: hashes theoretically don't guarantee O(1) lookup, but practically you can count on it (= it's the average complexity, the expected runtime for a random input).
Also note, that your algo will repeatedly signal duplicated elements of b which are also present in a. If this is a problem you have to store in the hash that you already checked/printed out that element.

Average case time complexity is O(m + n). This is what you should consider if you are doing some implementation, since hash maps would usually not have collisions. O(m+n) = O(max(m, n))
However, if this is an test question, by time complexity, people mean worst case time complexity. Worst case time complexity is O(mn) since each of second steps can take O(n) time in worst case.

Related

Big O notation of a preprocessed static data structure

From what i understand, O(n) will grow linearly in regards to the size of the input data set.
I'm getting confused as I have a querying structure that maps keys to a list of preprocessed values that will not ever change after the structure is initialised.
If i define n as the input, an array of keys.
def (arrOfKeys):
for key in arrOfKeys: # O(n) Iterating through the input.
preprocessedList = getPreprocessedListDifferentForEachKey(key) # O(1) this list could have any number of elements.
for anotherPreprocessedList in preprocessedList: # * <- O(n) or O(1)?
for element in anotherPreprocessedList: # * <- O(n) or O(1)?
...
I'm unsure if this O(1) because it is preprocessed or O(n) as the size of the list is dependent on what the input is?
Does this end up being O(n^3) at the worst case or is it possible to argue O(n)?
It depends, if preprocessedList (and its sub-array) is always going to be of a constant length, your 2 inner loops will be of time complexity O(1). If they however are depending on the input argument arrOfKeys they will each be of O(n) and thus O(n) * O(n) = O(n^2).
Combined with the first loop you then multiply it with its time complexity which is O(n).
So if the inner loops are each of O(n) it's going to be in total O(n^3)
If the lengths of preprocessedList is variable, but not depending on the length of arrOfKeys you can define it as m and say it's of time complexity O(m). You can then say that the time complexity is O(n*m^2).
It's usually possible to introduce another symbol to describe the time complexity as long as you explain what they are and how they relate to the indata.

Should we ignore constant k in O(nk)?

Was reading CLRS when I encountered this:
Why do we not ignore the constant k in the big o equations in a. , b. and c.?
In this case, you aren't considering the run time of a single algorithm, but of a family of algorithms parameterized by k. Considering k lets you compare the difference between sorting n/n == 1 list and n/2 2-element lists. Somewhere in the middle, there is a value of k that you want to compute for part (c) so that Θ(nk + n lg(n/k)) and Θ(n lg n) are equal.
Going into more detail, insertion sort is O(n^2) because (roughly speaking) in the worst case, any single insertion could take O(n) time. However, if the sublists have a fixed length k, then you know the insertion step is O(1), independent of how many lists you are sorting. (That is, the bottleneck is no longer in the insertion step, but the merge phase.)
K is not a constant when you compare different algorithms with different values of k.

Complexity of two Methods

If I have a method that insert an element to an heap with the following code:
(1) If an array is full - create a new array and resize by its original.length * 2 , and then copy each elements from the original array to the new one.
(2). In order to fulfill an Heap just percup/perdown each element to its suit position.
So the worst case complexities are - (1) is O(n) and for (2) its O(logn)
my question is what is the sum of the two complexities? How to calculate the worst case complexity of this algorithm.
Thanks!
For situations like this, if you follow the text book approach the worst case complexity of the algorithm would be
= O(n) + O(logn)
= O(n)
So, the complexity would be O(n)
Actually the name worst case complexity gives you the answer. You should ask yourself the question,
Is there any case where the complexity is O(n).
If yes, then thats the worst case complexity
If you are inserting N elements one by one, then siftUp/siftDown process executes every time, and time dedicated to these procedures is O(NlogN) (as sum of log1+log2+...log(N-1)+log(N))
But array expanding happens seldom. The last expanding takes N steps, previous N/2 steps and so on. Time dedicated to these procedures is
N + N/2 + N/4 + ...+1 = N*(1 + 1/2 +1/4+...) = N*2 = O(N)
So amortized time for expanding part is O(1), and amortized time for inserting part is O(logN)
Overall complexity for N elements is
O(N) + O(NlogN) = O(NlogN)
or
O(logN) per element

Is n operations of O(1) average time each considered O(n) in average?

I'm studying to data structures exam and I'm trying to solve this question:
given an array of n numbers and a number Z, find x,y such as x+y=Z , in O(n) average time.
My suggestion is move the array's content to a hash table, and using open addressing do the following:
For each number A[i] search for Z-A[i] in the hash table (O(1) in average for each operation.) Worst case you'll perform n searches, O(1) average time each, that's O(n) in average.
Is my analysis correct?
Given that you are traversing all your array the second time, yes that is O(n) * O(1) (and not O(n)+O(1) as previously stated from me) (for hash lookup in average time), so you are talking about an algorithm of O(n) complexity .

Amortized Analysis of Algorithms

I am currently reading amortized analysis. I am not able to fully understand how it is different from normal analysis we perform to calculate average or worst case behaviour of algorithms. Can someone explain it with an example of sorting or something ?
Amortized analysis gives the average performance (over time) of each operation in
the worst case.
In a sequence of operations the worst case does not occur often in each operation - some operations may be cheap, some may be expensive Therefore, a traditional worst-case per operation analysis can give overly pessimistic bound. For example, in a dynamic array only some inserts take a linear time, though others - a constant time.
When different inserts take different times, how can we accurately calculate the total time? The amortized approach is going to assign an "artificial cost" to each operation in the sequence, called the amortized cost of an operation. It requires that the total real cost of the sequence should be bounded by the total of the amortized costs of all the operations.
Note, there is sometimes flexibility in the assignment of amortized costs.
Three methods are used in amortized analysis
Aggregate Method (or brute force)
Accounting Method (or the banker's method)
Potential Method (or the physicist's method)
For instance assume we’re sorting an array in which all the keys are distinct (since this is the slowest case, and takes the same amount of time as when they are not, if we don’t do anything special with keys that equal the pivot).
Quicksort chooses a random pivot. The pivot is equally likely to be the smallest key,
the second smallest, the third smallest, ..., or the largest. For each key, the
probability is 1/n. Let T(n) be a random variable equal to the running time of quicksort on
n distinct keys. Suppose quicksort picks the ith smallest key as the pivot. Then we run quicksort recursively on a list of length i − 1 and on a list of
length n − i. It takes O(n) time to partition and concatenate the lists–let’s
say at most n dollars–so the running time is
Here i is a random variable that can be any number from 1 (pivot is the
smallest key) to n (pivot is largest key), each chosen with probability 1/n,
so
This equation is called a recurrence. The base cases for the recurrence are T(0) = 1 and T(1) = 1. This means that sorting a list of length zero or one takes at most one dollar (unit of time).
So when you solve:
The expression 1 + 8j log_2 j might be an overestimate, but it doesn’t
matter. The important point is that this proves that E[T(n)] is in O(n log n).
In other words, the expected running time of quicksort is in O(n log n).
Also there’s a subtle but important difference between amortized running time
and expected running time. Quicksort with random pivots takes O(n log n) expected running time, but its worst-case running time is in Θ(n^2). This means that there is a small
possibility that quicksort will cost (n^2) dollars, but the probability that this
will happen approaches zero as n grows large.
Quicksort O(n log n) expected time
Quickselect Θ(n) expected time
For a numeric example:
The Comparison Based Sorting Lower Bound is:
Finally you can find more information about quicksort average case analysis here
average - a probabilistic analysis, the average is in relation to all of the possible inputs, it is an estimate of the likely run time of the algorithm.
amortized - non probabilistic analysis, calculated in relation to a batch of calls to the algorithm.
example - dynamic sized stack:
say we define a stack of some size, and whenever we use up the space, we allocate twice the old size, and copy the elements into the new location.
overall our costs are:
O(1) per insertion \ deletion
O(n) per insertion ( allocation and copying ) when the stack is full
so now we ask, how much time would n insertions take?
one might say O(n^2), however we don't pay O(n) for every insertion.
so we are being pessimistic, the correct answer is O(n) time for n insertions, lets see why:
lets say we start with array size = 1.
ignoring copying we would pay O(n) per n insertions.
now, we do a full copy only when the stack has these number of elements:
1,2,4,8,...,n/2,n
for each of these sizes we do a copy and alloc, so to sum the cost we get:
const*(1+2+4+8+...+n/4+n/2+n) = const*(n+n/2+n/4+...+8+4+2+1) <= const*n(1+1/2+1/4+1/8+...)
where (1+1/2+1/4+1/8+...) = 2
so we pay O(n) for all of the copying + O(n) for the actual n insertions
O(n) worst case for n operation -> O(1) amortized per one operation.

Resources