Analysis of dictionary - algorithm

My first question in the analysis it is mentioned as
n+(n/2)+(n/4)+--- is atmost 2n. how we got result as atmost 2n?
We have a collection of arrays, where array "i" has size (2 to power
i). Each array is either empty of full, and each is sorted. However,
there will be no relationship between the items in different arrays.
The issue of which arrays are full and which is empty is based on the
binary representation of the number of items we are storing.
To perform a lookup, we just do binary search in eac occupied aray. In
the worst case this takes time O(log(n) + log(n/2) + log(n/4) +....+1)
= O(log square n).
Following is question on above text snippet.
How does author came with O(log(n) + log(n/2) + log(n/4) +....+1) ?
and above sum is O(log square n).
Thanks!

n + n/2 + n/4 + n/8 ... = n * (1/1 + 1/2 + 1/4 + 1/8 + ...)
The sum 1/1 + 1/2 + 1/4 + 1/8 + ... is a geometric series that converges to 2, so the result is 2n.
Apparently, the author is talking about a collection of arrays with the sizes n, n/2, n/4, ..., and he is doing a binary search in each of them. A binary search in an array with n elements takes O(log n) time, so the total time required is O(log n + log n/2 + log n/4 + ...).

Related

How do you find the complexity of an algorithm given the number of computations performed each iteration?

Say there is an algorithm with input of size n. On the first iteration, it performs n computations, then is left with a problem instance of size floor(n/2) - for the worst case. Now it performs floor(n/2) computations. So, for example, an input of n=25 would see it perform 25+12+6+3+1 computations until an answer is reached, which is 47 total computations. How do you put this into Big O form to find worst case complexity?
You just need to write the corresponding recurrence in a formal manner:
T(n) = T(n/2) + n = n + n/2 + n/4 + ... + 1 =
n(1 + 1/2 + 1/4 + ... + 1/n) < 2 n
=> T(n) = O(n)

What is the complexity of sum of log functions

I have an algorithm that have the following complexity in big O:
log(N) + log(N+1) + log(N+2) + ... + log(N+M)
Is it the same as log(N+M) since it is the largest element?
OR is it M*log(N+M) because it is a sum of M elements?
The important rules to know in order to solve it are:
Log a + Log b = Log ab, and
Log a - Log b = Log a/b
Add and subtract Log 2, Log 3, ... Log N-1 to the given value.
This will give you Log 2 + Log 3 + ... + Log (N+M) - (Log 2 + Log 3 + ... + Log (N-1))
The first part will compute to Log ((N+M)!) and the part after the subtraction sign will compute to Log ((N-1)!)
Hence, this complexity comes to Log ( (N+M)! / (N-1)! ).
UPDATE after OP asked another good question in the comment:
If we have N + N^2 + N^3, it will reduce to just N^3 (the largest element), right? Why we can't apply the same logic here - log(N+M) - largest element?
If we have just two terms that look like Log(N) + Log(M+N), then we can combine them both and say that they will definitely be less than 2 * Log(M+N) and will therefore be O(Log (M+N)).
However, if there is a relation between the number of items being summed and highest value of the item, then the presence of such a relation makes the calculation slightly not so straightforward.
For example, the big O of addition of 2 (Log N)'s, is O(Log N), while the big O of summation of N Log N's is not O(Log N) but is O(N * Log N).
In the given summation, the value and the number of total values is dependent on M and N, therefore we cannot put this complexity as Log(M+N), however, we can definitely write it as M * (Log (M+N)).
How? Each of the values in the given summation is less than or equal to Log(M + N), and there are total M such values. Hence the summation of these values will be less than M * (Log (M+N)) and therefore will be O(M * (Log (M+N))).
Thus, both answers are correct but O(Log ( (N+M)! / (N-1)! )) is a tighter bound.
If M does not depend on N and does not vary then the complexity is O(log(N))
For k such as 0 <= k <= M and N>=M and N>=2,
log(N+k)=log(N(1+k/N)) = log(N) + log(1+k/N) <= log(N) + log(2)
<= log(N) + log(N) <= 2 log(N)
So
log(N) + log(N+1) + log(N+2) + ... + log(N+M) <= (M+1)2 log(N)
So the complexity in big O is: log(N)
To answer your questions:
1) yes because there is a fixed number of elements all less or equal than log(N+M)
2) In fact there are M + 1 elements (from 0 to M)
I specify that O((M+1)log(N+M)) is a O(log(N))

Tree sort: time complexity

Why is the average case time complexity of tree sort O(n log n)?
From Wikipedia:
Adding one item to a binary search tree is on average an O(log n)
process (in big O notation), so adding n items is an O(n log n)
process
But we don't each time add an item to a tree of n items. We start with an empty tree, and gradually increase the size of the tree.
So it looks more like
log1 + log2 + ... + logn = log (1*2*...*n) = log n!
Am I missing something?
The reason why O(log(n!)) = O(nlog(n)) is a two-part answer. First, expand O(log(n!)),
log(1) + log(2) + ... + log(n)
We can both agree here that log(1), log(2), and all the numbers up to log(n-1) are each less than log(n). Therefore, the following inequality can be made,
log(1) + log(2) + ... + log(n) <= log(n) + log(n) + ... + log(n)
Now the other half of the answer depends on the fact that half of the numbers from 1 to n are greater than n/2. This means that log(n!) would be greater than n/2*log(n/2) aka the first half of the sum log(n!),
log(1) + log(2) + ... + log(n) => log(n/2) + log(n/2) + ... + log(n/2)
The reason being that the first half of log(1) + log(2) + ... + log(n) is log(1) + log(2) + ... + log(n/2), which is less than n/2*log(n/2) as proven by the first inequality so by adding the second half of the sum log(n!), it can be shown that it is greater than n/2*log(n/2).
So with these two inequalities, it can be proven that O(log(n!)) = O(nlog(n))
O(log(n!)) = O(nlog(n)).
https://en.wikipedia.org/wiki/Stirling%27s_approximation
(Answers must be 30 characters.)

2D Peak finding in linear time

I am reading this O(n) solution of the 2D peak finding problem. The author says that an important detail is to
split by the maximum direction. For square arrays this means that
split directions will be alternating.
Why is this necessary?
This is not a necessary. The alternating direction gives O(N) for any arrays.
Let's count the number of comparisons for an array M × N.
First iteration gives 3×M, second gives 3×N/2, third gives 3×M/4, fourth gives 3×N/8, i.e.:
3 * (M + M/4 + M/16 + ...) + 3 * (N/2 + N/8 + N/32 + ...)
We got two geometric series. Cause both of these series has common ratio 1/4, we can calculate the upper limit:
3 * (4 * M/3 + 2 * N/3)
Cause O(const × N) = O(N) and O(M + N) = O(N), we have O(N) algorithm.
If we always choose the vertical direction, then performance of algorithm is O(logM × N). If M much more N, then this algorithm will be faster. F.e. if M = 1025 and N = 3, then count of comparisons in the first algorithm is comparable to 1000, and in the second algorithm is comparable to 30.
Splitting an array by the maximum direction, we got the faster algorithm for specific values of M and N. Is this algorithm O(N)? Yes, cause even comparing both vertical and horizontal sections at each step we have 3 × (M + M/2 + M/4 + ...) + 3 × (N + N/2 + N/4 + ...) = 3 * (2 × M + 2 × N) comparisons, i.e. O(M + N) = O(N). But we always choose only one direction at each step.
Splitting the longer side ensures that the length of the split is at most sqrt(area). He could have also gone through the proof noticing that he's halving area with each call and he looks at most 3 sqrt(area) cells to do so.

Average Runtime of Quickselect

Wikipedia states that the average runtime of quickselect algorithm (Link) is O(n). However, I could not clearly understand how this is so. Could anyone explain to me (via recurrence relation + master method usage) as to how the average runtime is O(n)?
Because
we already know which partition our desired element lies in.
We do not need to sort (by doing partition on) all the elements, but only do operation on the partition we need.
As in quick sort, we have to do partition in halves *, and then in halves of a half, but this time, we only need to do the next round partition in one single partition (half) of the two where the element is expected to lie in.
It is like (not very accurate)
n + 1/2 n + 1/4 n + 1/8 n + ..... < 2 n
So it is O(n).
Half is used for convenience, the actual partition is not exact 50%.
To do an average case analysis of quickselect one has to consider how likely it is that two elements are compared during the algorithm for every pair of elements and assuming a random pivoting. From this we can derive the average number of comparisons. Unfortunately the analysis I will show requires some longer calculations but it is a clean average case analysis as opposed to the current answers.
Let's assume the field we want to select the k-th smallest element from is a random permutation of [1,...,n]. The pivot elements we choose during the course of the algorithm can also be seen as a given random permutation. During the algorithm we then always pick the next feasible pivot from this permutation therefore they are chosen uniform at random as every element has the same probability of occurring as the next feasible element in the random permutation.
There is one simple, yet very important, observation: We only compare two elements i and j (with i<j) if and only if one of them is chosen as first pivot element from the range [min(k,i), max(k,j)]. If another element from this range is chosen first then they will never be compared because we continue searching in a sub-field where at least one of the elements i,j is not contained in.
Because of the above observation and the fact that the pivots are chosen uniform at random the probability of a comparison between i and j is:
2/(max(k,j) - min(k,i) + 1)
(Two events out of max(k,j) - min(k,i) + 1 possibilities.)
We split the analysis in three parts:
max(k,j) = k, therefore i < j <= k
min(k,i) = k, therefore k <= i < j
min(k,i) = i and max(k,j) = j, therefore i < k < j
In the third case the less-equal signs are omitted because we already consider those cases in the first two cases.
Now let's get our hands a little dirty on calculations. We just sum up all the probabilities as this gives the expected number of comparisons.
Case 1
Case 2
Similar to case 1 so this remains as an exercise. ;)
Case 3
We use H_r for the r-th harmonic number which grows approximately like ln(r).
Conclusion
All three cases need a linear number of expected comparisons. This shows that quickselect indeed has an expected runtime in O(n). Note that - as already mentioned - the worst case is in O(n^2).
Note: The idea of this proof is not mine. I think that's roughly the standard average case analysis of quickselect.
If there are any errors please let me know.
In quickselect, as specified, we apply recursion on only one half of the partition.
Average Case Analysis:
First Step: T(n) = cn + T(n/2)
where, cn = time to perform partition, where c is any constant(doesn't matter). T(n/2) = applying recursion on one half of the partition.Since it's an average case we assume that the partition was the median.
As we keep on doing recursion, we get the following set of equation:
T(n/2) = cn/2 + T(n/4) T(n/4) = cn/2 + T(n/8) .. . T(2) = c.2 + T(1) T(1) = c.1 + ...
Summing the equations and cross-cancelling like values produces a linear result.
c(n + n/2 + n/4 + ... + 2 + 1) = c(2n) //sum of a GP
Hence, it's O(n)
I also felt very conflicted at first when I read that the average time complexity of quickselect is O(n) while we break the list in half each time (like binary search or quicksort). It turns out that breaking the search space in half each time doesn't guarantee an O(log n) or O(n log n) runtime. What makes quicksort O(n log n) and quickselect is O(n) is that we always need to explore all branches of the recursive tree for quicksort and only a single branch for quickselect. Let's compare the time complexity recurrence relations of quicksort and quickselect to prove my point.
Quicksort:
T(n) = n + 2T(n/2)
= n + 2(n/2 + 2T(n/4))
= n + 2(n/2) + 4T(n/4)
= n + 2(n/2) + 4(n/4) + ... + n(n/n)
= 2^0(n/2^0) + 2^1(n/2^1) + ... + 2^log2(n)(n/2^log2(n))
= n (log2(n) + 1) (since we are adding n to itself log2 + 1 times)
Quickselect:
T(n) = n + T(n/2)
= n + n/2 + T(n/4)
= n + n/2 + n/4 + ... n/n
= n(1 + 1/2 + 1/4 + ... + 1/2^log2(n))
= n (1/(1-(1/2))) = 2n (by geometric series)
I hope this convinces you why the average runtime of quickselect is O(n).

Resources