Tree sort: time complexity - algorithm

Why is the average case time complexity of tree sort O(n log n)?
From Wikipedia:
Adding one item to a binary search tree is on average an O(log n)
process (in big O notation), so adding n items is an O(n log n)
process
But we don't each time add an item to a tree of n items. We start with an empty tree, and gradually increase the size of the tree.
So it looks more like
log1 + log2 + ... + logn = log (1*2*...*n) = log n!
Am I missing something?

The reason why O(log(n!)) = O(nlog(n)) is a two-part answer. First, expand O(log(n!)),
log(1) + log(2) + ... + log(n)
We can both agree here that log(1), log(2), and all the numbers up to log(n-1) are each less than log(n). Therefore, the following inequality can be made,
log(1) + log(2) + ... + log(n) <= log(n) + log(n) + ... + log(n)
Now the other half of the answer depends on the fact that half of the numbers from 1 to n are greater than n/2. This means that log(n!) would be greater than n/2*log(n/2) aka the first half of the sum log(n!),
log(1) + log(2) + ... + log(n) => log(n/2) + log(n/2) + ... + log(n/2)
The reason being that the first half of log(1) + log(2) + ... + log(n) is log(1) + log(2) + ... + log(n/2), which is less than n/2*log(n/2) as proven by the first inequality so by adding the second half of the sum log(n!), it can be shown that it is greater than n/2*log(n/2).
So with these two inequalities, it can be proven that O(log(n!)) = O(nlog(n))

O(log(n!)) = O(nlog(n)).
https://en.wikipedia.org/wiki/Stirling%27s_approximation
(Answers must be 30 characters.)

Related

How do you find the complexity of an algorithm given the number of computations performed each iteration?

Say there is an algorithm with input of size n. On the first iteration, it performs n computations, then is left with a problem instance of size floor(n/2) - for the worst case. Now it performs floor(n/2) computations. So, for example, an input of n=25 would see it perform 25+12+6+3+1 computations until an answer is reached, which is 47 total computations. How do you put this into Big O form to find worst case complexity?
You just need to write the corresponding recurrence in a formal manner:
T(n) = T(n/2) + n = n + n/2 + n/4 + ... + 1 =
n(1 + 1/2 + 1/4 + ... + 1/n) < 2 n
=> T(n) = O(n)

Time complexity of this recurrence

Is T (n) = T (n/2) + 2^n possible on master method?
How to measure the time complexity of this recurrence.Thanks
Well, do a few expansion:
T(n) = 2^n + T(n/2)
= 2^n + 2^(n/2) + T(n/4)
= 2^n + 2^(n/2) + 2^(n/4) + T(n/8)
...etc.
Obviously, the complexity is at least Ω(2^n), since there's a 2^n in the expansion.
Furthermore, the terms added in each expansion get small really fast. You probably already know that:
2^n + 2^(n-1) + 2^(n-2)... = O(2^n)
You have:
2^n + 2^(n/2) + 2^(n/4)...
and those terms, except for the first, are all smaller until n<2 (around a base case), so your sequence is in O(2^n) as well, i.e., the lower bound is tight and the recurrence is in Θ(2^n)

What is the complexity of sum of log functions

I have an algorithm that have the following complexity in big O:
log(N) + log(N+1) + log(N+2) + ... + log(N+M)
Is it the same as log(N+M) since it is the largest element?
OR is it M*log(N+M) because it is a sum of M elements?
The important rules to know in order to solve it are:
Log a + Log b = Log ab, and
Log a - Log b = Log a/b
Add and subtract Log 2, Log 3, ... Log N-1 to the given value.
This will give you Log 2 + Log 3 + ... + Log (N+M) - (Log 2 + Log 3 + ... + Log (N-1))
The first part will compute to Log ((N+M)!) and the part after the subtraction sign will compute to Log ((N-1)!)
Hence, this complexity comes to Log ( (N+M)! / (N-1)! ).
UPDATE after OP asked another good question in the comment:
If we have N + N^2 + N^3, it will reduce to just N^3 (the largest element), right? Why we can't apply the same logic here - log(N+M) - largest element?
If we have just two terms that look like Log(N) + Log(M+N), then we can combine them both and say that they will definitely be less than 2 * Log(M+N) and will therefore be O(Log (M+N)).
However, if there is a relation between the number of items being summed and highest value of the item, then the presence of such a relation makes the calculation slightly not so straightforward.
For example, the big O of addition of 2 (Log N)'s, is O(Log N), while the big O of summation of N Log N's is not O(Log N) but is O(N * Log N).
In the given summation, the value and the number of total values is dependent on M and N, therefore we cannot put this complexity as Log(M+N), however, we can definitely write it as M * (Log (M+N)).
How? Each of the values in the given summation is less than or equal to Log(M + N), and there are total M such values. Hence the summation of these values will be less than M * (Log (M+N)) and therefore will be O(M * (Log (M+N))).
Thus, both answers are correct but O(Log ( (N+M)! / (N-1)! )) is a tighter bound.
If M does not depend on N and does not vary then the complexity is O(log(N))
For k such as 0 <= k <= M and N>=M and N>=2,
log(N+k)=log(N(1+k/N)) = log(N) + log(1+k/N) <= log(N) + log(2)
<= log(N) + log(N) <= 2 log(N)
So
log(N) + log(N+1) + log(N+2) + ... + log(N+M) <= (M+1)2 log(N)
So the complexity in big O is: log(N)
To answer your questions:
1) yes because there is a fixed number of elements all less or equal than log(N+M)
2) In fact there are M + 1 elements (from 0 to M)
I specify that O((M+1)log(N+M)) is a O(log(N))

Analysis of dictionary

My first question in the analysis it is mentioned as
n+(n/2)+(n/4)+--- is atmost 2n. how we got result as atmost 2n?
We have a collection of arrays, where array "i" has size (2 to power
i). Each array is either empty of full, and each is sorted. However,
there will be no relationship between the items in different arrays.
The issue of which arrays are full and which is empty is based on the
binary representation of the number of items we are storing.
To perform a lookup, we just do binary search in eac occupied aray. In
the worst case this takes time O(log(n) + log(n/2) + log(n/4) +....+1)
= O(log square n).
Following is question on above text snippet.
How does author came with O(log(n) + log(n/2) + log(n/4) +....+1) ?
and above sum is O(log square n).
Thanks!
n + n/2 + n/4 + n/8 ... = n * (1/1 + 1/2 + 1/4 + 1/8 + ...)
The sum 1/1 + 1/2 + 1/4 + 1/8 + ... is a geometric series that converges to 2, so the result is 2n.
Apparently, the author is talking about a collection of arrays with the sizes n, n/2, n/4, ..., and he is doing a binary search in each of them. A binary search in an array with n elements takes O(log n) time, so the total time required is O(log n + log n/2 + log n/4 + ...).

Is log(n!) = Θ(n·log(n))?

I am to show that log(n!) = Θ(n·log(n)).
A hint was given that I should show the upper bound with nn and show the lower bound with (n/2)(n/2). This does not seem all that intuitive to me. Why would that be the case? I can definitely see how to convert nn to n·log(n) (i.e. log both sides of an equation), but that's kind of working backwards.
What would be the correct approach to tackle this problem? Should I draw the recursion tree? There is nothing recursive about this, so that doesn't seem like a likely approach..
Remember that
log(n!) = log(1) + log(2) + ... + log(n-1) + log(n)
You can get the upper bound by
log(1) + log(2) + ... + log(n) <= log(n) + log(n) + ... + log(n)
= n*log(n)
And you can get the lower bound by doing a similar thing after throwing away the first half of the sum:
log(1) + ... + log(n/2) + ... + log(n) >= log(n/2) + ... + log(n)
= log(n/2) + log(n/2+1) + ... + log(n-1) + log(n)
>= log(n/2) + ... + log(n/2)
= n/2 * log(n/2)
I realize this is a very old question with an accepted answer, but none of these answers actually use the approach suggested by the hint.
It is a pretty simple argument:
n! (= 1*2*3*...*n) is a product of n numbers each less than or equal to n. Therefore it is less than the product of n numbers all equal to n; i.e., n^n.
Half of the numbers -- i.e. n/2 of them -- in the n! product are greater than or equal to n/2. Therefore their product is greater than the product of n/2 numbers all equal to n/2; i.e. (n/2)^(n/2).
Take logs throughout to establish the result.
Sorry, I don't know how to use LaTeX syntax on stackoverflow..
See Stirling's Approximation:
ln(n!) = n*ln(n) - n + O(ln(n))
where the last 2 terms are less significant than the first one.
For lower bound,
lg(n!) = lg(n)+lg(n-1)+...+lg(n/2)+...+lg2+lg1
>= lg(n/2)+lg(n/2)+...+lg(n/2)+ ((n-1)/2) lg 2 (leave last term lg1(=0); replace first n/2 terms as lg(n/2); replace last (n-1)/2 terms as lg2 which will make cancellation easier later)
= n/2 lg(n/2) + (n/2) lg 2 - 1/2 lg 2
= n/2 lg n - (n/2)(lg 2) + n/2 - 1/2
= n/2 lg n - 1/2
lg(n!) >= (1/2) (n lg n - 1)
Combining both bounds :
1/2 (n lg n - 1) <= lg(n!) <= n lg n
By choosing lower bound constant greater than (1/2) we can compensate for -1 inside the bracket.
Thus lg(n!) = Theta(n lg n)
Helping you further, where Mick Sharpe left you:
It's deriveration is quite simple:
see http://en.wikipedia.org/wiki/Logarithm -> Group Theory
log(n!) = log(n * (n-1) * (n-2) * ... * 2 * 1) = log(n) + log(n-1) + ... + log(2) + log(1)
Think of n as infinitly big. What is infinite minus one? or minus two? etc.
log(inf) + log(inf) + log(inf) + ... = inf * log(inf)
And then think of inf as n.
Thanks, I found your answers convincing but in my case, I must use the Θ properties:
log(n!) = Θ(n·log n) => log(n!) = O(n log n) and log(n!) = Ω(n log n)
to verify the problem I found this web, where you have all the process explained: http://www.mcs.sdsmt.edu/ecorwin/cs372/handouts/theta_n_factorial.htm
http://en.wikipedia.org/wiki/Stirling%27s_approximation
Stirling approximation might help you. It is really helpful in dealing with problems on factorials related to huge numbers of the order of 10^10 and above.
This might help:
eln(x) = x
and
(lm)n = lm*n
If you reframe the problem, you can solve this with calculus! This method was originally shown to me via Arthur Breitman https://twitter.com/ArthurB/status/1436023017725964290.
First, you take the integral of log(x) from 1 to n it is n*log(n) -n +1. This proves a tight upper bound since log is monotonic and for every point n, the integral from n to n+1 of log(n) > log(n) * 1. You can similarly craft the lower bound using log(x-1), as for every point n, 1*log(n) > the integral from x=n-1 to n of log(x). The integral of log(x) from 0 to n-1 is (n-1)*(log(n-1) -1), or n log(n-1) -n -log(n-1)+1.
These are very tight bounds!

Resources