Big-oh complexity for logarithmic algorithms - complexity-theory

Few more problems I ran into calculating the Big-oh complexity. There are 2 problems which I cannot solve due to log base operations. Here are the two problems:
n = # of data items being manipulated
1) n^3 + n^2 log (base 2) n + n^3 log (base 2) n
2) 2n^3 + 1000n^2 + log (base 4) n + 300000n
I am confused when the logs have a base number. How do you go about calculating the complexity for these? Anyone care to explain how you get the complexity with a bit of detail if possible?

The base of the logarithm is irrelevant. You can just ignore it. Therefore:
1) It's O(n^3 log n) because that's the term that grows the fastest.
2) It's O(n^3) for the same reason.
The base is irrelevant because log base a (x) = log base b (x) / log base b (a), so any logarithm differs from another by a constant.
I suggest you read more about the properties of the logarithm here.

You don't need to "calculate complexity for a base number", you can just compare its growth rate to that of the other terms (see this graph of logarithm growth rates, to give you an idea )
Note that to solve these problems, you don't need to pay attention to the base of the logs.
O(x + y + z) === O(max(x,y,z))
So, decide which of the summed terms is largest and you can solve your problems.

In the calculation of asymptotic complexity, n is assumed to be very large and thus constants can be ignored. When you have a sum, only take into account the biggest term.
In your examples, this results in:
1) n^3 log(n)
2) n^3

Related

Why is the worst case time complexity of this simple algorithm T(n/2) +1 as opposed to n^2+T(n-1)?

The following question was on a recent assignment in University. I would have thought the answer would be n^2+T(n-1) as I thought the n^2 would make it's asymptotic time complexity O(n^2). Where as with T(n/2)+1 its asymptotic time complexity would be O(log2(n)).
The answers were returned and it turns out the correct answer is T(n/2)+1 however I can't get my head around why this is the case.
Could someone possibly explain to me why that's the worst case time complexity of this algorithm? It's possible my understanding of time complexity is just wrong.
The asymptotic time complexity is taking n large. In the case of your example, since the question specifies that k is fixed, the only complexity relevant is the last one. See the Wikipedia formal definition, specifically:
As n grows to infinity, the recursion that dominates T(n) = T(n / 2) + 1. You can prove this as well using the formal definition, basically picking x_0 = 10 * k and showing that a finite M can be found using the first two cases. It should be clear that both log(n) and n^2 satisfy the definition, so the tighter bound is the asymptotic complexity.
What does O (f (n)) mean? It means the time is at most c * f (n), for some unknown and possibly large c.
kevmo claimed a complexity of O (log2 n). Well, you can check all the values n ≤ 10k, and let the largest value of T (n) be X. X might be quite large (about 167 k^3 in this case, I think, but it doesn't actually matter). For larger n, the time needed is at most X + log2 (n). Choose c = X, and this is always less than c * log2 (n).
Of course people usually assume that a O (log n) algorithm would be quick, and this one most certainly isn't if say k = 10,000. So you learned as well that O notation must be handled with care.

Big-O Algebra Simplification Issue

I've been working on a problem for several hours now, and I need clarification:
I needed to simplify (as much as possible) the following big-O expressions. For each, I put down what I thought was the correct answer. I would like solutions, but I would appreciate an explanation as well if I am incorrect. I am trying to learn Big O notation as well as possible, and I think doing these problems helped a lot. I just want to make sure I'm on the right path.
a) O(sqrt(n) + log(n)*log(n))
I thought this was O(n)
b) O(3log2 n + 2log3 n)
I thought this was O(log3 (n))
c) O(n^3 + 2n^2 +3n + 4)
I thought this was O(n^3)
Thanks for all your help!
Let's go through this one at a time.
O(sqrt(n) + log(n)*log(n)). I thought this was O(n)
You are correct that this is O(n), but that's not a particularly tight bound. Let's start with a simplifying question: which grows faster, O(sqrt(n)) or O(log(n) * log(n))? Using that information, can you drop one of the two terms from the summation?
O(3log2 n + 2log3 n). I thought this was O(log3 (n))
Remember that "big-O ignores the base of logarithms" (that is, logb n = O(logc n) for any b and c that are greater than one). You're technically right that it's O(log3 n), but that's not the cleanest solution. You'd be better off saying O(log n) here.
O(n^3 + 2n^2 +3n + 4). I thought this was O(n^3)
Exactly right! This works because 2n2 + 3n + 4 is O(n3), so you can drop those terms from the summation. Now, can you use a similar trick to simplify your answer to part (a)?
Hope this helps!
Ok the answer is long but I was pretty throughout.
Intro:
1st thing you need to do is to properly define what you mean by big O. Relevant read. Traditionally it's defined only as upper bound. But it's not very useful in computer science, at least not for task such as yours. You could technically answer with anything growing faster than example i.e. saying O(n!) for all the questions would technically be ok.
More useful is big theta, and usually in CS I saw big O redefined to the meaning of big Theta from the read above. The difference is that your bound, must be tighter and also apply from below.
Definitions/Rules: My favourite method to calculate Big O (and Theta) is using limits. It allows to sum asymptotic behaviour relations in a simple and straight forward manner.
Basically if (x->inf is implied here and thereafter):
lim f(x) / g(x) = infinity - f asymptotically grows bigger than g
lim f(x) / g(x) is a constant > 0 - f asymptotically grows the same as g
lim f(x) / g(x) = 0 - f asymptotically grows slower than g
Number 2. is big Theta. Number 2. and 3. combined are traditional Big O as in "f belongs to O(g)" (or "is O(g)" which is somewhat confusing wording). It means that f will not outgrow g so g is its upper bound.
Now with a little math is pretty easy to prove that Big O (or Theta) will care only about the fastest growing term. This comes straight from limit properties.
I will use O as big Theta from now on because everything holds for Big O too as it is looser.
Explanation of examples:
Your 3rd example is the easiest. You can safely drop 2n^2 +3n + 4 because n^3 is growing faster. You can prove that n^3 + 2n^2 +3n + 4 is O(n^3) it by calculating lim n^3 / (n^3 + 2n^2 +3n + 4).
Same goes for your 2nd exaple, but you need to go through logarithm properties. Basically:
log b1 (x) = c log b2 (x) - it means you can switch the base of logarithm at the expense of a constant... and from above rules definition a constant factor does not change anything, it's still 2. just the constant changes.
Your 1st example is hardest/trickiest, because the limit is most complicated. However, O(f+g) is either O(f) or O(g), because either one grows faster, so the other can be dropped or they asymptotically grow the same so either one can be chosen (their fastest growing term will be the same anyways). This means you need to check which one is growing faster, you do this by ... calculating lim sqrt(n)/(log(n)*log(n)) and choosing according to rules from above. I think this one needs d'Hospital rule.
(a) is the toughest one there I think; (b) and (c) use fairly common rules for Big-Oh simplification.
For (a), I suggest making a substition: let m = [some function of n that makes one of the two terms simpler] and rearrange to get n = [something]. You can then use this to substitute m into the expression, thereby getting rid of all appearances of n, and simplify it according to Big-Oh rules. Then, provided that the function you picked is an increasing function of n, you can substitute n back in and simplify further if need be.

Which is better: O(n log n) or O(n^2)

Okay so I have this project I have to do, but I just don't understand it. The thing is, I have 2 algorithms. O(n^2) and O(n*log2n).
Anyway, I find out in the project info that if n<100, then O(n^2) is more efficient, but if n>=100, then O(n*log2n) is more efficient. I'm suppose to demonstrate with an example using numbers and words or drawing a photo. But the thing is, I don't understand this and I don't know how to demonstrate this.
Anyone here that can help me understand how this works?
Good question. Actually, I always show these 3 pictures:
n = [0; 10]
n = [0; 100]
n = [0; 1000]
So, O(N*log(N)) is far better than O(N^2). It is much closer to O(N) than to O(N^2).
But your O(N^2) algorithm is faster for N < 100 in real life. There are a lot of reasons why it can be faster. Maybe due to better memory allocation or other "non-algorithmic" effects. Maybe O(N*log(N)) algorithm requires some data preparation phase or O(N^2) iterations are shorter. Anyway, Big-O notation is only appropriate in case of large enough Ns.
If you want to demonstrate why one algorithm is faster for small Ns, you can measure execution time of 1 iteration and constant overhead for both algorithms, then use them to correct theoretical plot:
Example
Or just measure execution time of both algorithms for different Ns and plot empirical data.
Just ask wolframalpha if you have doubts.
In this case, it says
n log(n)
lim --------- = 0
n^2
Or you can also calculate the limit yourself:
n log(n) log(n) (Hôpital) 1/n 1
lim --------- = lim -------- = lim ------- = lim --- = 0
n^2 n 1 n
That means n^2 grows faster, so n log(n) is smaller (better), when n is high enough.
Big-O notation is a notation of asymptotic complexity. This means it calculates the complexity when N is arbitrarily large.
For small Ns, a lot of other factors come in. It's possible that an algorithm has O(n^2) loop iterations, but each iteration is very short, while another algorithm has O(n) iterations with very long iterations. With large Ns, the linear algorithm will be faster. With small Ns, the quadratic algorithm will be faster.
So, for small Ns, just measure the two and see which one is faster. No need to go into asymptotic complexity.
Incidentally, don't write the basis of the log. Big-O notation ignores constants - O(17 * N) is the same as O(N). Since log2N is just ln N / ln 2, the basis of the logarithm is just another constant and is ignored.
Let's compare them,
On one hand we have:
n^2 = n * n
On the other hand we have:
nlogn = n * log(n)
Putting them side to side:
n * n versus n * log(n)
Let's divide by n which is a common term, to get:
n versus log(n)
Let's compare values:
n = 10 log(n) ~ 2.3
n = 100 log(n) ~ 4.6
n = 1,000 log(n) ~ 6.9
n = 10,000 log(n) ~ 9.21
n = 100,000 log(n) ~ 11.5
n = 1,000,000 log(n) ~ 13.8
So we have:
n >> log(n) for n > 1
n^2 >> n log(n) for n > 1
Anyway, I find out in the project info that if n<100, then O(n^2) is
more efficient, but if n>=100, then O(n*log2n) is more efficient.
Let us start by clarifying what is Big O notation in the current context. From (source) one can read:
Big O notation is a mathematical notation that describes the limiting
behavior of a function when the argument tends towards a particular
value or infinity. (..) In computer science, big O notation is used to classify algorithms
according to how their run time or space requirements grow as the
input size grows.
Big O notation does not represent a function but rather a set of functions with a certain asymptotic upper-bound; as one can read from source:
Big O notation characterizes functions according to their growth
rates: different functions with the same growth rate may be
represented using the same O notation.
Informally, in computer-science time-complexity and space-complexity theories, one can think of the Big O notation as a categorization of algorithms with a certain worst-case scenario concerning time and space, respectively. For instance, O(n):
An algorithm is said to take linear time/space, or O(n) time/space, if its time/space complexity is O(n). Informally, this means that the running time/space increases at most linearly with the size of the input (source).
and O(n log n) as:
An algorithm is said to run in quasilinear time/space if T(n) = O(n log^k n) for some positive constant k; linearithmic time/space is the case k = 1 (source).
Mathematically speaking the statement
Which is better: O(n log n) or O(n^2)
is not accurate, since as mentioned before Big O notation represents a set of functions. Hence, more accurate would have been "does O(n log n) contains O(n^2)". Nonetheless, typically such relaxed phrasing is normally used to quantify (for the worst-case scenario) how a set of algorithms behaves compared with another set of algorithms regarding the increase of their input sizes. To compare two classes of algorithms (e.g., O(n log n) and O(n^2)) instead of
Anyway, I find out in the project info that if n<100, then O(n^2) is
more efficient, but if n>=100, then O(n*log2n) is more efficient.
you should analyze how both classes of algorithms behaves with the increase of their input size (i.e., n) for the worse-case scenario; analyzing n when it tends to the infinity
As #cem rightly point it out, in the image "big-O denote one of the asymptotically least upper-bounds of the plotted functions, and does not refer to the sets O(f(n))"
As you can see in the image after a certain input, O(n log n) (green line) grows slower than O(n^2) (orange line). That is why (for the worst-case) O(n log n) is more desirable than O(n^2) because one can increase the input size, and the growth rate will increase slower with the former than with the latter.
First, it is not quite correct to compare asymptotic complexity mixed with N constraint. I.E., I can state:
O(n^2) is slower than O(n * log(n)), because the definition of Big O notation will include n is growing infinitely.
For particular N it is possible to say which algorithm is faster by simply comparing N^2 * ALGORITHM_CONSTANT and N * log(N) * ALGORITHM_CONSTANT, where ALGORITHM_CONSTANT depends on the algorithm. For example, if we traverse array twice to do our job, asymptotic complexity will be O(N) and ALGORITHM_CONSTANT will be 2.
Also I'd like to mention that O(N * log2N) which I assume logariphm on basis 2 (log2N) is actually the same as O(N * log(N)) because of logariphm properties.
We have two way to compare two Algo
->first way is very simple compare and apply limit
T1(n)-Algo1
T2(n)=Alog2
lim (n->infinite) T1(n)/T2(n)=m
(i)if m=0 Algo1 is faster than Algo2
(ii)m=k Both are same
(iii)m=infinite Algo2 is faster
*Second way pretty simple as compare to 1st there you just take a log of both but do not neglet multi constant
Algo 1=log n
Algo 2=sqr(n)
keep log n =x
Any poly>its log
O(sqr(n))>o(logn)
I am a mathematician so i will try to explain why n^2 is faster than nlogn for small values of n , with a simple limit , while n-->0 :
lim n^2 / nlogn = lim n / logn = 0 / -inf = 0
so , for small values of n ( in this case "small value" is n existing in [1,99] ) , the nlogn is faster than n^2 , 'cause as we see limit = 0 .
But why n-->0? Because n in an algorithm can take "big" values , so when n<100 , it is considered like a very small value so we can take the limit n-->0.

Big O proof with sqrt and log

I am having trouble figuring out how to prove that
t(n) = sqrt(31n + 12n log n + 57)
is
O(sqrt(n) log n)
I haven't had to deal with square root's in big O notation yet so I am having lots of trouble with this! Any help is greatly appreciated :)
Big O notation is about how algorithm characteristics (clock time taken, memory use, processing time) grow with the size of the problem.
Constant factors get discarded because they don't affect how the value scales.
Minor terms also get discarded because they end up having next to no effect.
So your original equation
sqrt(31n + 12nlogn + 57)
immediately simplifies to
sqrt(n log n)
Square roots distribute, like other kinds of multiplication and division, so this can be straightforwardedly converted to:
sqrt(n) sqrt(log n)
Since logs convert multiplication into addition (this is why slide rules work), this becomes:
sqrt(n) log (n/2)
Again, we discard constants, because we're interested in the class of behaviour
sqrt(n) log n
And, we have the answer.
Update
As has been correctly pointed out,
sqrt(n) sqrt(log n)
does not become
sqrt(n) log (n/2)
So the end of my derivation is wrong.
Start by finding the largest-degree factor inside of the sqrt(), which would be 12nlogn. The largest-degree factor makes all the other factors irrelevant in big O terms, so it becomes O(sqrt(12nlogn)). A constant factor is also irrelevant, so it becomes O(sqrt(nlogn)). Then I suppose you can make the argument this is equal to O(sqrt(n) * sqrt(logn)), or O(sqrt(n) * log(n)^(1/2)), and eliminate the power on logn to get O(sqrt(n)logn). But I don't know what the technical justification would be for that last step, because if you can turn sqrt(logn) into logn, why can't you turn sqrt(n) into n?
Hint: Consider the leading terms of the expansion of sqrt(1 + x) for |x| < 1.

Determining BigO of a recurrence

T (1) = c
T (n) = T (n/2) + dn
How would I determine BigO of this quickly?
Use repeated backsubstitution and find the pattern. An example here.
I'm not entirely sure what dn is, but assuming you mean a constant multiplied by n:
According to Wolfram Alpha, the recurrence equation solution for:
f(n) = f(n / 2) + cn
is:
f(n) = 2c(n - 1) + c1
which would make this O(n).
Well, the recurrence part of the relationship is the T(n/2) part, which is in effect halving the value of n each time.
Thus you will need approx. (log2 n) steps to get to the termination condition, hence the overall cost of the algorithm is O(log2 n). You can ignore the dn part as is it a constant-time operation for each step.
Note that as stated, the problem won't necessarily terminate since halving an arbitrary value of n repeatedly is unlikely to exactly hit 1. I suspect that the T(n/2) part should actually read T(floor (n / 2)) or something like that in order to ensure that this terminates.
use master's theorem
see http://en.wikipedia.org/wiki/Master_theorem
By the way, the asymptotic behaviour of your recurrence is O(n) assuming d is positive and sufficiently smaller than n (size of problem)

Resources