Algorithms bounds and functions describing algorithm - algorithm

ok so basically i need to understand how i can compare functions so that i can find big O big theta and big omega for algorithms of a program
my mathematics background is not very strong but i have the foundations down
and my question is
is there a mathematical way to find where two functions will intersect and eventually one dominate the other from some point n
for example if i have a function
2n^2 and 64nlog(n) [with log to base 2]
how can i find at which values of n, 2n^2 will upper bound( hope i used the correct term here) 64nlog(n) and also how to apply this to any other function
is it just simply guess work?

You just need to find the intercepts of the two functions. So set them equal to each other and solve for n. Or just plot them and get a rough idea of the intercepts so you know at what size of data you should switch between each algorithm. Also, Wolfram Alpha has useful function solving and plotting tools too, just in case you are a bit rusty on math.

Big O, Big Theta, and Big Omega are about general sorts of growth patterns for algorithms. Each algorithm can be implemented an infinite number of ways. Each implementation may have a specific execution time, which relates to the Big O of the algorithm. You can compare the execution-time-per-input-size of two implementations of an algorithm, but not for an algorithm by itself. Big O, Big Theta, and Big Omega say next-to-nothing about the execution-time of an algorithm. As such, you cannot even guess a specific intersection point of where one algorithm becomes faster, because such a concept makes no sense. We can discuss intersection points in theory, but not in detail, because it has no value for algorithms, only implementations.
It's also important to note that Big O (and similar) have no constant factors, like your functions do.
http://en.wikipedia.org/wiki/Big_O_notation

You can divide out an n on both sides, and a common factor 2, and then solve this:
32 log(n) <= c n for n >= n_0
Let's say c = 32, because then it's log(n) <= n (which looks nice), and n_0 can be 1.
But this sort of thing has already been done. Unless you are in CS class, you can just say that O(n log n) is a subset of O(n2) without proving it, it's well-known anyway.

For big-O, big-Theta, and big-Omega, you don't have to find the intersection point. You just have to prove that one function eventually dominates the other (up to constant factors). One useful tool for this for example is L'Hopital's rule: Take the ratio of the two running time functions and compute the limit as n goes to infinity, and if the limit is infinity/infinity then you can apply L'Hopital's rule to get that the limit of the ratio is equal to the limit of the ratio of derivatives, and sometimes get a solution. For example this shows that log(n) = O(n^c) for any c > 0.

Related

Can't understand particalur Big-O example in textbook

So far understanding Big-O notation and how it's calculated is ok...most of the situations are easy to understand. However, I just came across this one problem that I cannot for the life of me figure out.
Directions: select the best big-O notation for the expression.
(n^2 + lg(n))(n-1) / (n + n^2)
The answer is O(n). That's all fine and dandy, but how is that rationalized given the n^3 factor in the numerator? n^3 isn't the best, but I thought there was like a "minimum" basis between f(n) <= O(g(n))?
The book has not explained any mathematical inner-workings, everything has sort of been injected into a possible solution (taking f(n) and generating a g(n) that's slightly greater than f(n)).
Kinda stumped. Go crazy on the math, or math referencing, if you must.
Also, given a piece of code, how does one determine the time units per line? How do you determine logarithmic times based off of a line of code (or multiple lines of code)? I understand that declaring and setting a variable is considered 1 unit of time, but when things get nasty, how would I approach a solution?
If you throw this algorithm into Wolfram Alpha, you get this generic result:
If you expand (FOIL) it, you get (roughly) a cubic function divided by a quadratic function. With Big-O, constants don't matter and the larger power wins, so you'd result with something like this:
The rest from here is mathematical induction. The overall algorithm grows in a linear-like fashion with respect to larger and larger values of n. It's not quite linear so we can't say it has a Big-Omega of (n), but it does come fairly reasonably close to O(n) due to the amortized constant growth rate.
Alternatively, you could annoy mathematicians everywhere and say, "Since this is based on Big-O rules, we can drop the factor of n from the denominator and thus result in O(n) by simple division." However, it's important in my mind to consider that this is still not quite linear.
Mind, this is a less-rigorous explanation than might be satisfactory for your class, but this gives you some math-based perspective on its runtime.
Non-rigorous answer:
Distributing the numerator product, we find that the numerator is n^3 + n log(n) - n^2 - log n.
We note that the numerator grows as n^3 for large n, and the denominator grows as n^2 for large n.
We interpret that as growth as n^{3 - 2} for large n, or O(n).

What makes two functions have fundamentally different efficiency times in Big O notation

Today I was posed with the problem of ordering several functions based on efficiency and computing mathematically which functions would have the same Big O notation. Long story short, I ended up getting into a disagreement with my classmate as to whether or not there is a fundamental difference between a function with a run time of 2^n and on with a run time of n^(n/2).
I was taught that in Big O notation, leading coefficients end up being insignificant as n approaches infinity because they are just a vertical scaling of the same parent function, when n is huge, 6*n isn't really THAT different from n, as they both have the same parent growth rate, which makes sense. My argument followed that because this vertical scaling of the function was insignificant because it was simply a child function of the same thing, any constant transformation made would retain the overall parent function, so the child would have the same base growth rate and end up being simplified to the same notation (in this case O(2^n)).
My classmate made the point that
2^(n/2) = (2^(1/2))^n = sqrt(2)^n
....and because 1.414^n is quite smaller than 2^n as n approaches infinity, then it should be noticeably larger.
My classmate then proposed that two functions have different Big O notations if
lim((f(n)'s efficiency)/(g(n)'s efficiency)) as n->infinity
is either infinity (f(n) is bigger), or 0 (g(n) is bigger)
And because ((2^n) / (2^(n/2))) = ((2^(n/2) * 2^(n/2)) / (2^(n/2))) =
2^(n/2), approaches infinity, they must have a rate of change that is
fundamentally different.
My classmate's theory of what makes two algorithms have different Big O notation clearly makes sense for linear vs linear, linear vs quadratic, and just about any other common situation, but then again, so does mine. Something that is a transformed linear function (meaning it is translated and or scaled vertically or horizontally, but not scaled by a negative number, zero, et cetera) will always have a Big O notation of O(n), because it is linear. Any quadratic function will end up being O(n^2) because the constants will become insignificant and only the n^2 term will matter, because it is a transformed quadratic. (works for other stuff too, you get it) Obviously, x^2 is fundamentally different from x^3, because you cannot scale a quadratic to match a cubic function, so they must be different enough to get their own categories in Big-O.
Clearly [at least] one of us is thinking about this the wrong way. I mean, either O(2^(n/2)) gets simplified to O(2^n) or it doesn't, right?
So which one (if either) of us is right, and why is the other wrong, and most importantly, how do we tell if two inefficiencies are fundamentally different in a situation like this?
Thanks!
Your initial question compares 2^n and n^(n/2). It's easy to see than n^(n/2) is slower because at n=4 they are equal (16), and from that point on n^(n/2) is increasingly larger.
2^(n/2) is smaller than both of these. Effectively, where c is a constant and nc is n * a constant, nc^c < c^nc < nc^nc.
In the case of 2^n vs 2^(n/2), 2^n has a larger O because no constant can make 2^(n/2) keep up with it as n increases.

Why do we ignore co-efficients in Big O notation?

While searching for answers relating to "Big O" notation, I have seen many SO answers such as this, this, or this, but still I have not clearly understood some points.
Why do we ignore the co-efficients?
For example this answer says that the final complexity of 2N + 2 is O(N); we remove the leading co-efficient 2 and the final constant 2 as well.
Removing the final constant of 2 perhaps understandable. After all, N may be very large and so "forgetting" the final 2 may only change the grand total by a small percentage.
However I cannot clearly understand how removing the leading co-efficient does not make difference. If the leading 2 above became a 1 or a 3, the percentage change to the grand total would be large.
Similarly, apparently 2N^3 + 99N^2 + 500 is O(N^3). How do we ignore the 99N^2 along with the 500?
The purpose of the Big-O notation is to find what is the dominant factor in the asymptotic behavior of a function as the value tends towards the infinity.
As we walk through the function domain, some factors become more important than others.
Imagine f(n) = n^3+n^2. As n goes to infinity, n^2 becomes less and less relevant when compared with n^3.
But that's just the intuition behind the definition. In practice we ignore some portions of the function because of the formal definition:
f(x) = O(g(x)) as x->infinity
if and only if there is a positive real M and a real x_0 such as
|f(x)| <= M|g(x)| for all x > x_0.
That's in wikipedia. What that actually means is that there is a point (after x_0) after which some multiple of g(x) dominates f(x). That definition acts like a loose upper bound on the value of f(x).
From that we can derive many other properties, like f(x)+K = O(f(x)), f(x^n+x^n-1)=O(x^n), etc. It's just a matter of using the definition to prove those.
In special, the intuition behind removing the coefficient (K*f(x) = O(f(x))) lies in what we try to measure with computational complexity. Ultimately it's all about time (or any resource, actually). But it's hard to know how much time each operation take. One algorithm may perform 2n operations and the other n, but the latter may have a large constant time associated with it. So, for this purpose, isn't easy to reason about the difference between n and 2n.
From a (complexity) theory point of view, the coefficients represent hardware details that we can ignore. Specifically, the Linear Speedup Theorem dictates that for any problem we can always throw an exponentially increasing amount of hardware (money) at a computer to get a linear boost in speed.
Therefore, modulo expensive hardware purchases two algorithms that solve the same problem, one at twice the speed of the other for all input sizes, are considered essentially the same.
Big-O (Landau) notation has its origins independently in number theory, where one of its uses is to create a kind of equivalence between functions: if a given function is bounded above by another and simultaneously is bounded below by a scaled version of that same other function, then the two functions are essentially the same from an asymptotic point of view. The definition of Big-O (actually, "Big-Theta") captures this situation: the "Big-O" (Theta) of the two functions are exactly equal.
The fact that Big-O notation allows us to disregard the leading constant when comparing the growth of functions makes Big-O an ideal vehicle to measure various qualities of algorithms while respecting (ignoring) the "freebie" optimizations offered by the Linear Speedup Theorem.
Big O provides a good estimate of what algorithms are more efficient for larger inputs, all things being equal; this is why for an algorithm with an n^3 and an n^2 factor we ignore the n^2 factor, because even if the n^2 factor has a large constant it will eventually be dominated by the n^3 factor.
However, real algorithms incorporate more than simple Big O analysis, for example a sorting algorithm will often start with a O(n * log(n)) partitioning algorithm like quicksort or mergesort, and when the partitions become small enough the algorithm will switch to a simpler O(n^2) algorithm like insertionsort - for small inputs insertionsort is generally faster, although a basic Big O analysis doesn't reveal this.
The constant factors often aren't very interesting, and so they're omitted - certainly a difference in factors on the order of 1000 is interesting, but usually the difference in factors are smaller, and then there are many more constant factors to consider that may dominate the algorithms' constants. Let's say I've got two algorithms, the first with running time 3*n and the second with running time 2*n, each with comparable space complexity. This analysis assumes uniform memory access; what if the first algorithm interacts better with the cache, and this more than makes up for the worse constant factor? What if more compiler optimizations can be applied to it, or it behaves better with the memory management subsystem, or requires less expensive IO (e.g. fewer disk seeks or fewer database joins or whatever) and so on? The constant factor for the algorithm is relevant, but there are many more constants that need to be considered. Often the easiest way to determine which algorithm is best is just to run them both on some sample inputs and time the results; over-relying on the algorithms' constant factors would hide this step.
An other thing is that, what I have understood, the complexity of 2N^3 + 99N^2 + 500 will be O(N^3). So how do we ignore/remove 99N^2 portion even? Will it not make difference when let's say N is one miilion?
That's right, in that case the 99N^2 term is far overshadowed by the 2N^3 term. The point where they cross is at N=49.5, much less than one million.
But you bring up a good point. Asymptotic computational complexity analysis is in fact often criticized for ignoring constant factors that can make a huge difference in real-world applications. However, big-O is still a useful tool for capturing the efficiency of an algorithm in a few syllables. It's often the case that an n^2 algorithm will be faster in real life than an n^3 algorithm for nontrivial n, and it's almost always the case that a log(n) algorithm will be much faster than an n^2 algorithm.
In addition to being a handy yardstick for approximating practical efficiency, it's also an important tool for the theoretical analysis of algorithm complexity. Many useful properties arise from the composability of polynomials - this makes sense because nested looping is fundamental to computation, and those correspond to polynomial numbers of steps. Using asymptotic complexity analysis, you can prove a rich set of relationships between different categories of algorithms, and that teaches us things about exactly how efficiently certain problems can be solved.
Big O notation is not an absolute measure of complexity.
Rather it is a designation of how complexity will change as the variable changes. In other words as N increases the complexity will increase
Big O(f(N)).
To explain why terms are not included we look at how fast the terms increase.
So, Big O(2n+2) has two terms 2n and 2. Looking at the rate of increase
Big O(2) this term will never increase it does not contribute to the rate of increase at all so it goes away. Also since 2n increases faster than 2, the 2 turns into noise as n gets very large.
Similarly Big O(2n^3 + 99n^2) compares Big O(2n^3) and Big O(99n^2). For small values, say n < 50, the 99n^2 will contribute a larger nominal percentage than 2n^3. However if n gets very large, say 1000000, then 99n^2 although nominally large it is insignificant (close to 1 millionth) compared to the size of 2n^3.
As a consequence Big O(n^i) < Big O(n^(i+1)).
Coefficients are removed because of the mathematical definition of Big O.
To simplify the definition says Big O(f(n)) = Big O(f(cn)) for a constant c. This needs to be taken on faith because the reason for this is purely mathematical, and as such the proof would be too complex and dry to explain in simple terms.
The mathematical reason:
The real reason why we do this, is the way Big O-Notation is defined:
A series (or lets use the word function) f(n) is in O(g(n)) when the series f(n)/g(n) is bounded. Example:
f(n)= 2*n^2
g(n)= n^2
f(n) is in O(g(n)) because (2*n^2)/(n^2) = 2 as n approaches Infinity. The term (2*n^2)/(n^2) doesn't become infinitely large (its always 2), so the quotient is bounded and thus 2*n^2 is in O(n^2).
Another one:
f(n) = n^2
g(n) = n
The term n^2/n (= n) becomes infinetely large, as n goes to infinity, so n^2 is not in O(n).
The same principle applies, when you have
f(n) = n^2 + 2*n + 20
g(n) = n^2
(n^2 + 2*n + 20)/(n^2) is also bounded, because it tends to 1, as n goes to infinity.
Big-O Notation basically describes, that your function f(n) is (from some value of n on to infinity) smaller than a function g(n), multiplied by a constant. With the previous example:
2*n^2 is in O(n^2), because we can find a value C, so that 2*n^2 is smaller than C*n^2. In this example we can pick C to be 5 or 10, for example, and the condition will be satisfied.
So what do you get out of this? If you know your algorithm has complexity O(10^n) and you input a list of 4 numbers, it may take only a short time. If you input 10 numbers, it will take a million times longer! If it's one million times longer or 5 million times longer doesn't really matter here. You can always use 5 more computers for it and have it run in the same amount of time, the real problem here is, that it scales incredibly bad with input size.
For practical applications the constants does matter, so O(2 n^3) will be better than O(1000 n^2) for inputs with n smaller than 500.
There are two main ideas here: 1) If your algorithm should be great for any input, it should have a low time complexity, and 2) that n^3 grows so much faster than n^2, that perfering n^3 over n^2 almost never makes sense.

Meaning of Big O notation

Our teacher gave us the following definition of Big O notation:
O(f(n)): A function g(n) is in O(f(n)) (“big O of f(n)”) if there exist
constants c > 0 and N such that |g(n)| ≤ c |f(n)| for all n > N.
I'm trying to tease apart the various components of this definition. First of all, I'm confused by what it means for g(n) to be in O(f(n)). What does in mean?
Next, I'm confused by the overall second portion of the statement. Why does saying that the absolute value of g(n) less than or equal f(n) for all n > N mean anything about Big O Notation?
My general intuition for what Big O Notation means is that it is a way to describe the runtime of an algorithm. For example, if bubble sort runs in O(n^2) in the worst case, this means that it takes the time of n^2 operations (in this case comparisons) to complete the algorithm. I don't see how this intuition follows from the above definition.
First of all, I'm confused by what it means for g(n) to be in O(f(n)). What does in mean?
In this formulation, O(f(n)) is a set of functions. Thus O(N) is the set of all functions that are (in simple terms) proportional to N as N tends to infinity.
The word "in" means ... "is a member of the set".
Why does saying that the absolute value of g(n) less than or equal f(n) for all n > N mean anything about Big O Notation?
It is the definition. And besides you have neglected the c term in your synopsis, and that is an important part of the definition.
My general intuition for what Big O Notation means is that it is a way to describe the runtime of an algorithm. For example, if bubble sort runs in O(n^2) in the worst case, this means that it takes the time of n^2 operations (in this case comparisons) to complete the algorithm. I don't see how this intuition follows from the above definition.
Your intuition is incorrect in two respects.
Firstly, the real definition of O(N^2) is NOT that it takes N^2 operations. it is that it takes proportional to N^2 operations. That's where the c comes into it.
Secondly, it is only proportional to N^2 for large enough values of N. Big O notation is not about what happens for small N. It is about what happens when the problem size scales up.
Also, as a a comment notes "proportional" is not quite the right phraseology here. It might be more correct to say "tends towards proportional" ... but in reality there isn't a simple english description of what is going on here. The real definition is the mathematical one.
If you now reread the definition in the light of that, you should see that it fits just nicely.
(Note that the definitions of Big O, and related measures of complexity can also be expressed in calculus terminology; i.e. using "limits". However, generally speaking the things we are talking about are quantized; i.e. an integer number instructions, an integer number bytes of storage, etc. Calculus is really about functions involving real numbers. Hence, you could argue that the formulation above is preferable. OTOH, a real mathematician would probably see bus-sized holes in this argumentation.)
O(g(n)) looks like a function, but it is actually a set of functions. If a function f is in O(g(n)), it means that g is an asymptotic upper bound on f to within a constant factor. O(g(n)) contains all functions that are bounded from above by g(n).
More specifically, there exists a constant c and n0 such that f(n) < c * g(n) for all n > n0. This means that c * g(n) will always overtake f(n) beyond some value of n. g is asymptotically larger than f; it scales faster.
This is used in the analysis of algorithms as follows. The running time of an algorithm is impossible to specify practically. It would obviously depend on the machine on which it runs. We need a way of talking about efficiency that is unconcerned with matters of hardware. One might naively suggest counting the steps executed by the algorithm and using that as the measure of running time, but this would depend on the granularity with which the algorithm is specified and so is no good either. Instead, we concern ourselves only with how quickly the running time (this hypothetical thing T(n)) scales with the size of the input n.
Thus, we can report the running time by saying something like:
My algorithm (algo1) has a running time T(n) that is in the set O(n^2). I.e. it's bounded from above by some constant multiple of n^2.
Some alternative algorithm (algo2) might have a time complexity of O(n), which we call linear. This may or may not be better for some particular input size or on some hardware, but there is one thing we can say for certain: as n tends to infinity, algo2 will out-perform algo1.
Practically then, one should favour algorithms with better time complexities, as they will tend to run faster.
This asymptotic notation may be applied to memory usage also.

Asymptotic Notations and forming Recurrence relations by analysing the algorithms

I went through many lectures, videos and sources regarding Asymptotic notations. I understood what O, Omega and Theta were. But in algorithms, why do we use only Big Oh notation always, why not Theta and Omega (I know it sounds noobish, but please help me with this). What exactly is this upperbound and lowerbound in accordance with Algorithms?
My next question is, how do we find the complexity from an algorithm. Say I have an algorithm, how do I find the recurrence relation T(N) and then compute the complexity out of it? How do I form these equations? Like in the case of Linear Search using Recursive way, T(n)=T(N-1) + 1. How?
It would be great if someone can explain me considering me a noob so that I can understand even better. I found some answers but wasn't convincing enough in StackOverFlow.
Thank you.
Why we use big-O so much compared to Theta and Omega: This is partly cultural, rather than technical. It is extremely common for people to say big-O when Theta would really be more appropriate. Omega doesn't get used much in practice both because we frequently are more concerned about upper bounds than lower bounds, and also because non-trivial lower bounds are often much more difficult to prove. (Trivial lower bounds are usually the kind that say "You have to look at all of the input, so the running time is at least equal to the size of the input.")
Of course, these comments about lower bounds also partly explain Theta, since Theta involves both an upper bound and a lower bound.
Coming up with a recurrence relation: There's no simple recipe that addresses all cases. Here's a description for relatively simple recursive algorithmms.
Let N be the size of the initial input. Suppose there are R recursive calls in your recursive function. (Example: for mergesort, R would be 2.) Further suppose that all the recursive calls reduce the size of the initial input by the same amount, from N to M. (Example: for mergesort, M would be N/2.) And, finally, suppose that the recursive function does W work outside of the recursive calls. (Example: for mergesort, W would be N for the merge.)
Then the recurrence relation would be T(N) = R*T(M) + W. (Example: for mergesort, this would be T(N) = 2*T(N/2) + N.)
When we create an algorithm, it's always in order to be the fastest and we need to consider every case. This is why we use O, because we want to major the complexity and be sure that our algorithm will never overtake this.
To assess the complexity, you have to count the number of step. In the equation T(n) = T(n-1) + 1, there is gonna be N step before compute T(0), then the complixity is linear. (I'm talking about time complexity and not space complexity).

Resources