I have a homework problem:
If an algorithm takes 0.5 ms for an input size of 100, how long will it take for inputs of 500, 1,000 and 10,000 if it is:
Linear,
O(N log N),
Quadratic,
Cubic,
Exponential.
I understand the basic concepts here, I'm just unsure of how to approach the question mathematically. I'm tempted to simply calculate the amount of time it takes to do process each individual item of input (for instance, in a) I'll divide 0.5 by 100 to get .05, which I'll then multiply by 500, 1000 and 1000 to find how long it will take to process inputs of those sizes.
While this is simple for the linear calculations, and also pretty straightforward for quadratic and cubic ones, I don't quite understand how to apply this strategy to (N log N) or exponential functions: What value do I use as the base of the exponent?
For (N log N), is it as simple as calculating c = 100 * log(100) = 200; then calculating c = 500 * log(500) = 1349, which is 6.745 times 200, then multiplying 6.745 by .5 to arrive at 3.3725 as a final answer?
This is a bad exercise:
It teaches you to extrapolate from a single datapoint.
It teaches you to apply the results of asymptotic analysis where they can't be applied.
To expand on the latter, you cannot predict the performance of a system when given specific inputs based on an asymptotic bound because of hidden terms. Many algorithms have hidden constant and linear terms that dominate the run time for small input sizes like n=100.
But if you are supposed to ignore these facts, your approach is correct.
Suppose the function's runtime is given by T(n). You can solve these problems by doing the following:
Write a general expression for T(n) in terms of the function's growth rate, given there is some hidden constant multiplier. For example, for the n log n case, write T(n) = c n log n for some constant c.
Solve for c using the one data point you have; T(100) is given to you.
Plug in values of n as needed to estimate the runtime.
This works for any growth rate.
A quick note: "exponential" is not precise enough for you to make a meaningful estimate. 2^n and 100^n are both exponential but the latter grows significantly faster than the former. In computer science, typically logs use base 2, though there's no general conventions for exponents.
Hope this helps!
Related
While searching for answers relating to "Big O" notation, I have seen many SO answers such as this, this, or this, but still I have not clearly understood some points.
Why do we ignore the co-efficients?
For example this answer says that the final complexity of 2N + 2 is O(N); we remove the leading co-efficient 2 and the final constant 2 as well.
Removing the final constant of 2 perhaps understandable. After all, N may be very large and so "forgetting" the final 2 may only change the grand total by a small percentage.
However I cannot clearly understand how removing the leading co-efficient does not make difference. If the leading 2 above became a 1 or a 3, the percentage change to the grand total would be large.
Similarly, apparently 2N^3 + 99N^2 + 500 is O(N^3). How do we ignore the 99N^2 along with the 500?
The purpose of the Big-O notation is to find what is the dominant factor in the asymptotic behavior of a function as the value tends towards the infinity.
As we walk through the function domain, some factors become more important than others.
Imagine f(n) = n^3+n^2. As n goes to infinity, n^2 becomes less and less relevant when compared with n^3.
But that's just the intuition behind the definition. In practice we ignore some portions of the function because of the formal definition:
f(x) = O(g(x)) as x->infinity
if and only if there is a positive real M and a real x_0 such as
|f(x)| <= M|g(x)| for all x > x_0.
That's in wikipedia. What that actually means is that there is a point (after x_0) after which some multiple of g(x) dominates f(x). That definition acts like a loose upper bound on the value of f(x).
From that we can derive many other properties, like f(x)+K = O(f(x)), f(x^n+x^n-1)=O(x^n), etc. It's just a matter of using the definition to prove those.
In special, the intuition behind removing the coefficient (K*f(x) = O(f(x))) lies in what we try to measure with computational complexity. Ultimately it's all about time (or any resource, actually). But it's hard to know how much time each operation take. One algorithm may perform 2n operations and the other n, but the latter may have a large constant time associated with it. So, for this purpose, isn't easy to reason about the difference between n and 2n.
From a (complexity) theory point of view, the coefficients represent hardware details that we can ignore. Specifically, the Linear Speedup Theorem dictates that for any problem we can always throw an exponentially increasing amount of hardware (money) at a computer to get a linear boost in speed.
Therefore, modulo expensive hardware purchases two algorithms that solve the same problem, one at twice the speed of the other for all input sizes, are considered essentially the same.
Big-O (Landau) notation has its origins independently in number theory, where one of its uses is to create a kind of equivalence between functions: if a given function is bounded above by another and simultaneously is bounded below by a scaled version of that same other function, then the two functions are essentially the same from an asymptotic point of view. The definition of Big-O (actually, "Big-Theta") captures this situation: the "Big-O" (Theta) of the two functions are exactly equal.
The fact that Big-O notation allows us to disregard the leading constant when comparing the growth of functions makes Big-O an ideal vehicle to measure various qualities of algorithms while respecting (ignoring) the "freebie" optimizations offered by the Linear Speedup Theorem.
Big O provides a good estimate of what algorithms are more efficient for larger inputs, all things being equal; this is why for an algorithm with an n^3 and an n^2 factor we ignore the n^2 factor, because even if the n^2 factor has a large constant it will eventually be dominated by the n^3 factor.
However, real algorithms incorporate more than simple Big O analysis, for example a sorting algorithm will often start with a O(n * log(n)) partitioning algorithm like quicksort or mergesort, and when the partitions become small enough the algorithm will switch to a simpler O(n^2) algorithm like insertionsort - for small inputs insertionsort is generally faster, although a basic Big O analysis doesn't reveal this.
The constant factors often aren't very interesting, and so they're omitted - certainly a difference in factors on the order of 1000 is interesting, but usually the difference in factors are smaller, and then there are many more constant factors to consider that may dominate the algorithms' constants. Let's say I've got two algorithms, the first with running time 3*n and the second with running time 2*n, each with comparable space complexity. This analysis assumes uniform memory access; what if the first algorithm interacts better with the cache, and this more than makes up for the worse constant factor? What if more compiler optimizations can be applied to it, or it behaves better with the memory management subsystem, or requires less expensive IO (e.g. fewer disk seeks or fewer database joins or whatever) and so on? The constant factor for the algorithm is relevant, but there are many more constants that need to be considered. Often the easiest way to determine which algorithm is best is just to run them both on some sample inputs and time the results; over-relying on the algorithms' constant factors would hide this step.
An other thing is that, what I have understood, the complexity of 2N^3 + 99N^2 + 500 will be O(N^3). So how do we ignore/remove 99N^2 portion even? Will it not make difference when let's say N is one miilion?
That's right, in that case the 99N^2 term is far overshadowed by the 2N^3 term. The point where they cross is at N=49.5, much less than one million.
But you bring up a good point. Asymptotic computational complexity analysis is in fact often criticized for ignoring constant factors that can make a huge difference in real-world applications. However, big-O is still a useful tool for capturing the efficiency of an algorithm in a few syllables. It's often the case that an n^2 algorithm will be faster in real life than an n^3 algorithm for nontrivial n, and it's almost always the case that a log(n) algorithm will be much faster than an n^2 algorithm.
In addition to being a handy yardstick for approximating practical efficiency, it's also an important tool for the theoretical analysis of algorithm complexity. Many useful properties arise from the composability of polynomials - this makes sense because nested looping is fundamental to computation, and those correspond to polynomial numbers of steps. Using asymptotic complexity analysis, you can prove a rich set of relationships between different categories of algorithms, and that teaches us things about exactly how efficiently certain problems can be solved.
Big O notation is not an absolute measure of complexity.
Rather it is a designation of how complexity will change as the variable changes. In other words as N increases the complexity will increase
Big O(f(N)).
To explain why terms are not included we look at how fast the terms increase.
So, Big O(2n+2) has two terms 2n and 2. Looking at the rate of increase
Big O(2) this term will never increase it does not contribute to the rate of increase at all so it goes away. Also since 2n increases faster than 2, the 2 turns into noise as n gets very large.
Similarly Big O(2n^3 + 99n^2) compares Big O(2n^3) and Big O(99n^2). For small values, say n < 50, the 99n^2 will contribute a larger nominal percentage than 2n^3. However if n gets very large, say 1000000, then 99n^2 although nominally large it is insignificant (close to 1 millionth) compared to the size of 2n^3.
As a consequence Big O(n^i) < Big O(n^(i+1)).
Coefficients are removed because of the mathematical definition of Big O.
To simplify the definition says Big O(f(n)) = Big O(f(cn)) for a constant c. This needs to be taken on faith because the reason for this is purely mathematical, and as such the proof would be too complex and dry to explain in simple terms.
The mathematical reason:
The real reason why we do this, is the way Big O-Notation is defined:
A series (or lets use the word function) f(n) is in O(g(n)) when the series f(n)/g(n) is bounded. Example:
f(n)= 2*n^2
g(n)= n^2
f(n) is in O(g(n)) because (2*n^2)/(n^2) = 2 as n approaches Infinity. The term (2*n^2)/(n^2) doesn't become infinitely large (its always 2), so the quotient is bounded and thus 2*n^2 is in O(n^2).
Another one:
f(n) = n^2
g(n) = n
The term n^2/n (= n) becomes infinetely large, as n goes to infinity, so n^2 is not in O(n).
The same principle applies, when you have
f(n) = n^2 + 2*n + 20
g(n) = n^2
(n^2 + 2*n + 20)/(n^2) is also bounded, because it tends to 1, as n goes to infinity.
Big-O Notation basically describes, that your function f(n) is (from some value of n on to infinity) smaller than a function g(n), multiplied by a constant. With the previous example:
2*n^2 is in O(n^2), because we can find a value C, so that 2*n^2 is smaller than C*n^2. In this example we can pick C to be 5 or 10, for example, and the condition will be satisfied.
So what do you get out of this? If you know your algorithm has complexity O(10^n) and you input a list of 4 numbers, it may take only a short time. If you input 10 numbers, it will take a million times longer! If it's one million times longer or 5 million times longer doesn't really matter here. You can always use 5 more computers for it and have it run in the same amount of time, the real problem here is, that it scales incredibly bad with input size.
For practical applications the constants does matter, so O(2 n^3) will be better than O(1000 n^2) for inputs with n smaller than 500.
There are two main ideas here: 1) If your algorithm should be great for any input, it should have a low time complexity, and 2) that n^3 grows so much faster than n^2, that perfering n^3 over n^2 almost never makes sense.
I know that O(log n) refers to an iterative reduction by a fixed ratio of the problem set N (in big O notation), but how do i actually calculate it to see how many iterations an algorithm with a log N complexity would have to preform on the problem set N before it is done (has one element left)?
You can't. You don't calculate the exact number of iterations with BigO.
You can "derive" BigO when you have exact formula for number of iterations.
BigO just gives information how the number iterations grows with growing N, and only for "big" N.
Nothing more, nothing less. With this you can draw conclusions how much more operations/time will the algorithm take if you have some sample runs.
Expressed in the words of Tim Roughgarden at his courses on algorithms:
The big-Oh notation tries to provide a sweet spot for high level algorithm reasoning
That means it is intended to describe the relation between the algorithm time execution and the size of its input avoiding dependencies on the system architecture, programming language or chosen compiler.
Imagine that big-Oh notation could provide the exact execution time, that would mean that for any algorithm, for which you know its big-Oh time complexity function, you could predict how would it behave on any machine whatsoever.
On the other hand, it is centered on asymptotic behaviour. That is, its description is more accurate for big n values (that is why lower order terms of your algorithm time function are ignored in big-Oh notation). It can reasoned that low n values do not demand you to push foward trying to improve your algorithm performance.
Big O notation only shows an order of magnitude - not the actual number of operations that algorithm would perform. If you need to calculate exact number of loop iterations or elementary operations, you have to do it by hand. However in most practical purposes exact number is irrelevant - O(log n) tells you that num. of operations will raise logarythmically with a raise of n
From big O notation you can't tell precisely how many iteration will the algorithm do, it's just estimation. That means with small numbers the different between the log(n) and actual number of iterations could be differentiate significantly but the closer you get to infinity the different less significant.
If you make some assumptions, you can estimate the time up to a constant factor. The big assumption is that the limiting behavior as the size tends to infinity is the same as the actual behavior for the problem sizes you care about.
Under that assumption, the upper bound on the time for a size N problem is C*log(N) for some constant C. The constant will change depending on the base you use for calculating the logarithm. The base does not matter as long as you are consistent about it. If you have the measured time for one size, you can estimate C and use that to guesstimate the time for a different size.
For example, suppose a size 100 problem takes 20 seconds. Using common logarithms, C is 10. (The common log of 100 is 2). That suggests a size 1000 problem might take about 30 seconds, because the common log of 1000 is 3.
However, this is very rough. The approach is most useful for estimating whether an algorithm might be usable for a large problem. In that sort of situation, you also have to pay attention to memory size. Generally, setting up a problem will be at least linear in size, so its cost will grow faster than an O(log N) operation.
Still getting a grip on logarithms being the opposite of exponentials. (Would it also be correct to describe them as the inversion of exponentials?)
There are lots of great SO entries already on Big-O notation including O(log n) and QuickSort n(log n) specifically. Found some useful graphs as well.
In looking at Divide and Conquer algorithms, I'm coming across n log n, which I think is n multiplied by the value of log n. I often try concrete examples like 100 log 100, to help visualize what's going on in an abstract equation.
Just read that log n assumes base 10. Does n log n translate into:
"the number n multiplied by the amount 10 needs to be raised to the power of in order to equal the number n"?
So 100 log 100 equals 200 because 10 needs to be raised to the power of two to equal 100?
Does the base change as an algorithm iterates through a set? Does the base even matter if we're talking in abstractions anyway?
Yes, the base does change depending on the way it iterates, but it doesn't matter. As you might remember, changing the base of logarithms means multiplying them by a constant. Since you mentioned that you have read about Big-O notation, then you probably already know that constants do not make a difference (O(n) is the same as O(2n) or O(1000n)).
EDIT: to clarify something you said - "I'm coming across n log n, which I think is n multiplied by the value of log n". Yes, you are right. And if you want to know why it involves log n, then think of what algorithms like divide and conquer do - they split the input (in two halves or four quarters or ten tenths, depending on the algorithm) during each iteration. The question is "How many times can that input be split up before the algorithm ends?" So you look at the input and try to find how many times you can divide it by 2, or by 4, or by 10, until the operation is meaningless? (unless the purpose of the algorithm is to divide 0 as many times as possible) Now you can give yourself concrete examples, starting with easy stuff like "How many times can 8 be divided by 2?" or ""How many times can 1000 be divided by 10?"
You don't need to worry about the base - if you're dealing with algorithmic complexity, it doesn't matter which base you're in, because the difference is just a constant factor.
Fundamentally, you just need to know that log n means that as n increases exponentially, the running time (or space used) increases linearly. For example, if n=10 takes 1 minute, then n=100 would take 2 miuntes, and n=1000 would take 3 minutes - roughly. (It's usually in terms of upper bounds, with smaller factors ignored... but that's the general gist of it.)
n log n is just that log n multiplied by n - so the time or space taken increases "a bit faster than linearly", basically.
The base does not matter at all. In fact people tend to drop part of the operations, e.g. if one has O(n^4 + 2*n), this is often reduced to O(n^4). Only the most relevant power needs to be considered when comparing algorithms.
For the case of comparing two closely related algorithms, say O(n^4 + 2*n) against O(n^4 + 3*n), one needs to include the linear dependency in order to conserve the relevant information.
Consider a divide and conquer approach based on bisection: your base is 2, so you may talk about ld(n). On the other hand you use the O-notation to compare different algorithms by means of the same base. This being said, the difference between ld, ln, and log10 is just a matter of a general offset.
Logarithms and Exponents are inverse operations.
if
x^n = y
then
Logx(y) = n
For example,
10^3 = 1000
Log10 (1000) = 3
Divide and conquer algorithms work by dividing the problem into parts that are then solved as independent problems. There can also be a combination step that combines the parts. Most divide and conquer algorithms are base 2 which means they cut the problem in half each time. For example, Binary Search, works like searching a phone book for a name. You flip to the middle and say.. Is the name I'm looking for in the first half or last half? (before or after what you flipped to), then repeat. Every time you do this you divide the problem's size by 2. Therefore, it's base 2, not base 10.
Order notation is primarily only concerned with the "order" of the runtime because that is what is most important when trying to determine if a problem will be tractable (solvable in a reasonable amount of time).
Examples of different orders would be:
O(1)
O(n)
O(n * log n)
O(n^2 * log n)
O(n^3.5)
O(n!)
etc.
The O here stands for "big O notation" which is basically provides an upper bound on the growth rate of the function. Because we only care about the growth of the function for large inputs.. we typically ignore lower order terms for example
n^3 + 2 n^2 + 100 n
would be
O(n^3)
because n^3 is the largest order term it will dominate the growth function for large values of N.
When you see O(n * log n) people are just abbreviating... if you understand the algorithm it is typically Log base 2 because most algorithms cut the problem in half. However, it could be log base 3 for example if the algorithm cut the problem into thirds for example.
Note:
In either case, if you were to graph the growth function, it would appear as a logarithmic curve. But, of course a O(Log3 n) would be faster than O(Log2 n).
The reason you do not see O(log10 n) or O(log3 n) etc.. is that it just isn't that common for an algorithm to work better this way. In our phone book example you could split the pages into 3 separate thirds and compare inbetween 1-2 and 2-3. But, then you just made 2 comparisons and ended up knowing which 1/3 the name was in. However, if you just split it in half each time you'd know which 1/4 it was in which is more efficient.
In the vast set of programming languages I know, the function log() is intended to be base e=2.718281....
In mathematical books sometimes it means base "ten" and sometimes base "e".
As another answers pointed out, for the big-O notation does not matter, because, for all base x, the complexities O(log_x (n)) is the same as O(ln(n)) (here log_x means "logarithm in base x" and ln() means "logarithm in base e").
Finally, it's common that, in the analysis of several algorithms, it's more convenient consider that log() is, indeed, "logarithm in base 2". (I've seen some texts taking this approach). This is obviously related to the binary representation of numbers in the computers.
Suppose you run a O(log n) algorithm with an input size of 1000 and the algorithm requires 110
operations. When you double the input size to 2000, the algorithm now requires 120 operations. What is
your best guess for the number of operations required when you again double the input size to 4000?
The Big-O notation is used to indicate the runtime of the algorithm with respect to the input size in the worst-case. It does not predict anything about the actual number of operations. It does not take into account the low order terms and the constant factors.
There's an additive constant, corresponding to run-time overhead, in the solution. The following presumes that the result is Ɵ(log n) rather than just O(log n).
You could go on and explicitly solve for the constants if you wanted to make generalized predictions, but doing so based on two points would be pretty dubious.
Let f(n) be the estimation of the number of operations, just put your question into equation :
f(n) = c * log(n) // O(log n) algorithm
f(1000) = 110
f(2000) = 120
f(4000) = ?
Find c and you'll find your answer. But of course, it would only be a best guess estimation based on the given data and limiting behavior of f.
It won't be an accurate prediction for multiple reasons :
The big O notation only gives your the limiting behavior of the algorithm complexity, not the actual complexity formula.
The number of operation may strongly depends on the nature of the data, not just the multiplicity n.
The limiting behavior is calculated in a particular case of possible data (usually worst case).
Could it be done by keeping a counter to see how many iterations an algorithm goes through, or does the time duration need to be recorded?
The currently accepted won't give you any theoretical estimation, unless you are somehow able to fit the experimentally measured times with a function that approximates them. This answer gives you a manual technique to do that and fills that gap.
You start by guessing the theoretical complexity function of the algorithm. You also experimentally measure the actual complexity (number of operations, time, or whatever you find practical), for increasingly larger problems.
For example, say you guess an algorithm is quadratic. Measure (Say) the time, and compute the ratio of time to your guessed function (n^2):
for n = 5 to 10000 //n: problem size
long start = System.time()
executeAlgorithm(n)
long end = System.time()
long totalTime = end - start
double ratio = (double) time / (n * n)
end
. As n moves towards infinity, this ratio...
Converges to zero? Then your guess is too low. Repeat with something bigger (e.g. n^3)
Diverges to infinity? Then your guess is too high. Repeat with something smaller (e.g. nlogn)
Converges to a positive constant? Bingo! Your guess is on the money (at least approximates the theoretical complexity for as large n values as you tried)
Basically that uses the definition of big O notation, that f(x) = O(g(x)) <=> f(x) < c * g(x) - f(x) is the actual cost of your algorithm, g(x) is the guess you put, and c is a constant. So basically you try to experimentally find the limit of f(x)/g(x); if your guess hits the real complexity, this ratio will estimate the constant c.
Algorithm complexity is defined as (something like:)
the number of operations the algorithm does as a function
of its input size.
So you need to try your algorithm with various input sizes (i.e. for sort - try sorting 10 elements, 100 elements etc.), and count each operation (e.g. assignment, increment, mathematical operation etc.) the algorithm does.
This will give you a good "theoretical" estimation.
If you want real-life numbers on the other hand - use profiling.
As others have mentioned, the theoretical time complexity is a function of number of cpu operations done by your algorithm. In general processor time should be a good approximation for that modulo a constant. But the real run time may vary because of a number of reasons such as:
processor pipeline flushes
Cache misses
Garbage collection
Other processes on the machine
Unless your code is systematically causing some of these things to happen, with enough number of statistical samples, you should have a fairly good idea of the time complexity of your algorithm, based on observed runtime.
The best way would be to actually count the number of "operations" performed by your algorithm. The definition of "operation" can vary: for an algorithm such as quicksort, it could be the number of comparisons of two numbers.
You could measure the time taken by your program to get a rough estimate, but various factors could cause this value to differ from the actual mathematical complexity.
yes.
you can track both, actual performance and number of iterations.
Might I suggest using ANTS profiler. It will provide you this kind of detail while you run your app with "experimental" data.