If I have a program that runs over some data in O(n) time, can I semi-accurately guestimate the O(n^3) runtime from my O(n) run?
**O(n) = 5 million iterations # 2 minutes total runtime**
**O(n^2) = ??**
(5 million)^2 = 2.5+13
2.5+13 / 5 million = 5 million minutes
5 million / 60 = 83,333 hours = 3,472 days = 9.5 years
**O(n^3) = ??**
(5 million)^3 = 1.25e+20
1.25e+20 / 5 million = 2.5e+13 minutes
2.5e+13 / 60 = 416666666667 hours = 17361111111.1 days = 47,564,688 years
Technically knowing O(...) doesn't tell you anything about any execution time for specific finite inputs.
Practically, you can make an estimation for example in the way you did, but the caveat is that it will only give you the order-of-magnitude under the assumptions that 1. the constant scaling factor omitted in the O(...) notations is roughly 1 in whatever units you chose (number of iterations here) in both programs/algorithms and 2. that the input value is large enough so that higher-order terms omitted by the O(...) notation are not relevant anymore.
Whether these assumptions are good assumptions will depend on the particular programs/algorithms you are looking at. It is trivial to come up with examples where this is a really bad approximation, but there are also many cases where such an estimate may be reasonable.
If you just want to estimate whether the alternate program will execute in a non-absurd time frame (e.g. hours vs centuries) I think it will often be a good enough for that, assuming you did not choose a weird unit and assuming there is nothing in the program that would explicitly increase the asymptotic scaling, like e.g. an inner loop with exactly 10000000 iterations.
If I have a program that runs over some data in O(n) time, can I semi-accurately guestimate the O(n^3) runtime from my O(n) run?
No.
There is no the O(n3) runtime, nor either any the O(n) time. Asymptotic complexity speaks to how the behavior of a particular program or subprogram scales with input size. You can use that to estimate the performance of the same program for one input size from appropriate measurements of the performance of that program for other input sizes, but that does not give you any information about any other program's specific performance for a given input size.
In particular, your idea seems to be that the usually-ignored coefficient of the bounding function is a property of the machine, but this is not at all the case. The coefficient is mostly a property of the details of the program. If you estimate it for one program then you know it only for that program. Forget programs with different asymptotic complexity: two programs with the same asymptotic complexity can be constructed that have arbitrarily different absolute performance for any given input size.
Related
Suppose a function has a total of 10N + 10 steps. The function class would just be O(N) then. If I want to improve the function's running time, does that mean decrease the number of steps and reduce the function class so that it's less than linear?
Literally you are reducing a function's running time if you make it runs in shorter time. Normally there are two direction to do this: Imagine real life running, you can run in shorter time by strengthen your muscles (upgrade to NASA super computer), or by shorten the distance that you have to run (Improve / change the algorithm to reduce the steps). We only focus the second direction.
Still there are tons of factor to consider like what is the realistic input to your function?
If N is small 99% of the time, then constant factor matters even they are of same class O(N).
O(10^6*N) and O(2*N) are both O(N), but N is not that dominant when it is smaller than 10^6
If N is usually large, you can still say you have improved the function by reducing the constant factor, but it is negligible (but yes you are still reducing it). If you need an observable boost, then you may need to change the algorithm, change the data structures, in order to improve the function to a better complexity class (from O(N) to O(lg N) for example).
Therefore, using your own words: "decrease the number of steps" and "reduce the function class" are both reducing the running time of the function, but which one is observable and thus useful, depends on its usage and other realistic factors.
I am trying to understand computational complexity and how fast a computer can execute instructions depending on the algorithm being used. I found a tutorial on http://www.cs.cmu.edu/~pattis/15-1XX/15-200/lectures/aa/ where it shows different complexity classes and the time it takes to run each depending on how many items it is running.
For example, if a computer can compute 1 instruction every 10^-9 seconds, how do you get those running times. I've been trying to understand the table because it doesn't go too deep into the calculations.
For instance, O(1) - why is it 10^-7 seconds - shouldn't it just be 10^-9?
Also for the other run times i'm not sure how you get those values.
When you say 10^-9, it will mean that there is only one instruction for the given program. Usually its not.
Moreover, the table is giving some approximate information for comparison.
Another information which is given in initial paragraph is the actual program under evaluation to actually calculate the number of instructions executed.
Actually, it would be less confusing if the table had assumed that O(1) operation is exactly 1 operation. However, above the table it is saying "Then the following table gives us an intuitive idea of how running times for algorithms in different complexity classes changes with problem size." So the focus of the entries is on the change in running time as N grows.
So, I could explain the data on the table as follows. It assumes that in each complexity class if it takes a certain amount of time for N = 10 then how much it will change if N becomes 100, 1000, ... .
For example, in case of O(1), if you assume that it takes 10^-7 for N=10, it will take the same amount for larger N's because the complexity is independent from N.
Another row that might be confusing is O(N^3): you can say if it takes 10^-5 for N=10 then for N=100 this value is multiplied by 10^3, and so on. In fact for N=10 it does not exactly calculate 10^3 x 10^-9, but rather assumes some running time and only focuses on change in the running time when N becomes larger.
I am really confused with Big(O) notation. Is Big(O) machine dependent or machine independent ? (Machine in the sense the computer in which we run the Algorithm)
Will Sorting of 1000 numbers using quick sort in i3 processor and i7 processor be the same ? Why don't we consider the machine and it's processor speed when calculating the Time Complexity ? I am a neophyte in this stuff.
Big-O is a measure of scalability, not of speed. It shows you what effect on time and memory it has when you e.g. double the amount of data - does it double the execution, or quadruple it?
Whether you use i7 or i3, double is double. Whether a linear algorithm is fast or slow, double is double.
This also has another implication many people ignore. A complex algorithm such as O(n^3) can be faster than a simple algorithm such as O(n) for a given n that is below a certain limit. Example:
loop n times:
loop n times:
loop n times:
sleep 1 second
is O(n^3), as it has 3 nested loops.
loop n times:
sleep 10 seconds
is O(n), as it only has one loop. For n = 10 the first program executes for 1000 seconds, and the second one executes for only 100. So, O(n) is good! one would be tempted to say. But if you have n = 2, the first, complex program executes in only 8 seconds, while the second, simpler one executes for 20! Even for n = 3, the first executes in 27 seconds, the second one in 30. So while the n is low, a complex program might be able to outperform the simpler one. It's just that as n rises, the complex program gets slower much faster (if that makes sense) than a simple one. For n = 1000, the simple code has risen to only 10000 seconds, but the complex one is now 1000000000 seconds!
Also, this clearly shows you that complexity is not processor-dependent. A second is a second.
EDIT: Also, you might want to read this question, where Big-O is explained in a number of very high-quality answers.
Big(O) Notation is the method of calculating the complexity of an algorithm, and hence the relative time it will take to run. The same algorithm, for the same data, will run faster on a faster processor, but will still take the same number of operations. It's used as a way of evaluating the relative efficiency of different algorithms to achieve the same result.
Big O notation is not architecture-dependent in any way, it is a Mathematical construct.It is a very limited measure of algorithmic complexity, it only gives you a rough upper bound for how performance changes with data size.
Big(O) is alogorithm dependent. It's job is to help compare the relative costs of various algorithms, without the need to consider the machine dependencies.
Linear search though an array, on average will look at about 1/2 of the elements if it is found. for all practical purposes that is O(N/2) which is the same as O(1/2 * N). for compairson, you toss away the coefficient. hence it is O(N) for use.
A binary tree can hold N elements for searching as well. on agerage it will look though log base 2 (N) to find something, hence you will see it described as cost O(LN2(N)).
pop in small values for N, and there isn't a whole lot of difference between the algorithms. Pop in a large value of N, and it will be clear that the binary tree lookup is much faster.
Big(O) is not machine dependent. It is mathematical notation to denote complexity of an algorithm. Usually we use these notations in theory to compare algorithms performance.
I'm trying to understand a particular aspect of Big O analysis in the context of running programs on a PC.
Suppose I have an algorithm that has a performance of O(n + 2). Here if n gets really large the 2 becomes insignificant. In this case it's perfectly clear the real performance is O(n).
However, say another algorithm has an average performance of O(n2 / 2). The book where I saw this example says the real performance is O(n2). I'm not sure I get why, I mean the 2 in this case seems not completely insignificant. So I was looking for a nice clear explanation from the book. The book explains it this way:
"Consider though what the 1/2 means. The actual time to check each value
is highly dependent on the machine instruction that the code
translates to and then on the speed at which the CPU can execute the instructions. Therefore the 1/2 doesn't mean very much."
And my reaction is... huh? I literally have no clue what that says or more precisely what that statement has to do with their conclusion. Can somebody spell it out for me please.
Thanks for any help.
There's a distinction between "are these constants meaningful or relevant?" and "does big-O notation care about them?" The answer to that second question is "no," while the answer to that first question is "absolutely!"
Big-O notation doesn't care about constants because big-O notation only describes the long-term growth rate of functions, rather than their absolute magnitudes. Multiplying a function by a constant only influences its growth rate by a constant amount, so linear functions still grow linearly, logarithmic functions still grow logarithmically, exponential functions still grow exponentially, etc. Since these categories aren't affected by constants, it doesn't matter that we drop the constants.
That said, those constants are absolutely significant! A function whose runtime is 10100n will be way slower than a function whose runtime is just n. A function whose runtime is n2 / 2 will be faster than a function whose runtime is just n2. The fact that the first two functions are both O(n) and the second two are O(n2) doesn't change the fact that they don't run in the same amount of time, since that's not what big-O notation is designed for. O notation is good for determining whether in the long term one function will be bigger than another. Even though 10100n is a colossally huge value for any n > 0, that function is O(n) and so for large enough n eventually it will beat the function whose runtime is n2 / 2 because that function is O(n2).
In summary - since big-O only talks about relative classes of growth rates, it ignores the constant factor. However, those constants are absolutely significant; they just aren't relevant to an asymptotic analysis.
Big O notation is most commonly used to describe an algorithm's running time. In this context, I would argue that specific constant values are essentially meaningless. Imagine the following conversation:
Alice: What is the running time of your algorithm?
Bob: 7n2
Alice: What do you mean by 7n2?
What are the units? Microseconds? Milliseconds? Nanoseconds?
What CPU are you running it on? Intel i9-9900K? Qualcomm Snapdragon 845? (Or are you using a GPU, an FPGA, or other hardware?)
What type of RAM are you using?
What programming language did you implement the algorithm in? What is the source code?
What compiler / VM are you using? What flags are you passing to the compiler / VM?
What is the operating system?
etc.
So as you can see, any attempt to indicate a specific constant value is inherently problematic. But once we set aside constant factors, we are able to clearly describe an algorithm's running time. Big O notation gives us a robust and useful description of how long an algorithm takes, while abstracting away from the technical features of its implementation and execution.
Now it is possible to specify the constant factor when describing the number of operations (suitably defined) or CPU instructions an algorithm executes, the number of comparisons a sorting algorithm performs, and so forth. But typically, what we're really interested in is the running time.
None of this is meant to suggest that the real-world performance characteristics of an algorithm are unimportant. For example, if you need an algorithm for matrix multiplication, the Coppersmith-Winograd algorithm is inadvisable. It's true that this algorithm takes O(n2.376) time, whereas the Strassen algorithm, its strongest competitor, takes O(n2.808) time. However, according to Wikipedia, Coppersmith-Winograd is slow in practice, and "it only provides an advantage for matrices so large that they cannot be processed by modern hardware." This is usually explained by saying that the constant factor for Coppersmith-Winograd is very large. But to reiterate, if we're talking about the running time of Coppersmith-Winograd, it doesn't make sense to give a specific number for the constant factor.
Despite its limitations, big O notation is a pretty good measure of running time. And in many cases, it tells us which algorithms are fastest for sufficiently large input sizes, before we even write a single line of code.
Big-O notation only describes the growth rate of algorithms in terms of mathematical function, rather than the actual running time of algorithms on some machine.
Mathematically, Let f(x) and g(x) be positive for x sufficiently large.
We say that f(x) and g(x) grow at the same rate as x tends to infinity, if
now let f(x)=x^2 and g(x)=x^2/2, then lim(x->infinity)f(x)/g(x)=2. so x^2 and x^2/2 both have same growth rate.so we can say O(x^2/2)=O(x^2).
As templatetypedef said, hidden constants in asymptotic notations are absolutely significant.As an example :marge sort runs in O(nlogn) worst-case time and insertion sort runs in O(n^2) worst case time.But as the hidden constant factors in insertion sort is smaller than that of marge sort, in practice insertion sort can be faster than marge sort for small problem sizes on many machines.
You are completely right that constants matter. In comparing many different algorithms for the same problem, the O numbers without constants give you an overview of how they compare to each other. If you then have two algorithms in the same O class, you would compare them using the constants involved.
But even for different O classes the constants are important. For instance, for multidigit or big integer multiplication, the naive algorithm is O(n^2), Karatsuba is O(n^log_2(3)), Toom-Cook O(n^log_3(5)) and Schönhage-Strassen O(n*log(n)*log(log(n))). However, each of the faster algorithms has an increasingly large overhead reflected in large constants. So to get approximate cross-over points, one needs valid estimates of those constants. Thus one gets, as SWAG, that up to n=16 the naive multiplication is fastest, up to n=50 Karatsuba and the cross-over from Toom-Cook to Schönhage-Strassen happens for n=200.
In reality, the cross-over points not only depend on the constants, but also on processor-caching and other hardware-related issues.
Big O without constant is enough for algorithm analysis.
First, the actual time does not only depend how many instructions but also the time for each instruction, which is closely connected to the platform where the code runs. It is more than theory analysis. So the constant is not necessary for most case.
Second, Big O is mainly used to measure how the run time will increase as the problem becomes larger or how the run time decrease as the performance of hardware improved.
Third, for situations of high performance optimizing, constant will also be taken into consideration.
The time required to do a particular task in computers now a days does not required a large amount of time unless the value entered is very large.
Suppose we wants to multiply 2 matrices of size 10*10 we will not have problem unless we wants to do this operation multiple times and then the role of asymptotic notations becomes prevalent and when the value of n becomes very big then the constants don't really makes any difference to the answer and are almost negligible so we tend to leave them while calculating the complexity.
Time complexity for O(n+n) reduces to O(2n). Now 2 is a constant. So the time complexity will essentially depend on n.
Hence the time complexity of O(2n) equates to O(n).
Also if there is something like this O(2n + 3) it will still be O(n) as essentially the time will depend on the size of n.
Now suppose there is a code which is O(n^2 + n), it will be O(n^2) as when the value of n increases the effect of n will become less significant compared to effect of n^2.
Eg:
n = 2 => 4 + 2 = 6
n = 100 => 10000 + 100 => 10100
n = 10000 => 100000000 + 10000 => 100010000
As you can see the effect of the second expression as lesser effect as the value of n keeps increasing. Hence the time complexity evaluates to O(n^2).
Could it be done by keeping a counter to see how many iterations an algorithm goes through, or does the time duration need to be recorded?
The currently accepted won't give you any theoretical estimation, unless you are somehow able to fit the experimentally measured times with a function that approximates them. This answer gives you a manual technique to do that and fills that gap.
You start by guessing the theoretical complexity function of the algorithm. You also experimentally measure the actual complexity (number of operations, time, or whatever you find practical), for increasingly larger problems.
For example, say you guess an algorithm is quadratic. Measure (Say) the time, and compute the ratio of time to your guessed function (n^2):
for n = 5 to 10000 //n: problem size
long start = System.time()
executeAlgorithm(n)
long end = System.time()
long totalTime = end - start
double ratio = (double) time / (n * n)
end
. As n moves towards infinity, this ratio...
Converges to zero? Then your guess is too low. Repeat with something bigger (e.g. n^3)
Diverges to infinity? Then your guess is too high. Repeat with something smaller (e.g. nlogn)
Converges to a positive constant? Bingo! Your guess is on the money (at least approximates the theoretical complexity for as large n values as you tried)
Basically that uses the definition of big O notation, that f(x) = O(g(x)) <=> f(x) < c * g(x) - f(x) is the actual cost of your algorithm, g(x) is the guess you put, and c is a constant. So basically you try to experimentally find the limit of f(x)/g(x); if your guess hits the real complexity, this ratio will estimate the constant c.
Algorithm complexity is defined as (something like:)
the number of operations the algorithm does as a function
of its input size.
So you need to try your algorithm with various input sizes (i.e. for sort - try sorting 10 elements, 100 elements etc.), and count each operation (e.g. assignment, increment, mathematical operation etc.) the algorithm does.
This will give you a good "theoretical" estimation.
If you want real-life numbers on the other hand - use profiling.
As others have mentioned, the theoretical time complexity is a function of number of cpu operations done by your algorithm. In general processor time should be a good approximation for that modulo a constant. But the real run time may vary because of a number of reasons such as:
processor pipeline flushes
Cache misses
Garbage collection
Other processes on the machine
Unless your code is systematically causing some of these things to happen, with enough number of statistical samples, you should have a fairly good idea of the time complexity of your algorithm, based on observed runtime.
The best way would be to actually count the number of "operations" performed by your algorithm. The definition of "operation" can vary: for an algorithm such as quicksort, it could be the number of comparisons of two numbers.
You could measure the time taken by your program to get a rough estimate, but various factors could cause this value to differ from the actual mathematical complexity.
yes.
you can track both, actual performance and number of iterations.
Might I suggest using ANTS profiler. It will provide you this kind of detail while you run your app with "experimental" data.