I am trying to understand computational complexity and how fast a computer can execute instructions depending on the algorithm being used. I found a tutorial on http://www.cs.cmu.edu/~pattis/15-1XX/15-200/lectures/aa/ where it shows different complexity classes and the time it takes to run each depending on how many items it is running.
For example, if a computer can compute 1 instruction every 10^-9 seconds, how do you get those running times. I've been trying to understand the table because it doesn't go too deep into the calculations.
For instance, O(1) - why is it 10^-7 seconds - shouldn't it just be 10^-9?
Also for the other run times i'm not sure how you get those values.
When you say 10^-9, it will mean that there is only one instruction for the given program. Usually its not.
Moreover, the table is giving some approximate information for comparison.
Another information which is given in initial paragraph is the actual program under evaluation to actually calculate the number of instructions executed.
Actually, it would be less confusing if the table had assumed that O(1) operation is exactly 1 operation. However, above the table it is saying "Then the following table gives us an intuitive idea of how running times for algorithms in different complexity classes changes with problem size." So the focus of the entries is on the change in running time as N grows.
So, I could explain the data on the table as follows. It assumes that in each complexity class if it takes a certain amount of time for N = 10 then how much it will change if N becomes 100, 1000, ... .
For example, in case of O(1), if you assume that it takes 10^-7 for N=10, it will take the same amount for larger N's because the complexity is independent from N.
Another row that might be confusing is O(N^3): you can say if it takes 10^-5 for N=10 then for N=100 this value is multiplied by 10^3, and so on. In fact for N=10 it does not exactly calculate 10^3 x 10^-9, but rather assumes some running time and only focuses on change in the running time when N becomes larger.
Related
If I have a program that runs over some data in O(n) time, can I semi-accurately guestimate the O(n^3) runtime from my O(n) run?
**O(n) = 5 million iterations # 2 minutes total runtime**
**O(n^2) = ??**
(5 million)^2 = 2.5+13
2.5+13 / 5 million = 5 million minutes
5 million / 60 = 83,333 hours = 3,472 days = 9.5 years
**O(n^3) = ??**
(5 million)^3 = 1.25e+20
1.25e+20 / 5 million = 2.5e+13 minutes
2.5e+13 / 60 = 416666666667 hours = 17361111111.1 days = 47,564,688 years
Technically knowing O(...) doesn't tell you anything about any execution time for specific finite inputs.
Practically, you can make an estimation for example in the way you did, but the caveat is that it will only give you the order-of-magnitude under the assumptions that 1. the constant scaling factor omitted in the O(...) notations is roughly 1 in whatever units you chose (number of iterations here) in both programs/algorithms and 2. that the input value is large enough so that higher-order terms omitted by the O(...) notation are not relevant anymore.
Whether these assumptions are good assumptions will depend on the particular programs/algorithms you are looking at. It is trivial to come up with examples where this is a really bad approximation, but there are also many cases where such an estimate may be reasonable.
If you just want to estimate whether the alternate program will execute in a non-absurd time frame (e.g. hours vs centuries) I think it will often be a good enough for that, assuming you did not choose a weird unit and assuming there is nothing in the program that would explicitly increase the asymptotic scaling, like e.g. an inner loop with exactly 10000000 iterations.
If I have a program that runs over some data in O(n) time, can I semi-accurately guestimate the O(n^3) runtime from my O(n) run?
No.
There is no the O(n3) runtime, nor either any the O(n) time. Asymptotic complexity speaks to how the behavior of a particular program or subprogram scales with input size. You can use that to estimate the performance of the same program for one input size from appropriate measurements of the performance of that program for other input sizes, but that does not give you any information about any other program's specific performance for a given input size.
In particular, your idea seems to be that the usually-ignored coefficient of the bounding function is a property of the machine, but this is not at all the case. The coefficient is mostly a property of the details of the program. If you estimate it for one program then you know it only for that program. Forget programs with different asymptotic complexity: two programs with the same asymptotic complexity can be constructed that have arbitrarily different absolute performance for any given input size.
Let's say you had an algorithm which had n^(-1/2) complexity, say a scientific algorithm where one sample doesn't give much information so it takes ages to process it, but many samples to cross-reference made it faster. Would you represent that as O(n^(-1/2))? Is that even possible theoretically? Tldr can you have an inverse exponential time complexity?
You could define O(n^(-0.5)) using this set:
O(n^(-0.5)) := {g(n) : There exist positive constants c and N such that 0<=g(n)<=cn^(-0.5), for n > N}.
The function n^(-1), for example, belongs to this set.
None of the elements of the set above, however, could be a an upper bound on the running time of an algorithm.
Note that for any constant c:
if: n>c^2 then: n^(-0.5)*c < 1.
This means that your algorithm do less than one simple operation for input large enough. Since it must execute a natural number of simple operation, we have that it does exactly 0 operations - nothing at all.
A decreasing running time doesn't make sense in practice (even less if it decreases to zero). If that existed, you would find ways to add dummy elements and increase N artificially.
But most algorithm have at least O(N) complexity (whenever every data element influences the final solution); even if not, just the representation of N gets longer and longer which will eventually increase the running time (like O(Log N)).
I am really confused with Big(O) notation. Is Big(O) machine dependent or machine independent ? (Machine in the sense the computer in which we run the Algorithm)
Will Sorting of 1000 numbers using quick sort in i3 processor and i7 processor be the same ? Why don't we consider the machine and it's processor speed when calculating the Time Complexity ? I am a neophyte in this stuff.
Big-O is a measure of scalability, not of speed. It shows you what effect on time and memory it has when you e.g. double the amount of data - does it double the execution, or quadruple it?
Whether you use i7 or i3, double is double. Whether a linear algorithm is fast or slow, double is double.
This also has another implication many people ignore. A complex algorithm such as O(n^3) can be faster than a simple algorithm such as O(n) for a given n that is below a certain limit. Example:
loop n times:
loop n times:
loop n times:
sleep 1 second
is O(n^3), as it has 3 nested loops.
loop n times:
sleep 10 seconds
is O(n), as it only has one loop. For n = 10 the first program executes for 1000 seconds, and the second one executes for only 100. So, O(n) is good! one would be tempted to say. But if you have n = 2, the first, complex program executes in only 8 seconds, while the second, simpler one executes for 20! Even for n = 3, the first executes in 27 seconds, the second one in 30. So while the n is low, a complex program might be able to outperform the simpler one. It's just that as n rises, the complex program gets slower much faster (if that makes sense) than a simple one. For n = 1000, the simple code has risen to only 10000 seconds, but the complex one is now 1000000000 seconds!
Also, this clearly shows you that complexity is not processor-dependent. A second is a second.
EDIT: Also, you might want to read this question, where Big-O is explained in a number of very high-quality answers.
Big(O) Notation is the method of calculating the complexity of an algorithm, and hence the relative time it will take to run. The same algorithm, for the same data, will run faster on a faster processor, but will still take the same number of operations. It's used as a way of evaluating the relative efficiency of different algorithms to achieve the same result.
Big O notation is not architecture-dependent in any way, it is a Mathematical construct.It is a very limited measure of algorithmic complexity, it only gives you a rough upper bound for how performance changes with data size.
Big(O) is alogorithm dependent. It's job is to help compare the relative costs of various algorithms, without the need to consider the machine dependencies.
Linear search though an array, on average will look at about 1/2 of the elements if it is found. for all practical purposes that is O(N/2) which is the same as O(1/2 * N). for compairson, you toss away the coefficient. hence it is O(N) for use.
A binary tree can hold N elements for searching as well. on agerage it will look though log base 2 (N) to find something, hence you will see it described as cost O(LN2(N)).
pop in small values for N, and there isn't a whole lot of difference between the algorithms. Pop in a large value of N, and it will be clear that the binary tree lookup is much faster.
Big(O) is not machine dependent. It is mathematical notation to denote complexity of an algorithm. Usually we use these notations in theory to compare algorithms performance.
I am not looking for an algorithm to the above question. I just want someone to comment on my answer.
I was asked the following question in an interview:
How to get top 100 numbers out of a large set of numbers (can't fit in
memory)
And this is what I said:
Divide the numbers in batches of 1000 each. Sort each batch in "O(1)" time. Total time taken is O(n) up till now. Now take 1st 100 numbers from 1st and 2nd batch (in O(1)). Take 1st 100 from the above computed nos and the 3rd batch and so on. This will take O(n) in total - so it is an O(n) algorithm.
The interviewer replies that sorting a batch of 1000 nos. won't take O(1) time and so won't picking out 1st 100 out of a batch and after a lot of discussion he said, he doesn't have problem with the algo taking O(n) time, he just has a problem with me saying that sorting the batch takes O(1) time.
My explanation was that 1000 doesn't depend on the input (n). Irrespective of what n is, I'll always make batches of 1000 nos. and if you have to calculate, the sorting takes O(1000*log 1000)) which is essentially O(1).
If you have to make proper calculations, it would be
1000*log 1000 to sort one batch
sort (n/1000) such batches
takes 1000 * log 1000 * n/1000 = O(n*log(1000)) time = O(n) time
I asked a lot of my friends also about this and although they agreed with me but partially.
So I wan't to know if my reasoning is 100% accurate (please criticize even if it is 99% correct).
Just remember, this post is not asking for the answer to the above posted question. I have already found a better answer at Retrieving the top 100 numbers from one hundred million of numbers
The interviewer is wrong, but it's useful to consider why. What you're saying is correct, but there is an unstated assumption that you depend on. Possibly, the interviewer is making a different assumption.
If we say that sorting 1000 numbers is O(1), we're being a bit informal. Specifically, what we mean is that, in the limit as N goes to infinity, there is a constant greater than or equal to the cost of sorting the 1000 numbers. Since the cost of sorting the fixed-size set is independent of N, the limit isn't going to depend on N, either. Thus, it's O(1) as N goes to infinity.
A generous interpretation is that the interviewer wanted you to treat the sorting step differently. You could be more precise and say that it was O(M*log(M)) as M goes to infinity (or M goes to N, if you prefer), with M representing the size of the batches of numbers. That would make an overall O(N*log(M)) for your approach, as N and M both approach infinity. Of course, that wasn't the limit you described.
Strictly speaking, it's meaningless to say that something is O(1) without specifying the limit. One usually doesn't need to bother for algorithms, because it's clear from the context: the limit commonly taken is as a single parameter approaches infinity. Your description is correct when considering only N, but you could consider more than just N.
It is indeed O(n) - but the constants are very high, especially considering you will need to read each element from the filesystem twice [once in the sort, and once in the second phase], and file system access, is much slower then memory access. Since this will probably be the bottleneck of the algorithm, your solution will probably run twice slower then using a priority-queue.
Note that for a constant top 100, even the naive solution is O(n):
for each i in range(1,100):
x <- find highest element
remove x from the list
append x to the solution
This solution is also O(n), since you have 100 iteration, in each iteration you need 2 traversals of the list [with some optimisations, 1 traversal per iteration can be done]. So, the total number of traversals is strictly smaller then 1000, and there are no more factors that depend on the size, thus the solution is O(n) - but it is definetly a terrible solution.
I think the interviewer meant that your solution - though O(n) has very large constants.
for example, say n = Integer.MAX_VALUE or 2^123 then O(log(n)) = 32 and 123 so a small integer. isn't it O(1) ?
what is the difference ? I think, the reason is O(1) is constant but O(log(n)) not. Any other ideas ?
If n is bounded above, then complexity classes involving n make no sense. There is no such thing as "in the limit as 2^123 approaches infinity", except in the old joke that "a pentagon approximates a circle, for sufficiently large values of 5".
Generally, when analysing the complexity of code, we pretend that the input size isn't bounded above by the resource limits of the machine, even though it is. This does lead to some slightly odd things going on around log n, since if n has to fit into a fixed-size int type, then log n has quite a small bound, so the bound is more likely to be useful/relevant.
So sometimes, we're analysing a slightly idealised version of the algorithm, because the actual code written cannot accept arbitrarily large input.
For example, your average quicksort formally uses Theta(log n) stack in the worst case, obviously so with the fairly common implementation that call-recurses on the "small" side of the partition and loop-recurses on the "big" side. But on a 32 bit machine you can arrange to in fact use a fixed-size array of about 240 bytes to store the "todo list", which might be less than some other function you've written based on an algorithm that formally has O(1) stack use. The morals are that implementation != algorithm, complexity doesn't tell you anything about small numbers, and any specific number is "small".
If you want to account for bounds, you could say that, for example, your code to sort an array is O(1) running time, because the array has to be below the size that fits in your PC's address space, and hence the time to sort it is bounded. However, you will fail your CS assignment if you do, and you won't be providing anyone with any useful information :-)
Obviously if you know that the input will always have a fixed number of elements, the algorithm will always run in constant time. Big-O notation is used to denote worse-case running time, which describes the limit when the number of elements grows infinitely large.
The difference is that n isn't fixed. The idea behind Big-O notation is to get an idea of how the size of the input effects the running time (or memory usage). So if an algorithm always takes the same amount of time, whether n = 1 or n = Integer.MAX_VALUE, we say it is O(1). If the algorithm takes a unit of time longer each time the input size doubles, then we say it is O(logn).
Edit: to answer your specific question on the difference between O(1) and O(logn), I'll give you an example. Let's say we want an algorithm that will find the min element in an unsorted array. One approach is to go through each element and keep track of the current min. Another approach is to sort the array and then return the first element.
The first algorithm is O(n), and the second algorithm is O(nlogn). So let's say we start with an array of 16 elements. The first algorithm will run in time 16, the second algorithm will run in time 16*4. If we increase it to 17, then it becomes 17 and 17*4. We might naively say that the second algorithm takes 4 times as long as the first algorithm (if we treat the logn component as constant).
But let's look at what happens when our array contains 2^32 elements. Now the first algorithm takes 2^32 time to complete, where our second algorithm takes 32*2^32 time to complete. It takes 32 times as long. Yes, it's a small difference, but it is still a difference. If the first algorithm takes 1 minute, the second algorithm will take over half an hour!
I think you will get a better idea if it is called O(n^0).
It is a scaling function depending on the input variable N. It is a function, not number, you should never assume any number for the variable N.
It is just like that you say that a function f(x) is 3 because f(100) = 3, it is wrong. It is a function, not any particular number. A constant function f(x) = 1 is still a function, it will never equal to another function g(x) = N, i.e. g(x)=f(x)
Its the growth rate that you want to look at. O(1) implies no growth at all. While O(logn) does have growth. Even though the growth is small it is still growth.
You’re not thinking big enough. Any algorithm that runs on a computer will either run forever or terminate after some small number of steps — since the computer is only a finite state machine, you cannot write algorithms that run for an arbitrary amount of time and then terminate. By that argument, Big-O notation is only theoretical and has no purpose in a real-life computer program. Even O(2^n) hits an upper limit at O(2^INT_MAX), which is equivalent to O(1).
Realistically, though, Big-O can help you out if you know the constant factors. Even if an algorithm has an upper bound of O(log n), and n can have 32 bits, that could mean the difference between a request taking 1 second and 32 seconds.
Big-O shows how running time (or memory, etc) changes as the size of problem changes.
When size of the problem gets 10 times bigger, an O(n) solution takes 10 times as long, an O(log(n)) solution takes a bit longer, and an O(1) solution takes the same time: O(1) means 'changes as fast as constant 1', but constants don't change.
Familiarize yourself with the big-O notation in a bit more detail.
There is a reason why you leave "O(n)" in, and consider to drop "O(log n)". They both are "constants": the former is less than 32, and the latter is less than 232. But you nevertheless have a natural feeling that you can't call O(n) O(1).
However, if log(n) < 32, it means that O(n*logn) algorithm works thirty two times slower than its O(n) version. Big enough to write "log*n"s?