Once I read in a scientific paper:
The computational complexity of the algorithm is O(N^d), where N
is the number of data, d is the dimension. Hence with fixed
dimension, the algorithm complexity is polynomial.
Now, this made me think, that (if I'm not mistaken), big-O notation is defined in the number of binary inputs. Thus if I fix the dimension of data, it is natural to arrive to polynomial solution. Moreover, if I would also fix N, the number of input, I would arrive to an O(1) solution, see the connected post:
Algorithm complexity with input is fix-sized
My question is, if you think that this is a valid argument for polynomial complexity? Can one really fix one dimension and the input data and claim polynomial complexity?
Yes, that's a reasonable thing to do.
It really depends on the initial problem, but in most cases I would say fixing number of dimensions is reasonable. I would expect the paper to claim something like "polynomial complexity for practical purposes" or something like that or have some arguments presented why limiting d is reasonable.
You can compare with a solution with complexity O(d^N) where fixing the number of dimensions doesn't mean that the solution is polynomial. So the one presented is clearly better when d is small.
As a quick recall from university time.
Big-O notation is just a UPPER bound of how your algorithm perform.
Mathematically, f(x) is O(g(x)) means that there exists a constant k>0 and x0 such that
f(x) <= kg(x) for all x>x0
To answer your question, you cannot fix the N, which is the independent variable.
If you fix N, says <100, we can surely arrive O(1),
because according to the definition. We can set a large K to ensure f(N) <= kG(N) for all x (<100)
This only works for some algorithms. It is not clear to me, what the "dimension" should be in some cases.
E.g. SubSetSum is is NP-complete, therefor there is no algorithm known with polynomial complexity. But the input is just N numbers. You could also see it as N numbers of bit length d. but the algorithm still has a polynomial complexity.
Same holds for Shortest Vector Problem (SVP) for lattices. The input is a N x N Basis (lets say with integer entries) and you look for the shortest non zero vector. This is also a hard problem and no algorithm with polynomial complexity is known yet.
For many problems its not just the size of the input data that makes the problem difficult, but certain properties or parameters of that data. E.g. many graph problems have complexity given in the number of nodes and edges separately.
Sometimes, the difference between this parameters might be dramatic, for example if you have something like O(n^d) the complexity is just polynomial when n grows, but exponential when d grows.
If you now happen to have an application, where you know that the value of a parameter like the dimension is always the same or there is a (small) maximal value, then regarding this parameter as fixed can give you useful inside. So statements like these are very common in scientific papers.
However, you can not just fix any parameter, e.g. your memory is finite, therefore sorting of data is constant time, because the bound on that parameter is so large that viewing it as fixed does not give you any useful insight.
So fixing all parameters is usually not an option because there has to be one aspect in which the size of your data varies. It can be an option if your complexity is very slow growing.
E.g. data structures with O(log n) operations are sometimes considered to have effectively constant complexity if the constant is also quite small. Or data structures as union-find-structures where amortized complexity of the operations is O(α(n)) where α is the inverse of the Ackermann-function, a function growing so slowly that it is impossible to get above 10 or for any size n any hardware imaginable could possible ever handle.
Related
If we have a random array of integers with size n such that we need to calculate the summation.
Claim: The best algorithm does that in O(n)
But, I claim we can do this in O(1). why?
We know for sure that n is locked in some field (since it's int and int is finite) which means I can sum all elements in less than 2,147,483,647 steps?
Why isn't every algorithm O(1)?
Because we choose to ignore the limitations (i.e. assume resources to be unlimited) of concrete computers when we analyse complexity of algorithms.
Asymptotic complexity gives us useful information about how the complexity grows. Merely concluding that there is a constant limit because of the hardware and ignoring how the limit is reached doesn't give us valuable information.
Besides, the limit that you perceive is much, much higher in reality. Computers aren't limited to representing at most 2'147'483'647 integer values. Using complex data-structures, computers can represent arbitrarily large numers - until memory runs out... but memory can be streamed from the disk. And there are data centers that easily provide hundreds of Tera Bytes of storage.
Although, to be fair: If we allow arbitrary length integers, then the complexity of the sum is worse than linear because even a single addition has linear complexity.
Once we take concrete hardware into consideration, it makes more sense to choose to use concrete units in the analysis: How many seconds does the program take for some concrete input on this particular hardware? And the way to find the answer is not math, but concrete measurement.
Why isn't every algorithm O(1)?
TL;DR: Because the Big O notation is used to quantify an algorithm, with regards of how it behaves with an increment of its input.
Informally, you can think about it as a framework that humans have invented to quantify classes of algorithms. If such framework would yield for every algorithm O(1), it would have defeated its own purpose in the first place i.e, quantify classes of algorithms.
More detailed answer
Let us start by clarifying what is Big O notation in the current context. From (source) one can read:
Big O notation is a mathematical notation that describes the limiting
behavior of a function when the argument tends towards a particular
value or infinity. (..) In computer science, big O notation is used to classify algorithms
according to how their run time or space requirements grow as the
input size grows.
The following statement is not accurate:
But, I claim we can do this in O(1). why? we know for sure that n is locked in some field (since it's int and int is finite) which means I can sum all elements in less than 2,147,483,647 steps?
One cannot simply perform "O(2,147,483,647)" and claim that "O(1)" since the Big O notation does not represent a function but rather a set of functions with a certain asymptotic upper-bound; as one can read from source:
Big O notation characterizes functions according to their growth
rates: different functions with the same growth rate may be
represented using the same O notation.
Informally, in computer-science time-complexity and space-complexity theories, one can think of the Big O notation as a categorization of algorithms with a certain worst-case scenario concerning time and space, respectively. For instance, O(n):
An algorithm is said to take linear time/space, or O(n) time/space, if its time/space complexity is O(n). Informally, this means that the running time/space increases at most linearly with the size of the input (source).
So the complexity is O(n) because with an increment of the input the complexity grows linear and not constant.
A theory where all algorithms have a complexity O(1) would be of little use, acknowledge that.
In the theory of complexity, N is an unbounded variable so that you do obtain a non-trivial asymptotic bound.
For practical algorithms, the asymptotic complexity is useful in that is does model the exact complexity for moderate values of N (well below the largest int).
In some pathological cases (such as the most sophisticated matrix multiplication algorithms), N must exceed the capacity of an int before the asymptotic behavior becomes beneficial.
Side remark:
I recall of a paper claiming O(1) time for a certain operation. That operation was involving an histogram of pixel values, the range of which is indeed constant. But this was a misleading trick, because the hidden constant was proportional to 256 for an 8 bits image, and 65536 for 16 bits, making the algorithm dead slow. Claiming O(H) where H is the number of bins would have been more informative and more honest.
You chooses what is N and in what runtime dependency you are interested. You can say:
"Well computers have limited ressources, so I simply consider those limits as constant factors and now all my algorithms are O(1)"
However, this view is of limited use, because we want to use complexity to classify algorithms to see which performs better and which worse, and a classification that puts all algorithms in the same bucket does not help with that.
Integers are countable infinite by definition. As such you can not proof termination on basis of n. If you redefine integers as an bounded interval of countable numbers you can claim O(1) if and only if n is such an integer literal.
IMO: The useful part of O-notation is the information about time complexity in relation to input. In a scenario where my input is bounded I just focus on the behavior within the bounds. And that is O(n) in this case. You can claim O(1) but this strips it of information.
Say something runs at n^0.5 vs log n. It's true that this obviously isn't fast (the log n beats it). However, what about n^0.1 or n^0.01? Would it still be preferable to go with the logarithmic algorithm?
I guess, how small should the exponent be to switch to exponential?
The exponent does not matter. It is n that matters.
No matter how small the exponent of an exponential-time-complexity algorithm is, the logarithmic-time-complexity algorithm will beat it if n is large enough.
So, it all depends on your n. Substitute a specific n, calculate the actual run-time cost of your exponential-time-complexity algorithm vs your logarithmic-time-complexity algorithm, and see who the winner is.
Asymptotic complexity can be a bit misleading.
A function that's proportional to log n will be less than one that's proportional to (say) n0.01 . . . once n gets large enough.
But for smaller values of n, all bets are off, because the constant of proportionality can play a large role. For example, sorting algorithms that have O(n2) worst-case complexity are often better choices, when n is known to be small, than sorting algorithms that have O(n log n), because the latter are typically more complicated and therefore have more overhead. It's only when n grows larger that the latter start to win out.
In general, performance decisions should be based on profiling and testing, rather than on purely mathematical arguments about what should theoretically be faster.
In general, given two sublinear algorithms you should choose the one with the smallest constant multiplier. Since complexity theory won't help you with that, you will have to write the programs as efficiently as possible and benchmark them. This necessity might lead you to choose the algorithm which is easier to code efficiently, which might also be a reasonable criterion.
This is of course not the case with superlinear functions, where large n exaggerate costs. But even then, you might find an algorithm whose theoretical efficiency is superior but which requires a very large n to be superior to a simpler algorithm, perhaps so large that it will never be tried.
You're talking about big O which tends to refer to how an algorithm scales as a function of inputs, as opposed to how fast it is in the absolute time sense. On certain data sets an algorithm with a worse big O, may perform much better in absolute time.
Let's say you have two algorithms, one is exactly O(n^0.1) and the other is exactly log(n)
Though the O(n^0.1) is worse, it takes until n is about equal to 100,000,000,000 for it to be surpassed by log(n).
An algorithm could plausibly have a sqrt(N) running time, but what would one with an even lower exponent look like? It is an obvious candidate for replacing a logarithmic method if such can be found, and that's where big O analysis ceases to be useful - it depends on N and the other costs of each operation, plus implementation difficulty.
First of all, exponential complexity is in the form of , where n is the exponent. Complexities below are generally called sub-linear.
Yes, from some finite n₀ onwards, general logarithm is always better than any . However, as you observe, this n₀ turns out to be quite large for smaller values of k. Therefore, if you expect your value of n to be reasonably small, the payload of the logarithm may be worse than the power, but here we move from theory to practice.
To see why should the logarithm be better, let's first look at some plots:
This curve looks mostly the same for all values of k, but may cross the n axis for smaller exponents. If its value is above 0, the logarithm is smaller and thus better. The point at which it crosses the axis for the second time (not in this graph) is where the logarithm becomes the best option for all values of n after that.
Notice the minimum - if the minimum exists and is below 0, we can assume that the function will eventually cross the axis for the second time and become in favour of the logarithm. So let's find the minimum, using the derivative.
From the nature of the function, this is the minimum and it always exists. Therefore, the function is rising from this point.
For k = 0.1, this turns out to be 10,000,000,000, and this is not even where the function crosses the n axis for the second time, only the minimum is there. So in practise, using the exponent is better than the logarithm at least up to this point (unless you have some constants there, of course).
I know that O(log n) refers to an iterative reduction by a fixed ratio of the problem set N (in big O notation), but how do i actually calculate it to see how many iterations an algorithm with a log N complexity would have to preform on the problem set N before it is done (has one element left)?
You can't. You don't calculate the exact number of iterations with BigO.
You can "derive" BigO when you have exact formula for number of iterations.
BigO just gives information how the number iterations grows with growing N, and only for "big" N.
Nothing more, nothing less. With this you can draw conclusions how much more operations/time will the algorithm take if you have some sample runs.
Expressed in the words of Tim Roughgarden at his courses on algorithms:
The big-Oh notation tries to provide a sweet spot for high level algorithm reasoning
That means it is intended to describe the relation between the algorithm time execution and the size of its input avoiding dependencies on the system architecture, programming language or chosen compiler.
Imagine that big-Oh notation could provide the exact execution time, that would mean that for any algorithm, for which you know its big-Oh time complexity function, you could predict how would it behave on any machine whatsoever.
On the other hand, it is centered on asymptotic behaviour. That is, its description is more accurate for big n values (that is why lower order terms of your algorithm time function are ignored in big-Oh notation). It can reasoned that low n values do not demand you to push foward trying to improve your algorithm performance.
Big O notation only shows an order of magnitude - not the actual number of operations that algorithm would perform. If you need to calculate exact number of loop iterations or elementary operations, you have to do it by hand. However in most practical purposes exact number is irrelevant - O(log n) tells you that num. of operations will raise logarythmically with a raise of n
From big O notation you can't tell precisely how many iteration will the algorithm do, it's just estimation. That means with small numbers the different between the log(n) and actual number of iterations could be differentiate significantly but the closer you get to infinity the different less significant.
If you make some assumptions, you can estimate the time up to a constant factor. The big assumption is that the limiting behavior as the size tends to infinity is the same as the actual behavior for the problem sizes you care about.
Under that assumption, the upper bound on the time for a size N problem is C*log(N) for some constant C. The constant will change depending on the base you use for calculating the logarithm. The base does not matter as long as you are consistent about it. If you have the measured time for one size, you can estimate C and use that to guesstimate the time for a different size.
For example, suppose a size 100 problem takes 20 seconds. Using common logarithms, C is 10. (The common log of 100 is 2). That suggests a size 1000 problem might take about 30 seconds, because the common log of 1000 is 3.
However, this is very rough. The approach is most useful for estimating whether an algorithm might be usable for a large problem. In that sort of situation, you also have to pay attention to memory size. Generally, setting up a problem will be at least linear in size, so its cost will grow faster than an O(log N) operation.
What is the use of Big-O notation in computer science if it doesn't give all the information needed?
For example, if one algorithm runs at 1000n and one at n, it is true that they are both O(n). But I still may make a foolish choice based on this information, since one algorithm takes 1000 times as long as the other for any given input.
I still need to know all the parts of the equation, including the constant, to make an informed choice, so what is the importance of this "intermediate" comparison? I end up loosing important information when it gets reduced to this form, and what do I gain?
What does that constant factor represent? You can't say with certainty, for example, that an algorithm that is O(1000n) will be slower than an algorithm that's O(5n). It might be that the 1000n algorithm loads all data into memory and makes 1,000 passes over that data, and the 5n algorithm makes five passes over a file that's stored on a slow I/O device. The 1000n algorithm will run faster even though its "constant" is much larger.
In addition, some computers perform some operations more quickly than other computers do. It's quite common, given two O(n) algorithms (call them A and B), for A to execute faster on one computer and B to execute faster on the other computer. Or two different implementations of the same algorithm can have widely varying runtimes on the same computer.
Asymptotic analysis, as others have said, gives you an indication of how an algorithm's running time varies with the size of the input. It's useful for giving you a good starting place in algorithm selection. Quick reference will tell you that a particular algorithm is O(n) or O(n log n) or whatever, but it's very easy to find more detailed information on most common algorithms. Still, that more detailed analysis will only give you a constant number without saying how that number relates to real running time.
In the end, the only way you can determine which algorithm is right for you is to study it yourself and then test it against your expected data.
In short, I think you're expecting too much from asymptotic analysis. It's a useful "first line" filter. But when you get beyond that you have to look for more information.
As you correctly noted, it does not give you information on the exact running time of an algorithm. It is mainly used to indicate the complexity of an algorithm, to indicate if it is linear in the input size, quadratic, exponential, etc. This is important when choosing between algorithms if you know that your input size is large, since even a 1000n algorithm well beat a 1.23 exp(n) algorithm for large enough n.
In real world algorithms, the hidden 'scaling factor' is of course important. It is therefore not uncommon to use an algorithm with a 'worse' complexity if it has a lower scaling factor. Many practical implementations of sorting algorithms are for example 'hybrid' and will resort to some 'bad' algorithm like insertion sort (which is O(n^2) but very simple to implement) for n < 10, while changing to quicksort (which is O(n log(n)) but more complex) for n >= 10.
Big-O tells you how the runtime or memory consumption of a process changes as the size of its input changes. O(n) and O(1000n) are both still O(n) -- if you double the size of the input, then for all practical purposes the runtime doubles too.
Now, we can have an O(n) algorithm and an O(n2) algorithm where the coefficient of n is 1000000 and the coefficient of n2 is 1, in which case the O(n2) algorithm would outperform the O(n) for smaller n values. This doesn't change the fact, however, that the second algorithm's runtime grows more rapidly than the first's, and this is the information that big-O tells us. There will be some input size at which the O(n) algorithm begins to outperform the O(n2) algorithm.
In addition to the hidden impact of the constant term, complexity notation also only considers the worst case instance of a problem.
Case in point, the simplex method (linear programming) has exponential complexity for all known implementations. However, the simplex method works much faster in practice than the provably polynomial-time interior point methods.
Complexity notation has much value for theoretical problem classification. If you want some more information on practical consequences check out "Smoothed Analysis" by Spielman: http://www.cs.yale.edu/homes/spielman
This is what you are looking for.
It's main purpose is for rough comparisons of logic. The difference of O(n) and O(1000n) is large for n ~ 1000 (n roughly equal to 1000) and n < 1000, but when you compare it to values where n >> 1000 (n much larger than 1000) the difference is miniscule.
You are right in saying they both scale linearly and knowing the coefficient helps in a detailed analysis but generally in computing the difference between linear (O(cn)) and exponential (O(cn^x)) performance is more important to note than the difference between two linear times. There is a larger value in the comparisons of runtime of higher orders such as and Where the performance difference scales exponentially.
The overall purpose of Big O notation is to give a sense of relative performance time in order to compare and further optimize algorithms.
You're right that it doesn't give you all information, but there's no single metric in any field that does that.
Big-O notation tells you how quickly the performance gets worse, as your dataset gets larger. In other words, it describes the type of performance curve, but not the absolute performance.
Generally, Big-O notation is useful to express an algorithm's scaling performance as it falls into one of three basic categories:
Linear
Logarithmic (or "linearithmic")
Exponential
It is possible to do deep analysis of an algorithm for very accurate performance measurements, but it is time consuming and not really necessary to get a broad indication of performance.
While answering to this question a debate began in comments about complexity of QuickSort. What I remember from my university time is that QuickSort is O(n^2) in worst case, O(n log(n)) in average case and O(n log(n)) (but with tighter bound) in best case.
What I need is a correct mathematical explanation of the meaning of average complexity to explain clearly what it is about to someone who believe the big-O notation can only be used for worst-case.
What I remember if that to define average complexity you should consider complexity of algorithm for all possible inputs, count how many degenerating and normal cases. If the number of degenerating cases divided by n tend towards 0 when n get big, then you can speak of average complexity of the overall function for normal cases.
Is this definition right or is definition of average complexity different ? And if it's correct can someone state it more rigorously than I ?
You're right.
Big O (big Theta etc.) is used to measure functions. When you write f=O(g) it doesn't matter what f and g mean. They could be average time complexity, worst time complexity, space complexities, denote distribution of primes etc.
Worst-case complexity is a function that takes size n, and tells you what is maximum number of steps of an algorithm given input of size n.
Average-case complexity is a function that takes size n, and tells you what is expected number of steps of an algorithm given input of size n.
As you see worst-case and average-case complexity are functions, so you can use big O to express their growth.
If you're looking for a formal definition, then:
Average complexity is the expected running time for a random input.
Let's refer Big O Notation in Wikipedia:
Let f and g be two functions defined on some subset of the real numbers. One writes f(x)=O(g(x)) as x --> infinity if ...
So what the premise of the definition states is that the function f should take a number as an input and yield a number as an output. What input number are we talking about? It's supposedly a number of elements in the sequence to be sorted. What output number could we be talking about? It could be a number of operations done to order the sequence. But stop. What is a function? Function in Wikipedia:
a function is a relation between a set of inputs and a set of permissible outputs with the property that each input is related to exactly one output.
Are we producing exacly one output with our prior defition? No, we don't. For a given size of a sequence we can get a wide variation of number of operations. So to ensure the definition is applicable to our case we need to reduce a set possible outcomes (number of operations) to a single value. It can be a maximum ("the worse case"), a minimum ("the best case") or an average.
The conclusion is that talking about best/worst/average case is mathematically correct and using big O notation without those in context of sorting complexity is somewhat sloppy.
On the other hand, we could be more precise and use big Theta notation instead of big O notation.
I think your definition is correct, but your conclusions are wrong.
It's not necessarily true that if the proportion of "bad" cases tends to 0, then the average complexity is equal to the complexity of the "normal" cases.
For example, suppose that 1/(n^2) cases are "bad" and the rest "normal", and that "bad" cases take exactly (n^4) operations, whereas "normal" cases take exactly n operations.
Then the average number of operations required is equal to:
(n^4/n^2) + n(n^2-1)/(n^2)
This function is O(n^2), but not O(n).
In practice, though, you might find that time is polynomial in all cases, and the proportion of "bad" cases shrinks exponentially. That's when you'd ignore the bad cases in calculating an average.
Average case analysis does the following:
Take all inputs of a fixed length (say n), sum up all the running times of all instances of this length, and build the average.
The problem is you will probably have to enumerate all inputs of length n in order to come up with an average complexity.