How to apply computational complexity theory - algorithm

I have the basics down on computational complexity theory. I can understand why I might want to scale with one algorithm compared to another. Now that I'm there, how do I actually determine the complexity of the function I've created? How do I understand which functions to use, which one will scale better? How, for example, will I know that the Telephone Book binary search takes O(log n) notation, or that the Fibonacci sequence takes O(n^2) notation, outside of trial and error? How do I determine the complexity of a function in, for example, scikit-learn?
How do I actually apply this stuff?

The tasks performed in scikit-learn are highly computational that is why it is recommended to get a good GPU for running ML/DS related tasks. All of these tasks are run using cores/threads parallelly.
So It is hard to determine what are the actual complexities of these functions are, but what we can do is test run and check how much time it takes for a given length of the input.
Refer here for better understanding.

scikit-learn is heavily (but not only: a kd-tree for example is probably a more computer-science like algorithm) based on numerical-optimization and a classic computer-science focused treatment of computational-complexity is surely needed, but not enough.
For something like an interior-point solver or coordinate-descent based SVM-solver (two example of concepts behind ML-algorithms), both are iterative methods like nearly everything in num-opt, it's not needed to know how fast you can do binary-search, but more important to know, how many iterations your algorithm will need or more specific: how your algorithm moves through the optimization-space. This is pretty tough, depends on the data (e.g. eigenvalues of the hessian of the cost-function) and the proofs / analysis are heavily math-based: e.g. metric-theory.
Things are even worse when heuristics are in play (and those are very common).
So basically: you won't be able to do this for reasonable complex algorithms.
What you should do:
check the docs / sources which algorithm is used and find the underlying research-paper and the analysis to obtain something like "cubic in sample-space"
do empirical analysis with your data

Basically, you need to calculate the number of operations needed by the algorithm
depending on the input size. The Big O is then just the highest order part of the expression with constant factors ignored.
One does not care about the kind of operations (comparison, assignment, ...) as long as the time for the operation is constant.
For complex algorithms, that analysis can be difficult.
For binary search: with each search step, the number of values to be searched is reduced into its half. So twice the input requires one more search step (operation).
t(2n) = 1 + t(n). This results in t(n) = c ld n = O(log n), at least for the powers of two. For the other n, the expression is more complex but the highest order part is still c ld n.
For Fibonacci: the naive, recursive implementation requires you to calculate fib(n-1) and fib(n-2) for calculating fib(n). Hence you calculate fib(n-2) twice, fib(n-3) three times, fib(n-4) five times and so on (following the Fibonacci series itself). So the number of calculations to do is 1 + 1 + 2 + 3 + 5 + ... + fib(n - 1) = fib(n) - 1. Since we are interested in the asymptotic behavior (for big n), we can apply the asymptotic approximation formula:
This means naive, recursive Fibonacci is O(a^n), i.e. exponential complexity.
A better algorithm starts from the beginning of the Fibonacci series and calculates each number once. That's obviously O(n), as it takes n (or n - 2) equal steps.

Related

Shouldn't the time complexity of binary search be O(ceil(logn))?

This seems to be a common question and the answer can be found anywhere, but it's not the case: I cannot find an answer anywhere on the Internet. The point is, I've never seen anywhere asking a question about whether the time complexity might be O(ceil(logn)), I cannot figure it out, so I decided to ask a question here.
First, suppose I have a sorted array containing n numbers, and I want to search for a value in it using the binary search algorithm. The count of steps required in the worse case are listed below:
n
steps
1
1
2
2
3
2
4
3
5
3
6
3
7
3
8
4
9
4
10
4
As you can see, The steps required for an array of n numbers are ceil(log2n)(ceil(log2n) denotes the smallest integer that is greater than or equal to log2n). So I think the time complexity of binary search should be O(ceil(logn)), but according to Wikipedia, the time complexity should be O(logn), why? Is there something wrong?
As I have already explained in two other answers (see here and here), the Big-O notation is not what most people think it is. It neither tells you anything about the speed of an algorithm nor anything about the number of processing steps.
The only thing Big-O tells you is how the processing time of an algorithm will change if the number of input elements is changing. Does it stay constant? Does it raise linearly? Does it raise logarithmically? Does it raise quadratically? This is the only thing that Big-O is answering.
Thus O(5) is the same as O(1000000) as both simply mean constant which is typically written as O(1). And O(n + 100000) is the same as O(5n + 8) as both simple mean linear which is typically written as O(n).
I know that many people will say "Yes, but O(5n) is steeper than O(2n)" and this is absolutely correct but still they are both linear and Big-O is not about comparing two functions of linear complexity with each other but about categorizing functions into coarse categories. People just get confused by the fact that these categories are named after mathematical functions, so they believe any function may make sense to be used for Big-O notation but that isn't the case. Only functions with different characteristics do get an own Big-O notation.
The following overview is nowhere near complete but in practice mainly the following Big-O notations are relevant:
O(1) - constant
O(log log n) - double logarithmic
O(log n) - logarithmic
O((log n)^c), c > 1 - polylogarithmic
O(n^c), 0 < c < 1 - fractional power
O(n) - linear
O(n log n) = O(log n!) - linearithmic
O(n^2) - quadratic
O(n^c), c > 1 - polynomial
O(c^n), c > 1 - exponential
O(n!) - factorial
Instead of writing these as functions one could also have given each of them just a name but writing them as function has two advantages: People with some math background will immediately have the image of a graph in their head and it's easy to introduce new types without coming up with fancy names as long as you can mathematically describe their graphs.
O(ceil(log n)) and O(log n) both represent the same asymptotic complexity (logarithmic complexity).
Or loosely put : O(ceil(log n)) = O(log n)
What you're seeing here is the difference between two different ways of quantifying how much work an algorithm does. The first approach is to count up exactly how many times something happens, yielding a precise expression for the answer. For example, in this case you're noting that the number of rounds of binary search is ⌈lg (n+1)⌉. This precisely counts the number of rounds done in binary search in the worst case, and if you need to know that exact figure, this is a good piece of information to have.
However, in many cases we're aren't interested in the exact amount of work that an algorithm does and are instead more interested in how that algorithm is going to scale as the inputs get larger. For example, let's suppose that we run binary search on an array of size 103 and find that it takes 10μs to run. How long should we expect it to take to run on an input of size 106? We can work out the number of rounds that binary search will perform here in the worst case by plugging in n = 103 into our formula (11), and we could then try to work out how long we think each of those iterations is going to take (10μs / 11 rounds ≈ 0.91 μs / round), and then work out how many rounds we'll have with n = 106 (21) and multiply the number of rounds by the cost per round to get an estimate of about 19.1μs.
That's kind of a lot of work. Is there an easier way to do this?
That's where big-O notation comes in. When we say that the cost of binary search in the worst case is O(log n), we are not saying "the exact amount of work that binary search does is given by the expression log n." Instead, what we're saying is *the runtime of binary search, in the worst case, scales the same way that the function log n scales." Through properties of logarithms, we can see that log n2 = 2 log n, so if you square the size of the input to a function that grows logarithmically, it's reasonable to guess that the output of that function will roughly double.
In our case, if we know that the runtime of binary search on an input of size 103 is 10μs and we're curious to learn what the runtime will be on an input of size 106 = (103)2, we can make a pretty reasonable guess that it'll be about 20μs because we've squared the input. And if you look at the math I did above, that's really close to the figure we got by doing a more detailed analysis.
So in that sense, saying "the runtime of binary search is O(log n)" is a nice shorthand for saying "whatever the exact formula for the runtime of the function is, it'll scale logarithmically with the size of the input." That allows us to extrapolate from our existing data points without having to work out the leading coefficients or anything like that.
This is especially useful when you factor in that even though binary search does exactly ⌈lg (n+1)⌉ rounds, that expression doesn't exactly capture the full cost of performing the binary search. Each of those rounds might take a slightly different amount of time depending on how the comparison goes, and there's probably some setup work that gives an additive constant in, and the cost of each round in terms of real compute time can't be determined purely from the code itself. That's why we often use big-O notation when quantifying how fast algorithms run - it lets us capture what often ends up being the most salient details (how things will scale) without having to pin down all the mathematical particulars.

Are exponential algorithms (with very small exponents) faster than the general logarithmic algorithm?

Say something runs at n^0.5 vs log n. It's true that this obviously isn't fast (the log n beats it). However, what about n^0.1 or n^0.01? Would it still be preferable to go with the logarithmic algorithm?
I guess, how small should the exponent be to switch to exponential?
The exponent does not matter. It is n that matters.
No matter how small the exponent of an exponential-time-complexity algorithm is, the logarithmic-time-complexity algorithm will beat it if n is large enough.
So, it all depends on your n. Substitute a specific n, calculate the actual run-time cost of your exponential-time-complexity algorithm vs your logarithmic-time-complexity algorithm, and see who the winner is.
Asymptotic complexity can be a bit misleading.
A function that's proportional to log n will be less than one that's proportional to (say) n0.01 . . . once n gets large enough.
But for smaller values of n, all bets are off, because the constant of proportionality can play a large role. For example, sorting algorithms that have O(n2) worst-case complexity are often better choices, when n is known to be small, than sorting algorithms that have O(n log n), because the latter are typically more complicated and therefore have more overhead. It's only when n grows larger that the latter start to win out.
In general, performance decisions should be based on profiling and testing, rather than on purely mathematical arguments about what should theoretically be faster.
In general, given two sublinear algorithms you should choose the one with the smallest constant multiplier. Since complexity theory won't help you with that, you will have to write the programs as efficiently as possible and benchmark them. This necessity might lead you to choose the algorithm which is easier to code efficiently, which might also be a reasonable criterion.
This is of course not the case with superlinear functions, where large n exaggerate costs. But even then, you might find an algorithm whose theoretical efficiency is superior but which requires a very large n to be superior to a simpler algorithm, perhaps so large that it will never be tried.
You're talking about big O which tends to refer to how an algorithm scales as a function of inputs, as opposed to how fast it is in the absolute time sense. On certain data sets an algorithm with a worse big O, may perform much better in absolute time.
Let's say you have two algorithms, one is exactly O(n^0.1) and the other is exactly log(n)
Though the O(n^0.1) is worse, it takes until n is about equal to 100,000,000,000 for it to be surpassed by log(n).
An algorithm could plausibly have a sqrt(N) running time, but what would one with an even lower exponent look like? It is an obvious candidate for replacing a logarithmic method if such can be found, and that's where big O analysis ceases to be useful - it depends on N and the other costs of each operation, plus implementation difficulty.
First of all, exponential complexity is in the form of , where n is the exponent. Complexities below are generally called sub-linear.
Yes, from some finite n₀ onwards, general logarithm is always better than any . However, as you observe, this n₀ turns out to be quite large for smaller values of k. Therefore, if you expect your value of n to be reasonably small, the payload of the logarithm may be worse than the power, but here we move from theory to practice.
To see why should the logarithm be better, let's first look at some plots:
This curve looks mostly the same for all values of k, but may cross the n axis for smaller exponents. If its value is above 0, the logarithm is smaller and thus better. The point at which it crosses the axis for the second time (not in this graph) is where the logarithm becomes the best option for all values of n after that.
Notice the minimum - if the minimum exists and is below 0, we can assume that the function will eventually cross the axis for the second time and become in favour of the logarithm. So let's find the minimum, using the derivative.
From the nature of the function, this is the minimum and it always exists. Therefore, the function is rising from this point.
For k = 0.1, this turns out to be 10,000,000,000, and this is not even where the function crosses the n axis for the second time, only the minimum is there. So in practise, using the exponent is better than the logarithm at least up to this point (unless you have some constants there, of course).

How is pre-computation handled by complexity notation?

Suppose I have an algorithm that runs in O(n) for every input of size n, but only after a pre-computation step of O(n^2) for that given size n. Is the algorithm considered O(n) still, with O(n^2) amortized? Or does big O only consider one "run" of the algorithm at size n, and so the pre-computation step is included in the notation, making the true notation O(n+n^2) or O(n^2)?
It's not uncommon to see this accounted for by explicitly separating out the costs into two different pieces. For example, in the range minimum query problem, it's common to see people talk about things like an &langle;O(n2), O(1)&rangle;-time solution to the problem, where the O(n2) denotes the precomputation cost and the O(1) denotes the lookup cost. You also see this with string algorithms sometimes: a suffix tree provides an O(m)-preprocessing-time, O(n+z)-query-time solution to string searching, while Aho-Corasick string matching offers an O(n)-preprocessing-time, O(m+z)-query-time solution.
The reason for doing so is that the tradeoffs involved here really depend on the use case. It lets you quantitatively measure how many queries you're going to have to make before the preprocessing time starts to be worth it.
People usually care about the total time to get things done when they are talking about complexity etc.
Thus, if getting to the result R requires you to perform steps A and B, then complexity(R) = complexity(A) + complexity(B). This works out to be O(n^2) in your particular example.
You have already noted that for O analysis, the fastest growing term dominates the overall complexity (or in other words, in a pipeline, the slowest module defines the throughput).
However, complexity analysis of A and B will be typically performed in isolation if they are disjoint.
In summary, it's the amount of time taken to get the results that counts, but you can (and usually do) reason about the individual steps independent of one another.
There are cases when you cannot only specify the slowest part of the pipeline. A simple example is BFS, with the complexity O(V + E). Since E = O(V^2), it may be tempting to write the complexity of BFS as O(E) (since E > V). However, that would be incorrect, since there can be a graph with no edges! In those cases, you will still need to iterate over all the vertices.
The point of O(...) notation is not to measure how fast the algorithm is working, because in many specific cases O(n) can be significantly slower than, say O(n^3). (Imagine the algorithm which runs in 10^100 n steps vs. the one which runs in n^3 / 2 steps.) If I tell you that my algorithm runs in O(n^2) time, it tells you nothing about how long it will take for n = 1000.
The point of O(...) is to specify how the algorithm behaves when the input size grows. If I tell you that my algorithm runs in O(n^2) time, and it takes 1 second to run for n = 500, then you'll expect rather 4 seconds to for n = 1000, not 1.5 and not 40.
So, to answer your question -- no, the algorithm will not be O(n), it will be O(n^2), because if I double the input size the time will be multiplied by 4, not by 2.

Is `log(n)` base 10?

Still getting a grip on logarithms being the opposite of exponentials. (Would it also be correct to describe them as the inversion of exponentials?)
There are lots of great SO entries already on Big-O notation including O(log n) and QuickSort n(log n) specifically. Found some useful graphs as well.
In looking at Divide and Conquer algorithms, I'm coming across n log n, which I think is n multiplied by the value of log n. I often try concrete examples like 100 log 100, to help visualize what's going on in an abstract equation.
Just read that log n assumes base 10. Does n log n translate into:
"the number n multiplied by the amount 10 needs to be raised to the power of in order to equal the number n"?
So 100 log 100 equals 200 because 10 needs to be raised to the power of two to equal 100?
Does the base change as an algorithm iterates through a set? Does the base even matter if we're talking in abstractions anyway?
Yes, the base does change depending on the way it iterates, but it doesn't matter. As you might remember, changing the base of logarithms means multiplying them by a constant. Since you mentioned that you have read about Big-O notation, then you probably already know that constants do not make a difference (O(n) is the same as O(2n) or O(1000n)).
EDIT: to clarify something you said - "I'm coming across n log n, which I think is n multiplied by the value of log n". Yes, you are right. And if you want to know why it involves log n, then think of what algorithms like divide and conquer do - they split the input (in two halves or four quarters or ten tenths, depending on the algorithm) during each iteration. The question is "How many times can that input be split up before the algorithm ends?" So you look at the input and try to find how many times you can divide it by 2, or by 4, or by 10, until the operation is meaningless? (unless the purpose of the algorithm is to divide 0 as many times as possible) Now you can give yourself concrete examples, starting with easy stuff like "How many times can 8 be divided by 2?" or ""How many times can 1000 be divided by 10?"
You don't need to worry about the base - if you're dealing with algorithmic complexity, it doesn't matter which base you're in, because the difference is just a constant factor.
Fundamentally, you just need to know that log n means that as n increases exponentially, the running time (or space used) increases linearly. For example, if n=10 takes 1 minute, then n=100 would take 2 miuntes, and n=1000 would take 3 minutes - roughly. (It's usually in terms of upper bounds, with smaller factors ignored... but that's the general gist of it.)
n log n is just that log n multiplied by n - so the time or space taken increases "a bit faster than linearly", basically.
The base does not matter at all. In fact people tend to drop part of the operations, e.g. if one has O(n^4 + 2*n), this is often reduced to O(n^4). Only the most relevant power needs to be considered when comparing algorithms.
For the case of comparing two closely related algorithms, say O(n^4 + 2*n) against O(n^4 + 3*n), one needs to include the linear dependency in order to conserve the relevant information.
Consider a divide and conquer approach based on bisection: your base is 2, so you may talk about ld(n). On the other hand you use the O-notation to compare different algorithms by means of the same base. This being said, the difference between ld, ln, and log10 is just a matter of a general offset.
Logarithms and Exponents are inverse operations.
if
x^n = y
then
Logx(y) = n
For example,
10^3 = 1000
Log10 (1000) = 3
Divide and conquer algorithms work by dividing the problem into parts that are then solved as independent problems. There can also be a combination step that combines the parts. Most divide and conquer algorithms are base 2 which means they cut the problem in half each time. For example, Binary Search, works like searching a phone book for a name. You flip to the middle and say.. Is the name I'm looking for in the first half or last half? (before or after what you flipped to), then repeat. Every time you do this you divide the problem's size by 2. Therefore, it's base 2, not base 10.
Order notation is primarily only concerned with the "order" of the runtime because that is what is most important when trying to determine if a problem will be tractable (solvable in a reasonable amount of time).
Examples of different orders would be:
O(1)
O(n)
O(n * log n)
O(n^2 * log n)
O(n^3.5)
O(n!)
etc.
The O here stands for "big O notation" which is basically provides an upper bound on the growth rate of the function. Because we only care about the growth of the function for large inputs.. we typically ignore lower order terms for example
n^3 + 2 n^2 + 100 n
would be
O(n^3)
because n^3 is the largest order term it will dominate the growth function for large values of N.
When you see O(n * log n) people are just abbreviating... if you understand the algorithm it is typically Log base 2 because most algorithms cut the problem in half. However, it could be log base 3 for example if the algorithm cut the problem into thirds for example.
Note:
In either case, if you were to graph the growth function, it would appear as a logarithmic curve. But, of course a O(Log3 n) would be faster than O(Log2 n).
The reason you do not see O(log10 n) or O(log3 n) etc.. is that it just isn't that common for an algorithm to work better this way. In our phone book example you could split the pages into 3 separate thirds and compare inbetween 1-2 and 2-3. But, then you just made 2 comparisons and ended up knowing which 1/3 the name was in. However, if you just split it in half each time you'd know which 1/4 it was in which is more efficient.
In the vast set of programming languages I know, the function log() is intended to be base e=2.718281....
In mathematical books sometimes it means base "ten" and sometimes base "e".
As another answers pointed out, for the big-O notation does not matter, because, for all base x, the complexities O(log_x (n)) is the same as O(ln(n)) (here log_x means "logarithm in base x" and ln() means "logarithm in base e").
Finally, it's common that, in the analysis of several algorithms, it's more convenient consider that log() is, indeed, "logarithm in base 2". (I've seen some texts taking this approach). This is obviously related to the binary representation of numbers in the computers.

What is the purpose of Big-O notation in computer science if it doesn't give all the information needed?

What is the use of Big-O notation in computer science if it doesn't give all the information needed?
For example, if one algorithm runs at 1000n and one at n, it is true that they are both O(n). But I still may make a foolish choice based on this information, since one algorithm takes 1000 times as long as the other for any given input.
I still need to know all the parts of the equation, including the constant, to make an informed choice, so what is the importance of this "intermediate" comparison? I end up loosing important information when it gets reduced to this form, and what do I gain?
What does that constant factor represent? You can't say with certainty, for example, that an algorithm that is O(1000n) will be slower than an algorithm that's O(5n). It might be that the 1000n algorithm loads all data into memory and makes 1,000 passes over that data, and the 5n algorithm makes five passes over a file that's stored on a slow I/O device. The 1000n algorithm will run faster even though its "constant" is much larger.
In addition, some computers perform some operations more quickly than other computers do. It's quite common, given two O(n) algorithms (call them A and B), for A to execute faster on one computer and B to execute faster on the other computer. Or two different implementations of the same algorithm can have widely varying runtimes on the same computer.
Asymptotic analysis, as others have said, gives you an indication of how an algorithm's running time varies with the size of the input. It's useful for giving you a good starting place in algorithm selection. Quick reference will tell you that a particular algorithm is O(n) or O(n log n) or whatever, but it's very easy to find more detailed information on most common algorithms. Still, that more detailed analysis will only give you a constant number without saying how that number relates to real running time.
In the end, the only way you can determine which algorithm is right for you is to study it yourself and then test it against your expected data.
In short, I think you're expecting too much from asymptotic analysis. It's a useful "first line" filter. But when you get beyond that you have to look for more information.
As you correctly noted, it does not give you information on the exact running time of an algorithm. It is mainly used to indicate the complexity of an algorithm, to indicate if it is linear in the input size, quadratic, exponential, etc. This is important when choosing between algorithms if you know that your input size is large, since even a 1000n algorithm well beat a 1.23 exp(n) algorithm for large enough n.
In real world algorithms, the hidden 'scaling factor' is of course important. It is therefore not uncommon to use an algorithm with a 'worse' complexity if it has a lower scaling factor. Many practical implementations of sorting algorithms are for example 'hybrid' and will resort to some 'bad' algorithm like insertion sort (which is O(n^2) but very simple to implement) for n < 10, while changing to quicksort (which is O(n log(n)) but more complex) for n >= 10.
Big-O tells you how the runtime or memory consumption of a process changes as the size of its input changes. O(n) and O(1000n) are both still O(n) -- if you double the size of the input, then for all practical purposes the runtime doubles too.
Now, we can have an O(n) algorithm and an O(n2) algorithm where the coefficient of n is 1000000 and the coefficient of n2 is 1, in which case the O(n2) algorithm would outperform the O(n) for smaller n values. This doesn't change the fact, however, that the second algorithm's runtime grows more rapidly than the first's, and this is the information that big-O tells us. There will be some input size at which the O(n) algorithm begins to outperform the O(n2) algorithm.
In addition to the hidden impact of the constant term, complexity notation also only considers the worst case instance of a problem.
Case in point, the simplex method (linear programming) has exponential complexity for all known implementations. However, the simplex method works much faster in practice than the provably polynomial-time interior point methods.
Complexity notation has much value for theoretical problem classification. If you want some more information on practical consequences check out "Smoothed Analysis" by Spielman: http://www.cs.yale.edu/homes/spielman
This is what you are looking for.
It's main purpose is for rough comparisons of logic. The difference of O(n) and O(1000n) is large for n ~ 1000 (n roughly equal to 1000) and n < 1000, but when you compare it to values where n >> 1000 (n much larger than 1000) the difference is miniscule.
You are right in saying they both scale linearly and knowing the coefficient helps in a detailed analysis but generally in computing the difference between linear (O(cn)) and exponential (O(cn^x)) performance is more important to note than the difference between two linear times. There is a larger value in the comparisons of runtime of higher orders such as and Where the performance difference scales exponentially.
The overall purpose of Big O notation is to give a sense of relative performance time in order to compare and further optimize algorithms.
You're right that it doesn't give you all information, but there's no single metric in any field that does that.
Big-O notation tells you how quickly the performance gets worse, as your dataset gets larger. In other words, it describes the type of performance curve, but not the absolute performance.
Generally, Big-O notation is useful to express an algorithm's scaling performance as it falls into one of three basic categories:
Linear
Logarithmic (or "linearithmic")
Exponential
It is possible to do deep analysis of an algorithm for very accurate performance measurements, but it is time consuming and not really necessary to get a broad indication of performance.

Resources