Why don't we evaluate algorithms by their median-case complexity - algorithm

We have three ways to evaluate an algorithm:
Worst case
Best case
And avg case
The first tells us to look at the worst possible input for the algorithm, and evaluate it's performance.
The second tells us to look at the best input for our algorithm.
The last tells us to look at the avg case of input to the algorithm and so it may be a more accurate measure of an algorithm's performance.
Why aren't we considering an algorithm by it's median case, it's surly be a more accurate then avg-case or at least be complementary factor to it.
Because we look at an input that half of the possible input are below it and above it.
median gives the weight needed to the input that avg may not give.

Median doesn't really have very useful statistical properties.
One thing useful about average is that it becomes asymptotically more unlikely that you will get bad input.
Suppose that average run-time of your algorithm is f(n) in 60% cases and g(n) in 40% cases, where g(n) >> f(n). Then your median is Θ(f(n)), but your solution would often not fit in a time-slot for f(n) algorithm. However, even if probability for g(n) is a very small constant, average will still be Θ(g(n)) alerting you that the algorithm might run for a long time.
Other useful property of expected value is summing. If you have a number of tasks executed sequentially then average total run-time will be equal to a total of average run-times. This makes average easier to both derive and use. There is no similar property for medians.

Related

Why big-Oh is not always a worst case analysis of an algorithm?

I am trying to learn analysis of algorithms and I am stuck with relation between asymptotic notation(big O...) and cases(best, worst and average).
I learn that the Big O notation defines an upper bound of an algorithm, i.e. it defines function can not grow more than its upper bound.
At first it sound to me as it calculates the worst case.
I google about(why worst case is not big O?) and got ample of answers which were not so simple to understand for beginner.
I concluded it as follows:
Big O is not always used to represent worst case analysis of algorithm because, suppose a algorithm which takes O(n) execution steps for best, average and worst input then it's best, average and worst case can be expressed as O(n).
Please tell me if I am correct or I am missing something as I don't have anyone to validate my understanding.
Please suggest a better example to understand why Big O is not always worst case.
Big-O?
First let us see what Big O formally means:
In computer science, big O notation is used to classify algorithms
according to how their running time or space requirements grow as the
input size grows.
This means that, Big O notation characterizes functions according to their growth rates: different functions with the same growth rate may be represented using the same O notation. Here, O means order of the function, and it only provides an upper bound on the growth rate of the function.
Now let us look at the rules of Big O:
If f(x) is a sum of several terms, if there is one with largest
growth rate, it can be kept, and all others omitted
If f(x) is a product of several factors, any constants (terms in the
product that do not depend on x) can be omitted.
Example:
f(x) = 6x^4 − 2x^3 + 5
Using the 1st rule we can write it as, f(x) = 6x^4
Using the 2nd rule it will give us, O(x^4)
What is Worst Case?
Worst case analysis gives the maximum number of basic operations that
have to be performed during execution of the algorithm. It assumes
that the input is in the worst possible state and maximum work has to
be done to put things right.
For example, for a sorting algorithm which aims to sort an array in ascending order, the worst case occurs when the input array is in descending order. In this case maximum number of basic operations (comparisons and assignments) have to be done to set the array in ascending order.
It depends on a lot of things like:
CPU (time) usage
memory usage
disk usage
network usage
What's the difference?
Big-O is often used to make statements about functions that measure the worst case behavior of an algorithm, but big-O notation doesn’t imply anything of the sort.
The important point here is we're talking in terms of growth, not number of operations. However, with algorithms we do talk about the number of operations relative to the input size.
Big-O is used for making statements about functions. The functions can measure time or space or cache misses or rabbits on an island or anything or nothing. Big-O notation doesn’t care.
In fact, when used for algorithms, big-O is almost never about time. It is about primitive operations.
When someone says that the time complexity of MergeSort is O(nlogn), they usually mean that the number of comparisons that MergeSort makes is O(nlogn). That in itself doesn’t tell us what the time complexity of any particular MergeSort might be because that would depend how much time it takes to make a comparison. In other words, the O(nlogn) refers to comparisons as the primitive operation.
The important point here is that when big-O is applied to algorithms, there is always an underlying model of computation. The claim that the time complexity of MergeSort is O(nlogn), is implicitly referencing an model of computation where a comparison takes constant time and everything else is free.
Example -
If we are sorting strings that are kk bytes long, we might take “read a byte” as a primitive operation that takes constant time with everything else being free.
In this model, MergeSort makes O(nlogn) string comparisons each of which makes O(k) byte comparisons, so the time complexity is O(k⋅nlogn). One common implementation of RadixSort will make k passes over the n strings with each pass reading one byte, and so has time complexity O(nk).
The two are not the same thing.  Worst-case analysis as other have said is identifying instances for which the algorithm takes the longest to complete (i.e., takes the most number of steps), then formulating a growth function using this.  One can analyze the worst-case time complexity using Big-Oh, or even other variants such as Big-Omega and Big-Theta (in fact, Big-Theta is usually what you want, though often Big-Oh is used for ease of comprehension by those not as much into theory).  One important detail and why worst-case analysis is useful is that the algorithm will run no slower than it does in the worst case.  Worst-case analysis is a method of analysis we use in analyzing algorithms.
Big-Oh itself is an asymptotic measure of a growth function; this can be totally independent as people can use Big-Oh to not even measure an algorithm's time complexity; its origins stem from Number Theory.  You are correct to say it is the asymptotic upper bound of a growth function; but the manner you prescribe and construct the growth function comes from your analysis.  The Big-Oh of a growth function itself means little to nothing without context as it only says something about the function you are analyzing.  Keep in mind there can be infinitely many algorithms that could be constructed that share the same time complexity (by the definition of Big-Oh, Big-Oh is a set of growth functions).
In short, worst-case analysis is how you build your growth function, Big-Oh notation is one method of analyzing said growth function.  Then, we can compare that result against other worst-case time complexities of competing algorithms for a given problem.  Worst-case analysis if done correctly yields the worst-case running time if done exactly (you can cut a lot of corners and still get the correct asymptotics if you use a barometer), and using this growth function yields the worst-case time complexity of the algorithm.  Big-Oh alone doesn't guarantee the worst-case time complexity as you had to make the growth function itself.  For instance, I could utilize Big-Oh notation for any other kind of analysis (e.g., best case, average case).  It really depends on what you're trying to capture.  For instance, Big-Omega is great for lower bounds.
Imagine a hypothetical algorithm that in best case only needs to do 1 step, in the worst case needs to do n2 steps, but in average (expected) case, only needs to do n steps. With n being the input size.
For each of these 3 cases you could calculate a function that describes the time complexity of this algorithm.
1 Best case has O(1) because the function f(x)=1 is really the highest we can go, but also the lowest we can go in this case, omega(1). Since Omega is equal to O (the upper bound and lower bound), we state that this function, in the best case, behaves like theta(1).
2 We could do the same analysis for the worst case and figure out that O(n2 ) = omega(n2 ) =theta(n2 ).
3 Same counts for the average case but with theta( n ).
So in theory you could determine 3 cases of an algorithm and for those 3 cases calculate the lower/upper/thight bounds. I hope this clears things up a bit.
https://www.google.co.in/amp/s/amp.reddit.com/r/learnprogramming/comments/3qtgsh/how_is_big_o_not_the_same_as_worst_case_or_big/
Big O notation shows how an algorithm grows with respect to input size. It says nothing of which algorithm is faster because it doesn't account for constant set up time (which can dominate if you have small input sizes). So when you say
which takes O(n) execution steps
this almost doesn't mean anything. Big O doesn't say how many execution steps there are. There are C + O(n) steps (where C is a constant) and this algorithm grows at rate n depending on input size.
Big O can be used for best, worst, or average cases. Let's take sorting as an example. Bubble sort is a naive O(n^2) sorting algorithm, but when the list is sorted it takes O(n). Quicksort is often used for sorting (the GNU standard C library uses it with some modifications). It preforms at O(n log n), however this is only true if the pivot chosen splits the array in to two equal sized pieces (on average). In the worst case we get an empty array one side of the pivot and Quicksort performs at O(n^2).
As Big O shows how an algorithm grows with respect to size, you can look at any aspect of an algorithm. Its best case, average case, worst case in both time and/or memory usage. And it tells you how these grow when the input size grows - but it doesn't say which is faster.
If you deal with small sizes then Big O won't matter - but an analysis can tell you how things will go when your input sizes increase.
One example of where the worst case might not be the asymptotic limit: suppose you have an algorithm that works on the set difference between some set and the input. It might run in O(N) time, but get faster as the input gets larger and knocks more values out of the working set.
Or, to get more abstract, f(x) = 1/x for x > 0 is a decreasing O(1) function.
I'll focus on time as a fairly common item of interest, but Big-O can also be used to evaluate resource requirements such as memory. It's essential for you to realize that Big-O tells how the runtime or resource requirements of a problem scale (asymptotically) as the problem size increases. It does not give you a prediction of the actual time required. Predicting the actual runtimes would require us to know the constants and lower order terms in the prediction formula, which are dependent on the hardware, operating system, language, compiler, etc. Using Big-O allows us to discuss algorithm behaviors while sidestepping all of those dependencies.
Let's talk about how to interpret Big-O scalability using a few examples. If a problem is O(1), it takes the same amount of time regardless of the problem size. That may be a nanosecond or a thousand seconds, but in the limit doubling or tripling the size of the problem does not change the time. If a problem is O(n), then doubling or tripling the problem size will (asymptotically) double or triple the amounts of time required, respectively. If a problem is O(n^2), then doubling or tripling the problem size will (asymptotically) take 4 or 9 times as long, respectively. And so on...
Lots of algorithms have different performance for their best, average, or worst cases. Sorting provides some fairly straightforward examples of how best, average, and worst case analyses may differ.
I'll assume that you know how insertion sort works. In the worst case, the list could be reverse ordered, in which case each pass has to move the value currently being considered as far to the left as possible, for all items. That yields O(n^2) behavior. Doubling the list size will take four times as long. More likely, the list of inputs is in randomized order. In that case, on average each item has to move half the distance towards the front of the list. That's less than in the worst case, but only by a constant. It's still O(n^2), so sorting a randomized list that's twice as large as our first randomized list will quadruple the amount of time required, on average. It will be faster than the worst case (due to the constants involved), but it scales in the same way. The best case, however, is when the list is already sorted. In that case, you check each item to see if it needs to be slid towards the front, and immediately find the answer is "no," so after checking each of the n values you're done in O(n) time. Consequently, using insertion sort for an already ordered list that is twice the size only takes twice as long rather than four times as long.
You are right, in that you can say certainly say that an algorithm runs in O(f(n)) time in the best or average case. We do that all the time for, say, quicksort, which is O(N log N) on average, but only O(N^2) worst case.
Unless otherwise specified, however, when you say that an algorithm runs in O(f(n)) time, you are saying the algorithm runs in O(f(n)) time in the worst case. At least that's the way it should be. Sometimes people get sloppy, and you will often hear that a hash table is O(1) when in the worst case it is actually worse.
The other way in which a big O definition can fail to characterize the worst case is that it's an upper bound only. Any function in O(N) is also in O(N^2) and O(2^N), so we would be entirely correct to say that quicksort takes O(2^N) time. We just don't say that because it isn't useful to do so.
Big Theta and Big Omega are there to specify lower bounds and tight bounds respectively.
There are two "different" and most important tools:
the best, worst, and average-case complexity are for generating numerical function over the size of possible problem instances (e.g. f(x) = 2x^2 + 8x - 4) but it is very difficult to work precisely with these functions
big O notation extract the main point; "how efficient the algorithm is", it ignore a lot of non important things like constants and ... and give you a big picture

Big O of an algorithm that relies on convergence

I'm wondering if its possible to express the time complexity of an algorithm that relies on convergence using Big O notation.
In most algorithmic analysis I've seen, we evaluate our function's rate of growth based on input size.
In the case of an algorithm that has some convergence criteria (where we repeat an operation until some defined error metric is below a threshold, or the rate at which the error metric is changing is below some threshold), how can we measure the time complexity? The number of iterations required to converge and exit that loop seems difficult to reason about since the way an algorithm converges tends to be dependent on the content of the input rather than just it's size.
How can we represent the time complexity of an algorithm that relies on convergence in Big O notation?
In order to analyse an algorithm that relies on convergence, it seems that we have to prove something about the rate of convergence.
Convergence usually has a termination condition that checks if our error metric is below some threshold:
do {
// some operation with time complexity O(N)
} while (errorMetric > 0.01) // if this is false, we've reached convergence
Generally, we seek to define something about the algorithm's manner of convergence - usually by identifying that its a function of something.
For instance, we might be able to show that an algorithm's measure of error is a function of the number of iterations so that the error = 1 / 2^i, where i is the number of iterations.
This can be re-written in terms of the number of iterations like so: iterations = log(1 / E), where E is the desired error value.
Therefore, if we have an algorithm that performs some linear operation on each iteration of the convergence loop (as in the example above), we can surmise that our time complexity is O(N * log(1 / E)). Our function's rate of growth is dependent on the amount of error we're willing to tolerate, in addition to the input size.
So, if we're able to determine some property about the behaviour of convergence, such as if its a function of the error, or size of the input, then we can perform asymptotic analysis.
Take, for example, PageRank, an algorithm called power iteration is used in its computation, which is an algorithm that approximates the dominant eigenvector of a matrix. It seems possible that the rate of convergence can be shown to be a function of the first two eigenvalues (shown in the link).
Asymptotic notations don't rely on convergence.
According to CLRS book (Introduction to Algorithms Third Edition chapter 3 page 43):
When we look at input sizes large enough to make only the order of
growth of the running time relevant, we are studying the
asymptotic efficiency of algorithms.That is, we are concerned with how the running time of an algorithm increases with he size of
the input in the limit, as the size of the input increases without
bound. Usually, an algorithm that is asymptotically more efficient
will be the best choice for all but very small inputs.
You mentioned your code (or idea) has infinitive loop and continue to satisfy the condition and you named satisfying the condition convergence but in this meaning, convergence does not related to asymptotic notations like big O, because it must finish because a necessary condition for a code to be an algorithm is that it's iterations must finish. You need to make sure iterations of your code finish, so you can tell it the algorithm and can asymptotic analysis of it.
Another thing is it's true maybe sometime a result has more running time but another has less running time. It's not about asymptotic analysis. It's best case, worst case. We can show analyse of algorithms in best case or worst case by big O or other asymptotic notations. The most reliable of them is you analyse your algorithm in worst case. Finally, for analysis your code you should describe the step of your algorithm exactly.
From math point of view, the main problem is estimation of the Rate of convergence of used approach. I am not so familiar with numerical methods for speak fluently about higher than 1 Dimensions (matrixes and tensors you probably more interested in). But ley's take other example of Equation Solving than Bisection, already estimated above as O(log(1/e)).
Consider Newton method and assume we try to find one root with accuracy e=10e-8 for all float numbers. We have square as Rate of convergence, so we have approximately 2*log(float_range/e) cycle iterations, what's means the same as Bisection algorithmic complexity O(log(range/accuracy)), if we are able to calculate the derivative for constant time.
Hope, this example has a sense for you.

How to calculate O(log n) in big O notation?

I know that O(log n) refers to an iterative reduction by a fixed ratio of the problem set N (in big O notation), but how do i actually calculate it to see how many iterations an algorithm with a log N complexity would have to preform on the problem set N before it is done (has one element left)?
You can't. You don't calculate the exact number of iterations with BigO.
You can "derive" BigO when you have exact formula for number of iterations.
BigO just gives information how the number iterations grows with growing N, and only for "big" N.
Nothing more, nothing less. With this you can draw conclusions how much more operations/time will the algorithm take if you have some sample runs.
Expressed in the words of Tim Roughgarden at his courses on algorithms:
The big-Oh notation tries to provide a sweet spot for high level algorithm reasoning
That means it is intended to describe the relation between the algorithm time execution and the size of its input avoiding dependencies on the system architecture, programming language or chosen compiler.
Imagine that big-Oh notation could provide the exact execution time, that would mean that for any algorithm, for which you know its big-Oh time complexity function, you could predict how would it behave on any machine whatsoever.
On the other hand, it is centered on asymptotic behaviour. That is, its description is more accurate for big n values (that is why lower order terms of your algorithm time function are ignored in big-Oh notation). It can reasoned that low n values do not demand you to push foward trying to improve your algorithm performance.
Big O notation only shows an order of magnitude - not the actual number of operations that algorithm would perform. If you need to calculate exact number of loop iterations or elementary operations, you have to do it by hand. However in most practical purposes exact number is irrelevant - O(log n) tells you that num. of operations will raise logarythmically with a raise of n
From big O notation you can't tell precisely how many iteration will the algorithm do, it's just estimation. That means with small numbers the different between the log(n) and actual number of iterations could be differentiate significantly but the closer you get to infinity the different less significant.
If you make some assumptions, you can estimate the time up to a constant factor. The big assumption is that the limiting behavior as the size tends to infinity is the same as the actual behavior for the problem sizes you care about.
Under that assumption, the upper bound on the time for a size N problem is C*log(N) for some constant C. The constant will change depending on the base you use for calculating the logarithm. The base does not matter as long as you are consistent about it. If you have the measured time for one size, you can estimate C and use that to guesstimate the time for a different size.
For example, suppose a size 100 problem takes 20 seconds. Using common logarithms, C is 10. (The common log of 100 is 2). That suggests a size 1000 problem might take about 30 seconds, because the common log of 1000 is 3.
However, this is very rough. The approach is most useful for estimating whether an algorithm might be usable for a large problem. In that sort of situation, you also have to pay attention to memory size. Generally, setting up a problem will be at least linear in size, so its cost will grow faster than an O(log N) operation.

Algorithm Analysis: In practice, do coefficients of higher order terms matter?

Consider an^2 + bn + c. I understand that for large n, bn and c become insignificant.
I also understand that for large n, the differences between 2n^2 and n^2 are pretty insignificant compared to the differences between, say n^2 and n*log(n).
However, there is still an order of 2 difference between 2n^2 and n^2. Does this matter in practice? Or do people just think about algorithms without coefficients? Why?
The actual coefficients matter if you're interested in timing. But big-O isn't actually about timing, it's about scalability. When you see an algorithm described as O(n^2), you don't really know how long it will take to solve a problem of size n on a particular computer in a particular language with a particular compiler, but you know that a problem of size 2n should take about 4 times as long.
The reason you can ignore the coefficients is that if you consider the ratio of different size problems, the lower order terms' coefficients are asymptotically dominated, and the highest order term's coefficients cancel in the ratio.
We use time complexity analysis to help us estimate the time cost and understand how far we can go. For example, the lower bound time complexity for sorting algorithm is O(nlgn), it is proved in theory, and we should never try to design a algorithm better than this.
For the coefficient, in many case it's not easy to find a accurate number in theory, since it could be effect by the input data. But it doesn't mean it's not important. Quicksort is the most widely used sorting algorithm, since the coefficient of time complexity is really small, which is only 1.39NlgN for average case.
And another interesting fact about quicksort is that we all know that the worst case for quicksort will cost O(N^2). We can use Median of Medians algorithm to reduce the worst case time complexity of quicksort to O(NlgN), but we seldom use this version in practice. It's because that the coefficient of Median of Medians version is too big, which make it unusable.

Meaning of average complexity when using Big-O notation

While answering to this question a debate began in comments about complexity of QuickSort. What I remember from my university time is that QuickSort is O(n^2) in worst case, O(n log(n)) in average case and O(n log(n)) (but with tighter bound) in best case.
What I need is a correct mathematical explanation of the meaning of average complexity to explain clearly what it is about to someone who believe the big-O notation can only be used for worst-case.
What I remember if that to define average complexity you should consider complexity of algorithm for all possible inputs, count how many degenerating and normal cases. If the number of degenerating cases divided by n tend towards 0 when n get big, then you can speak of average complexity of the overall function for normal cases.
Is this definition right or is definition of average complexity different ? And if it's correct can someone state it more rigorously than I ?
You're right.
Big O (big Theta etc.) is used to measure functions. When you write f=O(g) it doesn't matter what f and g mean. They could be average time complexity, worst time complexity, space complexities, denote distribution of primes etc.
Worst-case complexity is a function that takes size n, and tells you what is maximum number of steps of an algorithm given input of size n.
Average-case complexity is a function that takes size n, and tells you what is expected number of steps of an algorithm given input of size n.
As you see worst-case and average-case complexity are functions, so you can use big O to express their growth.
If you're looking for a formal definition, then:
Average complexity is the expected running time for a random input.
Let's refer Big O Notation in Wikipedia:
Let f and g be two functions defined on some subset of the real numbers. One writes f(x)=O(g(x)) as x --> infinity if ...
So what the premise of the definition states is that the function f should take a number as an input and yield a number as an output. What input number are we talking about? It's supposedly a number of elements in the sequence to be sorted. What output number could we be talking about? It could be a number of operations done to order the sequence. But stop. What is a function? Function in Wikipedia:
a function is a relation between a set of inputs and a set of permissible outputs with the property that each input is related to exactly one output.
Are we producing exacly one output with our prior defition? No, we don't. For a given size of a sequence we can get a wide variation of number of operations. So to ensure the definition is applicable to our case we need to reduce a set possible outcomes (number of operations) to a single value. It can be a maximum ("the worse case"), a minimum ("the best case") or an average.
The conclusion is that talking about best/worst/average case is mathematically correct and using big O notation without those in context of sorting complexity is somewhat sloppy.
On the other hand, we could be more precise and use big Theta notation instead of big O notation.
I think your definition is correct, but your conclusions are wrong.
It's not necessarily true that if the proportion of "bad" cases tends to 0, then the average complexity is equal to the complexity of the "normal" cases.
For example, suppose that 1/(n^2) cases are "bad" and the rest "normal", and that "bad" cases take exactly (n^4) operations, whereas "normal" cases take exactly n operations.
Then the average number of operations required is equal to:
(n^4/n^2) + n(n^2-1)/(n^2)
This function is O(n^2), but not O(n).
In practice, though, you might find that time is polynomial in all cases, and the proportion of "bad" cases shrinks exponentially. That's when you'd ignore the bad cases in calculating an average.
Average case analysis does the following:
Take all inputs of a fixed length (say n), sum up all the running times of all instances of this length, and build the average.
The problem is you will probably have to enumerate all inputs of length n in order to come up with an average complexity.

Resources