Using worst/avg/best case for asymptotic analysis [closed] - algorithm

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I understand the worst/avg/best case are used to determine the complexity time of an algorithm into a function but how is that used in asymptotic analysis? I understand the upper/tight/lower bound(big O, big omega, big theta) are used to compare two functions and seeing what it's limit(growth) is in perspective to the other as n increases but i'm having trouble seeing the difference between worst/avg/best case big O and asymptotic analysis. What exactly do we get out of imputing our worst/avg/best case big O into the asymptotic analysis and measuring bounds? Would we use asymptotic analysis to specifically compare two algorithms of worst/avg/best case big O? If so do we use function f(n) for algorithm 1 and g(n) for algorithm 2 or do we have separate asymptotic analysis for each algorithm where algorithm 1 is f(n) and we try to find some c*g(n) such that => f(n) and such that c*g(n) <= f(n) and then do the same thing for algorithm 2. I'm not seeing the big picture here.

Since you want the big picture, let me try to give you the same.
Asymptotic analysis is used to study how the running time grows as size of input increases.This growth is studied in terms of the input size.Input size, which is usually denoted as N or M, it could mean anything from number of numbers(as in sorting), number of nodes(as in graphs) or even number of bits(as in multiplication of two numbers).
While dealing with asymptotic analysis our goal is find out which algorithm fares better in specific cases.Realize that an algorithm runs on quite varying times even for same sized inputs.To appreciate this, consider you are a sorting machine.You will be given a set of numbers and you need to sort them.If I give yuo a sorted list of numbers, you would have no work, and you are done already.If I gave you a reverse sorted list of numbers, imagine the number of operations you need to do to make the list sorted.Now that you see this, realize that we need a way of knowing what case the input would be?Would it be a best case?Would I get a worst case input?To answer this, we need some knowledge of the distribution of the input.Will it all be worst cases?Or would it be average cases?Or would it mostly be best cases?
The knowledge of the input distribution is fairly difficult to ascertain in most cases.Then we are left with two options.Either we can assume average case all the time and analyze our algorithm, or we could get a guarantee on the running case irrespective of the input distribution.The former is referred to as average case analysis, and to do such an analysis would require a formal definition of what makes an average case.Sometimes this is difficult to define and requires much mathematical insight.All the trouble is worth it, when you know that some algorithm runs much faster on the average case, compared to its worst case running time.There are several randomized algorithms that stand testimony to this.In such cases, doing an average case analysis reveals its practical applicability.
The latter, the worst case analysis is more often used since it provides a nice guarantee on the running time.In practice coming up with the worst case scenario is often fairly intuitive.Say you are the sorting machine, worst case is like reverse sorted array.What's the average case?
Yup, you are thinking, right?Not so intuitive.
The best case analysis is rarely used as one does not always get best cases.Still one can do such an analysis and find interesting behavior.
In conclusion, when we have a problem that we wanna solve, we come up with algorithms.Once we have an algorithm, we need to decide if it's of any practical use to our situation.If so we go ahead and shortlist the algorithms that can be applied, and compare them based on their time and space complexity.There could be more metrics for comparison, but these two are fundamental.One such metric could be ease of implementation.And depending on the situation at hand yu would employ either worst case analysis or average case analysis ir best case analysis.For example if you rarely have worst case scenarios, then its makes much more sense to carry out average case analysis.However if the performance of our code is of critical nature and we need to provide the output in a strict time limit, then its much more prudent to look at worst case analysis.Thus, the analysis that you make depends n the situation at hand, and with time, the intuition of which analysis to apply becomes second nature.
Please ask if you have more questions.
To know more about big-oh and the other notations read my answers here and here.

The Wikipedia article on quicksort provides a good example of how asymptotic analysis is used on best/average/worst case: it's got a worst case of O(n^2), an average case of O(n log n), and a best case of O(n log n), and if you're comparing this to another algorithm (say, heapsort) you would compare apples to apples, e.g. you'd compare quicksort's worst-case big-theta to heapsort's worst-case big-theta, or quicksort's space big-oh to heapsort's space big-oh.
You can also compare big-theta to big-oh if you're interested in upper bounds, or big-theta to big-omega if you're interested in lower bounds.
Big-omega is usually only of theoretical interest - you're far more likely to see analysis in terms of big-oh or big-theta.

What exactly do we get out of imputing our worst/avg/best case big O into the asymptotic analysis and measuring bounds?
It just gives an idea when you are comparing different approach for some problem. This will help you in comparing different approaches.
Would we use asymptotic analysis to specifically compare two algorithms of worst/avg/best case big O?
Generally only worse case gets more focus, compared to Big Omega and theta.
Yes, we use function f(n) for algorithm 1 and g(n) for algorithm. And these functions are Big O of their respective algorithms.

Related

Is the Big-Omega time complexity of all search algorithms O(1)?

I understand that Big Omega defines the lower bound of s function (or best-case runtime).
Considering that almost every search algorithm could "luck out" and find the target element on the first iteration, would it be fair to say that its Big-Omega time complexity is O(1)?
I also understand that defining O(1) as the big Omega may not be useful -other lower bounds may be tighter, or closer to the evaluated function-, but the question is, is it correct?
I've found multiple sources claiming the linear search is Big-Omega O(n), even if some cases could complete in a single step, which is different from the best-case scenario as I understand it.
The lower bound (𝛺) is not the fastest answer a given algorithm can give.
The lower bound of a given problem is equal to the worst case scenario of the best algorithm that solves the problem. When doing complexity analysis, you should never forget that "luck" is always in the hands of the input (the instance the algorithm is trying to solve).
When trying to find a lower bound, you will imagine the "perfect algorithm" and you will try to "trap" it in a very hard case. Usually the algorithm is not defined and is only described based on its (hypotetical) performances. You would use arguments such as "If the ideal algorithm is that fast, it will not have this particular knowledge and will therefore fail on this particular instance, ie. the ideal algorithm doesn't exist". Replace ideal with the lower bound you are trying to prove.
For example, if we search the lower bound for the min-search problem in an unsorted array is 𝛺(n). The proof for this is quite trivial, and like most of the time, is made by contradiction. Basically, an algorithm A in o(n) will not see at least one item from the input array, if that item it did not saw was the minimum, A will fail. The contradiction proves that the problem is in 𝛺(n).
Maybe you can have a look at that answer I gave on a similar question.
The notations O, o, Θ, Ω, and ω are used in characterizing mathematical functions; for example, f(n) = n3 log n is in O(n4) and in Ω(n3).
So, the question is what mathematical functions we apply them to.
The mathematical functions that we tend to be interested in are things like "the worst-case time complexity of such-and-such algorithm, as a function of the size of its input", or "the average-case space complexity of such-and-such procedure, as a function of the largest element in its input". (Note: when we just say "the complexity of such-and-such algorithm", that's usually shorthand for its worst-case time complexity, as a function of some characteristic of its input that's hopefully obvious in context. Either way, it's still a mathematical function.)
We can use any of these notations in characterizing those functions. In particular, it's fine to use Ω in characterizing the worst case or average case.
We can also use any of these functions in characterizing things like "the best-case […]" — that's unusual, but there are times when it may be relevant. But, notably, we're not limited to Ω for that; just as we can use Ω in characterizing the worst case, we can also use O in characterizing the best case. It's all about what characterizations we're interested in.
You are confusing two different topics: Lower/upper bound, and worst-case/best-case time complexity.
The short answer to your question is: Yes, all search algorithms have a lower bound of Ω(1). Linear search (in the worst case, and on average) also has a lower bound of Ω(n), which is a stronger and more useful claim. The analogy is that 1 < π but also 3 < π, the latter being more useful. So in this sense, you are right.
However, your confusion seems to be between the notations for complexity classes (big-O, big-Ω, big-θ etc), and the concepts of best-case, worst-case, average case. The point is that the best case and the worst case time complexities of an algorithm are completely different functions, and you can use any of the notations above to describe any of them. (NB: Some claim that big-Ω automatically and exclusively describes best case time complexity and that big-O describes worst case, but this is a common misconception. They just describe complexity classes and you can use them with any mathematical functions.)
It is correct to say that the average time complexity linear search is Ω(n), because we are just talking about the function that describes its average time complexity. Its worst-case complexity is a different function, which happens not to be Ω(n), because as you say it can be constant-time.

When (not how or why) to calculate Big O of an algorithm

I was asked this question in an interview recently and was curious as to what others thought.
"When should you calculate Big O?"
Most sites/books talk about HOW to calc Big O but not actually when you should do it. I'm an entry level developer and I have minimal experience so I'm not sure if I'm thinking on the right track. My thinking is you would have a target Big O to work towards, develop the algorithm then calculate the Big O. Then try to refactor the algorithm for efficiency.
My question then becomes is this what actually happens in industry or am I far off?
"When should you calculate Big O?"
When you care about the Time Complexity of the algorithm.
When do I care?
When you need to make your algorithm to be able to scale, meaning that it's expected to have big datasets as input to your algorithm (e.g. number of points and number of dimensions in a nearest neighbor algorithm).
Most notably, when you want to compare algorithms!
You are asked to do a task, for which several algorithms can be applied to. Which one do you choose? You compare the Space, Time and development/maintenance complexities of them, and choose the one that best fits your needs.
Big O or asymptotic notations allow us to analyze an algorithm's running time by identifying its behavior as the input size for the algorithm increases.
So whenever you need to analyse your algorithm's behavior with respect to growth of the input, you will calculate this. Let me give you an example -
Suppose you need to query over 1 billion data. So you wrote a linear search algorithm. So is it okay? How would you know? You will calculate Big-o. It's O(n) for linear search. So in worst case it would execute 1 billion instruction to query. If your machine executes 10^7 instruction per second(let's assume), then it would take 100 seconds. So you see - you are getting an runtime analysis in terms of growth of the input.
When we are solving an algorithmic problem we want to test the algorithm irrespective of hardware where we are running the algorithm. So we have certain asymptotic notation using which we can define the time and space complexities of our algorithm.
Theta-Notation: Used for defining average case complexity as it bounds the function from top and bottom
Omega-Notation: Bounds the function from below. It is used for best-time complexity
Big-O Notation: This is important as it tells about worst-case complexity and it bounds the function from top.
Now I think the answer to Why BIG-O is calculated is that using it we can get a fair idea that how bad our algorithm can perform as the size of input increases. And If we can optimize our algorithm for worst case then average and best case will take care for themselves.
I assume that you want to ask "when should I calculate time complexity?", just to avoid technicalities about Theta, Omega and Big-O.
Right attitude is to guess it almost always. Notable exceptions include piece of code you want to run just once and you are sure that it will never receive bigger input.
The emphasis on guess is because it does not matter that much whether complexity is constant or logarithmic. There is also a little difference between O(n^2) and O(n^2 log n) or between O(n^3) and O(n^4). But there is a big difference between constant and linear.
The main goal of the guess, is the answer to the question: "What happens if I get 10 times larger input?". If complexity is constant, nothing happens (in theory at least). If complexity is linear, you will get 10 times larger running time. If complexity is quadratic or bigger, you start to have problems.
Secondary goal of the guess is the answer to question: 'What is the biggest input I can handle?". Again quadratic will get you up to 10000 at most. O(2^n) ends around 25.
This might sound scary and time consuming, but in practice, getting time complexity of the code is rather trivial, since most of the things are either constant, logarithmic or linear.
It represents the upper bound.
Big-oh is the most useful because represents the worst-case behavior. So, it guarantees that the program will terminate within a certain time period, it may stop earlier, but never later.
It gives the worst time complexity or maximum time require to execute the algorithm

Big O based comparison of an algorithm's running time

This was during a Data Structures and Algorithms lesson that I encountered this doubt. I have looked up several resources,both online and offline, but still have doubts. I was asked to figure out the running time of an algorithm and I did that correctly - O(n^3) . But what puzzled me was this question - how much slower does the algorithm work if n goes from ,say, 90 to 900,000 ? My doubts were :
The algorithm has a running time of O(n^3) , so obviously it will take more time for a larger input. But how do I compare an algorithm's performance for different inputs based on just the worst case time complexity ?
Can we just plug in the values of 'n' and divide the big-O to get a ratio ?
Please correct me wherever I am mistaken ! Thanks !
Obviously,your thinking is correct!
As the algorithm goes for the size(n),IN THE WORST-CASE,if size is increased the speed will go inversely with n^3.
Your assumption is correct.You just divide after putting values for n=900,000 by n=90 and obtain the result.This factor will be the slowing factor!
Here,slowing factor = (900,000)^3 / (90)^3 = 10^12 !
Hence, slow factor=10^12 .Your program will slow down by a factor of 10^12 !!!Such a drastic change!Or in other words,your efficiency will fall 10^(-12) times!!!
EDIT based on suggested comments :-
But how do I compare an algorithm's performance for different inputs
based on just the worst case time complexity ?
As hinted by G.Bach,one of the commentator on this post,The basic idea underlying your question is itself contradictory!You should have talked about Big-Theta Notation,instead of Big-O to think of a general solution! Big-O is an upper-bound whereas Big-Theta is a tight bound. When people only worry about what's the worst that can happen, big-O is sufficient; i.e. it says that "it can't get much worse than this". The tighter the bound the better, of course, but a tight bound isn't always easy to compute.
So,in worst case analysis your question would be answered the way both of us have answered.Now,one narrow reason that people use O instead of Ω is to drop disclaimers about worst or average cases.But,for a tighter analysis you'll have to check for both O and big-Omega as well to frame a question. This question can suitably be found solution for varied size of n though.I leave it for you to figure out.But,if you have any doubts,PLEASE,feel free to comment.
Allso,your running time is not directly related to worst case analysis,but,somehow it is related and can be reframed this way!
P.S.---> I have taken some ideas/statements from Big-oh vs big-theta. So,THANKS to the authors too!
I hope it helps.Feel free to comment if any info skimmed!
Your understanding is correct.
If the only performance measure you have is worst-case time complexity, that's all you can compare. If you had more information you could make a better estimate.
Yes, just substitute the values of n and divide 900,0003 by 903 to get the ratio.

Right way to discuss computational complexity for small n

When discussing computational complexity, it seems everyone generally goes straight to Big O.
Lets say for example I have a hybrid algorithm such as merge sort which uses insertion sort for smaller subarrays (I believe this is called tiled merge sort). It's still ultimately merge sort with O(n log n), but I want to discuss the behaviour/characteristics of the algorithm for small n, in cases where no merging actually takes place.
For all intents and purposes the tiled merge sort is insertion sort, executing exactly the same instructions for the domain of my small n. However, Big O deals with the large and asymptotic cases and discussing Big O for small n is pretty much an oxymoron. People have yelled at me for even thinking the words "behaves like an O(n^2) algorithm in such cases". What is the correct way to describe the algorithm's behaviour in cases of small n within the context of formal theoretical computational analysis? To clarify, not just in the case where n is small, but in the case where n is never big.
One might say that for such small n it doesn't matter but I'm interested in the cases where it does, for example with a large constant such as being executed many times, and where in practice it would show a clear trend and be the dominant factor. For example the initial quadratic growth seen in the graph below. I'm not dismissing Big O, more asking for a way to properly tell both sides of the story.
[EDIT]
If for "small n", constants can easily remove all trace of a growth rate then either
only the asymptotic case is discussed, in which case there is less relevance to any practical application, or
there must be a threshold at which we agree n is no longer "small".
What about the cases where n is not "small" (n is sufficiently big that the growth rate will not to affected significantly by any practical constant), but not yet big enough to show the final asymptotic growth rate so only sub growth rates are visible (for example the shape in the image above)?
Are there no practical algorithms that exhibit this behaviour? Even if there aren't, theoretical discussion should still be possible. Do we measure instead of discussing the theory purely because that's "what one should do"? If some behaviour is observed in all practical cases, why can't there be theory that's meaningful?
Let me turn the question around the other way. I have a graph that shows segmented super-linear steps. It sounds like many people would say "this is a pure coincidence, it could be any shape imaginable" (at the extreme of course) and wouldn't bat an eyelid if it were a sine wave instead. I know in many cases the shape could be hidden by constants, but here it's quite obvious. How can I give a formal explanation of why the graph produces this shape?
I particularly like #Sneftel's words "imprecise but useful guidance".
I know Big O and asymptotic analysis isn't applicable. What is? How far can I take it?
Discuss in chat
For small n, computation complexity - how things change as n increases towards infinity - isn't meaningful as other effects dominate.
Papers I've seen which discuss behaviour for small values of n do so by measuring the algorithms on real systems, and discuss how the algorithms perform in practice rather than from a theoretical viewpoint. For example, for the graph you've added to your post I would say 'this graph demonstrates an O(N) asymptotic behaviour overall, but the growth within each tile is bounded quadratic'.
I don't know of a situation where a discussion of such behaviour from a theoretical viewpoint would be meaningful - it is well known that for small n the practical effects outweigh the effects of scaling.
It's important to remember that asymptotic analysis is an analytic simplification, not a mandate for analyzing algorithms. Take selection sort, for instance. Yes, it executes in O(n^2) time. But it is also true that it performs precisely n*(n-1)/2 comparisons, and n-1-k swaps, where k is the number of elements (other than the maximum) which start in the correct position. Asymptotic analysis is a tool for simplifying the (otherwise generally impractical) task of performance analysis, and one we can put aside if we're not interested in the "really big n" segment.
And you can choose how you express your bounds. Say a function requires precisely n + floor(0.01*2^n) operations. That's exponential time, of course. But one can also say "for data sizes up to 10 elements, this algorithm requires between n and 2*n operations." The latter describes not the shape of the curve, but an envelope around that curve, giving imprecise but useful guidance about the practicalities of the algorithm within a particular range of use cases.
You are right.
For small n, i.e. when only insertion sort is performed, the asymptotic behavior is quadratic O(n^2).
And for larger n, when tiled merge sort enters into play, the behavior switches to O(n.Log(n)).
There is no contradiction if you remember that every behavior has its domain of validity, before the switching threshold, let N, and after it.
In practice there will be a smooth blend between the curves around N. But in practice too, that value of N is so small that the quadratic behavior does not have enough "room" to manifest itself.
Another way to deal with this analysis is to say that N being a constant, the insertion sorts take constant time. But I would disagree to say that this is a must.
Let's unpack things a bit. Big-O is a tool for describing the growth rate of a function. One of the functions to which it is commonly applied is the worst-case running time of an algorithm on inputs of length n, executing on a particular abstract machine. We often forget about the last part because there is a large class of machines with random-access memory that can emulate one another with only constant-factor slowdown, and the class of problems solvable within a particular big-O running-time bound is equivalent across these machines.
If you want to talk about complexity on small inputs, then you need to measure constant factors. One way is to measure running times of actual implementations. People usually do this on physical machines, but if you're hardcore like Knuth, you invent your own assembly language complete with instruction timings. Another way is to measure something that's readily identifiable but also a useful proxy for the other work performed. For comparison sorts, this could be comparisons. For numerical algorithms, this is often floating-point operations.
Complexity is not about execution time for one n on one machine, so there is no need to consider it even if constant is large. Complexity tells you how the size of the input affects execution time. For small n, you can treat execution time as constant. This is the one side.
From the second side you are saying that:
You have a hybrid algorithm working in O(n log n) for n larger than some k and O(n^2) for n smaller than k.
The constant k is so large that algorithm works slowly.
There is no sense in such algorithm, because you could easily improve it.
Lets take Red-black tree. Operations on this tree are performed in O(n log n) time complexity, but there is a large constant. So, on normal machines, it could work slowly (i.e. slower than simpler solutions) in some cases. There is no need to consider it in analyzing complexity. You need to consider it when you are implementing it in your system: you need to check if it's the best choice considering the actual machine(s) on which it will be working and what problems it will be solving.
Read Knuth's "The Art of Computer Programming series", starting with "Volume 1. Fundamental Algorithms", section "1.2.10: Analysis of an Algorithm". There he shows (and in all the rest of his seminal work) how exact analysis can be conducted for arbitrary problem sizes, using a suitable reference machine, by taking a detailed census of every processor instruction.
Such analyses have to take into account not only the problem size, but also any relevant aspect of the input data distribution which will influence the running time. For simplification, the analysis are often limited to the study of the worst case, the expected case or the output-sensitive case, rather than a general statistical characterization. And for further simplification, asymptotic analysis is used.
Not counting the fact that except for trivial algorithms the detailed approach is mathematically highly challenging, it has become unrealistic on modern machines. Indeed, it relies on a processor behavior similar to the so-called RAM model, which assumes constant time per instruction and per memory access (http://en.wikipedia.org/wiki/Random-access_machine). Except maybe for special hardware combined to a hard real-time OS, these assumptions are nowadays completely wrong.
When you have an algorithm with a time complexity say O(n^2).And you also have an another algorithm with a time complexity, say O(n).Then from these two time complexity you can't conclude that the latter algorithm is faster than the former one for all input values.You can only say latter algorithm is asymptotically(means for sufficiently large input values)faster than the former one.Here you have to keep in mind the fact that in case of asymptotic notations constant factors are generally ignored to increase the understand-ability of the time complexity of the algorithm.As example: marge sort runs in O(nlogn) worst-case time and insertion sort runs in O(n^2) worst case time.But as the hidden constant factors in insertion sort is smaller than that of marge sort, in practice insertion sort can be faster than marge sort for small problem sizes on many machines.
Asymptotic notation does not describe the actual running-time of an algorithm.Actual running time is dependent on machine as different machine has different architecture and different Instruction Cycle Execution time.Asymptotic notation just describes asymptotically how fast an algorithm is with respect to other algorithms.But it does not describe the behavior of the algorithm in case of small input values(n<=no).The value of no (threshold) is dependent on the hidden constant factors and lower order terms.And hidden constant factors are dependent on the machine on which it will be executed.

Is big-O notation a tool to do best, worst, & average case analysis of an algorithm?

Is big-O notation a tool to do best, worst, & average case analysis of an algorithm?
Or is big-O only for worst case analysis, since it is an upper bounding function?
It is Big O, because orders of magnitude are expressed like O(n), O(logN), etc.
The best, worst, and average cases of an algorithm can all be expressed with Big O notation.
For an example of this applied to sorting algorithms, see
http://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms
Note that an algorithm can be classified according to multiple, independent criteria such as memory use or CPU use. Often, there is a tradeoff between two or more criteria (e.g. an algorithm that uses little CPU may use quite a bit of memory).
Big "O" is a measure of asymptotic complexity, which is to say, roughly how an algorithm scales as N gets really large.
If best & worse converge to the same asymptotic complexity, you can use a single value - or you can figure them out seperately (for example, some sorting algorithms have completely different characteristics on sorted or almost-sorted data than on un-sorted data).
The notation itself doesn't convey this though, how you use it does.
... Or is big-O only for worst case analysis ...
If you give just one asymptotic complexity for an algorithm, it doesn't tell the reader whether (or how) the best and worst case differ from the average.
If you give best-case and worst-case complexity, it tells the reader how they differ.
By default, if a single value is listed, it is probably the average complexity which may (or may not) converge with the worst-case.

Resources