Bogosort and O(∞) - algorithm

The well known bogosort algorithm simply shuffles a deck until it is in order
while not inOrder(deck) do
shuffle(deck);
The complexity of this algorithm is O(∞).
First, is O(∞) well-defined? How can a function be within a constant factor of infinity?
Second, are there other well-known randomized algorithms that have this kind of worst case complexity? (of course no one would ever use bogosort...)
Finally, for a randomized algorithm, it seems to me that most of the time we can only speak about expected complexity. When does it make sense to use big-Oh with randomized algorithms?

O(∞) is an abuse of notation. If we're being strict about the notation, it cannot mean the growth is bounded by a constant factor of infinity, because infinity is not a real number.
If we accept this abuse of notation with its obvious meaning that the growth is "limited above by infinity" it becomes clear that it is too generic to be of much use. After all what function wouldn't be limited by infinity?
Because the worst-case running time of bogosort has no real upper bound, O(∞) is the only thing one can say about it with big-Oh notation, which, as we've seen, isn't really saying much.
But we can still use big-Oh notation when talking about randomized algorithms: we just need to analyse the things about it that have upper bounds. Bogosort has a best case, and that best case runs in O(n) time. And in average, it runs in O(n*n!) time.

Related

Is the Big-Omega time complexity of all search algorithms O(1)?

I understand that Big Omega defines the lower bound of s function (or best-case runtime).
Considering that almost every search algorithm could "luck out" and find the target element on the first iteration, would it be fair to say that its Big-Omega time complexity is O(1)?
I also understand that defining O(1) as the big Omega may not be useful -other lower bounds may be tighter, or closer to the evaluated function-, but the question is, is it correct?
I've found multiple sources claiming the linear search is Big-Omega O(n), even if some cases could complete in a single step, which is different from the best-case scenario as I understand it.
The lower bound (𝛺) is not the fastest answer a given algorithm can give.
The lower bound of a given problem is equal to the worst case scenario of the best algorithm that solves the problem. When doing complexity analysis, you should never forget that "luck" is always in the hands of the input (the instance the algorithm is trying to solve).
When trying to find a lower bound, you will imagine the "perfect algorithm" and you will try to "trap" it in a very hard case. Usually the algorithm is not defined and is only described based on its (hypotetical) performances. You would use arguments such as "If the ideal algorithm is that fast, it will not have this particular knowledge and will therefore fail on this particular instance, ie. the ideal algorithm doesn't exist". Replace ideal with the lower bound you are trying to prove.
For example, if we search the lower bound for the min-search problem in an unsorted array is 𝛺(n). The proof for this is quite trivial, and like most of the time, is made by contradiction. Basically, an algorithm A in o(n) will not see at least one item from the input array, if that item it did not saw was the minimum, A will fail. The contradiction proves that the problem is in 𝛺(n).
Maybe you can have a look at that answer I gave on a similar question.
The notations O, o, Θ, Ω, and ω are used in characterizing mathematical functions; for example, f(n) = n3 log n is in O(n4) and in Ω(n3).
So, the question is what mathematical functions we apply them to.
The mathematical functions that we tend to be interested in are things like "the worst-case time complexity of such-and-such algorithm, as a function of the size of its input", or "the average-case space complexity of such-and-such procedure, as a function of the largest element in its input". (Note: when we just say "the complexity of such-and-such algorithm", that's usually shorthand for its worst-case time complexity, as a function of some characteristic of its input that's hopefully obvious in context. Either way, it's still a mathematical function.)
We can use any of these notations in characterizing those functions. In particular, it's fine to use Ω in characterizing the worst case or average case.
We can also use any of these functions in characterizing things like "the best-case […]" — that's unusual, but there are times when it may be relevant. But, notably, we're not limited to Ω for that; just as we can use Ω in characterizing the worst case, we can also use O in characterizing the best case. It's all about what characterizations we're interested in.
You are confusing two different topics: Lower/upper bound, and worst-case/best-case time complexity.
The short answer to your question is: Yes, all search algorithms have a lower bound of Ω(1). Linear search (in the worst case, and on average) also has a lower bound of Ω(n), which is a stronger and more useful claim. The analogy is that 1 < π but also 3 < π, the latter being more useful. So in this sense, you are right.
However, your confusion seems to be between the notations for complexity classes (big-O, big-Ω, big-θ etc), and the concepts of best-case, worst-case, average case. The point is that the best case and the worst case time complexities of an algorithm are completely different functions, and you can use any of the notations above to describe any of them. (NB: Some claim that big-Ω automatically and exclusively describes best case time complexity and that big-O describes worst case, but this is a common misconception. They just describe complexity classes and you can use them with any mathematical functions.)
It is correct to say that the average time complexity linear search is Ω(n), because we are just talking about the function that describes its average time complexity. Its worst-case complexity is a different function, which happens not to be Ω(n), because as you say it can be constant-time.

isn't big Oh asymptotic notation {O(f(n)}the slowest runtime an algorithm can have?(it gives asymptotic upper bound which means slowest runtime)

i was reading a book called "Introduction to algorithms" and they were analyzing an algorithm called Strassen's algorithm for matrix multiplication and it said this-
""one might at first think that any matrix multiplication algorithm must take omega(n3)time, since the natural definition of matrix multiplication requires that many mul-tiplications. You would be incorrect, however: we have a way to multiply matrices in O(n3) time.""
isn't O(n3) time slower than omega(n3) time.
as omega gives asymtotic lower bound means fastest runtime.
than why the book say that we can do it in O(n3) like it is faster tha omega(n3) time.
First of all it is not true that, as people commonly seem to believe, Big-O is worst case, Big-Omega is best case, and Big-Theta is average case.
Big-O is an upper bound. We are often interested in an upper bound on the worst case so Big-O gets associated with worst case behavior, but we can also be interested in an upper bound on average case behavior, etc.
When we are using asymptotic notation applied to running times, "higher" functions are worse so upper bounds are good. If the algorithm has an upper bound, O(n^3), time this is better than it having a lower bound, Ω(n^3), because a lower bound means that it could be worse, could be slower, that it is no better than the lower bound.

Why big-Oh is not always a worst case analysis of an algorithm?

I am trying to learn analysis of algorithms and I am stuck with relation between asymptotic notation(big O...) and cases(best, worst and average).
I learn that the Big O notation defines an upper bound of an algorithm, i.e. it defines function can not grow more than its upper bound.
At first it sound to me as it calculates the worst case.
I google about(why worst case is not big O?) and got ample of answers which were not so simple to understand for beginner.
I concluded it as follows:
Big O is not always used to represent worst case analysis of algorithm because, suppose a algorithm which takes O(n) execution steps for best, average and worst input then it's best, average and worst case can be expressed as O(n).
Please tell me if I am correct or I am missing something as I don't have anyone to validate my understanding.
Please suggest a better example to understand why Big O is not always worst case.
Big-O?
First let us see what Big O formally means:
In computer science, big O notation is used to classify algorithms
according to how their running time or space requirements grow as the
input size grows.
This means that, Big O notation characterizes functions according to their growth rates: different functions with the same growth rate may be represented using the same O notation. Here, O means order of the function, and it only provides an upper bound on the growth rate of the function.
Now let us look at the rules of Big O:
If f(x) is a sum of several terms, if there is one with largest
growth rate, it can be kept, and all others omitted
If f(x) is a product of several factors, any constants (terms in the
product that do not depend on x) can be omitted.
Example:
f(x) = 6x^4 − 2x^3 + 5
Using the 1st rule we can write it as, f(x) = 6x^4
Using the 2nd rule it will give us, O(x^4)
What is Worst Case?
Worst case analysis gives the maximum number of basic operations that
have to be performed during execution of the algorithm. It assumes
that the input is in the worst possible state and maximum work has to
be done to put things right.
For example, for a sorting algorithm which aims to sort an array in ascending order, the worst case occurs when the input array is in descending order. In this case maximum number of basic operations (comparisons and assignments) have to be done to set the array in ascending order.
It depends on a lot of things like:
CPU (time) usage
memory usage
disk usage
network usage
What's the difference?
Big-O is often used to make statements about functions that measure the worst case behavior of an algorithm, but big-O notation doesn’t imply anything of the sort.
The important point here is we're talking in terms of growth, not number of operations. However, with algorithms we do talk about the number of operations relative to the input size.
Big-O is used for making statements about functions. The functions can measure time or space or cache misses or rabbits on an island or anything or nothing. Big-O notation doesn’t care.
In fact, when used for algorithms, big-O is almost never about time. It is about primitive operations.
When someone says that the time complexity of MergeSort is O(nlogn), they usually mean that the number of comparisons that MergeSort makes is O(nlogn). That in itself doesn’t tell us what the time complexity of any particular MergeSort might be because that would depend how much time it takes to make a comparison. In other words, the O(nlogn) refers to comparisons as the primitive operation.
The important point here is that when big-O is applied to algorithms, there is always an underlying model of computation. The claim that the time complexity of MergeSort is O(nlogn), is implicitly referencing an model of computation where a comparison takes constant time and everything else is free.
Example -
If we are sorting strings that are kk bytes long, we might take “read a byte” as a primitive operation that takes constant time with everything else being free.
In this model, MergeSort makes O(nlogn) string comparisons each of which makes O(k) byte comparisons, so the time complexity is O(k⋅nlogn). One common implementation of RadixSort will make k passes over the n strings with each pass reading one byte, and so has time complexity O(nk).
The two are not the same thing.  Worst-case analysis as other have said is identifying instances for which the algorithm takes the longest to complete (i.e., takes the most number of steps), then formulating a growth function using this.  One can analyze the worst-case time complexity using Big-Oh, or even other variants such as Big-Omega and Big-Theta (in fact, Big-Theta is usually what you want, though often Big-Oh is used for ease of comprehension by those not as much into theory).  One important detail and why worst-case analysis is useful is that the algorithm will run no slower than it does in the worst case.  Worst-case analysis is a method of analysis we use in analyzing algorithms.
Big-Oh itself is an asymptotic measure of a growth function; this can be totally independent as people can use Big-Oh to not even measure an algorithm's time complexity; its origins stem from Number Theory.  You are correct to say it is the asymptotic upper bound of a growth function; but the manner you prescribe and construct the growth function comes from your analysis.  The Big-Oh of a growth function itself means little to nothing without context as it only says something about the function you are analyzing.  Keep in mind there can be infinitely many algorithms that could be constructed that share the same time complexity (by the definition of Big-Oh, Big-Oh is a set of growth functions).
In short, worst-case analysis is how you build your growth function, Big-Oh notation is one method of analyzing said growth function.  Then, we can compare that result against other worst-case time complexities of competing algorithms for a given problem.  Worst-case analysis if done correctly yields the worst-case running time if done exactly (you can cut a lot of corners and still get the correct asymptotics if you use a barometer), and using this growth function yields the worst-case time complexity of the algorithm.  Big-Oh alone doesn't guarantee the worst-case time complexity as you had to make the growth function itself.  For instance, I could utilize Big-Oh notation for any other kind of analysis (e.g., best case, average case).  It really depends on what you're trying to capture.  For instance, Big-Omega is great for lower bounds.
Imagine a hypothetical algorithm that in best case only needs to do 1 step, in the worst case needs to do n2 steps, but in average (expected) case, only needs to do n steps. With n being the input size.
For each of these 3 cases you could calculate a function that describes the time complexity of this algorithm.
1 Best case has O(1) because the function f(x)=1 is really the highest we can go, but also the lowest we can go in this case, omega(1). Since Omega is equal to O (the upper bound and lower bound), we state that this function, in the best case, behaves like theta(1).
2 We could do the same analysis for the worst case and figure out that O(n2 ) = omega(n2 ) =theta(n2 ).
3 Same counts for the average case but with theta( n ).
So in theory you could determine 3 cases of an algorithm and for those 3 cases calculate the lower/upper/thight bounds. I hope this clears things up a bit.
https://www.google.co.in/amp/s/amp.reddit.com/r/learnprogramming/comments/3qtgsh/how_is_big_o_not_the_same_as_worst_case_or_big/
Big O notation shows how an algorithm grows with respect to input size. It says nothing of which algorithm is faster because it doesn't account for constant set up time (which can dominate if you have small input sizes). So when you say
which takes O(n) execution steps
this almost doesn't mean anything. Big O doesn't say how many execution steps there are. There are C + O(n) steps (where C is a constant) and this algorithm grows at rate n depending on input size.
Big O can be used for best, worst, or average cases. Let's take sorting as an example. Bubble sort is a naive O(n^2) sorting algorithm, but when the list is sorted it takes O(n). Quicksort is often used for sorting (the GNU standard C library uses it with some modifications). It preforms at O(n log n), however this is only true if the pivot chosen splits the array in to two equal sized pieces (on average). In the worst case we get an empty array one side of the pivot and Quicksort performs at O(n^2).
As Big O shows how an algorithm grows with respect to size, you can look at any aspect of an algorithm. Its best case, average case, worst case in both time and/or memory usage. And it tells you how these grow when the input size grows - but it doesn't say which is faster.
If you deal with small sizes then Big O won't matter - but an analysis can tell you how things will go when your input sizes increase.
One example of where the worst case might not be the asymptotic limit: suppose you have an algorithm that works on the set difference between some set and the input. It might run in O(N) time, but get faster as the input gets larger and knocks more values out of the working set.
Or, to get more abstract, f(x) = 1/x for x > 0 is a decreasing O(1) function.
I'll focus on time as a fairly common item of interest, but Big-O can also be used to evaluate resource requirements such as memory. It's essential for you to realize that Big-O tells how the runtime or resource requirements of a problem scale (asymptotically) as the problem size increases. It does not give you a prediction of the actual time required. Predicting the actual runtimes would require us to know the constants and lower order terms in the prediction formula, which are dependent on the hardware, operating system, language, compiler, etc. Using Big-O allows us to discuss algorithm behaviors while sidestepping all of those dependencies.
Let's talk about how to interpret Big-O scalability using a few examples. If a problem is O(1), it takes the same amount of time regardless of the problem size. That may be a nanosecond or a thousand seconds, but in the limit doubling or tripling the size of the problem does not change the time. If a problem is O(n), then doubling or tripling the problem size will (asymptotically) double or triple the amounts of time required, respectively. If a problem is O(n^2), then doubling or tripling the problem size will (asymptotically) take 4 or 9 times as long, respectively. And so on...
Lots of algorithms have different performance for their best, average, or worst cases. Sorting provides some fairly straightforward examples of how best, average, and worst case analyses may differ.
I'll assume that you know how insertion sort works. In the worst case, the list could be reverse ordered, in which case each pass has to move the value currently being considered as far to the left as possible, for all items. That yields O(n^2) behavior. Doubling the list size will take four times as long. More likely, the list of inputs is in randomized order. In that case, on average each item has to move half the distance towards the front of the list. That's less than in the worst case, but only by a constant. It's still O(n^2), so sorting a randomized list that's twice as large as our first randomized list will quadruple the amount of time required, on average. It will be faster than the worst case (due to the constants involved), but it scales in the same way. The best case, however, is when the list is already sorted. In that case, you check each item to see if it needs to be slid towards the front, and immediately find the answer is "no," so after checking each of the n values you're done in O(n) time. Consequently, using insertion sort for an already ordered list that is twice the size only takes twice as long rather than four times as long.
You are right, in that you can say certainly say that an algorithm runs in O(f(n)) time in the best or average case. We do that all the time for, say, quicksort, which is O(N log N) on average, but only O(N^2) worst case.
Unless otherwise specified, however, when you say that an algorithm runs in O(f(n)) time, you are saying the algorithm runs in O(f(n)) time in the worst case. At least that's the way it should be. Sometimes people get sloppy, and you will often hear that a hash table is O(1) when in the worst case it is actually worse.
The other way in which a big O definition can fail to characterize the worst case is that it's an upper bound only. Any function in O(N) is also in O(N^2) and O(2^N), so we would be entirely correct to say that quicksort takes O(2^N) time. We just don't say that because it isn't useful to do so.
Big Theta and Big Omega are there to specify lower bounds and tight bounds respectively.
There are two "different" and most important tools:
the best, worst, and average-case complexity are for generating numerical function over the size of possible problem instances (e.g. f(x) = 2x^2 + 8x - 4) but it is very difficult to work precisely with these functions
big O notation extract the main point; "how efficient the algorithm is", it ignore a lot of non important things like constants and ... and give you a big picture

Still sort of confused about Big O notation

So I've been trying to understand Big O notation as well as I can, but there are still some things I'm confused about. So I keep reading that if something is O(n), it usually is referring to the worst-case of an algorithm, but that it doesn't necessarily have to refer to the worst case scenario, which is why we can say the best-case of insertion sort for example is O(n). However, I can't really make sense of what that means. I know that if the worst-case is O(n^2), it means that the function that represents the algorithm in its worst case grows no faster than n^2 (there is an upper bound). But if you have O(n) as the best case, how should I read that as? In the best case, the algorithm grows no faster than n? What I picture is a graph with n as the upper bound, like
If the best case scenario of an algorithm is O(n), then n is the upper bound of how fast the operations of the algorithm grow in the best case, so they cannot grow faster than n...but wouldn't that mean that they can grow as fast as O(log n) or O(1), since they are below the upper bound? That wouldn't make sense though, because O(log n) or O(1) is a better scenario than O(n), so O(n) WOULDN'T be the best case? I'm so lost lol
Big-O, Big-Θ, Big-Ω are independent from worst-case, average-case, and best-case.
The notation f(n) = O(g(n)) means f(n) grows no more quickly than some constant multiple of g(n).
The notation f(n) = Ω(g(n)) means f(n) grows no more slowly than some constant multiple of g(n).
The notation f(n) = Θ(g(n)) means both of the above are true.
Note that f(n) here may represent the best-case, worst-case, or "average"-case running time of a program with input size n.
Furthermore, "average" can have many meanings: it can mean the average input or the average input size ("expected" time), or it can mean in the long run (amortized time), or both, or something else.
Often, people are interested in the worst-case running time of a program, amortized over the running time of the entire program (so if something costs n initially but only costs 1 time for the next n elements, it averages out to a cost of 2 per element). The most useful thing to measure here is the least upper bound on the worst-case time; so, typically, when you see someone asking for the Big-O of a program, this is what they're looking for.
Similarly, to prove a problem is inherently difficult, people might try to show that the worst-case (or perhaps average-case) running time is at least a certain amount (for example, exponential).
You'd use Big-Ω notation for these, because you're looking for lower bounds on these.
However, there is no special relationship between worst-case and Big-O, or best-case and Big-Ω.
Both can be used for either, it's just that one of them is more typical than the other.
So, upper-bounding the best case isn't terribly useful. Yes, if the algorithm always takes O(n) time, then you can say it's O(n) in the best case, as well as on average, as well as the worst case. That's a perfectly fine statement, except the best case is usually very trivial and hence not interesting in itself.
Furthermore, note that f(n) = n = O(n2) -- this is technically correct, because f grows more slowly than n2, but it is not useful because it is not the least upper bound -- there's a very obvious upper bound that's more useful than this one, namely O(n). So yes, you're perfectly welcome to say the best/worst/average-case running time of a program is O(n!). That's mathematically perfectly correct. It's just useless, because when people ask for Big-O they're interested in the least upper bound, not just a random upper bound.
It's also worth noting that it may simply be insufficient to describe the running-time of a program as f(n). The running time often depends on the input itself, not just its size. For example, it may be that even queries are trivially easy to answer, whereas odd queries take a long time to answer.
In that case, you can't just give f as a function of n -- it would depend on other variables as well. In the end, remember that this is just a set of mathematical tools; it's your job to figure out how to apply it to your program and to figure out what's an interesting thing to measure. Using tools in a useful manner needs some creativity, and math is no exception.
Informally speaking, best case has O(n) complexity means that when the input meets
certain conditions (i.e. is best for the algorithm performed), then the count of
operations performed in that best case, is linear with respect to n (e.g. is 1n or 1.5n or 5n).
So if the best case is O(n), usually this means that in the best case it is exactly linear
with respect to n (i.e. asymptotically no smaller and no bigger than that) - see (1). Of course,
if in the best case that same algorithm can be proven to perform at most c * log N operations
(where c is some constant), then this algorithm's best case complexity would be informally
denoted as O(log N) and not as O(N) and people would say it is O(log N) in its best case.
Formally speaking, "the algorithm's best case complexity is O(f(n))"
is an informal and wrong way of saying that "the algorithm's complexity
is Ω(f(n))" (in the sense of the Knuth definition - see (2)).
See also:
(1) Wikipedia "Family of Bachmann-Landau notations"
(2) Knuth's paper "Big Omicron and Big Omega and Big Theta"
(3)
Big Omega notation - what is f = Ω(g)?
(4)
What is the difference between Θ(n) and O(n)?
(5)
What is a plain English explanation of "Big O" notation?
I find it easier to think of O() as about ratios than about bounds. It is defined as bounds, and so that is a valid way to think of it, but it seems a bit more useful to think about "if I double the number/size of inputs to my algorithm, does my processing time double (O(n)), quadruple (O(n^2)), etc...". Thinking about it that way makes it a little bit less abstract - at least to me...

Is big-O notation a tool to do best, worst, & average case analysis of an algorithm?

Is big-O notation a tool to do best, worst, & average case analysis of an algorithm?
Or is big-O only for worst case analysis, since it is an upper bounding function?
It is Big O, because orders of magnitude are expressed like O(n), O(logN), etc.
The best, worst, and average cases of an algorithm can all be expressed with Big O notation.
For an example of this applied to sorting algorithms, see
http://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms
Note that an algorithm can be classified according to multiple, independent criteria such as memory use or CPU use. Often, there is a tradeoff between two or more criteria (e.g. an algorithm that uses little CPU may use quite a bit of memory).
Big "O" is a measure of asymptotic complexity, which is to say, roughly how an algorithm scales as N gets really large.
If best & worse converge to the same asymptotic complexity, you can use a single value - or you can figure them out seperately (for example, some sorting algorithms have completely different characteristics on sorted or almost-sorted data than on un-sorted data).
The notation itself doesn't convey this though, how you use it does.
... Or is big-O only for worst case analysis ...
If you give just one asymptotic complexity for an algorithm, it doesn't tell the reader whether (or how) the best and worst case differ from the average.
If you give best-case and worst-case complexity, it tells the reader how they differ.
By default, if a single value is listed, it is probably the average complexity which may (or may not) converge with the worst-case.

Resources