Is somebody aware of a natural program or algorithm that has a non-monotone worst-case behavior?
By non-monotone worst-case behavior I mean that there is a natural number n such that the worst-case runtime for inputs of size n+1 is less than the worst-case runtime for inputs of size n.
Of course, it is easy to construct a program with this behavior. It might even be the case that this happens for small n (like n = 1) in natural programs. But I'm interested in a useful algorithm that is non-monotone for large n.
Is somebody aware of a natural program or algorithm that has a
non-monotone worst-case behavior?
Please define "natural program or algorithm". The concept "algorithm" has a definition I'm aware of, and there are certainly algorithms (as you correctly admit) which have non-monotone worst-case runtime complexity. Is a program "natural" if it does no unecessary work or has minimal runtime complexity for the class of problem it solves? In that case, would you argue that BubbleSort isn't an algorithm? More importantly, I can define a problem the most efficient solution to which has non-monotone worst-case behavior. Would such a problem be "unnatural"? What is your definition of a "natural problem"?
Of course, it is easy to construct a program with this behavior.
Then what's the real question? Until you commit to a definition of natural/useful algorithms and problems, your question has no answer. Are you interested only in pre-existing algorithms which people already use in the real world? If so, you should state that, and the problem becomes one of searching the literature. Frankly, I cannot imagine a reasonable definition of "natural, useful algorithm" which would preclude many examples of algorithms with non-monotone runtime complexity...
But I'm interested in a useful algorithm that is non-monotone for
large n.
Please define "useful algorithm". The concept "algorithm" has a definition I'm aware of, and there are certainly algorithms (as you correctly admit) which have non-monotone worst-case runtime complexity. Is an algorithm "useful" if it correctly solves some problem? I can easily define a problem which can be solved by an algorithm with non-monotone runtime complexity.
Think about a binary search.
When implementing binary search you need to think about the case where the array segment which you're splitting is of odd length. At that point you have 2 choices:
1. Round up/down
2. Check both indexes and make a decision before continuing.
If you choose the first case (lets assume you round down). For odd length arrays where the number you're searching for is the one passed the middle point, you'll have an extra iteration to make.
If that odd array was added one more element it would have saved you that extra iteration.
If you went for the second case, then most executions of the algorithm with more odd iterations then even would require more comparisons then if it was used with an extra element.
Note that all this is very implementation dependent, and so there can't be a real answer without a real algorithm (and moreover a real implementation).
Also all this is based on the assuming you're talking about actual run-time diff and not asymptotic diff. If that's not the case, then the answer would be "no". There is no algorithms with non-monotonic worst case asymptotic running time. That would defy the concept of "worst case".
Related
I understand that Big Omega defines the lower bound of s function (or best-case runtime).
Considering that almost every search algorithm could "luck out" and find the target element on the first iteration, would it be fair to say that its Big-Omega time complexity is O(1)?
I also understand that defining O(1) as the big Omega may not be useful -other lower bounds may be tighter, or closer to the evaluated function-, but the question is, is it correct?
I've found multiple sources claiming the linear search is Big-Omega O(n), even if some cases could complete in a single step, which is different from the best-case scenario as I understand it.
The lower bound (𝛺) is not the fastest answer a given algorithm can give.
The lower bound of a given problem is equal to the worst case scenario of the best algorithm that solves the problem. When doing complexity analysis, you should never forget that "luck" is always in the hands of the input (the instance the algorithm is trying to solve).
When trying to find a lower bound, you will imagine the "perfect algorithm" and you will try to "trap" it in a very hard case. Usually the algorithm is not defined and is only described based on its (hypotetical) performances. You would use arguments such as "If the ideal algorithm is that fast, it will not have this particular knowledge and will therefore fail on this particular instance, ie. the ideal algorithm doesn't exist". Replace ideal with the lower bound you are trying to prove.
For example, if we search the lower bound for the min-search problem in an unsorted array is 𝛺(n). The proof for this is quite trivial, and like most of the time, is made by contradiction. Basically, an algorithm A in o(n) will not see at least one item from the input array, if that item it did not saw was the minimum, A will fail. The contradiction proves that the problem is in 𝛺(n).
Maybe you can have a look at that answer I gave on a similar question.
The notations O, o, Θ, Ω, and ω are used in characterizing mathematical functions; for example, f(n) = n3 log n is in O(n4) and in Ω(n3).
So, the question is what mathematical functions we apply them to.
The mathematical functions that we tend to be interested in are things like "the worst-case time complexity of such-and-such algorithm, as a function of the size of its input", or "the average-case space complexity of such-and-such procedure, as a function of the largest element in its input". (Note: when we just say "the complexity of such-and-such algorithm", that's usually shorthand for its worst-case time complexity, as a function of some characteristic of its input that's hopefully obvious in context. Either way, it's still a mathematical function.)
We can use any of these notations in characterizing those functions. In particular, it's fine to use Ω in characterizing the worst case or average case.
We can also use any of these functions in characterizing things like "the best-case […]" — that's unusual, but there are times when it may be relevant. But, notably, we're not limited to Ω for that; just as we can use Ω in characterizing the worst case, we can also use O in characterizing the best case. It's all about what characterizations we're interested in.
You are confusing two different topics: Lower/upper bound, and worst-case/best-case time complexity.
The short answer to your question is: Yes, all search algorithms have a lower bound of Ω(1). Linear search (in the worst case, and on average) also has a lower bound of Ω(n), which is a stronger and more useful claim. The analogy is that 1 < π but also 3 < π, the latter being more useful. So in this sense, you are right.
However, your confusion seems to be between the notations for complexity classes (big-O, big-Ω, big-θ etc), and the concepts of best-case, worst-case, average case. The point is that the best case and the worst case time complexities of an algorithm are completely different functions, and you can use any of the notations above to describe any of them. (NB: Some claim that big-Ω automatically and exclusively describes best case time complexity and that big-O describes worst case, but this is a common misconception. They just describe complexity classes and you can use them with any mathematical functions.)
It is correct to say that the average time complexity linear search is Ω(n), because we are just talking about the function that describes its average time complexity. Its worst-case complexity is a different function, which happens not to be Ω(n), because as you say it can be constant-time.
Shouldn't all recursive algorithm have worst case (Space) of O(inf) due to potential overflow?
Take Fibonacci algorithm What is the space complexity of a recursive fibonacci algorithm?
The answer is O(N) but what if the input is large and stack overflow occurs. Then the worst case is O(inf)
Short answer: No.
Slightly longer answer: As everyone mentioned in the comments, Big-O notation assumes infinite resources, so issues of implementation (in particular hardware related ones) have no bearing. It's only a description of the algorithm and how it hypothetically would scale if you give it arbitrarily large inputs.
That said, if you WERE to include hardware constraints in the definition, the answer would still be no, since Big-O is concerned with processor cycles, or iterations, or tasks -- that sort of measurement of doing something. If you get too deep in a recursive algorithm such that you run out of stack, you're gonna stop performing any algorithm-related operations whatsoever. It'll just stop and you won't be "doing anything" anymore. Nothing more to measure, and certainly not infinite.
And anyway, the point of Big-O is to be useful and help with understanding the theory of the algorithm. Saying "everything is Big-O of infinity", even if it were right, wouldn't accomplish any of that. It'd just be a loophole in the definition.
When discussing computational complexity, it seems everyone generally goes straight to Big O.
Lets say for example I have a hybrid algorithm such as merge sort which uses insertion sort for smaller subarrays (I believe this is called tiled merge sort). It's still ultimately merge sort with O(n log n), but I want to discuss the behaviour/characteristics of the algorithm for small n, in cases where no merging actually takes place.
For all intents and purposes the tiled merge sort is insertion sort, executing exactly the same instructions for the domain of my small n. However, Big O deals with the large and asymptotic cases and discussing Big O for small n is pretty much an oxymoron. People have yelled at me for even thinking the words "behaves like an O(n^2) algorithm in such cases". What is the correct way to describe the algorithm's behaviour in cases of small n within the context of formal theoretical computational analysis? To clarify, not just in the case where n is small, but in the case where n is never big.
One might say that for such small n it doesn't matter but I'm interested in the cases where it does, for example with a large constant such as being executed many times, and where in practice it would show a clear trend and be the dominant factor. For example the initial quadratic growth seen in the graph below. I'm not dismissing Big O, more asking for a way to properly tell both sides of the story.
[EDIT]
If for "small n", constants can easily remove all trace of a growth rate then either
only the asymptotic case is discussed, in which case there is less relevance to any practical application, or
there must be a threshold at which we agree n is no longer "small".
What about the cases where n is not "small" (n is sufficiently big that the growth rate will not to affected significantly by any practical constant), but not yet big enough to show the final asymptotic growth rate so only sub growth rates are visible (for example the shape in the image above)?
Are there no practical algorithms that exhibit this behaviour? Even if there aren't, theoretical discussion should still be possible. Do we measure instead of discussing the theory purely because that's "what one should do"? If some behaviour is observed in all practical cases, why can't there be theory that's meaningful?
Let me turn the question around the other way. I have a graph that shows segmented super-linear steps. It sounds like many people would say "this is a pure coincidence, it could be any shape imaginable" (at the extreme of course) and wouldn't bat an eyelid if it were a sine wave instead. I know in many cases the shape could be hidden by constants, but here it's quite obvious. How can I give a formal explanation of why the graph produces this shape?
I particularly like #Sneftel's words "imprecise but useful guidance".
I know Big O and asymptotic analysis isn't applicable. What is? How far can I take it?
Discuss in chat
For small n, computation complexity - how things change as n increases towards infinity - isn't meaningful as other effects dominate.
Papers I've seen which discuss behaviour for small values of n do so by measuring the algorithms on real systems, and discuss how the algorithms perform in practice rather than from a theoretical viewpoint. For example, for the graph you've added to your post I would say 'this graph demonstrates an O(N) asymptotic behaviour overall, but the growth within each tile is bounded quadratic'.
I don't know of a situation where a discussion of such behaviour from a theoretical viewpoint would be meaningful - it is well known that for small n the practical effects outweigh the effects of scaling.
It's important to remember that asymptotic analysis is an analytic simplification, not a mandate for analyzing algorithms. Take selection sort, for instance. Yes, it executes in O(n^2) time. But it is also true that it performs precisely n*(n-1)/2 comparisons, and n-1-k swaps, where k is the number of elements (other than the maximum) which start in the correct position. Asymptotic analysis is a tool for simplifying the (otherwise generally impractical) task of performance analysis, and one we can put aside if we're not interested in the "really big n" segment.
And you can choose how you express your bounds. Say a function requires precisely n + floor(0.01*2^n) operations. That's exponential time, of course. But one can also say "for data sizes up to 10 elements, this algorithm requires between n and 2*n operations." The latter describes not the shape of the curve, but an envelope around that curve, giving imprecise but useful guidance about the practicalities of the algorithm within a particular range of use cases.
You are right.
For small n, i.e. when only insertion sort is performed, the asymptotic behavior is quadratic O(n^2).
And for larger n, when tiled merge sort enters into play, the behavior switches to O(n.Log(n)).
There is no contradiction if you remember that every behavior has its domain of validity, before the switching threshold, let N, and after it.
In practice there will be a smooth blend between the curves around N. But in practice too, that value of N is so small that the quadratic behavior does not have enough "room" to manifest itself.
Another way to deal with this analysis is to say that N being a constant, the insertion sorts take constant time. But I would disagree to say that this is a must.
Let's unpack things a bit. Big-O is a tool for describing the growth rate of a function. One of the functions to which it is commonly applied is the worst-case running time of an algorithm on inputs of length n, executing on a particular abstract machine. We often forget about the last part because there is a large class of machines with random-access memory that can emulate one another with only constant-factor slowdown, and the class of problems solvable within a particular big-O running-time bound is equivalent across these machines.
If you want to talk about complexity on small inputs, then you need to measure constant factors. One way is to measure running times of actual implementations. People usually do this on physical machines, but if you're hardcore like Knuth, you invent your own assembly language complete with instruction timings. Another way is to measure something that's readily identifiable but also a useful proxy for the other work performed. For comparison sorts, this could be comparisons. For numerical algorithms, this is often floating-point operations.
Complexity is not about execution time for one n on one machine, so there is no need to consider it even if constant is large. Complexity tells you how the size of the input affects execution time. For small n, you can treat execution time as constant. This is the one side.
From the second side you are saying that:
You have a hybrid algorithm working in O(n log n) for n larger than some k and O(n^2) for n smaller than k.
The constant k is so large that algorithm works slowly.
There is no sense in such algorithm, because you could easily improve it.
Lets take Red-black tree. Operations on this tree are performed in O(n log n) time complexity, but there is a large constant. So, on normal machines, it could work slowly (i.e. slower than simpler solutions) in some cases. There is no need to consider it in analyzing complexity. You need to consider it when you are implementing it in your system: you need to check if it's the best choice considering the actual machine(s) on which it will be working and what problems it will be solving.
Read Knuth's "The Art of Computer Programming series", starting with "Volume 1. Fundamental Algorithms", section "1.2.10: Analysis of an Algorithm". There he shows (and in all the rest of his seminal work) how exact analysis can be conducted for arbitrary problem sizes, using a suitable reference machine, by taking a detailed census of every processor instruction.
Such analyses have to take into account not only the problem size, but also any relevant aspect of the input data distribution which will influence the running time. For simplification, the analysis are often limited to the study of the worst case, the expected case or the output-sensitive case, rather than a general statistical characterization. And for further simplification, asymptotic analysis is used.
Not counting the fact that except for trivial algorithms the detailed approach is mathematically highly challenging, it has become unrealistic on modern machines. Indeed, it relies on a processor behavior similar to the so-called RAM model, which assumes constant time per instruction and per memory access (http://en.wikipedia.org/wiki/Random-access_machine). Except maybe for special hardware combined to a hard real-time OS, these assumptions are nowadays completely wrong.
When you have an algorithm with a time complexity say O(n^2).And you also have an another algorithm with a time complexity, say O(n).Then from these two time complexity you can't conclude that the latter algorithm is faster than the former one for all input values.You can only say latter algorithm is asymptotically(means for sufficiently large input values)faster than the former one.Here you have to keep in mind the fact that in case of asymptotic notations constant factors are generally ignored to increase the understand-ability of the time complexity of the algorithm.As example: marge sort runs in O(nlogn) worst-case time and insertion sort runs in O(n^2) worst case time.But as the hidden constant factors in insertion sort is smaller than that of marge sort, in practice insertion sort can be faster than marge sort for small problem sizes on many machines.
Asymptotic notation does not describe the actual running-time of an algorithm.Actual running time is dependent on machine as different machine has different architecture and different Instruction Cycle Execution time.Asymptotic notation just describes asymptotically how fast an algorithm is with respect to other algorithms.But it does not describe the behavior of the algorithm in case of small input values(n<=no).The value of no (threshold) is dependent on the hidden constant factors and lower order terms.And hidden constant factors are dependent on the machine on which it will be executed.
Can someone explain the reason behind ignoring the constants in computing the running time complexity of an Algorithm please?
Thanks
When analysing time complexity constants are difficult and irrelevant to calculate.
On some architecture addition might take twice as long as multiplication, so now we have to go through the algorithm and calculate the number of additions we make, and the number of multiplications we make in order to get an accurate runtime analysis. Not pretty!
More importantly, even if that is true now, at some point in the future, or on another slightly different architecture, that constant may be different, so the runtime would vary between architectures. So now my algorithm has more than one runtime? One for now on this architecture, and another for another architecture, and each of those may change in the future... Again, not pretty.
In a world where the capabilities of computing change all the time, twice the CPU capabilities tomorrow, four times the memory in a week, etc. there is little relevance to constant factors. This isn't the case if we need to quantify real-life runtime, but when we are analysing the complexity of an algorithm in general, not the complexity of a solution in a specific environment it is the case.
Also, and probably most importantly, the constant factors are not a good measure of the complexity of the problem, and at the end of the day we are trying to measure complexity. An algorithm that is of a specific complexity class, behaves (or more accurately, is bounded) a certain way for all size inputs. So when I try and measure the general complexity of two solutions, I do so in as general a way as possible, and therefore try and consider all values of input size (i.e. n->infinity).
Finally, It allows theoreticians to group algorithms into the same class, irrespective of certain constant factors that may or may not change and may or may not be improved. This is helpful for, among other reasons, optimality proofs; finding that a problem is omega(f(n)) is only useful if we consider algorithms in the complexity class of O(f(n)), irrespective of the constants.
because while comparing very large values, (let's call it n)... n^2 will be higher then k*n for any k, where n->infinity.
this is of course true for any power/exponent of n.
note that in real life projects, you do make these optimizations and try to minimize the constants, but usually they are less siginificant then the degree of the polynom
You only ignore constants when doing rough estimates. Most of the time, it's a valid simplification: when comparing two algorithms for large input dimensions, such as sorting an array, O(n log n) will eventually be faster or smaller than O(n²) no matter what.
However, when two algorithms have the same complexity, or when the expected dataset is so small that the asymptotic behaviour is not a valid real world scenario, then the constant is definitely important. For instance, a bubble sort implementation is likely to perform faster on an array of 3 or 4 values than quicksort.
As the value of n grows in equation 6n^2 + 100n + 300 the constant becomes of less relevance and we tend to ignore that. See the following image
Picture taken from Asymptotic Notation section at Khan Academy.
I am trying to get the worst run-time complexity order on a couple of algorithms created. However I have run into a problem that I keep tending to select the wrong or wrong amount of fundamental operations for an algorithm.
To me it appears to be that the selection of the fundamental operation is more of an art than a science. After googling and reading my text boxes, I still have not found a good definition. So far I have defined it as "An operation that always occurs within an algorithms execution" such as a comparison or array manipulation.
But algorithms often have many comparisons that are always executed so which operation do you pick?
I agree to some degree it's an art, so you should always clarify when writing documentation, etc.. But usually it's a "visit" to the underlying data structure. So like you said, for an array it's a comparison or a swap, for a hash map it may be a manual examination of a key, for a graph it's a visit to a vertex or edge, etc.
Even practicing complexity theorists have disagreements about this sort of thing, so what follows may be a bit subjective: http://blog.computationalcomplexity.org/2009/05/shaving-logs-with-unit-cost.html
The purpose of big-O notation is to summarize the efficiency of an algorithm for the reader. In practical contexts, I am most concerned with how many clock cycles an algorithm takes, assuming that the big-O constant is neither extremely small or large (and ignoring the effects of the memory hierarchy); this is the "unit-cost" model alluded to in the linked post.
The reason to count comparisons for sorting algorithms is that the cost of a comparison depends on the type of the input data. You could say that a sorting algorithm takes O(c n log n) cycles where c is the expense of a comparison, but it's simpler in this case to count comparisons instead because the other work performed by the algorithm is O(n log n). There's a sorting algorithm that sorts the concatenation of n sorted arrays of length n in n^2 log n steps and n^2 comparisons; here, I would expect that the number of comparisons and the computational overhead be stated separately, because neither necessarily dominates the other.
This only works when You have actually implemented the algorithm, but You could just use a profiler to see which operation is the bottleneck. That's a practical point of view. In theory, some assume that everything that is not the fundamental operation runs in zero time.
The somewhat simple definition I have heard is:
The operation which is executed at least as many times as any other
operation in the algorithm.
For example, in a sorting algorithm, these tend to be comparisons rather than assignments as you almost always have to visit and 'check' an element before you re-order it, but the check may not result in a re-ordering. So there will always be at-least as many comparisons as assignments.