What exactly does "scalability" mean in terms of Big O?

What exactly does "scalability" mean in terms of Big O? - algorithm

What does people mean when they say things like:
Big O estimates scalability.
The runtime grows “on the order of the square of the size of the input”, given that the algorithm in the worst case runs in O(n^2).
Does it mean for large n quadruple runtime for the doubled input size (assuming the algorithm runs in O(n^2))?
Is you said YES, then suppose that the number of steps our algorithm takes in the worst case is expressed by the function:
It follows that:
Moreover, we can see that:
But we can't quadruple runtime for the doubled input size n (even for large values of n) in this case, because it would simply be wrong. To see this, check the following plot:
And here is the plot of the ratio f(2n)/f(n):
It doesn't look like this ratio is tending towards 4.
So, what is scalabity, then? What does Big O estimate, if not the ratios like f(2n)/f(n) for large enough n? It doesn't seem to me that f scales like n^2.

So, what is scalabity, then? What does Big O estimate?
Big-O notation gives an upper bound on asymptotic complexity for e.g. classifying algorithms and/or functions.
Does it mean for large n quadruple runtime for the doubled input size
No, you are mixing up asymptotic (time) complexity with actual "runtime count" of your function. When deriving the asymptotic complexity in terms of Big-O notation for, say, a function, we will ignore any constant terms, meaning that for your own example, f(n) and f(2n) are identical in terms of asymptotic complexity. Both for f(n) and for g(n) = f(2n), f and g grows (scales, if you wish) quadratically in they asymptotic region. This does not mean their runtime is exactly quadrupled for a doubled input size (this kind of detailed hypothesis doesn't even make sense in term of asymptotic behavior), it only governs an upper bound on growth behavior for sufficiently large input.
Refer e.g. to my answer to this question for more thorough explanation on the non-relevance on both constant and lower order terms when deriving asymptotic complexity for a function:
Role of lower order terms in big O notation

Related

What is difference between different asymptotic notations?

I am really very confused in asymptotic notations. As far as I know, Big-O notation is for worst cast, omega is for best case and theta is for average case. However, I have always seen Big O being used everywhere, even for best case. For e.g. in the following link, see the table where time complexities of different sorting algorithms are mentioned-
https://en.wikipedia.org/wiki/Best,_worst_and_average_case
Everywhere in the table, big O notation is used irrespective of whether it is best case, worst case or average case. Then what is the use of other two notations and where do we use it?

As far as I know, Big-O notation is for worst cast, omega is for best case and theta is for average case.
They aren't. Omicron is for (asymptotic) upper bound, omega is for lower bound and theta is for tight bound, which is both an upper and a lower bound. If the lower and upper bound of an algorithm are different, then the complexity cannot be expressed with theta notation.
The concept of upper,lower,tight bound are orthogonal to the concept of best,average,worst case. You can analyze the upper bound of each case, and you can analyze different bounds of the worst case (and also any other combination of the above).
Asymptotic bounds are always in relation to the set of variables in the expression. For example, O(n) is in relation to n. The best, average and worst cases emerge from everything else but n. For example, if n is the number of elements, then the different cases might emerge from the order of the elements, or the number of unique elements, or the distribution of values.
However, I have always seen Big O being used everywhere, even for best case.
That's because the upper bound is almost always the one that is the most important and interesting when describing an algorithm. We rarely care about the lower bound. Just like we rarely care about the best case.
The lower bound is sometimes useful in describing a problem that has been proven to have a particular complexity. For example, it is proven that worst case complexity of all general comparison sorting algorithms is Ω(n log n). If the sorting algorithm is also O(n log n), then by definition, it is also Θ(n log n).

Big O is for upper bound, not for worst case! There is no notation specifically for worst case/best case. The examples you are talking about all have Big O because they are all upper bounded by the given value. I suggest you take another look at the book from which you learned the basics because this is immensely important to understand :)
EDIT: Answering your doubt- because usually, we are bothered with our at-most performance i.e. when we say, our algorithm performs in O(logn) in the best case-scenario, we know that its performance will not be worse than logarithmic time in the given scenario. It is the upper bound that we seek to reduce usually and hence we usually mention big O to compare algorithms. (not to say that we never mention the other two)

O(...) basically means "not (much) slower than ...".
It can be used for all three cases ("the worst case is not slower than", "the best case is not slower than", and so on).
Omega is the oppsite: You can say, something can't be much faster than ... . Again, it can be used with all three cases. Compared to O(...), it's not that important, because telling someone "I'm certain my program is not faster than yours" is nothing to be proud of.
Theta is a combination: It's "(more or less) as fast as" ..., not just slower/faster.

The Big-O notation is somethin like this >= in terms of asymptotic equality.
For example if you see this :
x = O(x^2) it does say x <= x^2 (in asymptotic terms).
And it does mean "x is at most as complex as x^2", which is something that you are usually interesting it.
Even when you compare Best/Average case, you can say "At best possible input, I will have AT MOST this complexity".

There are two things mixed up: Big O, Omega, Theta, are purely mathematical constructions. For example, O (f (N)) is the set of functions which are less than c * f (n), for some c > 0, and for all n >= some minimum value N0. With that definition, n = O (f (n^4)), because n ≤ n^4. 100 = O (f (n)), because 100 <= n for n ≥ 100, or 100 <= 100 * n for n ≥ 1.
For an algorithm, you want to give worst case speed, average case speed, rarely the best case speed, sometimes amortised average speed (that's when running an algorithm once does work that can be used when it's run again. Like calculating n! for n = 1, 2, 3, ... where each calculation can take advantage of the previous one). And whatever speed you measure, you can give a result in one of the notations.
For example, you might have an algorithm where you can prove that the worst case is O (n^2), but you cannot prove whether there are faster special cases or not, and you also cannot prove that the algorithm isn't actually faster, like O (n^1.9). So O (n^2) is the only thing that you can prove.

Why do we ignore co-efficients in Big O notation?

While searching for answers relating to "Big O" notation, I have seen many SO answers such as this, this, or this, but still I have not clearly understood some points.
Why do we ignore the co-efficients?
For example this answer says that the final complexity of 2N + 2 is O(N); we remove the leading co-efficient 2 and the final constant 2 as well.
Removing the final constant of 2 perhaps understandable. After all, N may be very large and so "forgetting" the final 2 may only change the grand total by a small percentage.
However I cannot clearly understand how removing the leading co-efficient does not make difference. If the leading 2 above became a 1 or a 3, the percentage change to the grand total would be large.
Similarly, apparently 2N^3 + 99N^2 + 500 is O(N^3). How do we ignore the 99N^2 along with the 500?

The purpose of the Big-O notation is to find what is the dominant factor in the asymptotic behavior of a function as the value tends towards the infinity.
As we walk through the function domain, some factors become more important than others.
Imagine f(n) = n^3+n^2. As n goes to infinity, n^2 becomes less and less relevant when compared with n^3.
But that's just the intuition behind the definition. In practice we ignore some portions of the function because of the formal definition:
f(x) = O(g(x)) as x->infinity
if and only if there is a positive real M and a real x_0 such as
|f(x)| <= M|g(x)| for all x > x_0.
That's in wikipedia. What that actually means is that there is a point (after x_0) after which some multiple of g(x) dominates f(x). That definition acts like a loose upper bound on the value of f(x).
From that we can derive many other properties, like f(x)+K = O(f(x)), f(x^n+x^n-1)=O(x^n), etc. It's just a matter of using the definition to prove those.
In special, the intuition behind removing the coefficient (K*f(x) = O(f(x))) lies in what we try to measure with computational complexity. Ultimately it's all about time (or any resource, actually). But it's hard to know how much time each operation take. One algorithm may perform 2n operations and the other n, but the latter may have a large constant time associated with it. So, for this purpose, isn't easy to reason about the difference between n and 2n.

From a (complexity) theory point of view, the coefficients represent hardware details that we can ignore. Specifically, the Linear Speedup Theorem dictates that for any problem we can always throw an exponentially increasing amount of hardware (money) at a computer to get a linear boost in speed.
Therefore, modulo expensive hardware purchases two algorithms that solve the same problem, one at twice the speed of the other for all input sizes, are considered essentially the same.
Big-O (Landau) notation has its origins independently in number theory, where one of its uses is to create a kind of equivalence between functions: if a given function is bounded above by another and simultaneously is bounded below by a scaled version of that same other function, then the two functions are essentially the same from an asymptotic point of view. The definition of Big-O (actually, "Big-Theta") captures this situation: the "Big-O" (Theta) of the two functions are exactly equal.
The fact that Big-O notation allows us to disregard the leading constant when comparing the growth of functions makes Big-O an ideal vehicle to measure various qualities of algorithms while respecting (ignoring) the "freebie" optimizations offered by the Linear Speedup Theorem.

Big O provides a good estimate of what algorithms are more efficient for larger inputs, all things being equal; this is why for an algorithm with an n^3 and an n^2 factor we ignore the n^2 factor, because even if the n^2 factor has a large constant it will eventually be dominated by the n^3 factor.
However, real algorithms incorporate more than simple Big O analysis, for example a sorting algorithm will often start with a O(n * log(n)) partitioning algorithm like quicksort or mergesort, and when the partitions become small enough the algorithm will switch to a simpler O(n^2) algorithm like insertionsort - for small inputs insertionsort is generally faster, although a basic Big O analysis doesn't reveal this.
The constant factors often aren't very interesting, and so they're omitted - certainly a difference in factors on the order of 1000 is interesting, but usually the difference in factors are smaller, and then there are many more constant factors to consider that may dominate the algorithms' constants. Let's say I've got two algorithms, the first with running time 3*n and the second with running time 2*n, each with comparable space complexity. This analysis assumes uniform memory access; what if the first algorithm interacts better with the cache, and this more than makes up for the worse constant factor? What if more compiler optimizations can be applied to it, or it behaves better with the memory management subsystem, or requires less expensive IO (e.g. fewer disk seeks or fewer database joins or whatever) and so on? The constant factor for the algorithm is relevant, but there are many more constants that need to be considered. Often the easiest way to determine which algorithm is best is just to run them both on some sample inputs and time the results; over-relying on the algorithms' constant factors would hide this step.

An other thing is that, what I have understood, the complexity of 2N^3 + 99N^2 + 500 will be O(N^3). So how do we ignore/remove 99N^2 portion even? Will it not make difference when let's say N is one miilion?
That's right, in that case the 99N^2 term is far overshadowed by the 2N^3 term. The point where they cross is at N=49.5, much less than one million.
But you bring up a good point. Asymptotic computational complexity analysis is in fact often criticized for ignoring constant factors that can make a huge difference in real-world applications. However, big-O is still a useful tool for capturing the efficiency of an algorithm in a few syllables. It's often the case that an n^2 algorithm will be faster in real life than an n^3 algorithm for nontrivial n, and it's almost always the case that a log(n) algorithm will be much faster than an n^2 algorithm.
In addition to being a handy yardstick for approximating practical efficiency, it's also an important tool for the theoretical analysis of algorithm complexity. Many useful properties arise from the composability of polynomials - this makes sense because nested looping is fundamental to computation, and those correspond to polynomial numbers of steps. Using asymptotic complexity analysis, you can prove a rich set of relationships between different categories of algorithms, and that teaches us things about exactly how efficiently certain problems can be solved.

Big O notation is not an absolute measure of complexity.
Rather it is a designation of how complexity will change as the variable changes. In other words as N increases the complexity will increase
Big O(f(N)).
To explain why terms are not included we look at how fast the terms increase.
So, Big O(2n+2) has two terms 2n and 2. Looking at the rate of increase
Big O(2) this term will never increase it does not contribute to the rate of increase at all so it goes away. Also since 2n increases faster than 2, the 2 turns into noise as n gets very large.
Similarly Big O(2n^3 + 99n^2) compares Big O(2n^3) and Big O(99n^2). For small values, say n < 50, the 99n^2 will contribute a larger nominal percentage than 2n^3. However if n gets very large, say 1000000, then 99n^2 although nominally large it is insignificant (close to 1 millionth) compared to the size of 2n^3.
As a consequence Big O(n^i) < Big O(n^(i+1)).
Coefficients are removed because of the mathematical definition of Big O.
To simplify the definition says Big O(f(n)) = Big O(f(cn)) for a constant c. This needs to be taken on faith because the reason for this is purely mathematical, and as such the proof would be too complex and dry to explain in simple terms.

The mathematical reason:
The real reason why we do this, is the way Big O-Notation is defined:
A series (or lets use the word function) f(n) is in O(g(n)) when the series f(n)/g(n) is bounded. Example:
f(n)= 2*n^2
g(n)= n^2
f(n) is in O(g(n)) because (2*n^2)/(n^2) = 2 as n approaches Infinity. The term (2*n^2)/(n^2) doesn't become infinitely large (its always 2), so the quotient is bounded and thus 2*n^2 is in O(n^2).
Another one:
f(n) = n^2
g(n) = n
The term n^2/n (= n) becomes infinetely large, as n goes to infinity, so n^2 is not in O(n).
The same principle applies, when you have
f(n) = n^2 + 2*n + 20
g(n) = n^2
(n^2 + 2*n + 20)/(n^2) is also bounded, because it tends to 1, as n goes to infinity.
Big-O Notation basically describes, that your function f(n) is (from some value of n on to infinity) smaller than a function g(n), multiplied by a constant. With the previous example:
2*n^2 is in O(n^2), because we can find a value C, so that 2*n^2 is smaller than C*n^2. In this example we can pick C to be 5 or 10, for example, and the condition will be satisfied.
So what do you get out of this? If you know your algorithm has complexity O(10^n) and you input a list of 4 numbers, it may take only a short time. If you input 10 numbers, it will take a million times longer! If it's one million times longer or 5 million times longer doesn't really matter here. You can always use 5 more computers for it and have it run in the same amount of time, the real problem here is, that it scales incredibly bad with input size.

For practical applications the constants does matter, so O(2 n^3) will be better than O(1000 n^2) for inputs with n smaller than 500.
There are two main ideas here: 1) If your algorithm should be great for any input, it should have a low time complexity, and 2) that n^3 grows so much faster than n^2, that perfering n^3 over n^2 almost never makes sense.

Meaning of Big O notation

Our teacher gave us the following definition of Big O notation:
O(f(n)): A function g(n) is in O(f(n)) (“big O of f(n)”) if there exist
constants c > 0 and N such that |g(n)| ≤ c |f(n)| for all n > N.
I'm trying to tease apart the various components of this definition. First of all, I'm confused by what it means for g(n) to be in O(f(n)). What does in mean?
Next, I'm confused by the overall second portion of the statement. Why does saying that the absolute value of g(n) less than or equal f(n) for all n > N mean anything about Big O Notation?
My general intuition for what Big O Notation means is that it is a way to describe the runtime of an algorithm. For example, if bubble sort runs in O(n^2) in the worst case, this means that it takes the time of n^2 operations (in this case comparisons) to complete the algorithm. I don't see how this intuition follows from the above definition.

First of all, I'm confused by what it means for g(n) to be in O(f(n)). What does in mean?
In this formulation, O(f(n)) is a set of functions. Thus O(N) is the set of all functions that are (in simple terms) proportional to N as N tends to infinity.
The word "in" means ... "is a member of the set".
Why does saying that the absolute value of g(n) less than or equal f(n) for all n > N mean anything about Big O Notation?
It is the definition. And besides you have neglected the c term in your synopsis, and that is an important part of the definition.
My general intuition for what Big O Notation means is that it is a way to describe the runtime of an algorithm. For example, if bubble sort runs in O(n^2) in the worst case, this means that it takes the time of n^2 operations (in this case comparisons) to complete the algorithm. I don't see how this intuition follows from the above definition.
Your intuition is incorrect in two respects.
Firstly, the real definition of O(N^2) is NOT that it takes N^2 operations. it is that it takes proportional to N^2 operations. That's where the c comes into it.
Secondly, it is only proportional to N^2 for large enough values of N. Big O notation is not about what happens for small N. It is about what happens when the problem size scales up.
Also, as a a comment notes "proportional" is not quite the right phraseology here. It might be more correct to say "tends towards proportional" ... but in reality there isn't a simple english description of what is going on here. The real definition is the mathematical one.
If you now reread the definition in the light of that, you should see that it fits just nicely.
(Note that the definitions of Big O, and related measures of complexity can also be expressed in calculus terminology; i.e. using "limits". However, generally speaking the things we are talking about are quantized; i.e. an integer number instructions, an integer number bytes of storage, etc. Calculus is really about functions involving real numbers. Hence, you could argue that the formulation above is preferable. OTOH, a real mathematician would probably see bus-sized holes in this argumentation.)

O(g(n)) looks like a function, but it is actually a set of functions. If a function f is in O(g(n)), it means that g is an asymptotic upper bound on f to within a constant factor. O(g(n)) contains all functions that are bounded from above by g(n).
More specifically, there exists a constant c and n0 such that f(n) < c * g(n) for all n > n0. This means that c * g(n) will always overtake f(n) beyond some value of n. g is asymptotically larger than f; it scales faster.
This is used in the analysis of algorithms as follows. The running time of an algorithm is impossible to specify practically. It would obviously depend on the machine on which it runs. We need a way of talking about efficiency that is unconcerned with matters of hardware. One might naively suggest counting the steps executed by the algorithm and using that as the measure of running time, but this would depend on the granularity with which the algorithm is specified and so is no good either. Instead, we concern ourselves only with how quickly the running time (this hypothetical thing T(n)) scales with the size of the input n.
Thus, we can report the running time by saying something like:
My algorithm (algo1) has a running time T(n) that is in the set O(n^2). I.e. it's bounded from above by some constant multiple of n^2.
Some alternative algorithm (algo2) might have a time complexity of O(n), which we call linear. This may or may not be better for some particular input size or on some hardware, but there is one thing we can say for certain: as n tends to infinity, algo2 will out-perform algo1.
Practically then, one should favour algorithms with better time complexities, as they will tend to run faster.
This asymptotic notation may be applied to memory usage also.

what is order of complexity in Big O notation?

Question
Hi I am trying to understand what order of complexity in terms of Big O notation is. I have read many articles and am yet to find anything explaining exactly 'order of complexity', even on the useful descriptions of Big O on here.
What I already understand about big O
The part which I already understand. about Big O notation is that we are measuring the time and space complexity of an algorithm in terms of the growth of input size n. I also understand that certain sorting methods have best, worst and average scenarios for Big O such as O(n) ,O(n^2) etc and the n is input size (number of elements to be sorted).
Any simple definitions or examples would be greatly appreciated thanks.

Big-O analysis is a form of runtime analysis that measures the efficiency of an algorithm in terms of the time it takes for the algorithm to run as a function of the input size. It’s not a formal bench- mark, just a simple way to classify algorithms by relative efficiency when dealing with very large input sizes.
Update:
The fastest-possible running time for any runtime analysis is O(1), commonly referred to as constant running time.An algorithm with constant running time always takes the same amount of time
to execute, regardless of the input size.This is the ideal run time for an algorithm, but it’s rarely achievable.
The performance of most algorithms depends on n, the size of the input.The algorithms can be classified as follows from best-to-worse performance:
O(log n) — An algorithm is said to be logarithmic if its running time increases logarithmically in proportion to the input size.
O(n) — A linear algorithm’s running time increases in direct proportion to the input size.
O(n log n) — A superlinear algorithm is midway between a linear algorithm and a polynomial algorithm.
O(n^c) — A polynomial algorithm grows quickly based on the size of the input.
O(c^n) — An exponential algorithm grows even faster than a polynomial algorithm.
O(n!) — A factorial algorithm grows the fastest and becomes quickly unusable for even small values of n.
The run times of different orders of algorithms separate rapidly as n gets larger.Consider the run time for each of these algorithm classes with
n = 10:
log 10 = 1
10 = 10
10 log 10 = 10
10^2 = 100
2^10= 1,024
10! = 3,628,800
Now double it to n = 20:
log 20 = 1.30
20 = 20
20 log 20= 26.02
20^2 = 400
2^20 = 1,048,576
20! = 2.43×1018
Finding an algorithm that works in superlinear time or better can make a huge difference in how well an application performs.

Say, f(n) in O(g(n)) if and only if there exists a C and n0 such that f(n) < C*g(n) for all n greater than n0.
Now that's a rather mathematical approach. So I'll give some examples. The simplest case is O(1). This means "constant". So no matter how large the input (n) of a program, it will take the same time to finish. An example of a constant program is one that takes a list of integers, and returns the first one. No matter how long the list is, you can just take the first and return it right away.
The next is linear, O(n). This means that if the input size of your program doubles, so will your execution time. An example of a linear program is the sum of a list of integers. You'll have to look at each integer once. So if the input is an list of size n, you'll have to look at n integers.
An intuitive definition could define the order of your program as the relation between the input size and the execution time.

Others have explained big O notation well here. I would like to point out that sometimes too much emphasis is given to big O notation.
Consider matrix multplication the naïve algorithm has O(n^3). Using the Strassen algoirthm it can be done as O(n^2.807). Now there are even algorithms that get O(n^2.3727).
One might be tempted to choose the algorithm with the lowest big O but it turns for all pratical purposes that the naïvely O(n^3) method wins out. This is because the constant for the dominating term is much larger for the other methods.
Therefore just looking at the dominating term in the complexity can be misleading. Sometimes one has to consider all terms.

Big O is about finding an upper limit for the growth of some function. See the formal definition on Wikipedia http://en.wikipedia.org/wiki/Big_O_notation
So if you've got an algorithm that sorts an array of size n and it requires only a constant amount of extra space and it takes (for example) 2 n² + n steps to complete, then you would say it's space complexity is O(n) or O(1) (depending on wether you count the size of the input array or not) and it's time complexity is O(n²).
Knowing only those O numbers, you could roughly determine how much more space and time is needed to go from n to n + 100 or 2 n or whatever you are interested in. That is how well an algorithm "scales".
Update
Big O and complexity are really just two terms for the same thing. You can say "linear complexity" instead of O(n), quadratic complexity instead of O(n²), etc...

I see that you are commenting on several answers wanting to know the specific term of order as it relates to Big-O.
Suppose f(n) = O(n^2), we say that the order is n^2.

Be careful here, there are some subtleties. You stated "we are measuring the time and space complexity of an algorithm in terms of the growth of input size n," and that's how people often treat it, but it's not actually correct. Rather, with O(g(n)) we are determining that g(n), scaled suitably, is an upper bound for the time and space complexity of an algorithm for all input of size n bigger than some particular n'. Similarly, with Omega(h(n)) we are determining that h(n), scaled suitably, is a lower bound for the time and space complexity of an algorithm for all input of size n bigger than some particular n'. Finally, if both the lower and upper bound are the same complexity g(n), the complexity is Theta(g(n)). In other words, Theta represents the degree of complexity of the algorithm while big-O and big-Omega bound it above and below.

Constant Growth: O(1)
Linear Growth: O(n)
Quadratic Growth: O(n^2)
Cubic Growth: O(n^3)
Logarithmic Growth: (log(n)) or O(n*log(n))

Big O use Mathematical Definition of complexity .
Order Of use in industrial Definition of complexity .

Meaning of average complexity when using Big-O notation

While answering to this question a debate began in comments about complexity of QuickSort. What I remember from my university time is that QuickSort is O(n^2) in worst case, O(n log(n)) in average case and O(n log(n)) (but with tighter bound) in best case.
What I need is a correct mathematical explanation of the meaning of average complexity to explain clearly what it is about to someone who believe the big-O notation can only be used for worst-case.
What I remember if that to define average complexity you should consider complexity of algorithm for all possible inputs, count how many degenerating and normal cases. If the number of degenerating cases divided by n tend towards 0 when n get big, then you can speak of average complexity of the overall function for normal cases.
Is this definition right or is definition of average complexity different ? And if it's correct can someone state it more rigorously than I ?

You're right.
Big O (big Theta etc.) is used to measure functions. When you write f=O(g) it doesn't matter what f and g mean. They could be average time complexity, worst time complexity, space complexities, denote distribution of primes etc.
Worst-case complexity is a function that takes size n, and tells you what is maximum number of steps of an algorithm given input of size n.
Average-case complexity is a function that takes size n, and tells you what is expected number of steps of an algorithm given input of size n.
As you see worst-case and average-case complexity are functions, so you can use big O to express their growth.

If you're looking for a formal definition, then:
Average complexity is the expected running time for a random input.

Let's refer Big O Notation in Wikipedia:
Let f and g be two functions defined on some subset of the real numbers. One writes f(x)=O(g(x)) as x --> infinity if ...
So what the premise of the definition states is that the function f should take a number as an input and yield a number as an output. What input number are we talking about? It's supposedly a number of elements in the sequence to be sorted. What output number could we be talking about? It could be a number of operations done to order the sequence. But stop. What is a function? Function in Wikipedia:
a function is a relation between a set of inputs and a set of permissible outputs with the property that each input is related to exactly one output.
Are we producing exacly one output with our prior defition? No, we don't. For a given size of a sequence we can get a wide variation of number of operations. So to ensure the definition is applicable to our case we need to reduce a set possible outcomes (number of operations) to a single value. It can be a maximum ("the worse case"), a minimum ("the best case") or an average.
The conclusion is that talking about best/worst/average case is mathematically correct and using big O notation without those in context of sorting complexity is somewhat sloppy.
On the other hand, we could be more precise and use big Theta notation instead of big O notation.

I think your definition is correct, but your conclusions are wrong.
It's not necessarily true that if the proportion of "bad" cases tends to 0, then the average complexity is equal to the complexity of the "normal" cases.
For example, suppose that 1/(n^2) cases are "bad" and the rest "normal", and that "bad" cases take exactly (n^4) operations, whereas "normal" cases take exactly n operations.
Then the average number of operations required is equal to:
(n^4/n^2) + n(n^2-1)/(n^2)
This function is O(n^2), but not O(n).
In practice, though, you might find that time is polynomial in all cases, and the proportion of "bad" cases shrinks exponentially. That's when you'd ignore the bad cases in calculating an average.

Average case analysis does the following:
Take all inputs of a fixed length (say n), sum up all the running times of all instances of this length, and build the average.
The problem is you will probably have to enumerate all inputs of length n in order to come up with an average complexity.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio