Could anyone explain Big O versus Big Omega vs Big Theta? [duplicate] - big-o

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Big Theta Notation - what exactly does big Theta represent?
I understand it in theory, I guess, but what I'm having trouble grasping is the application of the three.
In school, we always used Big O to denote the complexity of an algorithm. Bubble sort was O(n^2) for example.
Now after reading some more theory I get that Big Oh is not the only measure, there's at least two other interesting ones.
But here's my question:
Big O is the upper-bound, Big Omega is the lower bound, and Big Theta is a mix of the two. But what does that mean conceptually? I understand what it means on a graph; I've seen a million examples of that. But what does it mean for algorithm complexity? How does an "upper bound" or a "lower bound" mix with that?
I guess I just don't get its application. I understand that if multiplied by some constant c that if after some value n_0 f(x) is greater than g(x), f(x) is considered O(g(x)). But what does that mean practically? Why would we be multiplying f(x) by some value c? Hell, I thought with Big O notation multiples didn't matter.

The big O notation, and its relatives, the big Theta, the big Omega, the small o and the small omega are ways of saying something about how a function behaves at a limit point (for example, when approaching infinity, but also when approaching 0, etc.) without saying much else about the function. They are commonly used to describe running space and time of algorithms, but can also be seen in other areas of mathematics regarding asymptotic behavior.
The semi-intuitive definition is as follows:
A function g(x) is said to be O(f(x)) if "from some point on", g(x) is lower than c*f(x), where c is some constant.
The other definitions are similar, Theta demanding that g(x) be between two constant multiples of f(x), Omega demanding g(x)>c*f(x), and the small versions demand that this is true for all such constants.
But why is it interesting to say, for example, that an algorithm has run time of O(n^2)?
It's interesting mainly because, in theoretical computer science, we are most interested in how algorithms behave for large inputs. This is true because on small inputs algorithm run times can vary greatly depending on implementation, compilation, hardware, and other such things that are not really interesting when analyzing an algorithm theoretically.
The rate of growth, however, usually depends on the nature of the algorithm, and to improve it you need deeper insights on the problem you're trying to solve. This is the case, for example, with sorting algorithms, where you can get a simple algorithm (Bubble Sort) to run in O(n^2), but to improve this to O(n log n) you need a truly new idea, such as that introduced in Merge Sort or Heap Sort.
On the other hand, if you have an algorithm that runs in exactly 5n seconds, and another that runs in 1000n seconds (which is the difference between a long yawn and a launch break for n=3, for example), when you get to n=1000000000000, the difference in scale seems less important. If you have an algorithm that takes O(log n), though, you'd have to wait log(1000000000000)=12 seconds, perhaps multiplied by some constant, instead of the almost 317,098 years, which, no matter how big the constant is, is a completely different scale.
I hope this makes things a little clearer. Good luck with your studies!

Related

Big Omega, Big Oh, Big theta

I am reading a book Data Structures by Yashavnt P. Kanetkar. In page no. 11 of the book, I found out there are three categories of algorithms as stated by the book :
a. Algorithms that grows at least as fast as some function.
b. Algorithms that grow at the same rate.
c. Algorithms that grow no faster.
Later it is stated :
a. is Big Omega (n)
b. is Big theta (n)
c. is Big Oh (n).
I couldn't understand the meaning so searched for some youtube videos. What I learnt from them was Big Omega is the representation of best case scenario. Big Oh is the representation of the worst case scenario. Big theta is the representation of the average case scenario. I know what these cases are. However I can't understand what the book tried to mean by those three categories and how are they related to the three case scenarios.
Big Theta is a mathematician's approximation, answering the question "how well will it scale"; as the problem gets bigger and bigger, how much more time/memory will the function take? It strips away any constant factors and any terms that become less significant as the problem gets bigger, to leave just one simple expression.
Big Oh is a variant where you're giving "no worse than"; this is sometimes easier to work out, and it errs on the side of being conservative, so it's still useful.
Big Omega is the opposite variant where you're giving "no better than". This is also sometimes useful, but less often.

Why do we ignore co-efficients in Big O notation?

While searching for answers relating to "Big O" notation, I have seen many SO answers such as this, this, or this, but still I have not clearly understood some points.
Why do we ignore the co-efficients?
For example this answer says that the final complexity of 2N + 2 is O(N); we remove the leading co-efficient 2 and the final constant 2 as well.
Removing the final constant of 2 perhaps understandable. After all, N may be very large and so "forgetting" the final 2 may only change the grand total by a small percentage.
However I cannot clearly understand how removing the leading co-efficient does not make difference. If the leading 2 above became a 1 or a 3, the percentage change to the grand total would be large.
Similarly, apparently 2N^3 + 99N^2 + 500 is O(N^3). How do we ignore the 99N^2 along with the 500?
The purpose of the Big-O notation is to find what is the dominant factor in the asymptotic behavior of a function as the value tends towards the infinity.
As we walk through the function domain, some factors become more important than others.
Imagine f(n) = n^3+n^2. As n goes to infinity, n^2 becomes less and less relevant when compared with n^3.
But that's just the intuition behind the definition. In practice we ignore some portions of the function because of the formal definition:
f(x) = O(g(x)) as x->infinity
if and only if there is a positive real M and a real x_0 such as
|f(x)| <= M|g(x)| for all x > x_0.
That's in wikipedia. What that actually means is that there is a point (after x_0) after which some multiple of g(x) dominates f(x). That definition acts like a loose upper bound on the value of f(x).
From that we can derive many other properties, like f(x)+K = O(f(x)), f(x^n+x^n-1)=O(x^n), etc. It's just a matter of using the definition to prove those.
In special, the intuition behind removing the coefficient (K*f(x) = O(f(x))) lies in what we try to measure with computational complexity. Ultimately it's all about time (or any resource, actually). But it's hard to know how much time each operation take. One algorithm may perform 2n operations and the other n, but the latter may have a large constant time associated with it. So, for this purpose, isn't easy to reason about the difference between n and 2n.
From a (complexity) theory point of view, the coefficients represent hardware details that we can ignore. Specifically, the Linear Speedup Theorem dictates that for any problem we can always throw an exponentially increasing amount of hardware (money) at a computer to get a linear boost in speed.
Therefore, modulo expensive hardware purchases two algorithms that solve the same problem, one at twice the speed of the other for all input sizes, are considered essentially the same.
Big-O (Landau) notation has its origins independently in number theory, where one of its uses is to create a kind of equivalence between functions: if a given function is bounded above by another and simultaneously is bounded below by a scaled version of that same other function, then the two functions are essentially the same from an asymptotic point of view. The definition of Big-O (actually, "Big-Theta") captures this situation: the "Big-O" (Theta) of the two functions are exactly equal.
The fact that Big-O notation allows us to disregard the leading constant when comparing the growth of functions makes Big-O an ideal vehicle to measure various qualities of algorithms while respecting (ignoring) the "freebie" optimizations offered by the Linear Speedup Theorem.
Big O provides a good estimate of what algorithms are more efficient for larger inputs, all things being equal; this is why for an algorithm with an n^3 and an n^2 factor we ignore the n^2 factor, because even if the n^2 factor has a large constant it will eventually be dominated by the n^3 factor.
However, real algorithms incorporate more than simple Big O analysis, for example a sorting algorithm will often start with a O(n * log(n)) partitioning algorithm like quicksort or mergesort, and when the partitions become small enough the algorithm will switch to a simpler O(n^2) algorithm like insertionsort - for small inputs insertionsort is generally faster, although a basic Big O analysis doesn't reveal this.
The constant factors often aren't very interesting, and so they're omitted - certainly a difference in factors on the order of 1000 is interesting, but usually the difference in factors are smaller, and then there are many more constant factors to consider that may dominate the algorithms' constants. Let's say I've got two algorithms, the first with running time 3*n and the second with running time 2*n, each with comparable space complexity. This analysis assumes uniform memory access; what if the first algorithm interacts better with the cache, and this more than makes up for the worse constant factor? What if more compiler optimizations can be applied to it, or it behaves better with the memory management subsystem, or requires less expensive IO (e.g. fewer disk seeks or fewer database joins or whatever) and so on? The constant factor for the algorithm is relevant, but there are many more constants that need to be considered. Often the easiest way to determine which algorithm is best is just to run them both on some sample inputs and time the results; over-relying on the algorithms' constant factors would hide this step.
An other thing is that, what I have understood, the complexity of 2N^3 + 99N^2 + 500 will be O(N^3). So how do we ignore/remove 99N^2 portion even? Will it not make difference when let's say N is one miilion?
That's right, in that case the 99N^2 term is far overshadowed by the 2N^3 term. The point where they cross is at N=49.5, much less than one million.
But you bring up a good point. Asymptotic computational complexity analysis is in fact often criticized for ignoring constant factors that can make a huge difference in real-world applications. However, big-O is still a useful tool for capturing the efficiency of an algorithm in a few syllables. It's often the case that an n^2 algorithm will be faster in real life than an n^3 algorithm for nontrivial n, and it's almost always the case that a log(n) algorithm will be much faster than an n^2 algorithm.
In addition to being a handy yardstick for approximating practical efficiency, it's also an important tool for the theoretical analysis of algorithm complexity. Many useful properties arise from the composability of polynomials - this makes sense because nested looping is fundamental to computation, and those correspond to polynomial numbers of steps. Using asymptotic complexity analysis, you can prove a rich set of relationships between different categories of algorithms, and that teaches us things about exactly how efficiently certain problems can be solved.
Big O notation is not an absolute measure of complexity.
Rather it is a designation of how complexity will change as the variable changes. In other words as N increases the complexity will increase
Big O(f(N)).
To explain why terms are not included we look at how fast the terms increase.
So, Big O(2n+2) has two terms 2n and 2. Looking at the rate of increase
Big O(2) this term will never increase it does not contribute to the rate of increase at all so it goes away. Also since 2n increases faster than 2, the 2 turns into noise as n gets very large.
Similarly Big O(2n^3 + 99n^2) compares Big O(2n^3) and Big O(99n^2). For small values, say n < 50, the 99n^2 will contribute a larger nominal percentage than 2n^3. However if n gets very large, say 1000000, then 99n^2 although nominally large it is insignificant (close to 1 millionth) compared to the size of 2n^3.
As a consequence Big O(n^i) < Big O(n^(i+1)).
Coefficients are removed because of the mathematical definition of Big O.
To simplify the definition says Big O(f(n)) = Big O(f(cn)) for a constant c. This needs to be taken on faith because the reason for this is purely mathematical, and as such the proof would be too complex and dry to explain in simple terms.
The mathematical reason:
The real reason why we do this, is the way Big O-Notation is defined:
A series (or lets use the word function) f(n) is in O(g(n)) when the series f(n)/g(n) is bounded. Example:
f(n)= 2*n^2
g(n)= n^2
f(n) is in O(g(n)) because (2*n^2)/(n^2) = 2 as n approaches Infinity. The term (2*n^2)/(n^2) doesn't become infinitely large (its always 2), so the quotient is bounded and thus 2*n^2 is in O(n^2).
Another one:
f(n) = n^2
g(n) = n
The term n^2/n (= n) becomes infinetely large, as n goes to infinity, so n^2 is not in O(n).
The same principle applies, when you have
f(n) = n^2 + 2*n + 20
g(n) = n^2
(n^2 + 2*n + 20)/(n^2) is also bounded, because it tends to 1, as n goes to infinity.
Big-O Notation basically describes, that your function f(n) is (from some value of n on to infinity) smaller than a function g(n), multiplied by a constant. With the previous example:
2*n^2 is in O(n^2), because we can find a value C, so that 2*n^2 is smaller than C*n^2. In this example we can pick C to be 5 or 10, for example, and the condition will be satisfied.
So what do you get out of this? If you know your algorithm has complexity O(10^n) and you input a list of 4 numbers, it may take only a short time. If you input 10 numbers, it will take a million times longer! If it's one million times longer or 5 million times longer doesn't really matter here. You can always use 5 more computers for it and have it run in the same amount of time, the real problem here is, that it scales incredibly bad with input size.
For practical applications the constants does matter, so O(2 n^3) will be better than O(1000 n^2) for inputs with n smaller than 500.
There are two main ideas here: 1) If your algorithm should be great for any input, it should have a low time complexity, and 2) that n^3 grows so much faster than n^2, that perfering n^3 over n^2 almost never makes sense.

Algorithms bounds and functions describing algorithm

ok so basically i need to understand how i can compare functions so that i can find big O big theta and big omega for algorithms of a program
my mathematics background is not very strong but i have the foundations down
and my question is
is there a mathematical way to find where two functions will intersect and eventually one dominate the other from some point n
for example if i have a function
2n^2 and 64nlog(n) [with log to base 2]
how can i find at which values of n, 2n^2 will upper bound( hope i used the correct term here) 64nlog(n) and also how to apply this to any other function
is it just simply guess work?
You just need to find the intercepts of the two functions. So set them equal to each other and solve for n. Or just plot them and get a rough idea of the intercepts so you know at what size of data you should switch between each algorithm. Also, Wolfram Alpha has useful function solving and plotting tools too, just in case you are a bit rusty on math.
Big O, Big Theta, and Big Omega are about general sorts of growth patterns for algorithms. Each algorithm can be implemented an infinite number of ways. Each implementation may have a specific execution time, which relates to the Big O of the algorithm. You can compare the execution-time-per-input-size of two implementations of an algorithm, but not for an algorithm by itself. Big O, Big Theta, and Big Omega say next-to-nothing about the execution-time of an algorithm. As such, you cannot even guess a specific intersection point of where one algorithm becomes faster, because such a concept makes no sense. We can discuss intersection points in theory, but not in detail, because it has no value for algorithms, only implementations.
It's also important to note that Big O (and similar) have no constant factors, like your functions do.
http://en.wikipedia.org/wiki/Big_O_notation
You can divide out an n on both sides, and a common factor 2, and then solve this:
32 log(n) <= c n for n >= n_0
Let's say c = 32, because then it's log(n) <= n (which looks nice), and n_0 can be 1.
But this sort of thing has already been done. Unless you are in CS class, you can just say that O(n log n) is a subset of O(n2) without proving it, it's well-known anyway.
For big-O, big-Theta, and big-Omega, you don't have to find the intersection point. You just have to prove that one function eventually dominates the other (up to constant factors). One useful tool for this for example is L'Hopital's rule: Take the ratio of the two running time functions and compute the limit as n goes to infinity, and if the limit is infinity/infinity then you can apply L'Hopital's rule to get that the limit of the ratio is equal to the limit of the ratio of derivatives, and sometimes get a solution. For example this shows that log(n) = O(n^c) for any c > 0.

Why is the constant always dropped from big O analysis?

I'm trying to understand a particular aspect of Big O analysis in the context of running programs on a PC.
Suppose I have an algorithm that has a performance of O(n + 2). Here if n gets really large the 2 becomes insignificant. In this case it's perfectly clear the real performance is O(n).
However, say another algorithm has an average performance of O(n2 / 2). The book where I saw this example says the real performance is O(n2). I'm not sure I get why, I mean the 2 in this case seems not completely insignificant. So I was looking for a nice clear explanation from the book. The book explains it this way:
"Consider though what the 1/2 means. The actual time to check each value
is highly dependent on the machine instruction that the code
translates to and then on the speed at which the CPU can execute the instructions. Therefore the 1/2 doesn't mean very much."
And my reaction is... huh? I literally have no clue what that says or more precisely what that statement has to do with their conclusion. Can somebody spell it out for me please.
Thanks for any help.
There's a distinction between "are these constants meaningful or relevant?" and "does big-O notation care about them?" The answer to that second question is "no," while the answer to that first question is "absolutely!"
Big-O notation doesn't care about constants because big-O notation only describes the long-term growth rate of functions, rather than their absolute magnitudes. Multiplying a function by a constant only influences its growth rate by a constant amount, so linear functions still grow linearly, logarithmic functions still grow logarithmically, exponential functions still grow exponentially, etc. Since these categories aren't affected by constants, it doesn't matter that we drop the constants.
That said, those constants are absolutely significant! A function whose runtime is 10100n will be way slower than a function whose runtime is just n. A function whose runtime is n2 / 2 will be faster than a function whose runtime is just n2. The fact that the first two functions are both O(n) and the second two are O(n2) doesn't change the fact that they don't run in the same amount of time, since that's not what big-O notation is designed for. O notation is good for determining whether in the long term one function will be bigger than another. Even though 10100n is a colossally huge value for any n > 0, that function is O(n) and so for large enough n eventually it will beat the function whose runtime is n2 / 2 because that function is O(n2).
In summary - since big-O only talks about relative classes of growth rates, it ignores the constant factor. However, those constants are absolutely significant; they just aren't relevant to an asymptotic analysis.
Big O notation is most commonly used to describe an algorithm's running time. In this context, I would argue that specific constant values are essentially meaningless. Imagine the following conversation:
Alice: What is the running time of your algorithm?
Bob: 7n2
Alice: What do you mean by 7n2?
What are the units? Microseconds? Milliseconds? Nanoseconds?
What CPU are you running it on? Intel i9-9900K? Qualcomm Snapdragon 845? (Or are you using a GPU, an FPGA, or other hardware?)
What type of RAM are you using?
What programming language did you implement the algorithm in? What is the source code?
What compiler / VM are you using? What flags are you passing to the compiler / VM?
What is the operating system?
etc.
So as you can see, any attempt to indicate a specific constant value is inherently problematic. But once we set aside constant factors, we are able to clearly describe an algorithm's running time. Big O notation gives us a robust and useful description of how long an algorithm takes, while abstracting away from the technical features of its implementation and execution.
Now it is possible to specify the constant factor when describing the number of operations (suitably defined) or CPU instructions an algorithm executes, the number of comparisons a sorting algorithm performs, and so forth. But typically, what we're really interested in is the running time.
None of this is meant to suggest that the real-world performance characteristics of an algorithm are unimportant. For example, if you need an algorithm for matrix multiplication, the Coppersmith-Winograd algorithm is inadvisable. It's true that this algorithm takes O(n2.376) time, whereas the Strassen algorithm, its strongest competitor, takes O(n2.808) time. However, according to Wikipedia, Coppersmith-Winograd is slow in practice, and "it only provides an advantage for matrices so large that they cannot be processed by modern hardware." This is usually explained by saying that the constant factor for Coppersmith-Winograd is very large. But to reiterate, if we're talking about the running time of Coppersmith-Winograd, it doesn't make sense to give a specific number for the constant factor.
Despite its limitations, big O notation is a pretty good measure of running time. And in many cases, it tells us which algorithms are fastest for sufficiently large input sizes, before we even write a single line of code.
Big-O notation only describes the growth rate of algorithms in terms of mathematical function, rather than the actual running time of algorithms on some machine.
Mathematically, Let f(x) and g(x) be positive for x sufficiently large.
We say that f(x) and g(x) grow at the same rate as x tends to infinity, if
now let f(x)=x^2 and g(x)=x^2/2, then lim(x->infinity)f(x)/g(x)=2. so x^2 and x^2/2 both have same growth rate.so we can say O(x^2/2)=O(x^2).
As templatetypedef said, hidden constants in asymptotic notations are absolutely significant.As an example :marge sort runs in O(nlogn) worst-case time and insertion sort runs in O(n^2) worst case time.But as the hidden constant factors in insertion sort is smaller than that of marge sort, in practice insertion sort can be faster than marge sort for small problem sizes on many machines.
You are completely right that constants matter. In comparing many different algorithms for the same problem, the O numbers without constants give you an overview of how they compare to each other. If you then have two algorithms in the same O class, you would compare them using the constants involved.
But even for different O classes the constants are important. For instance, for multidigit or big integer multiplication, the naive algorithm is O(n^2), Karatsuba is O(n^log_2(3)), Toom-Cook O(n^log_3(5)) and Schönhage-Strassen O(n*log(n)*log(log(n))). However, each of the faster algorithms has an increasingly large overhead reflected in large constants. So to get approximate cross-over points, one needs valid estimates of those constants. Thus one gets, as SWAG, that up to n=16 the naive multiplication is fastest, up to n=50 Karatsuba and the cross-over from Toom-Cook to Schönhage-Strassen happens for n=200.
In reality, the cross-over points not only depend on the constants, but also on processor-caching and other hardware-related issues.
Big O without constant is enough for algorithm analysis.
First, the actual time does not only depend how many instructions but also the time for each instruction, which is closely connected to the platform where the code runs. It is more than theory analysis. So the constant is not necessary for most case.
Second, Big O is mainly used to measure how the run time will increase as the problem becomes larger or how the run time decrease as the performance of hardware improved.
Third, for situations of high performance optimizing, constant will also be taken into consideration.
The time required to do a particular task in computers now a days does not required a large amount of time unless the value entered is very large.
Suppose we wants to multiply 2 matrices of size 10*10 we will not have problem unless we wants to do this operation multiple times and then the role of asymptotic notations becomes prevalent and when the value of n becomes very big then the constants don't really makes any difference to the answer and are almost negligible so we tend to leave them while calculating the complexity.
Time complexity for O(n+n) reduces to O(2n). Now 2 is a constant. So the time complexity will essentially depend on n.
Hence the time complexity of O(2n) equates to O(n).
Also if there is something like this O(2n + 3) it will still be O(n) as essentially the time will depend on the size of n.
Now suppose there is a code which is O(n^2 + n), it will be O(n^2) as when the value of n increases the effect of n will become less significant compared to effect of n^2.
Eg:
n = 2 => 4 + 2 = 6
n = 100 => 10000 + 100 => 10100
n = 10000 => 100000000 + 10000 => 100010000
As you can see the effect of the second expression as lesser effect as the value of n keeps increasing. Hence the time complexity evaluates to O(n^2).

Big-oh vs big-theta [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What is the difference between Θ(n) and O(n)?
It seems to me like when people talk about algorithm complexity informally, they talk about big-oh. But in formal situations, I often see big-theta with the occasional big-oh thrown in.
I know mathematically what the difference is between the two, but in English, in what situation would using big-oh when you mean big-theta be incorrect, or vice versa (an example algorithm would be appreciated)?
Bonus: why do people seemingly always use big-oh when talking informally?
Big-O is an upper bound.
Big-Theta is a tight bound, i.e. upper and lower bound.
When people only worry about what's the worst that can happen, big-O is sufficient; i.e. it says that "it can't get much worse than this". The tighter the bound the better, of course, but a tight bound isn't always easy to compute.
See also
Wikipedia/Big O Notation
Related questions
What is the difference between Θ(n) and O(n)?
The following quote from Wikipedia also sheds some light:
Informally, especially in computer science, the Big O notation often is
permitted to be somewhat abused to describe an asymptotic tight bound
where using Big Theta notation might be more factually appropriate in a
given context.
For example, when considering a function T(n) = 73n3+ 22n2+ 58, all of the following are generally acceptable, but tightness of bound (i.e., bullets 2 and 3 below) are usually strongly preferred over laxness of bound (i.e., bullet 1
below).
T(n) = O(n100), which is identical to T(n) ∈ O(n100)
T(n) = O(n3), which is identical to T(n) ∈ O(n3)
T(n) = Θ(n3), which is identical to T(n) ∈ Θ(n3)
The equivalent English statements are respectively:
T(n) grows asymptotically no faster than n100
T(n) grows asymptotically no faster than n3
T(n) grows asymptotically as fast as n3.
So while all three statements are true, progressively more information is contained in
each. In some fields, however, the Big O notation (bullets number 2 in the lists above)
would be used more commonly than the Big Theta notation (bullets number 3 in the
lists above) because functions that grow more slowly are more desirable.
I'm a mathematician and I have seen and needed big-O O(n), big-Theta Θ(n), and big-Omega Ω(n) notation time and again, and not just for complexity of algorithms. As people said, big-Theta is a two-sided bound. Strictly speaking, you should use it when you want to explain that that is how well an algorithm can do, and that either that algorithm can't do better or that no algorithm can do better. For instance, if you say "Sorting requires Θ(n(log n)) comparisons for worst-case input", then you're explaining that there is a sorting algorithm that uses O(n(log n)) comparisons for any input; and that for every sorting algorithm, there is an input that forces it to make Ω(n(log n)) comparisons.
Now, one narrow reason that people use O instead of Ω is to drop disclaimers about worst or average cases. If you say "sorting requires O(n(log n)) comparisons", then the statement still holds true for favorable input. Another narrow reason is that even if one algorithm to do X takes time Θ(f(n)), another algorithm might do better, so you can only say that the complexity of X itself is O(f(n)).
However, there is a broader reason that people informally use O. At a human level, it's a pain to always make two-sided statements when the converse side is "obvious" from context. Since I'm a mathematician, I would ideally always be careful to say "I will take an umbrella if and only if it rains" or "I can juggle 4 balls but not 5", instead of "I will take an umbrella if it rains" or "I can juggle 4 balls". But the other halves of such statements are often obviously intended or obviously not intended. It's just human nature to be sloppy about the obvious. It's confusing to split hairs.
Unfortunately, in a rigorous area such as math or theory of algorithms, it's also confusing not to split hairs. People will inevitably say O when they should have said Ω or Θ. Skipping details because they're "obvious" always leads to misunderstandings. There is no solution for that.
Because my keyboard has an O key.
It does not have a Θ or an Ω key.
I suspect most people are similarly lazy and use O when they mean Θ because it's easier to type.
One reason why big O gets used so much is kind of because it gets used so much. A lot of people see the notation and think they know what it means, then use it (wrongly) themselves. This happens a lot with programmers whose formal education only went so far - I was once guilty myself.
Another is because it's easier to type a big O on most non-Greek keyboards than a big theta.
But I think a lot is because of a kind of paranoia. I worked in defence-related programming for a bit (and knew very little about algorithm analysis at the time). In that scenario, the worst case performance is always what people are interested in, because that worst case might just happen at the wrong time. It doesn't matter if the actually probability of that happening is e.g. far less than the probability of all members of a ships crew suffering a sudden fluke heart attack at the same moment - it could still happen.
Though of course a lot of algorithms have their worst case in very common circumstances - the classic example being inserting in-order into a binary tree to get what's effectively a singly-linked list. A "real" assessment of average performance needs to take into account the relative frequency of different kinds of input.
Bonus: why do people seemingly always use big-oh when talking informally?
Because in big-oh, this loop:
for i = 1 to n do
something in O(1) that doesn't change n and i and isn't a jump
is O(n), O(n^2), O(n^3), O(n^1423424). big-oh is just an upper bound, which makes it easier to calculate because you don't have to find a tight bound.
The above loop is only big-theta(n) however.
What's the complexity of the sieve of eratosthenes? If you said O(n log n) you wouldn't be wrong, but it wouldn't be the best answer either. If you said big-theta(n log n), you would be wrong.
Because there are algorithms whose best-case is quick, and thus it's technically a big O, not a big Theta.
Big O is an upper bound, big Theta is an equivalence relation.
There are a lot of good answers here but I noticed something was missing. Most answers seem to be implying that the reason why people use Big O over Big Theta is a difficulty issue, and in some cases this may be true. Often a proof that leads to a Big Theta result is far more involved than one that results in Big O. This usually holds true, but I do not believe this has a large relation to using one analysis over the other.
When talking about complexity we can say many things. Big O time complexity is just telling us what an algorithm is guarantied to run within, an upper bound. Big Omega is far less often discussed and tells us the minimum time an algorithm is guarantied to run, a lower bound. Now Big Theta tells us that both of these numbers are in fact the same for a given analysis. This tells us that the application has a very strict run time, that can only deviate by a value asymptoticly less than our complexity. Many algorithms simply do not have upper and lower bounds that happen to be asymptoticly equivalent.
So as to your question using Big O in place of Big Theta would technically always be valid, while using Big Theta in place of Big O would only be valid when Big O and Big Omega happened to be equal. For instance insertion sort has a time complexity of Big О at n^2, but its best case scenario puts its Big Omega at n. In this case it would not be correct to say that its time complexity is Big Theta of n or n^2 as they are two different bounds and should be treated as such.
I have seen Big Theta, and I'm pretty sure I was taught the difference in school. I had to look it up though. This is what Wikipedia says:
Big O is the most commonly used asymptotic notation for comparing functions, although in many cases Big O may be replaced with Big Theta Θ for asymptotically tighter bounds.
Source: Big O Notation#Related asymptotic notation
I don't know why people use Big-O when talking formally. Maybe it's because most people are more familiar with Big-O than Big-Theta? I had forgotten that Big-Theta even existed until you reminded me. Although now that my memory is refreshed, I may end up using it in conversation. :)

Resources