Why isn't every algorithm O(1)? - algorithm

If we have a random array of integers with size n such that we need to calculate the summation.
Claim: The best algorithm does that in O(n)
But, I claim we can do this in O(1). why?
We know for sure that n is locked in some field (since it's int and int is finite) which means I can sum all elements in less than 2,147,483,647 steps?

Why isn't every algorithm O(1)?
Because we choose to ignore the limitations (i.e. assume resources to be unlimited) of concrete computers when we analyse complexity of algorithms.
Asymptotic complexity gives us useful information about how the complexity grows. Merely concluding that there is a constant limit because of the hardware and ignoring how the limit is reached doesn't give us valuable information.
Besides, the limit that you perceive is much, much higher in reality. Computers aren't limited to representing at most 2'147'483'647 integer values. Using complex data-structures, computers can represent arbitrarily large numers - until memory runs out... but memory can be streamed from the disk. And there are data centers that easily provide hundreds of Tera Bytes of storage.
Although, to be fair: If we allow arbitrary length integers, then the complexity of the sum is worse than linear because even a single addition has linear complexity.
Once we take concrete hardware into consideration, it makes more sense to choose to use concrete units in the analysis: How many seconds does the program take for some concrete input on this particular hardware? And the way to find the answer is not math, but concrete measurement.

Why isn't every algorithm O(1)?
TL;DR: Because the Big O notation is used to quantify an algorithm, with regards of how it behaves with an increment of its input.
Informally, you can think about it as a framework that humans have invented to quantify classes of algorithms. If such framework would yield for every algorithm O(1), it would have defeated its own purpose in the first place i.e, quantify classes of algorithms.
More detailed answer
Let us start by clarifying what is Big O notation in the current context. From (source) one can read:
Big O notation is a mathematical notation that describes the limiting
behavior of a function when the argument tends towards a particular
value or infinity. (..) In computer science, big O notation is used to classify algorithms
according to how their run time or space requirements grow as the
input size grows.
The following statement is not accurate:
But, I claim we can do this in O(1). why? we know for sure that n is locked in some field (since it's int and int is finite) which means I can sum all elements in less than 2,147,483,647 steps?
One cannot simply perform "O(2,147,483,647)" and claim that "O(1)" since the Big O notation does not represent a function but rather a set of functions with a certain asymptotic upper-bound; as one can read from source:
Big O notation characterizes functions according to their growth
rates: different functions with the same growth rate may be
represented using the same O notation.
Informally, in computer-science time-complexity and space-complexity theories, one can think of the Big O notation as a categorization of algorithms with a certain worst-case scenario concerning time and space, respectively. For instance, O(n):
An algorithm is said to take linear time/space, or O(n) time/space, if its time/space complexity is O(n). Informally, this means that the running time/space increases at most linearly with the size of the input (source).
So the complexity is O(n) because with an increment of the input the complexity grows linear and not constant.

A theory where all algorithms have a complexity O(1) would be of little use, acknowledge that.
In the theory of complexity, N is an unbounded variable so that you do obtain a non-trivial asymptotic bound.
For practical algorithms, the asymptotic complexity is useful in that is does model the exact complexity for moderate values of N (well below the largest int).
In some pathological cases (such as the most sophisticated matrix multiplication algorithms), N must exceed the capacity of an int before the asymptotic behavior becomes beneficial.
Side remark:
I recall of a paper claiming O(1) time for a certain operation. That operation was involving an histogram of pixel values, the range of which is indeed constant. But this was a misleading trick, because the hidden constant was proportional to 256 for an 8 bits image, and 65536 for 16 bits, making the algorithm dead slow. Claiming O(H) where H is the number of bins would have been more informative and more honest.

You chooses what is N and in what runtime dependency you are interested. You can say:
"Well computers have limited ressources, so I simply consider those limits as constant factors and now all my algorithms are O(1)"
However, this view is of limited use, because we want to use complexity to classify algorithms to see which performs better and which worse, and a classification that puts all algorithms in the same bucket does not help with that.

Integers are countable infinite by definition. As such you can not proof termination on basis of n. If you redefine integers as an bounded interval of countable numbers you can claim O(1) if and only if n is such an integer literal.
IMO: The useful part of O-notation is the information about time complexity in relation to input. In a scenario where my input is bounded I just focus on the behavior within the bounds. And that is O(n) in this case. You can claim O(1) but this strips it of information.

Related

Why big-Oh is not always a worst case analysis of an algorithm?

I am trying to learn analysis of algorithms and I am stuck with relation between asymptotic notation(big O...) and cases(best, worst and average).
I learn that the Big O notation defines an upper bound of an algorithm, i.e. it defines function can not grow more than its upper bound.
At first it sound to me as it calculates the worst case.
I google about(why worst case is not big O?) and got ample of answers which were not so simple to understand for beginner.
I concluded it as follows:
Big O is not always used to represent worst case analysis of algorithm because, suppose a algorithm which takes O(n) execution steps for best, average and worst input then it's best, average and worst case can be expressed as O(n).
Please tell me if I am correct or I am missing something as I don't have anyone to validate my understanding.
Please suggest a better example to understand why Big O is not always worst case.
Big-O?
First let us see what Big O formally means:
In computer science, big O notation is used to classify algorithms
according to how their running time or space requirements grow as the
input size grows.
This means that, Big O notation characterizes functions according to their growth rates: different functions with the same growth rate may be represented using the same O notation. Here, O means order of the function, and it only provides an upper bound on the growth rate of the function.
Now let us look at the rules of Big O:
If f(x) is a sum of several terms, if there is one with largest
growth rate, it can be kept, and all others omitted
If f(x) is a product of several factors, any constants (terms in the
product that do not depend on x) can be omitted.
Example:
f(x) = 6x^4 − 2x^3 + 5
Using the 1st rule we can write it as, f(x) = 6x^4
Using the 2nd rule it will give us, O(x^4)
What is Worst Case?
Worst case analysis gives the maximum number of basic operations that
have to be performed during execution of the algorithm. It assumes
that the input is in the worst possible state and maximum work has to
be done to put things right.
For example, for a sorting algorithm which aims to sort an array in ascending order, the worst case occurs when the input array is in descending order. In this case maximum number of basic operations (comparisons and assignments) have to be done to set the array in ascending order.
It depends on a lot of things like:
CPU (time) usage
memory usage
disk usage
network usage
What's the difference?
Big-O is often used to make statements about functions that measure the worst case behavior of an algorithm, but big-O notation doesn’t imply anything of the sort.
The important point here is we're talking in terms of growth, not number of operations. However, with algorithms we do talk about the number of operations relative to the input size.
Big-O is used for making statements about functions. The functions can measure time or space or cache misses or rabbits on an island or anything or nothing. Big-O notation doesn’t care.
In fact, when used for algorithms, big-O is almost never about time. It is about primitive operations.
When someone says that the time complexity of MergeSort is O(nlogn), they usually mean that the number of comparisons that MergeSort makes is O(nlogn). That in itself doesn’t tell us what the time complexity of any particular MergeSort might be because that would depend how much time it takes to make a comparison. In other words, the O(nlogn) refers to comparisons as the primitive operation.
The important point here is that when big-O is applied to algorithms, there is always an underlying model of computation. The claim that the time complexity of MergeSort is O(nlogn), is implicitly referencing an model of computation where a comparison takes constant time and everything else is free.
Example -
If we are sorting strings that are kk bytes long, we might take “read a byte” as a primitive operation that takes constant time with everything else being free.
In this model, MergeSort makes O(nlogn) string comparisons each of which makes O(k) byte comparisons, so the time complexity is O(k⋅nlogn). One common implementation of RadixSort will make k passes over the n strings with each pass reading one byte, and so has time complexity O(nk).
The two are not the same thing.  Worst-case analysis as other have said is identifying instances for which the algorithm takes the longest to complete (i.e., takes the most number of steps), then formulating a growth function using this.  One can analyze the worst-case time complexity using Big-Oh, or even other variants such as Big-Omega and Big-Theta (in fact, Big-Theta is usually what you want, though often Big-Oh is used for ease of comprehension by those not as much into theory).  One important detail and why worst-case analysis is useful is that the algorithm will run no slower than it does in the worst case.  Worst-case analysis is a method of analysis we use in analyzing algorithms.
Big-Oh itself is an asymptotic measure of a growth function; this can be totally independent as people can use Big-Oh to not even measure an algorithm's time complexity; its origins stem from Number Theory.  You are correct to say it is the asymptotic upper bound of a growth function; but the manner you prescribe and construct the growth function comes from your analysis.  The Big-Oh of a growth function itself means little to nothing without context as it only says something about the function you are analyzing.  Keep in mind there can be infinitely many algorithms that could be constructed that share the same time complexity (by the definition of Big-Oh, Big-Oh is a set of growth functions).
In short, worst-case analysis is how you build your growth function, Big-Oh notation is one method of analyzing said growth function.  Then, we can compare that result against other worst-case time complexities of competing algorithms for a given problem.  Worst-case analysis if done correctly yields the worst-case running time if done exactly (you can cut a lot of corners and still get the correct asymptotics if you use a barometer), and using this growth function yields the worst-case time complexity of the algorithm.  Big-Oh alone doesn't guarantee the worst-case time complexity as you had to make the growth function itself.  For instance, I could utilize Big-Oh notation for any other kind of analysis (e.g., best case, average case).  It really depends on what you're trying to capture.  For instance, Big-Omega is great for lower bounds.
Imagine a hypothetical algorithm that in best case only needs to do 1 step, in the worst case needs to do n2 steps, but in average (expected) case, only needs to do n steps. With n being the input size.
For each of these 3 cases you could calculate a function that describes the time complexity of this algorithm.
1 Best case has O(1) because the function f(x)=1 is really the highest we can go, but also the lowest we can go in this case, omega(1). Since Omega is equal to O (the upper bound and lower bound), we state that this function, in the best case, behaves like theta(1).
2 We could do the same analysis for the worst case and figure out that O(n2 ) = omega(n2 ) =theta(n2 ).
3 Same counts for the average case but with theta( n ).
So in theory you could determine 3 cases of an algorithm and for those 3 cases calculate the lower/upper/thight bounds. I hope this clears things up a bit.
https://www.google.co.in/amp/s/amp.reddit.com/r/learnprogramming/comments/3qtgsh/how_is_big_o_not_the_same_as_worst_case_or_big/
Big O notation shows how an algorithm grows with respect to input size. It says nothing of which algorithm is faster because it doesn't account for constant set up time (which can dominate if you have small input sizes). So when you say
which takes O(n) execution steps
this almost doesn't mean anything. Big O doesn't say how many execution steps there are. There are C + O(n) steps (where C is a constant) and this algorithm grows at rate n depending on input size.
Big O can be used for best, worst, or average cases. Let's take sorting as an example. Bubble sort is a naive O(n^2) sorting algorithm, but when the list is sorted it takes O(n). Quicksort is often used for sorting (the GNU standard C library uses it with some modifications). It preforms at O(n log n), however this is only true if the pivot chosen splits the array in to two equal sized pieces (on average). In the worst case we get an empty array one side of the pivot and Quicksort performs at O(n^2).
As Big O shows how an algorithm grows with respect to size, you can look at any aspect of an algorithm. Its best case, average case, worst case in both time and/or memory usage. And it tells you how these grow when the input size grows - but it doesn't say which is faster.
If you deal with small sizes then Big O won't matter - but an analysis can tell you how things will go when your input sizes increase.
One example of where the worst case might not be the asymptotic limit: suppose you have an algorithm that works on the set difference between some set and the input. It might run in O(N) time, but get faster as the input gets larger and knocks more values out of the working set.
Or, to get more abstract, f(x) = 1/x for x > 0 is a decreasing O(1) function.
I'll focus on time as a fairly common item of interest, but Big-O can also be used to evaluate resource requirements such as memory. It's essential for you to realize that Big-O tells how the runtime or resource requirements of a problem scale (asymptotically) as the problem size increases. It does not give you a prediction of the actual time required. Predicting the actual runtimes would require us to know the constants and lower order terms in the prediction formula, which are dependent on the hardware, operating system, language, compiler, etc. Using Big-O allows us to discuss algorithm behaviors while sidestepping all of those dependencies.
Let's talk about how to interpret Big-O scalability using a few examples. If a problem is O(1), it takes the same amount of time regardless of the problem size. That may be a nanosecond or a thousand seconds, but in the limit doubling or tripling the size of the problem does not change the time. If a problem is O(n), then doubling or tripling the problem size will (asymptotically) double or triple the amounts of time required, respectively. If a problem is O(n^2), then doubling or tripling the problem size will (asymptotically) take 4 or 9 times as long, respectively. And so on...
Lots of algorithms have different performance for their best, average, or worst cases. Sorting provides some fairly straightforward examples of how best, average, and worst case analyses may differ.
I'll assume that you know how insertion sort works. In the worst case, the list could be reverse ordered, in which case each pass has to move the value currently being considered as far to the left as possible, for all items. That yields O(n^2) behavior. Doubling the list size will take four times as long. More likely, the list of inputs is in randomized order. In that case, on average each item has to move half the distance towards the front of the list. That's less than in the worst case, but only by a constant. It's still O(n^2), so sorting a randomized list that's twice as large as our first randomized list will quadruple the amount of time required, on average. It will be faster than the worst case (due to the constants involved), but it scales in the same way. The best case, however, is when the list is already sorted. In that case, you check each item to see if it needs to be slid towards the front, and immediately find the answer is "no," so after checking each of the n values you're done in O(n) time. Consequently, using insertion sort for an already ordered list that is twice the size only takes twice as long rather than four times as long.
You are right, in that you can say certainly say that an algorithm runs in O(f(n)) time in the best or average case. We do that all the time for, say, quicksort, which is O(N log N) on average, but only O(N^2) worst case.
Unless otherwise specified, however, when you say that an algorithm runs in O(f(n)) time, you are saying the algorithm runs in O(f(n)) time in the worst case. At least that's the way it should be. Sometimes people get sloppy, and you will often hear that a hash table is O(1) when in the worst case it is actually worse.
The other way in which a big O definition can fail to characterize the worst case is that it's an upper bound only. Any function in O(N) is also in O(N^2) and O(2^N), so we would be entirely correct to say that quicksort takes O(2^N) time. We just don't say that because it isn't useful to do so.
Big Theta and Big Omega are there to specify lower bounds and tight bounds respectively.
There are two "different" and most important tools:
the best, worst, and average-case complexity are for generating numerical function over the size of possible problem instances (e.g. f(x) = 2x^2 + 8x - 4) but it is very difficult to work precisely with these functions
big O notation extract the main point; "how efficient the algorithm is", it ignore a lot of non important things like constants and ... and give you a big picture

Can O(k * n) be considered as linear complexity (O(n))?

When talking about complexity in general, things like O(3n) tend to be simplified to O(n) and so on. This is merely theoretical, so how does complexity work in reality? Can O(3n) also be simplified to O(n)?
For example, if a task implies that solution must be in O(n) complexity and in our code we have 2 times linear search of an array, which is O(n) + O(n). So, in reality, would that solution be considered as linear complexity or not fast enough?
Note that this question is asking about real implementations, not theoretical. I'm already aware that O(n) + O(n) is simplified to O(n)?
Bear in mind that O(f(n)) does not give you the amount of real-world time that something takes: only the rate of growth as n grows. O(n) only indicates that if n doubles, the runtime doubles as well, which lumps functions together that take one second per iteration or one millennium per iteration.
For this reason, O(n) + O(n) and O(2n) are both equivalent to O(n), which is the set of functions of linear complexity, and which should be sufficient for your purposes.
Though an algorithm that takes arbitrary-sized inputs will often want the most optimal function as represented by O(f(n)), an algorithm that grows faster (e.g. O(n²)) may still be faster in practice, especially when the data set size n is limited or fixed in practice. However, learning to reason about O(f(n)) representations can help you compose algorithms to have a predictable—optimal for your use-case—upper bound.
Yes, as long as k is a constant, you can write O(kn) = O(n).
The intuition behind is that the constant k doesn't increase with the size of the input space and at some point will be incomparably small to n, so it doesn't have much influence on the overall complexity.
Yes - as long as the number k of array searches is not affected by the input size, even for inputs that are too big to be possible in practice, O(kn) = O(n). The main idea of the O notation is to emphasize how the computation time increases with the size of the input, and so constant factors that stay the same no matter how big the input is aren't of interest.
An example of an incorrect way to apply this is to say that you can perform selection sort in linear time because you can only fit about one billion numbers in memory, and so selection sort is merely one billion array searches. However, with an ideal computer with infinite memory, your algorithm would not be able to handle more than one billion numbers, and so it is not a correct sorting algorithm (algorithms must be able to handle arbitrarily large inputs unless you specify a limit as a part of the problem statement); it is merely a correct algorithm for sorting up to one billion numbers.
(As a matter of fact, once you put a limit on the input size, most algorithms will become constant-time because for all inputs within your limit, the algorithm will solve it using at most the amount of time that is required for the biggest / most difficult input.)

How to calculate O(log n) in big O notation?

I know that O(log n) refers to an iterative reduction by a fixed ratio of the problem set N (in big O notation), but how do i actually calculate it to see how many iterations an algorithm with a log N complexity would have to preform on the problem set N before it is done (has one element left)?
You can't. You don't calculate the exact number of iterations with BigO.
You can "derive" BigO when you have exact formula for number of iterations.
BigO just gives information how the number iterations grows with growing N, and only for "big" N.
Nothing more, nothing less. With this you can draw conclusions how much more operations/time will the algorithm take if you have some sample runs.
Expressed in the words of Tim Roughgarden at his courses on algorithms:
The big-Oh notation tries to provide a sweet spot for high level algorithm reasoning
That means it is intended to describe the relation between the algorithm time execution and the size of its input avoiding dependencies on the system architecture, programming language or chosen compiler.
Imagine that big-Oh notation could provide the exact execution time, that would mean that for any algorithm, for which you know its big-Oh time complexity function, you could predict how would it behave on any machine whatsoever.
On the other hand, it is centered on asymptotic behaviour. That is, its description is more accurate for big n values (that is why lower order terms of your algorithm time function are ignored in big-Oh notation). It can reasoned that low n values do not demand you to push foward trying to improve your algorithm performance.
Big O notation only shows an order of magnitude - not the actual number of operations that algorithm would perform. If you need to calculate exact number of loop iterations or elementary operations, you have to do it by hand. However in most practical purposes exact number is irrelevant - O(log n) tells you that num. of operations will raise logarythmically with a raise of n
From big O notation you can't tell precisely how many iteration will the algorithm do, it's just estimation. That means with small numbers the different between the log(n) and actual number of iterations could be differentiate significantly but the closer you get to infinity the different less significant.
If you make some assumptions, you can estimate the time up to a constant factor. The big assumption is that the limiting behavior as the size tends to infinity is the same as the actual behavior for the problem sizes you care about.
Under that assumption, the upper bound on the time for a size N problem is C*log(N) for some constant C. The constant will change depending on the base you use for calculating the logarithm. The base does not matter as long as you are consistent about it. If you have the measured time for one size, you can estimate C and use that to guesstimate the time for a different size.
For example, suppose a size 100 problem takes 20 seconds. Using common logarithms, C is 10. (The common log of 100 is 2). That suggests a size 1000 problem might take about 30 seconds, because the common log of 1000 is 3.
However, this is very rough. The approach is most useful for estimating whether an algorithm might be usable for a large problem. In that sort of situation, you also have to pay attention to memory size. Generally, setting up a problem will be at least linear in size, so its cost will grow faster than an O(log N) operation.

Why is the constant always dropped from big O analysis?

I'm trying to understand a particular aspect of Big O analysis in the context of running programs on a PC.
Suppose I have an algorithm that has a performance of O(n + 2). Here if n gets really large the 2 becomes insignificant. In this case it's perfectly clear the real performance is O(n).
However, say another algorithm has an average performance of O(n2 / 2). The book where I saw this example says the real performance is O(n2). I'm not sure I get why, I mean the 2 in this case seems not completely insignificant. So I was looking for a nice clear explanation from the book. The book explains it this way:
"Consider though what the 1/2 means. The actual time to check each value
is highly dependent on the machine instruction that the code
translates to and then on the speed at which the CPU can execute the instructions. Therefore the 1/2 doesn't mean very much."
And my reaction is... huh? I literally have no clue what that says or more precisely what that statement has to do with their conclusion. Can somebody spell it out for me please.
Thanks for any help.
There's a distinction between "are these constants meaningful or relevant?" and "does big-O notation care about them?" The answer to that second question is "no," while the answer to that first question is "absolutely!"
Big-O notation doesn't care about constants because big-O notation only describes the long-term growth rate of functions, rather than their absolute magnitudes. Multiplying a function by a constant only influences its growth rate by a constant amount, so linear functions still grow linearly, logarithmic functions still grow logarithmically, exponential functions still grow exponentially, etc. Since these categories aren't affected by constants, it doesn't matter that we drop the constants.
That said, those constants are absolutely significant! A function whose runtime is 10100n will be way slower than a function whose runtime is just n. A function whose runtime is n2 / 2 will be faster than a function whose runtime is just n2. The fact that the first two functions are both O(n) and the second two are O(n2) doesn't change the fact that they don't run in the same amount of time, since that's not what big-O notation is designed for. O notation is good for determining whether in the long term one function will be bigger than another. Even though 10100n is a colossally huge value for any n > 0, that function is O(n) and so for large enough n eventually it will beat the function whose runtime is n2 / 2 because that function is O(n2).
In summary - since big-O only talks about relative classes of growth rates, it ignores the constant factor. However, those constants are absolutely significant; they just aren't relevant to an asymptotic analysis.
Big O notation is most commonly used to describe an algorithm's running time. In this context, I would argue that specific constant values are essentially meaningless. Imagine the following conversation:
Alice: What is the running time of your algorithm?
Bob: 7n2
Alice: What do you mean by 7n2?
What are the units? Microseconds? Milliseconds? Nanoseconds?
What CPU are you running it on? Intel i9-9900K? Qualcomm Snapdragon 845? (Or are you using a GPU, an FPGA, or other hardware?)
What type of RAM are you using?
What programming language did you implement the algorithm in? What is the source code?
What compiler / VM are you using? What flags are you passing to the compiler / VM?
What is the operating system?
etc.
So as you can see, any attempt to indicate a specific constant value is inherently problematic. But once we set aside constant factors, we are able to clearly describe an algorithm's running time. Big O notation gives us a robust and useful description of how long an algorithm takes, while abstracting away from the technical features of its implementation and execution.
Now it is possible to specify the constant factor when describing the number of operations (suitably defined) or CPU instructions an algorithm executes, the number of comparisons a sorting algorithm performs, and so forth. But typically, what we're really interested in is the running time.
None of this is meant to suggest that the real-world performance characteristics of an algorithm are unimportant. For example, if you need an algorithm for matrix multiplication, the Coppersmith-Winograd algorithm is inadvisable. It's true that this algorithm takes O(n2.376) time, whereas the Strassen algorithm, its strongest competitor, takes O(n2.808) time. However, according to Wikipedia, Coppersmith-Winograd is slow in practice, and "it only provides an advantage for matrices so large that they cannot be processed by modern hardware." This is usually explained by saying that the constant factor for Coppersmith-Winograd is very large. But to reiterate, if we're talking about the running time of Coppersmith-Winograd, it doesn't make sense to give a specific number for the constant factor.
Despite its limitations, big O notation is a pretty good measure of running time. And in many cases, it tells us which algorithms are fastest for sufficiently large input sizes, before we even write a single line of code.
Big-O notation only describes the growth rate of algorithms in terms of mathematical function, rather than the actual running time of algorithms on some machine.
Mathematically, Let f(x) and g(x) be positive for x sufficiently large.
We say that f(x) and g(x) grow at the same rate as x tends to infinity, if
now let f(x)=x^2 and g(x)=x^2/2, then lim(x->infinity)f(x)/g(x)=2. so x^2 and x^2/2 both have same growth rate.so we can say O(x^2/2)=O(x^2).
As templatetypedef said, hidden constants in asymptotic notations are absolutely significant.As an example :marge sort runs in O(nlogn) worst-case time and insertion sort runs in O(n^2) worst case time.But as the hidden constant factors in insertion sort is smaller than that of marge sort, in practice insertion sort can be faster than marge sort for small problem sizes on many machines.
You are completely right that constants matter. In comparing many different algorithms for the same problem, the O numbers without constants give you an overview of how they compare to each other. If you then have two algorithms in the same O class, you would compare them using the constants involved.
But even for different O classes the constants are important. For instance, for multidigit or big integer multiplication, the naive algorithm is O(n^2), Karatsuba is O(n^log_2(3)), Toom-Cook O(n^log_3(5)) and Schönhage-Strassen O(n*log(n)*log(log(n))). However, each of the faster algorithms has an increasingly large overhead reflected in large constants. So to get approximate cross-over points, one needs valid estimates of those constants. Thus one gets, as SWAG, that up to n=16 the naive multiplication is fastest, up to n=50 Karatsuba and the cross-over from Toom-Cook to Schönhage-Strassen happens for n=200.
In reality, the cross-over points not only depend on the constants, but also on processor-caching and other hardware-related issues.
Big O without constant is enough for algorithm analysis.
First, the actual time does not only depend how many instructions but also the time for each instruction, which is closely connected to the platform where the code runs. It is more than theory analysis. So the constant is not necessary for most case.
Second, Big O is mainly used to measure how the run time will increase as the problem becomes larger or how the run time decrease as the performance of hardware improved.
Third, for situations of high performance optimizing, constant will also be taken into consideration.
The time required to do a particular task in computers now a days does not required a large amount of time unless the value entered is very large.
Suppose we wants to multiply 2 matrices of size 10*10 we will not have problem unless we wants to do this operation multiple times and then the role of asymptotic notations becomes prevalent and when the value of n becomes very big then the constants don't really makes any difference to the answer and are almost negligible so we tend to leave them while calculating the complexity.
Time complexity for O(n+n) reduces to O(2n). Now 2 is a constant. So the time complexity will essentially depend on n.
Hence the time complexity of O(2n) equates to O(n).
Also if there is something like this O(2n + 3) it will still be O(n) as essentially the time will depend on the size of n.
Now suppose there is a code which is O(n^2 + n), it will be O(n^2) as when the value of n increases the effect of n will become less significant compared to effect of n^2.
Eg:
n = 2 => 4 + 2 = 6
n = 100 => 10000 + 100 => 10100
n = 10000 => 100000000 + 10000 => 100010000
As you can see the effect of the second expression as lesser effect as the value of n keeps increasing. Hence the time complexity evaluates to O(n^2).

What is the purpose of Big-O notation in computer science if it doesn't give all the information needed?

What is the use of Big-O notation in computer science if it doesn't give all the information needed?
For example, if one algorithm runs at 1000n and one at n, it is true that they are both O(n). But I still may make a foolish choice based on this information, since one algorithm takes 1000 times as long as the other for any given input.
I still need to know all the parts of the equation, including the constant, to make an informed choice, so what is the importance of this "intermediate" comparison? I end up loosing important information when it gets reduced to this form, and what do I gain?
What does that constant factor represent? You can't say with certainty, for example, that an algorithm that is O(1000n) will be slower than an algorithm that's O(5n). It might be that the 1000n algorithm loads all data into memory and makes 1,000 passes over that data, and the 5n algorithm makes five passes over a file that's stored on a slow I/O device. The 1000n algorithm will run faster even though its "constant" is much larger.
In addition, some computers perform some operations more quickly than other computers do. It's quite common, given two O(n) algorithms (call them A and B), for A to execute faster on one computer and B to execute faster on the other computer. Or two different implementations of the same algorithm can have widely varying runtimes on the same computer.
Asymptotic analysis, as others have said, gives you an indication of how an algorithm's running time varies with the size of the input. It's useful for giving you a good starting place in algorithm selection. Quick reference will tell you that a particular algorithm is O(n) or O(n log n) or whatever, but it's very easy to find more detailed information on most common algorithms. Still, that more detailed analysis will only give you a constant number without saying how that number relates to real running time.
In the end, the only way you can determine which algorithm is right for you is to study it yourself and then test it against your expected data.
In short, I think you're expecting too much from asymptotic analysis. It's a useful "first line" filter. But when you get beyond that you have to look for more information.
As you correctly noted, it does not give you information on the exact running time of an algorithm. It is mainly used to indicate the complexity of an algorithm, to indicate if it is linear in the input size, quadratic, exponential, etc. This is important when choosing between algorithms if you know that your input size is large, since even a 1000n algorithm well beat a 1.23 exp(n) algorithm for large enough n.
In real world algorithms, the hidden 'scaling factor' is of course important. It is therefore not uncommon to use an algorithm with a 'worse' complexity if it has a lower scaling factor. Many practical implementations of sorting algorithms are for example 'hybrid' and will resort to some 'bad' algorithm like insertion sort (which is O(n^2) but very simple to implement) for n < 10, while changing to quicksort (which is O(n log(n)) but more complex) for n >= 10.
Big-O tells you how the runtime or memory consumption of a process changes as the size of its input changes. O(n) and O(1000n) are both still O(n) -- if you double the size of the input, then for all practical purposes the runtime doubles too.
Now, we can have an O(n) algorithm and an O(n2) algorithm where the coefficient of n is 1000000 and the coefficient of n2 is 1, in which case the O(n2) algorithm would outperform the O(n) for smaller n values. This doesn't change the fact, however, that the second algorithm's runtime grows more rapidly than the first's, and this is the information that big-O tells us. There will be some input size at which the O(n) algorithm begins to outperform the O(n2) algorithm.
In addition to the hidden impact of the constant term, complexity notation also only considers the worst case instance of a problem.
Case in point, the simplex method (linear programming) has exponential complexity for all known implementations. However, the simplex method works much faster in practice than the provably polynomial-time interior point methods.
Complexity notation has much value for theoretical problem classification. If you want some more information on practical consequences check out "Smoothed Analysis" by Spielman: http://www.cs.yale.edu/homes/spielman
This is what you are looking for.
It's main purpose is for rough comparisons of logic. The difference of O(n) and O(1000n) is large for n ~ 1000 (n roughly equal to 1000) and n < 1000, but when you compare it to values where n >> 1000 (n much larger than 1000) the difference is miniscule.
You are right in saying they both scale linearly and knowing the coefficient helps in a detailed analysis but generally in computing the difference between linear (O(cn)) and exponential (O(cn^x)) performance is more important to note than the difference between two linear times. There is a larger value in the comparisons of runtime of higher orders such as and Where the performance difference scales exponentially.
The overall purpose of Big O notation is to give a sense of relative performance time in order to compare and further optimize algorithms.
You're right that it doesn't give you all information, but there's no single metric in any field that does that.
Big-O notation tells you how quickly the performance gets worse, as your dataset gets larger. In other words, it describes the type of performance curve, but not the absolute performance.
Generally, Big-O notation is useful to express an algorithm's scaling performance as it falls into one of three basic categories:
Linear
Logarithmic (or "linearithmic")
Exponential
It is possible to do deep analysis of an algorithm for very accurate performance measurements, but it is time consuming and not really necessary to get a broad indication of performance.

Resources