Why is O(n) equal to O(2n) - complexity-theory

I understand that O(N) is essentially equal to O(cN) where c='some constant'. But if N = c. Doesn't that make it O(N)^2. Does this hold as c increases, or is there some formal limit.

If N = c then c is not constant. Therefore this is never the case.

O(n) means that the runtime of the algorithm increases linearly with the size of the input. If you double the size of the input, you double the runtime. If you triple the size of the input, you triple the runtime. And so on. So the graph is a straight line.
O(n^2) means that the runtime of the algorithm increases quadratically with the size of the input. If you double the size of the input, you quadruple the runtime. This is bad. The graph kind of loops up and gets really high really fast.
With O(2n), you have increased the slope of the line, but it is still a line. Since it's linear, it "reduces" to O(n).
Hope that helps.

Related

Big O (constant) time complexity

Why does the following code for each statement refer to big O constant (here I use 1 for the convention)?
I mean if the array size gets bigger the time complexity may get larger right? Also the number in total will get larger and larger, won't it affect the complexity?
Pseudocode:
def find_sum(given_array)
total = 0 # refers to O(1)
for each i in given array: #O(1)
total+=i #O(1)
return total #O(1)
TL;DR: Because the Big O notation is used to quantify an algorithm, with regards of how it behaves with an increment of its input.
I mean if the array size gets bigger the time complexity may get
larger right? Also the number in total will get larger and larger,
won't it affect the complexity?
You are mistaken the time taken by the algorithm with the time-complexity.
Let us start by clarifying what is Big O notation in the current context. From (source) one can read:
Big O notation is a mathematical notation that describes the limiting
behavior of a function when the argument tends towards a particular
value or infinity. (..) In computer science, big O notation is used to classify algorithms
according to how their run time or space requirements grow as the
input size grows.
Informally, in computer-science time-complexity and space-complexity theories, one can think of the Big O notation as a categorization of algorithms with a certain worst-case scenario concerning time and space, respectively. For instance, O(n):
An algorithm is said to take linear time/space, or O(n) time/space, if its time/space complexity is O(n). Informally, this means that the running time/space increases at most linearly with the size of the input (source).
So for this code:
def find_sum(given_array)
total = 0
for each i in given array:
total+=i
return total
the complexity is O(n) because with an increment of the input the complexity grows linear and not constant. More accurately Θ(n).
IMO it is not very accurate to find out the complexity like:
def find_sum(given_array)
total = 0 # refers to O(1)
for each i in given array: #O(1)
total+=i #O(1)
return total #O(1)
Since the Big O notation represents a set of functions with a certain asymptotic upper-bound; as one can read from source:
Big O notation characterizes functions according to their growth
rates: different functions with the same growth rate may be
represented using the same O notation.
More accurate would be :
def find_sum(given_array)
total = 0 # takes c1 time
for each i in given array:
total+=i # takes c2 time
return total # takes c3 time
So the time complexity would be c1 + n * c2 + c3, which can be simplified to n. And since both the lower and upper bounds of this function are the same we can use Θ(n) instead of O(n).
Why does the following code for each statement refer to big O constant (here I use 1 for the convention)?
Not sure, ask the person who wrote it. It seems clear the overall runtime is not O(1), so if that's the conclusion they arrive at, they are wrong. If they didn't mean to say that, what they wrote is either wrong or confusing.
I mean if the array size gets bigger the time complexity may get larger right?
Yes, it might. Indeed, here, it will, since you are at least iterating over the elements in the array. More elements in the array, more iterations of the loop. Straightforward.
Also the number in total will get larger and larger, won't it affect the complexity?
This is an interesting insight and the answer depends on how you conceive of numbers being represented. If you have fixed-length numeric representations (32-bit unsigned ints, double-precision floats, etc.) then addition is a constant-time operation. If you have variable-length representations (like a big integer library, or doing the algorithm by hand) then the complexity of adding would depend on the addition method used but would necessarily increase with number size (for regular add-with-carry, an upper logarithmic bound would be possible). Indeed, with variable-length representations, your complexity should at least include some parameter related to the size (perhaps max or average) of numbers in the array; otherwise, the runtime might be dominated by adding the numbers rather than looping (e.g., an array of two 1000^1000 bit integers would spend almost all time adding rather than looping).
No answer so far address the second question:
Also the number in total will get larger and larger, won't it affect the complexity?
which is very important and usually not accounted for.
The answer is, it depends on your computational model. If the underlying machine may add insanely arbitrarily large numbers in constant time, then no, it does not affect the time complexity.
A realistic machine, however, operates on values of fixed width. Modern computers happily add 64 bit quantities. Some may only add 16 bit-wide values at a time. A Turing machine - which is a base of the whole complexity theory - works with 1 bit at a time. In any case, once our numbers outgrow the register width, we must account for the fact that addition takes time proportional to the number of bits in the addends, which in this case is log(i) (or log(total), but since total grows as i*(i-1)/2, its bit width is approximately log(i*i) = 2 log(i)).
With this in mind, annotating
total+=i # O(log i)
is more prudent. Now the complexity of the loop
for each i in given array:
total+=i # O(log(i))
is sum[1..n] log(i) = log(n!) ~ n log(n). The last equality comes from the Stirling approximation of a factorial.
There is no way, that the loop:
for each i in given array:
total+=i
will run in O(1) time. Even if the size of input n is 1, asymptotic analysis will still indicate, that it runs in O(n), and not in O(1).
Asymptotic Analysis measures the time/space complexity in relation to the input size, and it does not necessarily show the exact number of operations performed.
Point, that O(1) is constant, does not mean that it's just one (1) operation, but rather it means, that this particular block (which takes O(1)) does not change when the input changes, and therefore, it has no correlation to the input, so it has a constant complexity.
O(n), on the other hand, indicates, that the complexity depends on n, and it changes depending on how the input n changes. O(n) is a linear relation, when input size and runtime have 1 to 1 correlation.
Correctly written comments would look like this:
def find_sum(given_array)
total = 0 #this is O(1), it doesn't depend on input
for each i in given array: #this is O(n), as loop will get longer as the input size gets longer
total+=i #again, O(1)
return total #again, O(1)

How is pre-computation handled by complexity notation?

Suppose I have an algorithm that runs in O(n) for every input of size n, but only after a pre-computation step of O(n^2) for that given size n. Is the algorithm considered O(n) still, with O(n^2) amortized? Or does big O only consider one "run" of the algorithm at size n, and so the pre-computation step is included in the notation, making the true notation O(n+n^2) or O(n^2)?
It's not uncommon to see this accounted for by explicitly separating out the costs into two different pieces. For example, in the range minimum query problem, it's common to see people talk about things like an ⟨O(n2), O(1)⟩-time solution to the problem, where the O(n2) denotes the precomputation cost and the O(1) denotes the lookup cost. You also see this with string algorithms sometimes: a suffix tree provides an O(m)-preprocessing-time, O(n+z)-query-time solution to string searching, while Aho-Corasick string matching offers an O(n)-preprocessing-time, O(m+z)-query-time solution.
The reason for doing so is that the tradeoffs involved here really depend on the use case. It lets you quantitatively measure how many queries you're going to have to make before the preprocessing time starts to be worth it.
People usually care about the total time to get things done when they are talking about complexity etc.
Thus, if getting to the result R requires you to perform steps A and B, then complexity(R) = complexity(A) + complexity(B). This works out to be O(n^2) in your particular example.
You have already noted that for O analysis, the fastest growing term dominates the overall complexity (or in other words, in a pipeline, the slowest module defines the throughput).
However, complexity analysis of A and B will be typically performed in isolation if they are disjoint.
In summary, it's the amount of time taken to get the results that counts, but you can (and usually do) reason about the individual steps independent of one another.
There are cases when you cannot only specify the slowest part of the pipeline. A simple example is BFS, with the complexity O(V + E). Since E = O(V^2), it may be tempting to write the complexity of BFS as O(E) (since E > V). However, that would be incorrect, since there can be a graph with no edges! In those cases, you will still need to iterate over all the vertices.
The point of O(...) notation is not to measure how fast the algorithm is working, because in many specific cases O(n) can be significantly slower than, say O(n^3). (Imagine the algorithm which runs in 10^100 n steps vs. the one which runs in n^3 / 2 steps.) If I tell you that my algorithm runs in O(n^2) time, it tells you nothing about how long it will take for n = 1000.
The point of O(...) is to specify how the algorithm behaves when the input size grows. If I tell you that my algorithm runs in O(n^2) time, and it takes 1 second to run for n = 500, then you'll expect rather 4 seconds to for n = 1000, not 1.5 and not 40.
So, to answer your question -- no, the algorithm will not be O(n), it will be O(n^2), because if I double the input size the time will be multiplied by 4, not by 2.

Is Manacher's algorithm really linear?

I recently read a wikipedia article on Manacher's algorithm and after seeing the sample implementation and dozens of other implementations... I'll be honest I have no clue how this algorithm is linear. The way I see it, it's rather in the best case O(n+n/2) but that's not linear, is it?
http://en.wikipedia.org/wiki/Longest_palindromic_substring
For each character within the original string we're trying to expand the P array in both directions until we either reach a string boundary or the symmetrical property is not satisfied. If it would only be like so this'd mean O(n^2) but with the extra observations will be less than that. Still at most I could get my head down to O(n+n/2) but not to O(n) as that would esentially mean the internal nested loop o(1). Anything higher than that and it's higher breaks the linearity for the whole algorithm.
so in a nutshell, how is this algorithm linear?
O(n + n/2) is linear, O(n+n/2) ~ O(n)
The time is still proportional to n.
Or, being more precice, the limit of (n+n/2) / n as n goes to infinity (and also when it doesn't) is a finite constant. So O(n+n/2) and O(n) are equivalent.

Which is bigger: O(n*logn) or O(1)?

We are going over the master theorem in my algorithms class, and for one problem, I'm trying to compare nlogn vs 1 to figure out which case of the MT it falls under. But I'm having a hard timing figuring out which is bigger.
Edit: This is for solving a recurrence problem. The equation is T(n) = 2T(n/4) + N*LogN. Just threw this in incase it helps.
Think about it this way:
O(N*LogN) will increase with N in such a way that for any X, no matter how large, you can find a value of N such that N*LogN is greater than X.
O(1) will stay the same, no matter what N is.
This means that O(1) is asymptotically better, i.e. for some (perhaps very high) value of N the O(N*LogN) will become slower.
If an algorithm is O(NlogN) that means that there exists a number A and a quantity of execution time B, such that for any input size N greater than A, the execution time will be less than B times NlogN.
If an algorithm is O(1), that would mean that there exists some fixed amount of time C in which the algorithm would be guaranteed to complete regardless of the input size.
In comparing two algorithms, one of which is O(NlgN) and one of which is O(1), one will generally discover that the O(1) algorithm is faster for values of N that are sufficiently large, but in many cases the O(NlgN) algorithm may be faster for small values of N.
Indeed, while something like an O(N^3) or O(N^4) algorithm would generally seem pretty bad, it's possible that even an O(N^4) algorithm may outperform an O(1) algorithm if N is usually a small number (e.g. 1-5 or so) and never gets very big (even an occasional value of 50 could seriously dog performance).

What does it mean to find big o notation for memory

I have a question, what does it mean to find the big-o order of the memory required by an algorithm?
Like what's the difference between that and the big o operations?
E.g
a question asks
Given the following pseudo-code, with an initialized two dimensional array A, with both dimensions of size n:
for i <- 1 to n do
for j <- 1 to n-i do
A[i][j]= i + j
Wouldn't the big o notation for memory just be n^2 and the computations also be n^2?
Big-Oh is about how something grows according to something else (technically the limit on how something grows). The most common introductory usage is for the something to be how fast an algorithm runs according to the size of inputs.
There is nothing that says you can't have the something be how much memory is used according to the size of the input.
In your example, since there is a bucket in the array for everything in i and j, the space requirements grow as O(i*j), which is O(n^2)
But if your algorithm was instead keeping track of the largest sum, and not the sums of every number in each array, the runtime complexity would still be O(n^2) while the space complexity would be constant, as the algorithm only ever needs to keep track of current i, current j, current max, and the max being tested.
Big-O order of memory means how does the number of bytes needed to execute the algorithm vary as the number of elements processed increases. In your example, I think the Big-O order is n squared, because the data is stored in a square array of size nxn.
The big-O order of operations means how does the number of calculations needed to execute the algorithm vary as the number of elements processed increases.
Yes you are correct the space and time complexity for the above pseudo code is n^2.
But for the below code the space or memory complexity is 1 and but time complexity is n^2.
I usually go by the assignments etc done within the code which gives you the memory complexity.
for i <- 1 to n do
for j <- 1 to n-i do
A[0][0]= i + j
I honestly never heard of "big O for memory" but I can easily guess it is only loosely relater to the computation time - probably only setting a lower bound.
As an example, it is easy to design an algorithm which uses n^2 memory and n^3 computation, but i think it is impossible to do the other way round - you cannot process n^2 data with n complexity computationally.
Your algorithm has complexity 1/2 * n^ 2, thus O(n^2)
If A is given to your algorithm, then the space complexity is O(1). Iterating over an existing 2D array and writing values to existing memory locations uses no additional memory.
However, if the algorithm allocates A, then the space complexity is O(n2).
The time complexity is O(n2) either way.

Resources