Are O(n+m) and O(n) notations equivalent if m<n? - algorithm

I am reading about the Rabin-Karp algorithm on Wikipedia and the time complexity mentioned in there is O(n+m). Now, from my understanding, m is necessarily between 0 and n, so in the best case the complexity is O(n) and in the worst case it is also O(2n)=O(n), so why isn't it just O(n)?

Basically, Robin-Karp expresses it's asymptotic notation as O(m+n) as a means to express the fact that it takes linear time relative to m+n and not just n. Essentially, the variables, m and n, have to mean something whenever you use asymptotic notation. For the case of the Robin-Karp algorithm, n represents the length of the text, and m represents the combined length of both the text and the pattern. Note that O(2n) means the same thing as O(n), because O(2n) is still a linear function of just n. However, in the case of Robin-Karp, m+n isn't really a function of just n. Rather, it's a function of both m and n, which are two independent variables. As such, O(m+n) doesn't mean the same thing as O(n) in the same way that O(2n) equates to O(n).
I hope that makes sense. :-P

m and n measure different dimensions of the input data. Text of length n and patterns of length m is not the same as text of length 2n and patterns of length 0.
O(m+n) tells us that the complexity is proportional to both the length of the text and the length of the patterns.

There are some scenarios where saying complexity in form of O(n+m) is suitable than just saying O(max(m,n)).
Scenario:
Consider BFS(Breadth First Search) or DFS(Depth First Search) as Scenario.
It will be more intuitive and will convey more information to say that the complexity is O(E+V) than max{E,V}. The former is in Sync with actual algorithmic description.

Related

Is Big Omega of any linear algorithm n or can it also be 1?

If we have a linear algorithm (for example, find if a number exists in a given array of numbers), does this mean that Omega(n) = n? The number of steps would be n. And the tightest bound I can make is c*n where c = 1.
But as far as I know, Omega also describes the best case scenario which in this case would be 1 because the searched element can be on the first position of the array and that accounts for only one step. So, by this logic, Omega(n) = 1.
Which variant is the correct one and why? Thanks.
There is a large confusion about what is described using the asymptotic notation.
The running time of an algorithm is in general a function of the number of elements, but also of the particular values of the inputs. Hence T(x) where x is an input of n elements is not a function of n alone.
Now one can study the worst-case and best-case: to determine these, you choose a configuration of the input corresponding to the slowest or fastest execution time and these are functions of n only. And additional option is the expected (or average) running-time, which corresponds to a given statistical distribution of the input. This is also a function of n alone.
Now, Tworst(n), Tbest(n), Texpected(n) can have upper bounds, denoted by O(f(n)), and lower bounds, denoted by Ω(f(n)). When these bounds coincide, the notation Θ(f(n)) is used.
In the case of a linear search, the best case is Θ(1) and the worst and expected cases are Θ(n). Hence the running time for arbitrary input is Ω(1)
and O(n).
Addendum:
The Graal of algorithmics is the discovery of efficient algorithms, i.e. such that the effective running time is of the same order as the best behavior that can be achieved independently of any algorithm.
For instance, it is obvious that the worst-case of any search algorithm is Ω(n) because whatever the search order, you may have to perform n comparisons (for instance if the key is not there). As the linear search is worst-case O(n), it is worst-case efficient. It is also best-case efficient, but this is not so interesting.
If you have an linear time algorithm that means that the time complexity has a linear upper bound, namely O(n). This does not mean that it also has a linear lower bound. In your example, finding out if a element exits, the lower bound is Ω(1). Here is Ω(n) just wrong.
Doing a linear search on an array, to find the minimal element takes exactly n steps in all cases. So here is the lower bound Ω(n). But Ω(1) would also be right, since a constant number of steps is also a lower bound for n steps, but it is no tight lower bound.

What is the meaning of O(M+N)?

This is a basic question... but I'm thinking that O(M+N) is the same as O(max(M,N)), since the larger term should dominate as we go to infinity? Also, that would be different from O(min(M,N)), is that right? I keep seeing this notation, esp. when discussing graph algorithms. For example, you routinely see: O(|V| + |E|) (e.g., http://algs4.cs.princeton.edu/41undirected/).
Yes, O(M+N) means the same thing as O(max(M, N)). That is different than O(min(M, N)). As #Dr_Asik says, O(M+N) is technically linear O(N) but when M and N have a meaning, it is nice to be able to say "linear in what?" Imagine the algorithm is linear in the number of rows and the number of columns. We can either define N = rows + cols and say O(N) or we can say O(M+N) where M is rows and N is columns.
Linear time is noted O(N). Since (M+N) is a linear function, it should simply be noted O(N) as well. Likewise there is no sense in comparing O(1) to O(2), O(10) etc., they're all constant time and should all be noted O(1).
I know this is an old thread, but as I am studying this now I figured I would add my two cents for those currently searching similar questions.
I would argue that O(n+m), in the context of a graph represented as an adjacency list, is exactly that and cannot be changed for the following reasons:
1) O(n+m) = O(n) + O(m), but O(m) is upper bounded by O(n^2) so that
O(n+m) = O(n) + O(n^2) = O(n^2). However this is purely in terms of n only, that is, it is only taking into account the vertices and giving a weak upper bound (weak because it is trying to represent the edges with vertices). This does show though that O(n) does not equal O(n+m) as there COULD be a quadratic amount of edges when compared to vertices.
2) Saying O(n+m) takes into account all the elements that have to be passed through when implementing an algorithm which is reduced to something like Breadth First Search (BFS). As it takes into account all the elements in the graph exactly once, it can be considered linear and is a more strict analysis that upper bounding the edges with n^2. One could, for the sake of notation write something like n = |V| + |E| thus BFS runs is O(n) and gives the reader a sense of linearity, but generally, as the OP has mentioned, it is written as O(n+m) where
n= |V| and m = |E|.
Thanks a lot, hope this helps someone.

Time complexity in Big-O

What would be the BigO time of this algorithm
Input: Array A sorting n>=1 integers
Output: The sum of the elements at even cells in A
s=A[0]
for i=2 to n-1 by increments of 2
{
s=s+A[i]
}
return s
I think the function for this equation is F(n)=n*ceiling(n/2) but how do you convert that to bigO
The time complexity for that algorithm would be O(n), since the amount of work it does grows linearly with the size of the input. The other way to look at it is that loops over the input once - ignore the fact that it only looks at half of the values, that doesn't matter for Big-O complexity).
The number of operations is not proportional to n*ceiling(n/2), but rather n/2 which is O(n). Because of the meaning of big-O, (which includes the idea of an arbitrary coefficient), O(n) and O(n/2) are absolutely equivalent - so it is always written as O(n).
This is an O(n) algorithm since you look at ~n/2 elements.
Your algorithm will do N/2 iterations given that there are N elements in the array. Each iteration requires constant time to complete. This gives us O(N) complexity, and here's why.
Generally, the running time of an algorithm is a function f(x) from the size of data. Saying that f(x) is O(g(x)) means that there exists some constant c such that for all sufficiently large values of x f(x) <= cg(x). Easy to see how this applies in our case: if we assume that each iteration takes a unit of time, obviously f(N) <= 1/2N.
A formal manner to obtain the exact number of operations and your algorithm's order of growth:

What's wrong with this inductive proof that mergesort is O(n)?

Comparison based sorting is big omega of nlog(n), so we know that mergesort can't be O(n). Nevertheless, I can't find the problem with the following proof:
Proposition P(n): For a list of length n, mergesort takes O(n) time.
P(0): merge sort on the empty list just returns the empty list.
Strong induction: Assume P(1), ..., P(n-1) and try to prove P(n). We know that at each step in a recursive mergesort, two approximately "half-lists" are mergesorted and then "zipped up". The mergesorting of each half list takes, by induction, O(n/2) time. The zipping up takes O(n) time. So the algorithm has a recurrence relation of M(n) = 2M(n/2) + O(n) which is 2O(n/2) + O(n) which is O(n).
Compare the "proof" that linear search is O(1).
Linear search on an empty array is O(1).
Linear search on a nonempty array compares the first element (O(1)) and then searches the rest of the array (O(1)). O(1) + O(1) = O(1).
The problem here is that, for the induction to work, there must be one big-O constant that works both for the hypothesis and the conclusion. That's impossible here and impossible for your proof.
The "proof" only covers a single pass, it doesn't cover the log n number of passes.
The recurrence only shows the cost of a pass as compared to the cost of the previous pass. To be correct, the recurrence relation should have the cumulative cost rather than the incremental cost.
You can see where the proof falls down by viewing the sample merge sort at http://en.wikipedia.org/wiki/Merge_sort
Here is the crux: all induction steps which refer to particular values of n must refer to a particular function T(n), not to O() notation!
O(M(n)) notation is a statement about the behavior of the whole function from problem size to performance guarantee (asymptotically, as n increases without limit). The goal of your induction is to determine a performance bound T(n), which can then be simplified (by dropping constant and lower-order factors) to O(M(n)).
In particular, one problem with your proof is that you can't get from your statement purely about O() back to a statement about T(n) for a given n. O() notation allows you to ignore a constant factor for an entire function; it doesn't allow you to ignore a constant factor over and over again while constructing the same function recursively...
You can still use O() notation to simplify your proof, by demonstrating:
T(n) = F(n) + O(something less significant than F(n))
and propagating this predicate in the usual inductive way. But you need to preserve the constant factor of F(): this constant factor has direct bearing on the solution of your divide-and-conquer recurrence!

Complexity for recursive functions - Time and Space

I was interested in knowing how to calculate the time and space complexity of recursive functions like permutation, fibonacci(described here)
In general we can have recursion at many places than just at permutaion or recursion, so I am looking for approach generally followed to calculate tmie ans space complexity
Thank you
Take a look at http://www.cs.duke.edu/~ola/ap/recurrence.html
Time complexity and space complexity are the two things that characterize the performance of an algorithm. Nowadays, as space is relatively inexpensive, people bother mostly about time complexity, and time complexity is mostly expressed in terms of a recurrence relation.
Consider binary search algorithm (Search for an element in an array): Each time middle element(mid) is selected and compared with the element(x) to be searched. If (mid > x), search the lower sub-array, otherwise search the upper sub-array. If there are n elements in an array, and let T(n) represents the time complexity of the algorithm, then
T(n) = T(n/2) + c, where c is a constant. With given boundary conditions, we can solve for T(n), in this case, it would be T(n) = log(n), with T(1) = 1.

Resources