big O of two nested binary search trees - algorithm

I need to compute the big O of my algorithm.
It is contained from two nested for, each of them have binary search tree.
the complexity of binary search tree is O(log n)
How can i compute the right complexity of my algorithm ?
Does it O(log(n)log(m)) or O(log(nm)) ?

As you said, one binary search has a complexity of O(log n). You could imagine log(n) as the (maximum) number of steps that your algorithm need. For each of these steps your algorithm performs a nested binary search. That means, you could imagine this as log(n) times log(m) steps.
So you are right with O(log(n) x log(m)).
If for example n is bigger than m, then you have O((log(n))^2).
But normally we don't distinguish between different inputs. We consider them together as "the one" input of size n. Then your complexity is just O((log(n))^2).
Not distinguishing different inputs at least makes sense for polynomial algorithms. To see this consider an example: You have two inputs of sizes n and m with m <= n. Your total input size is therefore n+m. Let's see what happens for different complexities.
O(n+m) <= 2*O(n). Normally when talking about complexity we are not interested in constant factors. so we can say, we have O(n).
O(log(n+m)) <= O(log(2n)). Now we remember the logarithm rules from school. They say log(2n) = log(2) + log(n). Normally when talking about complexity we are not interested in constants. So we can say, we have O(log(n))
O((n+m)^2) = O(n^2+2nm+m^2) <= O(n^2+2nn+n^2) = O(4n^2). Again we are not interested in constant factors and just say, we have O(n^2).

Related

Is time Complexity O(nm) equal to O(n^2) if m <= n

I am studying the time complexity of an algorithm to find if a string n contains a substring m where m <= n. The result of the analysis is that the time complexity is O(nm). So taking this time complexity as starting point and knowing that m <= n, thus mn <= n^2. Can we say that time complexity in big-O notation is O(n^2).
It is indeed the case that if the runtime of a function is O(mn) and you know for a fact that m ≤ n, then the runtime of the function is O(n2).
However, this may not be the best way of characterizing the runtime. If m is a tunable parameter that can range from, say, 0 to n, then O(mn) could be a "better" way of characterizing the runtime. For example, suppose this is an algorithm that finds the m smallest elements out of an n-element array. If you want to use this algorithm to find the top five elements (m = 5) out of ten million (n = 106), then characterizing the runtime as O(n2) would greatly overestimate the actual amount of work done, while O(mn) would give a better bound on the runtime.
In the case of string searching, the fact that the two strings can have wildly different lengths is a good reason to characterize the runtime as O(mn), as it shows you how the runtime scales with the runtime of both of the input strings, not just one of the strings.
Hope this helps!

Are O(n+m) and O(n) notations equivalent if m<n?

I am reading about the Rabin-Karp algorithm on Wikipedia and the time complexity mentioned in there is O(n+m). Now, from my understanding, m is necessarily between 0 and n, so in the best case the complexity is O(n) and in the worst case it is also O(2n)=O(n), so why isn't it just O(n)?
Basically, Robin-Karp expresses it's asymptotic notation as O(m+n) as a means to express the fact that it takes linear time relative to m+n and not just n. Essentially, the variables, m and n, have to mean something whenever you use asymptotic notation. For the case of the Robin-Karp algorithm, n represents the length of the text, and m represents the combined length of both the text and the pattern. Note that O(2n) means the same thing as O(n), because O(2n) is still a linear function of just n. However, in the case of Robin-Karp, m+n isn't really a function of just n. Rather, it's a function of both m and n, which are two independent variables. As such, O(m+n) doesn't mean the same thing as O(n) in the same way that O(2n) equates to O(n).
I hope that makes sense. :-P
m and n measure different dimensions of the input data. Text of length n and patterns of length m is not the same as text of length 2n and patterns of length 0.
O(m+n) tells us that the complexity is proportional to both the length of the text and the length of the patterns.
There are some scenarios where saying complexity in form of O(n+m) is suitable than just saying O(max(m,n)).
Scenario:
Consider BFS(Breadth First Search) or DFS(Depth First Search) as Scenario.
It will be more intuitive and will convey more information to say that the complexity is O(E+V) than max{E,V}. The former is in Sync with actual algorithmic description.

How is O(N) algorithm also an O(N^2) algorithm?

I was reading about Big-O Notation
So, any algorithm that is O(N) is also an O(N^2).
It seems confusing to me, I know that Big-O gives upper bound only.
But how can an O(N) algorithm also be an O(N^2) algorithm.
Is there any examples where it is the case?
I can't think of any.
Can anyone explain it to me?
"Upper bound" means the algorithm takes no longer than (i.e. <=) that long (as the input size tends to infinity, with relevant constant factors considered).
It does not mean it will ever actually take that long.
Something that's O(n) is also O(n log n), O(n2), O(n3), O(2n) and also anything else that's asymptotically bigger than n.
If you're comfortable with the relevant mathematics, you can also see this from the formal definition.
O notation can be naively read as "less than".
In numbers if I tell you x < 4 well then obviously x<5 and x< 6 and so on.
O(n) means that, if the input size of an algorithm is n (n could be the number of elements, or the size of an element or anything else that mathematically describes the size of the input) then the algorithm runs "about n iterations".
More formally it means that the number of steps x in the algorithm satisfies that:
x < k*n + C where K and C are real positive numbers
In other words, for all possible inputs, if the size of the input is n, then the algorithm executes no more than k*n + C steps.
O(n^2) is similar except the bound is kn^2 + C. Since n is a natural number n^2 >= n so the definition still holds. It is true that, because x < kn + C then x < k*n^2 + C.
So an O(n) algorithm is an O(n^2) algorithm, and an O(N^3 algorithm) and an O(n^n) algorithm and so on.
For something to be O(N), it means that for large N, it is less than the function f(N)=k*N for some fixed k. But it's also less than k*N^2. So O(N) implies O(N^2), or more generally, O(N^m) for all m>1.
*I assumed that N>=1, which is indeed the case for large N.
Big-O notation describes the upper bound, but it is not wrong to say that O(n) is also O(n^2). O(n) alghoritms are subset of O(n^2) alghoritms. It's the same that squares are subsets of all rectangles, but not every rectangle is a square. So technically it is correct to say that O(n) alghoritm is O(n^2) alghoritm even if it is not precise.
Definition of big-O:
Some function f(x) is O(g(x)) iff |f(x)| <= M|g(x)| for all x >= x0.
Clearly if g1(x) <= g2(x) then |f(x)| <= M|g1(x)| <= M|g2(x)|.
For an algorithm with just a single Loop will get a O(n) and algorithm with a nested loop will get a O(n^2).
Now consider the Bubble sort algorithm it uses the nested loop in it,
If we give an already sort set of inputs to a bubble sort algorithm the inner loop will never get executed so for a scenario like this it gets O(n) and for the other cases it gets O(n^2).

What's the complexity of for i: for o = i+1

for i = 0 to size(arr)
for o = i + 1 to size(arr)
do stuff here
What's the worst-time complexity of this? It's not N^2, because the second one decreases by one every i loop. It's not N, it should be bigger. N-1 + N-2 + N-3 + ... + N-N+1.
It is N ^ 2, since it's the product of two linear complexities.
(There's a reason asymptotic complexity is called asymptotic and not identical...)
See Wikipedia's explanation on the simplifications made.
Think of it like you are working with a n x n matrix. You are approximately working on half of the elements in the matrix, but O(n^2/2) is the same as O(n^2).
When you want to determine the complexity class of an algorithm, all you need is to find the fastest growing term in the complexity function of the algorithm. For example, if you have complexity function f(n)=n^2-10000*n+400, to find O(f(n)), you just have to find the "strongest" term in the function. Why? Because for n big enough, only that term dictates the behavior of the entire function. Having said that, it is easy to see that both f1(n)=n^2-n-4 and f2(n)=n^2 are in O(n^2). However, they, for the same input size n, don't run for the same amount of time.
In your algorithm, if n=size(arr), the do stuff here code will run f(n)=n+(n-1)+(n-2)+...+2+1 times. It is easy to see that f(n) represents a sum of an arithmetic series, which means f(n)=n*(n+1)/2, i.e. f(n)=0.5*n^2+0.5*n. If we assume that do stuff here is O(1), then your algorithm has O(n^2) complexity.
for i = 0 to size(arr)
I assumed that the loop ends when i becomes greater than size(arr), not equal to. However, if the latter is the case, than f(n)=0.5*n^2-0.5*n, and it is still in O(n^2). Remember that O(1),O(n),0(n^2),... are complexity classes, and that complexity functions of algorithms are functions that describe, for the input size n, how many steps there is in the algorithm.
It's n*(n-1)/2 which is equal to O(n^2).

What is the meaning of O(M+N)?

This is a basic question... but I'm thinking that O(M+N) is the same as O(max(M,N)), since the larger term should dominate as we go to infinity? Also, that would be different from O(min(M,N)), is that right? I keep seeing this notation, esp. when discussing graph algorithms. For example, you routinely see: O(|V| + |E|) (e.g., http://algs4.cs.princeton.edu/41undirected/).
Yes, O(M+N) means the same thing as O(max(M, N)). That is different than O(min(M, N)). As #Dr_Asik says, O(M+N) is technically linear O(N) but when M and N have a meaning, it is nice to be able to say "linear in what?" Imagine the algorithm is linear in the number of rows and the number of columns. We can either define N = rows + cols and say O(N) or we can say O(M+N) where M is rows and N is columns.
Linear time is noted O(N). Since (M+N) is a linear function, it should simply be noted O(N) as well. Likewise there is no sense in comparing O(1) to O(2), O(10) etc., they're all constant time and should all be noted O(1).
I know this is an old thread, but as I am studying this now I figured I would add my two cents for those currently searching similar questions.
I would argue that O(n+m), in the context of a graph represented as an adjacency list, is exactly that and cannot be changed for the following reasons:
1) O(n+m) = O(n) + O(m), but O(m) is upper bounded by O(n^2) so that
O(n+m) = O(n) + O(n^2) = O(n^2). However this is purely in terms of n only, that is, it is only taking into account the vertices and giving a weak upper bound (weak because it is trying to represent the edges with vertices). This does show though that O(n) does not equal O(n+m) as there COULD be a quadratic amount of edges when compared to vertices.
2) Saying O(n+m) takes into account all the elements that have to be passed through when implementing an algorithm which is reduced to something like Breadth First Search (BFS). As it takes into account all the elements in the graph exactly once, it can be considered linear and is a more strict analysis that upper bounding the edges with n^2. One could, for the sake of notation write something like n = |V| + |E| thus BFS runs is O(n) and gives the reader a sense of linearity, but generally, as the OP has mentioned, it is written as O(n+m) where
n= |V| and m = |E|.
Thanks a lot, hope this helps someone.

Resources