O(n log n) vs O(m) for algorithm - algorithm

I am finding an algorithm for a problem where I have two sets A and B of points with n and m points. I have two algorithms for the sets with complexity O(n log n) and O(m) and I am now wondering whether the complexity for the both algorithms combined is O(n log n) or O(m).
Basically, I am wondering whether there is some relation between m and n which would result in O(m).

If m and n are truly independent of one another and neither quantity influences the other, then the runtime of running an O(n log n)-time algorithm and then an O(m)-time algorithm is will be O(n log n + m). Neither term dominates the other - if n gets huge compared to m then the n log n part dominates, and if m is huge relative to n then the m term dominates.
This gets more complicated if you know how m and n relate to one another in some way. Many graph algorithms, for example, use m to denote the number of edges and n to denote the number of nodes. In those cases, you can sometimes simplify these expressions, but sometimes cannot. For example, the cost of implementing Dijkstra’s algorithm with a Fibonacci heap is O(m + n log n), the same as what we have above.

Size of your input is x: = m + n.
Complexity of a combined (if both are performed at most a constant number of times in the combined algorithm) algorithm is:
O(n log n) + O(m) = O(x log x) + O(x) = O(x log x).

Yes if m ~ n^n, then O(logm) = O(nlogn).
There is a log formula:
log(b^c) = c*log(b)
EDIT:
For both the algos combined the Big O is always the one that is larger because we are concerned about the asymptotic upper bound.
So it will depend on value of n and m. Eg: While n^n < m, the complexity is Olog(m), after that it becomes O(nlog(n)).
For Big-O notation we are only concerned about the larger values, so if n^n >>>> m then it is O(nlog(n)), else if m >>>> n^n then it is O(logm)

Related

Dijkstra's: Where is the equation from? m < n^2/log n

In this passage from my textbook:
where are the inequalities from? (The ones that I've marked with red rectangles.) I feel that they describe a relationship between vertices and edges in a graph, but I don't understand it.
You have two implementations of Dijkstra’s algorithm to choose from. One runs in time O((m + n) log n) = O(m log n), assuming the graph is connected. The other runs in time O(n2). The question is where the crossover point is between these two runtimes. Equating and simplifying gives that
m log n = n2
m = n2 / log n
So if m is asymptotically smaller than n2 / log n, you’d prefer the heap implementation, and if m is asymptotically bigger than n2 / log n you’d prefer the unsorted sequence approach.
(Note that, with a Fibonacci heap, the runtime of Dijkstra’s algorithm is O(m + n log n), which is never asymptotically worse than O(n2).)

Master theorem solution for the case d=log_b(a)

From Dasgupta's Algorithms: if the running time of a divide and conquer algorithm is described by the recurrence T(n)=aT(n/b)+O(n^d), then its solution is:
T(n)=O(n^d) if d>log_b(a)
T(n)=O(n^log_b(a)) if d<log_b(a)
T(n)=O(n^d*log_2(n)) if d=log_b(a)
where each subproblem's size is decreasing by b in the next recursion call, a is the branching factor and O((n/b^k)^d) is the time for deviding and combining the subproblems on the level k for each subproblem.
Cases 1 and 2 are straightforward - they are taken from the geometric series formed when summing the work done at each level of the recursion tree, which is a^k*O((n\b^k)^d)=O(n^d)*(a/b^d)^k.
Where does the log_2(n) come up from in case 3? When d=log_b(a), the ratio a/b^d equals 1, hence the sum of the series is n^d*log_b(a), not n^d*log_2(n)
As a simpler example, first note that O(log n), O(log137 n), and O(log16 n) mean the same thing. The reason for this is that, by the change of basis formula for logarithms, for any fixed constant m we have
log_m n = log n / log m = (1 / log m) · log n = O(log n).
The Master Theorem assumes that a, b, and d are constants. From the change of basis formula for logarithms, we have that
In that sense, O(nd logb n) = O(nd log n), since b is a constant here.
As a note, it’s unusual to see something written out as O(nd log2 n), since the log base here doesn’t matter and just contributes to the (already hidden) constant factor.

Algorithm Running Time for O(n.m^2)

I would like to know, because I couldn't find any information online, how is an algorithm like O(n * m^2) or O(n * k) or O(n + k) supposed to be analysed?
Does only the n count?
The other terms are superfluous?
So O(n * m^2) is actually O(n)?
No, here the k and m terms are not superfluous,they do have a valid existence and essential for computing time complexity. They are wrapped together to provide a concrete-complexity to the code.
It may seem like the terms n and k are independent to each other in the code,but,they both combinedly determines the complexity of the algorithm.
Say, if you've to iterate a loop of size n-elements, and, in between, you have another loop of k-iterations, then the overall complexity turns O(nk).
Complexity of order O(nk), you can't dump/discard k here.
for(i=0;i<n;i++)
for(j=0;j<k;j++)
// do something
Complexity of order O(n+k), you can't dump/discard k here.
for(i=0;i<n;i++)
// do something
for(j=0;j<k;j++)
// do something
Complexity of order O(nm^2), you can't dump/discard m here.
for(i=0;i<n;i++)
for(j=0;j<m;j++)
for(k=0;k<m;k++)
// do something
Answer to the last question---So O(n.m^2) is actually O(n)?
No,O(nm^2) complexity can't be reduced further to O(n) as that would mean m doesn't have any significance,which is not the case actually.
FORMALLY: O(f(n)) is the SET of ALL functions T(n) that satisfy:
There exist positive constants c and N such that, for all n >= N,
T(n) <= c f(n)
Here are some examples of when and why factors other than n matter.
[1] 1,000,000 n is in O(n). Proof: set c = 1,000,000, N = 0.
Big-Oh notation doesn't care about (most) constant factors. We generally leave constants out; it's unnecessary to write O(2n), because O(2n) = O(n). (The 2 is not wrong; just unnecessary.)
[2] n is in O(n^3). [That's n cubed]. Proof: set c = 1, N = 1.
Big-Oh notation can be misleading. Just because an algorithm's running time is in O(n^3) doesn't mean it's slow; it might also be in O(n). Big-Oh notation only gives us an UPPER BOUND on a function.
[3] n^3 + n^2 + n is in O(n^3). Proof: set c = 3, N = 1.
Big-Oh notation is usually used only to indicate the dominating (largest
and most displeasing) term in the function. The other terms become
insignificant when n is really big.
These aren't generalizable, and each case may be different. That's the answer to the questions: "Does only the n count? The other terms are superfluous?"
Although there is already an accepted answer, I'd still like to provide the following inputs :
O(n * m^2) : Can be viewed as n*m*m and assuming that the bounds for n and m are similar then the complexity would be O(n^3).
Similarly -
O(n * k) : Would be O(n^2) (with the bounds for n and k being similar)
and -
O(n + k) : Would be O(n) (again, with the bounds for n and k being similar).
PS: It would be better not to assume the similarity between the variables and to first understand how the variables relate to each other (Eg: m=n/2; k=2n) before attempting to conclude.

n^2 log n complexity

I am just a bit confused. If time complexity of an algorithm is given by
what is that in big O notation? Just or we keep the log?
If that's the time-complexity of the algorithm, then it is in big-O notation already, so, yes, keep the log. Asymptotically, there is a difference between O(n^2) and O((n^2)*log(n)).
A formal mathematical proof would be nice here.
Let's define following variables and functions:
N - input length of the algorithm,
f(N) = N^2*ln(N) - a function that computes algorithm's execution time.
Let's determine whether growth of this function is asymptotically bounded by O(N^2).
According to the definition of the asymptotic notation [1], g(x) is an asymptotic bound for f(x) if and only if: for all sufficiently large values of x, the absolute value of f(x) is at most a positive constant multiple of g(x). That is, f(x) = O(g(x)) if and only if there exists a positive real number M and a real number x0 such that
|f(x)| <= M*g(x) for all x >= x0 (1)
In our case, there must exists a positive real number M and a real number N0 such that:
|N^2*ln(N)| <= M*N^2 for all N >= N0 (2)
Obviously, such M and x0 do not exist, because for any arbitrary large M there is N0, such that
ln(N) > M for all N >= N0 (3)
Thus, we have proved that N^2*ln(N) is not asymptotically bounded by O(N^2).
References:
1: - https://en.wikipedia.org/wiki/Big_O_notation
A simple way to understand the big O notation is to divide the actual number of atomic steps by the term withing the big O and validate you get a constant (or a value that is smaller than some constant).
for example if your algorithm does 10n²⋅logn steps:
10n²⋅logn/n² = 10 log n -> not constant in n -> 10n²⋅log n is not O(n²)
10n²⋅logn/(n²⋅log n) = 10 -> constant in n -> 10n²⋅log n is O(n²⋅logn)
You do keep the log because log(n) will increase as n increases and will in turn increase your overall complexity since it is multiplied.
As a general rule, you would only remove constants. So for example, if you had O(2 * n^2), you would just say the complexity is O(n^2) because running it on a machine that is twice more powerful shouldn't influence the complexity.
In the same way, if you had complexity O(n^2 + n^2) you would get to the above case and just say it's O(n^2). Since O(log(n)) is more optimal than O(n^2), if you had O(n^2 + log(n)), you would say the complexity is O(n^2) because it's even less than having O(2 * n^2).
O(n^2 * log(n)) does not fall into the above situation so you should not simplify it.
if complexity of some algorithm =O(n^2) it can be written as O(n*n). is it O(n)?absolutely not. so O(n^2*logn) is not O(n^2).what you may want to know is that O(n^2+logn)=O(n^2).
A simple explanation :
O(n2 + n) can be written as O(n2) because when we increase n, the difference between n2 + n and n2 becomes non-existent. Thus it can be written O(n2).
Meanwhile, in O(n2logn) as the n increases, the difference between n2 and n2logn will increase unlike the above case.
Therefore, logn stays.

Lower bounds on comparison sorts for a small fraction of inputs?

Can someone please walk me through mathematical part of the solution of the following problem.
Show that there is no comparison sort whose running time is linear for at least half
of the n! inputs of length n. What about a fraction of 1/n of the inputs of length n?
What about a fraction (1/(2)^n)?
Solution:
If the sort runs in linear time for m input permutations, then the height h of the
portion of the decision tree consisting of the m corresponding leaves and their
ancestors is linear.
Use the same argument as in the proof of Theorem 8.1 to show that this is impossible
for m = n!/2, n!/n, or n!/2n.
We have 2^h ≥ m, which gives us h ≥ lgm. For all the possible ms given here,
lgm = Ω(n lg n), hence h = Ω(n lg n).
In particular,
lgn!/2= lg n! − 1 ≥ n lg n − n lg e − 1
lgn!/n= lg n! − lg n ≥ n lg n − n lg e − lg n
lgn!/2^n= lg n! − n ≥ n lg n − n lg e − n
Each of these proofs are a straightforward modification of the more general proof that you can't have a comparison sort that sorts any faster than Ω(n log n) (you can see this proof in this earlier answer). Intuitively, the argument goes as follows. In order for a sorting algorithm to work correctly, it has to be able to determine what the initial ordering of the elements is. Otherwise, it can't reorder the values to put them in ascending order. Given n elements, there are n! different permutations of those elements, meaning that there are n! different inputs to the sorting algorithm.
Initially, the algorithm knows nothing about the input sequence, and it can't distinguish between any of the n! different permutations. Every time the algorithm makes a comparison, it gains a bit more information about how the elements are ordered. Specifically, it can tell whether the input permutation is in the group of permutations where the comparison yields true or in the group of permutations where the comparison yields false. You can visualize how the algorithm works as a binary tree, where each node corresponds to some state of the algorithm, and the (up to) two children of a particular node indicate the states of the algorithm that would be entered if the comparison yields true or false.
In order for the sorting algorithm to be able to sort correctly, it has to be able to enter a unique state for each possible input, since otherwise the algorithm couldn't distinguish between two different input sequences and would therefore sort at least one of them incorrectly. This means that if you consider the number of leaf nodes in the tree (parts where the algorithm has finished comparing and is going to sort), there must be at least one leaf node per input permutation. In the general proof, there are n! permutations, so there must be at least n! leaf nodes. In a binary tree, the only way to have k leaf nodes is to have height at least Ω(log k), meaning that you have to do at least Ω(log k) comparisons. Thus the general sorting lower bound is Ω(log n!) = Ω(n log n) by Stirling's approximation.
In the cases that you're considering, we're restricting ourselves to a subset of those possible permutations. For example, suppose that we want to be able to sort n! / 2 of the permutations. This means that our tree must have height at least lg (n! / 2) = lg n! - 1 = Ω(n log n). As a result. you can't sort in time O(n), because no linear function grows at the rate Ω(n log n). For the second part, seeing if you can get n! / n sorted in linear time, again the decision tree would have to have height lg (n! / n) = lg n! - lg n = Ω(n log n), so you can't sort in O(n) comparisons. For the final one, we have that lg n! / 2n = lg n! - n = Ω(n log n) as well, so again it can't be sorted in O(n) time.
However, you can sort 2n permutations in linear time, since lg 2n = n = O(n).
Hope this helps!

Resources