O(m+n) vs O(m) /O(n) - algorithm

Different algorithms have different time complexity. I've been curious about this one too much.
O(m+n) represents a linear function, just similar to O(m) or O(n), which also represent linear functions. How is O(m+n) any different from O(m) or O(n)? They both represent linear time. In the case of O(n)/O(m), we neglect the other terms and just take the highest degree. Even in the case of the following equation: T(n)=n+1+n+1; we make T(n)=2n and thus make it O(n). Anyhow, we do not take into account the other parts of the equation.
I did read some articles on this and I didn't quite understand what those meant because according to those articles(or maybe I misinterpreted), m and n are for two variables i and j, but if that's the case, then why do we write two-pointer algorithms as O(n^2).
All this is very confusing for me, please explain to me the difference.

m and n might have very different values, that is why O(m+n) is different from O(m) or O(n) (but similar to O(max(m,n)))
Simple example:
Breadth-first search on graphs has complexity O(V+E) where V is vertex count, E is edge count.
For dense graphs E might be as large as V*(V-1)/2, so E~V^2 and we cannot say that complexity is O(V) - in this case it is O(V^2).
On the other side - very sparse graphs, where E is very small compared with V. In this case we cannot say that O(E) - in this case it is O(V).
And O(E+V) is valid in all cases.

Related

Why naive implementation of Dijkstra's shortest path algorithm takes O(nm) time

I'm watching the coursera lectures about algorithm and the professor introduces that the naive implementation of Dijkstra's shortest path algorithm, without using heaps, takes O(nm) time(n is the number of vertices, and m is the number of edges)
It claims that the main loop will go through the rest vertices besides the source, which are n-1 vertices, this I can understand, but inside the loop, the algorithm will go through edges that with tail in the processed vertices and head in the unprocessed vertices, to minimize the next path. But why does it mean there are m edges to go through, we can just go through the edges that qualifies the criteria(tail in the processed vertices and head in the unprocessed vertices) even in a naive implementation right?
Could anyone please help me understand this? thanks.
When you consider big-O time complexity, you should think of it as a form of upper-bound. Meaning, if some input can possibly make your program run in O(N) while some other input can possibly make your program run in O(NM), we generally say that this is an O(NM) program (especially when dealing with algorithms). This is called considering an algorithm in its worst-case.
There are rare cases where we don't consider the worst-case (amortized time complexity analysis, time complexity with random elements to it such as quicksort). In the case of quicksort, for example, the worst-case time complexity is O(N^2), but the chance of that happening is so, so small that in practice, if we select a random pivot, we can expect O(N log N) time complexity. That being said, unless you can guarantee (or have large confidence that) your program runs faster than O(NM), it is an O(NM) program.
Let's consider the case of your naive implementation of Dijkstra's algorithm. You didn't specify whether you were considering directed or undirected graphs, so I'll assume that the graph is undirected (the case for a directed graph is extremely similar).
We're going through all the nodes, besides the first node. This means we already have an O(N) loop.
In each layer of the loop, we're considering all the edges that stem from a processed node to an unprocessed node.
However, in the worst-case, there are close to O(M) of these in each layer of the loop; not O(1) or anything less than O(M).
There's an easy way to prove this. If each edge were to only be visited a single time, then we can say that the algorithm runs in O(M). However, in your naive implementation of Dijkstra's algorithm, the same edge can be considered multiple times. In fact, asymptomatically speaking, O(N) times.
You can try creating a graph yourself, then dry-running the process of Dijkstra on paper. You'll notice that each edge can be considered up to N times: once in each layer of the O(N) loop.
In other words, the reason we can't say the program is faster than O(NM) is because there is no guarantee that each edge isn't processed N times, or less than O(log N) times, or less than O(sqrt N) times, etc. Therefore, the best upper bound we can give is N, although in practice it may be less than N by some sort of constant. However, we do not consider this constant in big-O time complexity.
However, your thought process may lead to a better implementation of Dijsktra. When considering the algorithm, you may realize that instead of considering every single edge that goes from processed to unprocessed vertices in every iteration of the main loop, we only have to consider every single edge that is adjacent to the current node we are on (I'm talking about something like this or this).
By implementing it like so, you can achieve a complexity of O(N^2).

How does Kruskal and Prim change when edge weights are in the range of 1 to |V| or some constant W?

I'm reading CLRS Algorithms Edition 3 and I have two problems for my homework (I'm not asking for answers, I promise!). They are essentially the same question, just applied to Kruskal or to Prim. They are as follows:
Suppose that all edge weights in a graph are integers in the range from 1 to |V|. How fast can you make [Prim/Kruskal]'s algorithm run? What if the edge weights are integers in the range from 1 to W for some constant W?
I can see the logic behind the answers I'm thinking of and what I'm finding online (ie sort the edge weights using a linear sort, change the data structure being used, etc), so I don't need help answering it. But I'm wondering why there is a difference between the answer if the range is 1 to |V| and 1 to W. Why ask the same question twice? If it's some constant W, it could literally be anything. But honestly, so could |V| - we could have a crazy large graph, or a very small one. I'm not sure how the two questions posed in this problem are different, and why I need two separate approaches for both of them.
There's a difference in complexity between an algorithm that runs in O(V) time and O(W) time for constant W. Sure, V could be anything, as could W, but that's not really the point: one is linear, one, is O(1). The question is then for which algorithms could having a restricted range of edge-weights impact complexity (based, as you suggest on edge-weight sort time and choice in data-structure), and what would the actual new optimal complexity be for linearly bounded edge-weights vs. for edge-weights bounded by a constant, W.
Having bounded edge-weights could open up new possibilities for sorting algorithms for Kruskal's, and might change the data structure you'd want to use to implement the queue for Prim's along with the most optimal way you could implement extract-min and update-key operations for that queue. The extent to which edge-weights are bounded can impact whether a particular change in data structure or implementation is even beneficial to make in terms of final complexity.
For example, knowing that the n elements of a list are bounded in value by a constant W makes it so that a switch to radix sort would improve the asymptotic complexity of sorting them, but if I instead only knew that they were bounded in value by 2^n there would be no advantage in changing to radix sort over the traditional methods and their O(n*logn) sorting complexity.

Why is the Inverse Ackermann function used to describe complexity of Kruskal's algorithm?

In a class for analysis of algorithms, we are presented with this pseudocode for Kruskal's algorithm:
He then states the following, for disjoint-set forests:
A sequence of m MAKE-SET, UNION, and FIND-SET operations, n of which
are MAKE-SET operations, can be performed on a disjoint-set forest
with union by rank and path compression in worst-case time O(m α(n)).
Used to compute the complexity of Step 2, and steps 5-8
For connected G: |E| ≥ |V| -1; m = O(V + E), n = O(V);
So Steps 2, 5-8: O((V + E) α(V)) = O(E α(V))
α(V) = O(lg V) = O(lg E); so we obtain O(E lg E) ----- // how is α(V) equal here?
Kruskal: Steps 3, 5-8, and step 4: O(E lg E)
Observe: |E| < |V|2 -> lg E = O(lg V)
So, Kruskal complexity: O(E lg V)
I have attempted to understand the logic behind this "alpha(n)"/"α(n)" function, and from what I've read it seems that, simplistically, the Ackermann function is one that grows exponentially incredibly fast, and the inverse is one that grows logarithmically incredibly slowly.
If my interpretation is correct, what does "α(n)" represent? Does it mean that MAKE-SET operations are at most O(lg n)? How/Why is using inverse-Ackermann necessary? I was under the impression this operation is performed V times (for each vertex). Following this, α(V) is also simplified to O(lg V) = O(lg E), does this mean that, at a maximum, α(V) may be represented by O(lg V)?
Also, why is the |E| < |V|^2 -> lg E = O(lg V) statement made, how is it known that that |E| < |V|^2?
I think my question really boils down to, why is it that a "forest" representation of disjoint sets seems to be more efficient than those implemented with linked lists when my lecturer states they are both O(E log V)? Therefore is there a point in the increased difficulty of implementing disjoint sets with forests?
α(V) = O(lg V) is a common abuse of notation, really we have α(V) ∈ O(lg V) (inverse-Ackerman of V is a member of the set of functions O(lg V)). They're not equal, they're not even the same type, one is a function and the other is a set of functions.
how is it known that that |E| < |V|²?
How many edges does a complete undirected graph have? You can't have more than that. You could in a multigraph, but that's not what the algorithm operates on, and it's useless to extend it to multigraphs - just throw out all but the best edge between a pair of nodes.
why is it that a "forest" representation of disjoint sets seems to be more efficient than those implemented with linked lists when my lecturer states they are both O(E log V)?
This is a weird thing to ask for several reasons. First, you're effectively measuring the efficiency of disjoint sets through Kruskals algorithm, not by its own. The "they" is your question is two implementations of Kruskals algorithm. Secondly, as you surely realized, the derivation of an upper bound used α(V) ∈ O(lg V). So it deliberately ignores a significant difference. That makes sense, because the time complexity is asymptotically dominated by the sorting step, but just because a difference is invisible in a big O doesn't mean it isn't there.
Therefore is there a point in the increased difficulty of implementing disjoint sets with forests?
There is no increased difficulty really. It's a super easy data structure that you can write in 5 minutes, just two arrays and some simple code - linked lists may actually be harder, especially if you have to do manual memory management. Note that outside the context of Kruskals algorithm, the difference is huge in terms of both asymptotic time and actual time.
But even in the context of Kruskals algorithm, improving the second stage of the algorithm obviously makes the total time better, even if it doesn't show in the worst case asymptotic time. FWIW you can improve the first stage too, you can use a heap (or one of its fancier drop-in replacements) and only heapify the edges in linear time. Then the second stage of the algorithm will extract them one by one but, crucially, you typically don't have to extract every edge - you can keep track of how many disjoint sets are left and stop when it drops to 1, potentially leaving many (even most) edges unused. In the worst case that doesn't help, but in real life it does. And in special cases you can sort the edges faster than O(E log E), when any of the fast sorts (counting sort, bucket sort, etc) apply.

Time complexity of one algorithm cascaded into another?

I am working with random forest for a supervised classification problem, and I am using the k-means clustering algorithm to split the data at each node. I am trying to calculate the time complexity for the algorithm. From what I understand the the time complexity for k-means is
O(n · K · I · d )
where
n is the number of points,
K is the number of clusters,
I is the number of iterations, and
d is the number of attributes.
The k, I and d are constants or have an upper bound, and n is much larger as compared to these three, so I suppose the complexity is just O(n).
The random forest, on the other hand, is a divide-and-conquer approach, so for n instances the complexity is O(n · logn), though I am not sure about this, correct me if i am wrong.
To get the complexity of the algorithm do i just add these two things?
In this case, you don't add the values together. If you have a divide-and-conquer algorithm, the runtime is determined by a combination of
The number of subproblems made per call,
The sizes of those subproblems, and
The amount of work done per problem.
Changing any one of these parameters can wildly impact the overall runtime of the function. If you increase the number of subproblems made per call by even a small amount, you increase exponentially the number of total subproblems, which can have a large impact overall. Similarly, if you increase the work done per level, since there are so many subproblems the runtime can swing wildly. Check out the Master Theorem as an example of how to determine the runtime based on these quantities.
In your case, you are beginning with a divide-and-conquer algorithm where all you know is that the runtime is O(n log n) and are adding in a step that does O(n) work per level. Just knowing this, I don't believe it's possible to determine what the runtime will be. If, on the other hand, you make the assumption that
The algorithm always splits the input into two smaller pieces,
The algorithm recursively processes those two pieces independently, and
The algorithm uses your O(n) algorithm to determine which split to make
Then you can conclude that the runtime is O(n log n), since this is the solution to the recurrence given by the Master Theorem.
Without more information about the internal workings of the algorithm, though, I can't say for certain.
Hope this helps!

What is the meaning of O(M+N)?

This is a basic question... but I'm thinking that O(M+N) is the same as O(max(M,N)), since the larger term should dominate as we go to infinity? Also, that would be different from O(min(M,N)), is that right? I keep seeing this notation, esp. when discussing graph algorithms. For example, you routinely see: O(|V| + |E|) (e.g., http://algs4.cs.princeton.edu/41undirected/).
Yes, O(M+N) means the same thing as O(max(M, N)). That is different than O(min(M, N)). As #Dr_Asik says, O(M+N) is technically linear O(N) but when M and N have a meaning, it is nice to be able to say "linear in what?" Imagine the algorithm is linear in the number of rows and the number of columns. We can either define N = rows + cols and say O(N) or we can say O(M+N) where M is rows and N is columns.
Linear time is noted O(N). Since (M+N) is a linear function, it should simply be noted O(N) as well. Likewise there is no sense in comparing O(1) to O(2), O(10) etc., they're all constant time and should all be noted O(1).
I know this is an old thread, but as I am studying this now I figured I would add my two cents for those currently searching similar questions.
I would argue that O(n+m), in the context of a graph represented as an adjacency list, is exactly that and cannot be changed for the following reasons:
1) O(n+m) = O(n) + O(m), but O(m) is upper bounded by O(n^2) so that
O(n+m) = O(n) + O(n^2) = O(n^2). However this is purely in terms of n only, that is, it is only taking into account the vertices and giving a weak upper bound (weak because it is trying to represent the edges with vertices). This does show though that O(n) does not equal O(n+m) as there COULD be a quadratic amount of edges when compared to vertices.
2) Saying O(n+m) takes into account all the elements that have to be passed through when implementing an algorithm which is reduced to something like Breadth First Search (BFS). As it takes into account all the elements in the graph exactly once, it can be considered linear and is a more strict analysis that upper bounding the edges with n^2. One could, for the sake of notation write something like n = |V| + |E| thus BFS runs is O(n) and gives the reader a sense of linearity, but generally, as the OP has mentioned, it is written as O(n+m) where
n= |V| and m = |E|.
Thanks a lot, hope this helps someone.

Resources