I am comparing two algorithms, Prim's and Kruskal's.
I understand the basic concept of time complexity and when the two work best (sparse/dense graphs)
I found this on the Internet, but I am struggling to convert it to English.
dense graph: Prim = O(N2)
Kruskal = O(N2*log(N))
sparse graph: Prim = O(N2)
Kruskal = O(N log(N))
It's a bit of a long shot, but could anyone explain what is going on here?
Prim is O(N^2), where N is the number of vertices.
Kruskal is O(E log E), where E is the number of edges. The "E log E" comes from a good algorithm sorting the edges. You can then process it in linear E time.
In a dense graph, E ~ N^2. So Kruskal would be O( N^2 log N^2 ), which is simply O( N^2 log N ).
OK, here goes. O(N2) (2 = squared) means that the speed of the algorithm for large N varies as the square of N - so twice the size of graph will result in four times the time to compute.
The Kruskal rows are merely simplified, and assume that E = c * N2. c here is presumably a constant, that we can assume to be significantly smaller than N as N gets large. You need to know the following laws of logarithms: log(ab) = log a + log b and log(a^n) = n * log a. These two combined with the fact that log c << log N (is much less than and can be ignored) should let you understand the simplifications there.
Now, as for the original expressions and where they were derived from, you'd need to check the page you got these from. But I'm assuming that if you're looking at Prim's and Kruskal's then you will be able to understand the derivation, or at least that if you can't my explaining it to you is not actually going to help you in the long run...
Kruskal is sensitive to the number of edges (E) in a graph, not the number of nodes.
Prim however is only affected by the number of nodes (N), evaluting to O(N^2).
This means that in dense graphs where the number of edges approaches N^2 (all nodes connected) it's complexity factor of O(E*log(E)) is roughly equivalent to O(N^2*log(N)).
The c is a constant to account for the 'almost' and is irrelevant in O notation. Also log(N^2) is of the same order of magnitude as log(N) as logarithm outweighs the power of 2 by a substantial margin ( log(N^2) => 2*log(N) which in O notation is O(log(N)) ).
In a sparse graph E is closer to N giving you O(N*log(N)).
The thought is that in a dense graph, the number of edges is O(N^2) while in sparse graphs, the number of edges is O(N). So they're taking the O(E \lg E) and expanding it with this approximation of E in order to compare it directly to the running time of Prim's O(N^2).
Basically, it's showing that Kruskal's is better for sparse graphs and Prim's is better for dense graphs.
The two algorithms have big-O defined for different inputs (nodes and edges). So they are converting one to the other to compare them.
N is the number nodes in the graph E is the number of edges.
for a dense graph there are O(N^2) Edges
for a sparse graph there are O(N) Edges.
constants are of course irrelavent for big-O hence the c drops out
First: n is the number of vertices.
Prim is O(n^2) that part is easy enough.
Kruskal is O(Elog(E)) where E is the number of edges. in a dense graph, there are as many as N choose 2 edges, which is roughly n^2 (actually it's n(n-1)/2, but who's counting?) so, it's roughly n^2 log (n^2) which is 2n^2 log n which is O(n^2logn) which is bigger than O(n^2)
In a sparse graph, there are as few as n edges, so we have n log n which is less than O(n^2).
Related
Different algorithms have different time complexity. I've been curious about this one too much.
O(m+n) represents a linear function, just similar to O(m) or O(n), which also represent linear functions. How is O(m+n) any different from O(m) or O(n)? They both represent linear time. In the case of O(n)/O(m), we neglect the other terms and just take the highest degree. Even in the case of the following equation: T(n)=n+1+n+1; we make T(n)=2n and thus make it O(n). Anyhow, we do not take into account the other parts of the equation.
I did read some articles on this and I didn't quite understand what those meant because according to those articles(or maybe I misinterpreted), m and n are for two variables i and j, but if that's the case, then why do we write two-pointer algorithms as O(n^2).
All this is very confusing for me, please explain to me the difference.
m and n might have very different values, that is why O(m+n) is different from O(m) or O(n) (but similar to O(max(m,n)))
Simple example:
Breadth-first search on graphs has complexity O(V+E) where V is vertex count, E is edge count.
For dense graphs E might be as large as V*(V-1)/2, so E~V^2 and we cannot say that complexity is O(V) - in this case it is O(V^2).
On the other side - very sparse graphs, where E is very small compared with V. In this case we cannot say that O(E) - in this case it is O(V).
And O(E+V) is valid in all cases.
In a class for analysis of algorithms, we are presented with this pseudocode for Kruskal's algorithm:
He then states the following, for disjoint-set forests:
A sequence of m MAKE-SET, UNION, and FIND-SET operations, n of which
are MAKE-SET operations, can be performed on a disjoint-set forest
with union by rank and path compression in worst-case time O(m α(n)).
Used to compute the complexity of Step 2, and steps 5-8
For connected G: |E| ≥ |V| -1; m = O(V + E), n = O(V);
So Steps 2, 5-8: O((V + E) α(V)) = O(E α(V))
α(V) = O(lg V) = O(lg E); so we obtain O(E lg E) ----- // how is α(V) equal here?
Kruskal: Steps 3, 5-8, and step 4: O(E lg E)
Observe: |E| < |V|2 -> lg E = O(lg V)
So, Kruskal complexity: O(E lg V)
I have attempted to understand the logic behind this "alpha(n)"/"α(n)" function, and from what I've read it seems that, simplistically, the Ackermann function is one that grows exponentially incredibly fast, and the inverse is one that grows logarithmically incredibly slowly.
If my interpretation is correct, what does "α(n)" represent? Does it mean that MAKE-SET operations are at most O(lg n)? How/Why is using inverse-Ackermann necessary? I was under the impression this operation is performed V times (for each vertex). Following this, α(V) is also simplified to O(lg V) = O(lg E), does this mean that, at a maximum, α(V) may be represented by O(lg V)?
Also, why is the |E| < |V|^2 -> lg E = O(lg V) statement made, how is it known that that |E| < |V|^2?
I think my question really boils down to, why is it that a "forest" representation of disjoint sets seems to be more efficient than those implemented with linked lists when my lecturer states they are both O(E log V)? Therefore is there a point in the increased difficulty of implementing disjoint sets with forests?
α(V) = O(lg V) is a common abuse of notation, really we have α(V) ∈ O(lg V) (inverse-Ackerman of V is a member of the set of functions O(lg V)). They're not equal, they're not even the same type, one is a function and the other is a set of functions.
how is it known that that |E| < |V|²?
How many edges does a complete undirected graph have? You can't have more than that. You could in a multigraph, but that's not what the algorithm operates on, and it's useless to extend it to multigraphs - just throw out all but the best edge between a pair of nodes.
why is it that a "forest" representation of disjoint sets seems to be more efficient than those implemented with linked lists when my lecturer states they are both O(E log V)?
This is a weird thing to ask for several reasons. First, you're effectively measuring the efficiency of disjoint sets through Kruskals algorithm, not by its own. The "they" is your question is two implementations of Kruskals algorithm. Secondly, as you surely realized, the derivation of an upper bound used α(V) ∈ O(lg V). So it deliberately ignores a significant difference. That makes sense, because the time complexity is asymptotically dominated by the sorting step, but just because a difference is invisible in a big O doesn't mean it isn't there.
Therefore is there a point in the increased difficulty of implementing disjoint sets with forests?
There is no increased difficulty really. It's a super easy data structure that you can write in 5 minutes, just two arrays and some simple code - linked lists may actually be harder, especially if you have to do manual memory management. Note that outside the context of Kruskals algorithm, the difference is huge in terms of both asymptotic time and actual time.
But even in the context of Kruskals algorithm, improving the second stage of the algorithm obviously makes the total time better, even if it doesn't show in the worst case asymptotic time. FWIW you can improve the first stage too, you can use a heap (or one of its fancier drop-in replacements) and only heapify the edges in linear time. Then the second stage of the algorithm will extract them one by one but, crucially, you typically don't have to extract every edge - you can keep track of how many disjoint sets are left and stop when it drops to 1, potentially leaving many (even most) edges unused. In the worst case that doesn't help, but in real life it does. And in special cases you can sort the edges faster than O(E log E), when any of the fast sorts (counting sort, bucket sort, etc) apply.
So I'm teaching myself some graph algorithms, now on Kruskal's, and understand that it's recommended to use union-find so checking whether adding an edge creates a cycle only takes O(Log V) time. For practical purposes, I see why you'd want to, but strictly looking through Big O notation, does doing so actually affect the worst-case complexity?
My reasoning: If instead of union find, we did a DFS to check for cycles, the runtime for that would be O(E+V), and you have to perform that V times for a runtime of O(V^2 + VE). It's more than with union find, which would be O(V * LogV), but the bulk of the complexity of Kruskal's comes from deleting the minimum element of the priority queue E times, which is O(E * logE), the Big O answer. I don't really see a space advantage either since the union-find takes O(V) space and so too do the data structures you need to maintain to find a cycle using DFS.
So a probably overly long explanation for a simple question: Does using union-find in Kruskal's algorithm actually affect worst-case runtime?
and understand that it's recommended to use union-find so checking whether adding an edge creates a cycle only takes O(Log V) time
This isn't right. Using union find is O(alpha(n) * m), where alpha(n) is the inverse of the Ackermann function, and, for all intents and purposes, can be considered constant. So much faster than logarithmic:
Since alpha(n) is the inverse of this function, alpha(n) is less than 5 for all remotely practical values of n. Thus, the amortized running time per operation is effectively a small constant.
but the bulk of the complexity of Kruskal's comes from deleting the minimum element of the priority queue E times
This is also wrong. Kruskal's algorithm does not involve using any priority queues. It involves sorting the edges by cost at the beginning. Although the complexity remains the one you mention for this step. However, sorting might be faster in practice than a priority queue (using a priority queue will, at best, be equivalent to a heap sort, which is not the fastest sorting algorithm).
Bottom line, if m is the number of edges and n the number of nodes.:
Sorting the edges: O(m log m).
For each edge, calling union-find: O(m * alpha(n)), or basically just O(m).
Total complexity: O(m log m + m * alpha(n)).
If you don't use union-find, total complexity will be O(m log m + m * (n + m)), if we use your O(n + m) cycle finding algorithm. Although O(n + m) for this step is probably an understatement, since you must also update your structure somehow (insert an edge). The naive disjoint-set algorithm is actually O(n log n), so even worse.
Note: in this case, you can write log n instead of log m if you prefer, because m = O(n^2) and log(n^2) = 2log n.
In conclusion: yes, union-find helps a lot.
Even if you use the O(log n) variant of union-find, which would lead to O(m log m + m log n) total complexity, which you could assimilate to O(m log m), in practice you'd rather keep the second part faster if you can. Since union-find is very easy to implement, there's really no reason not to.
I have an algorithm that takes a DAG graph that has n nodes and for every node, it does a binary search on its adjacency nodes. To the best of my knowledge, this would be a O(n log n) algorithm however since the n inside the log corresponds only to the adjacency of a node I was wondering if this would become rather O(n log m). By m I mean the m nodes adjacent to each node (which would intuitively and often be much less than n).
Why not O(n log m)? I would say O(n log m) doesn't make sense because m is not technically a size of the input, n is. Besides, worst-case scenario the m can be n since a node could easily be connected to all others. Correct?
There are two cases here:
m, the number of adjacent nodes is bounded by a constant C, and
m, the number of adjacent nodes is bounded only by n, the number of nodes
In the first case the complexity is O(n), because Log(C) is a constant. In the second case, it's O(n*log(n)) because of the reason that you explained in your question (i.e. "m can be n)).
Big O notation provides an upper bound on algorithm's complexity, so since m equals n in the worst case (n - 1 to be precise), the correct complexity would be O(n log n).
There are certainly DAGs where one node is connected to every other node. Another example would be a DAG with nodes number 0,1,2...n, where every node has an edge leading to all higher numbered nodes.
There is precedent for giving a complexity estimate which depends on more than one parameter - http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm quotes a cost of O(|E| + |V| log(|V|). In some cases this might be useful information.
It is correct that in the worst case of a graph, each node has n-1 neighbours, meaning that it is connected to everyone else, but if that was so for every node then it wont be an acyclic graph.
Therefore the average neighbours of each node is less than n.
The maximum number of edges in a DAG is: (n-1)n/2
If we look at each node, it will have an average of (n-1)/2 neighbours.
So your complexity would still remain O(n log n) in the worst case.
First: The general running time of Dijkstras Shortest Path algorithm is
where m is the number of edges and n the number of vertices
Second: the number of expected decreasekey operations is the following
Third: The expected running time of dijkstra with a binary Heap which allows all operations in log(n) time is
But why is the running time on dense graphs linear if we consider a graph dense if
Can someone help with the O-notation and log calculations here?
First it isn't hard to show that if m is big omega of n log(n) log(log(n)) then log(n) is big omega of log(m). Therefore you can show that m is big omega of n log(m) log(log(m)).
From this you can show that n is big omega of m / (log(m) log(log(m))).
Substitute this back into the expression you have in the third point and we get that the expected running time is:
O(m + n log(m/n) log(n))
m m
= O(m + (------------------) log(log(m) log(log(m))) log(------------------)
log(m) log(log(m)) log(m) log(log(m))
From here you can expand all of the logs of products into sums of logs. You'll get a lot of terms. And then it is just a question of demonstrating that every one is O(m) or o(m). Which is straightforward, though tedious.
my solution is now the following