Running time of minimum spanning tree? ( Prim method ) - algorithm

I have written a code that solves MST using Prim method. I read that this kind of implementation(using priority queue) should have O(E + VlogV) = O(VlogV) where E is the number of edges and V number of Edges but when I look at my code it simply doesn't look that way.I would appreciate it if someone could clear this up for me.
To me it seems the running time is this:
The while loop takes O(E) times(until we go through all the edges)
Inside that loop we extract an element from the Q which takes O(logE) time.
And the second inner loop takes O(V) time(although we dont run this loop everytime
it is clear that it will be ran V times since we have to add all the vertices )
My conclusion would be that the running time is: O( E(logE+V) ) = O( E*V ).
This is my code:
#define p_int pair < int, int >
int N, M; //N - nmb of vertices, M - nmb of edges
int graph[100][100] = { 0 }; //adj. matrix
bool in_tree[100] = { false }; //if a node if in the mst
priority_queue< p_int, vector < p_int >, greater < p_int > > Q;
/*
keeps track of what is the smallest edge connecting a node in the mst tree and
a node outside the tree. First part of pair is the weight of the edge and the
second is the node. We dont remember the parent node beaceuse we dont need it :-)
*/
int mst_prim()
{
Q.push( make_pair( 0, 0 ) );
int nconnected = 0;
int mst_cost = 0;
while( nconnected < N )
{
p_int node = Q.top(); Q.pop();
if( in_tree[ node.second ] == false )
{
mst_cost += node.first;
in_tree[ node.second ] = true;
for( int i = 0; i < N; ++i )
if( graph[ node.second ][i] > 0 && in_tree[i]== false )
Q.push( make_pair( graph[ node.second ][i], i ) );
nconnected++;
}
}
return mst_cost;
}

You can use adjacency lists to speed your solution up (but not for dense graphs), but even then, you are not going to get O(V log V) without a Fibonacci heap..
Maybe the Kruskal algorithm would be simpler for you to understand. It features no priority queue, you only have to sort an array of edges once. It goes like this basically:
Insert all edges into an array and sort them by weight
Iterate over the sorted edges, and for each edge connecting nodes i and j, check if i and j are connected. If they are, skip the edge, else add the edge into the MST.
The only catch is to be quickly able to say if two nodes are connected. For this you use the Union-Find data structure, which goes like this:
int T[MAX_#_OF_NODES]
int getParent(int a)
{
if (T[a]==-1)return a;
return T[a]=getParent(T[a]);
}
void Unite(int a,int b)
{
if (rand()&1)
T[a]=b;
else
T[b]=a;
}
In the beginning, just initialize T to all -1, and then every time you want to find out if nodes A and B are connected, just compare their parents - if they are the same, they are connected (like this getParent(A)==getParent(B)). When you are inserting the edge to MST, make sure to update the Union-Find with Unite(getParent(A),getParent(B)).
The analysis is simple, you sort the edges and iterate over using the UF that takes O(1). So it is O(E logE + E ) which equals O(E log E).
That is it ;-)

I did not have to deal with the algorithm before, but what you have implemented does not match the algorithm as explained on Wikipedia. The algorithm there works as follows.
But all vertices into the queue. O(V)
While the queue is not empty... O(V)
Take the edge with the minimum weight from the queue. O(log(V))
Update the weights of adjacent vertices. O(E / V), this is the average number of adjacent vertices.
Reestablish the queue structure. O(log(V))
This gives
O(V) + O(V) * (O(log(V)) + O(V/E))
= O(V) + O(V) * O(log(V)) + O(V) * O(E / V)
= O(V) + O(V * log(V)) + O(E)
= O(V * log(V)) + O(E)
exactly what one expects.

Related

Time Complexity of Printing a Graph in Adjacency List Representation

What is the order of growth of the running time of the following code if the graph uses an adjacency-list representation, where V is the number of vertices, and E is the total number of edges?
// G.V() returns number of vertices, G is the graph.
for (int v = 0; v < G.V(); v++) {
for (w : G.adj(v)) {
System.out.println(v + "-" + w);
}
}
Why is the time complexity of the above code Theta(V+E), where V is the number of vertices and E is the number of edges?
I believe that if we let printing be the cost function, then it should be Theta(sum of degrees of each v) = Theta(2E) = Theta(E) because we enter the inner loop deg(v) times for vertex v.
if we let printing be the cost function, then
Using such assumption, yes, there will be Theta(E) println calls.
However, generally the execution time does not depend only on printing, but also on other instructions such as v++, there will be Theta(V+E) of them all.

Dijkstra time complexity with C++ pq

Dijkstra time complexity is O(V+ElogV) with binary heaps.
But, C++ pq(if used as binary heap), does not support decrease key. One solution suggested is to just insert the same vertex again in pq with decreased distance. For, ex:
From: https://www.hackerearth.com/practice/algorithms/graphs/shortest-path-algorithms/tutorial/
void dijkstra(){
// set the vertices distances as infinity
memset(vis, false , sizeof vis); // set all vertex as unvisited
dist[1] = 0;
multiset < pair < int , int > > s; // multiset do the job as a min-priority queue
s.insert({0 , 1}); // insert the source node with distance = 0
while(!s.empty()){
pair <int , int> p = *s.begin(); // pop the vertex with the minimum distance
s.erase(s.begin());
int x = p.s; int wei = p.f;
if( vis[x] ) continue; // check if the popped vertex is visited before
vis[x] = true;
for(int i = 0; i < v[x].size(); i++){
int e = v[x][i].f; int w = v[x][i].s;
if(dist[x] + w < dist[e] ){ // check if the next vertex distance could be minimized
dist[e] = dist[x] + w;
s.insert({dist[e], e} ); // insert the next vertex with the updated distance
}
}
}
}
The complexity should increase with this implementation(as opposed to claimed in the article, O(V+ElogV)), as the heap is size>V. I believe the complexity should be O(V+ElogE).
Am I correct? If not, what should be correct complexity?
Those bounds are actually equivalent for simple connected graphs. Since
|V| − 1 ≤ |E| ≤ |V| (|V| − 1)/2,
we can take logs and find that
log(|V|) − O(1/|V|) ≤ log(|V| − 1) ≤ log(|E|) ≤ log (|V| (|V| − 1)/2) ≤ 2 log(|V|),
hence Θ(log(|V|)) = Θ(log(|E|)).

Time complexity of adjacency list representation?

I am going through this link for adjacency list representation.
http://www.geeksforgeeks.org/graph-and-its-representations/
I have a simple doubt in some part of a code as follows :
// A utility function to print the adjacenncy list representation of graph
void printGraph(struct Graph* graph)
{
int v;
for (v = 0; v < graph->V; ++v)
{
struct AdjListNode* pCrawl = graph->array[v].head;
printf("\n Adjacency list of vertex %d\n head ", v);
while (pCrawl)
{
printf("-> %d", pCrawl->dest);
pCrawl = pCrawl->next;
}
printf("\n");
}
}
Since, here for every V while loop is executed say d times where d is the degree of each vertex.
So, I think time complexity is like
d0 + d1 + d2 + d3 ....... +dV where di is the degree of each vertex.
All this sums up O(E), but the link says that time complexity is O(|V|+|E|)
Not sure what is the problem with the understanding. Some help needed here
The important thing here is that for the time complexity to be valid, we need to cover every possible situation:
The outer loop is executed O(|V|) regardless of the graph structure.
Even if we had no edges at all, for every iteration of the outer loop, we would have to do a constant number of operations (O(1))
The inner loop is executed once for every edge, thus O(deg(v)) times, where deg(v) is the degree of the current node.
Thus the runtime of a single iteration of the outer loop is O(1 + deg(v)). Note that we cannot leave out the 1, because deg(v) might be 0 but we still need to do some work in that iteration
Summing it all up, we get a runtime of O(|V| * 1 + deg(v1) + deg(v2) + ...) = O(|V| + |E|).
As mentioned before, |E| could be rather small such that we still need
to account for the work done exclusively in the outer loop. Thus, we cannot simply remove the |V| term.

Longest path in ordered graph

Let G = (V, E) be a directed graph with nodes v_1, v_2,..., v_n. We say that G is an ordered graph if it has the following properties.
Each edge goes from a node with lower index to a node with a higher index. That is, every directed edge has the form (v_i, v_j) with i < j.
Each node except v_n has at least one edge leaving it. That is, for every node v_i, there is at least one edge of the form (v_i, v_j).
Give an efficient algorithm that takes an ordered graph G and returns the length of the longest path that begins at v_1 and ends at v_n.
If you want to see the nice latex version: here
My attempt:
Dynamic programming. Opt(i) = max {Opt(j)} + 1. for all j such such j is reachable from i.
Is there perhaps a better way to do this? I think even with memoization my algorithm will still be exponential. (this is just from an old midterm review I found online)
Your approach is right, you will have to do
Opt(i) = max {Opt(j)} + 1} for all j such that j is reachable from i
However, this is exponential only if you run it without memoization. With memoization, you will have the memoized optimal value for every node j, j > i, when you are on node i.
For the worst case complexity, let us assume that every two nodes that could be connected are connected. This means, v_1 is connected with (v_2, v_3, ... v_n); v_i is connected with (v_(i+1), v_(i+2), ... v_n).
Number of Vertices (V) = n
Hence, number of edges (E) = n*(n+1)/2 = O(V^2)
Let us focus our attention on a vertex v_k. For this vertex, we have to go through the already derived optimal values of (n-k) nodes.
Number of ways of reaching v_k directly = (k-1)
Hence worst case time complexity => sigma((k-1)*(n-k)) from k=1 to k=n, which is a sigma of power 2 polynomical, and hence will result in O(n^3) Time complexity.
Simplistically, the worst case time complexity is O(n^3) == O(V^3) == O(E) * O(V) == O(EV).
Thanks to the first property, this problem can be solved O(V^2) or even better with O(E) where V is the number of vertices and E is the number of edges. Indeed, it uses the dynamic programming approach which is quiet similar with the one you gives. Let opt[i] be the length of the longest path for v_1 to v_i. Then
opt[i] = max(opt[j]) + 1 where j < i and we v_i and v_j is connected,
using this equation, it can be solved in O(V^2).
Even better, we can solve this in another order.
int LongestPath() {
for (int v = 1; v <= V; ++v) opt[v] = -1;
opt[1] = 0;
for (int v = 1; v <= V; ++v) {
if (opt[v] >= 0) {
/* Each edge can be visited at most once,
thus the runtime time is bounded by |E|.
*/
for_each( v' can be reached from v)
opt[v'] = max(opt[v]+1, opt[v']);
}
}
return opt[V];
}

Complexity of non-recursiveDFS code

I think the complexity of this code is:
Time : O (v) : v is the vertex
Space: O (v) : v is the vertex
public void dfs() {
Stack<Integer> stack = new Stack<Integer>();
stack.add(source);
while (!stack.empty()) {
int vertex = stack.pop();
System.out.println(" print v: " + vertex);
for (int v : graph.adj(vertex)) {
if (!visited[v]) {
visited[v] = true;
stack.add(v);
edgeTo[v] = vertex;
}
}
}
}
Please correct me if I am wrong
Assuming that graph.adj() always produce a bounded number of vertices (maybe just one), then you are right.
However, if it depends in any way on the total number of vertices present in the system, then it is not. If this dependency is linear, then the algorithm is O(n^2).
Generalizing, if f(n) is the average number of graph.adj() per vertex, then the answer is O(n*f(n)).
You are traversing the adjacency matrix of each unvisited node, and each node is visited exactly once. Thus, you are in effect visiting each edge once, and thus the complexity is O(E), which can be as much as O(v^2) in worst case.

Resources