Time complexity of Prim's MST Algorithm - algorithm

Can someone explain to me why is Prim's Algorithm using adjacent matrix result in a time complexity of O(V2)?

(Sorry in advance for the sloppy looking ASCII math, I don't think we can use LaTEX to typeset answers)
The traditional way to implement Prim's algorithm with O(V^2) complexity is to have an array in addition to the adjacency matrix, lets call it distance which has the minimum distance of that vertex to the node.
This way, we only ever check distance to find the next target, and since we do this V times and there are V members of distance, our complexity is O(V^2).
This on it's own wouldn't be enough as the original values in distance would quickly become out of date. To update this array, all we do is at the end of each step, iterate through our adjacency matrix and update the distance appropriately. This doesn't affect our time complexity since it merely means that each step takes O(V+V) = O(2V) = O(V). Therefore our algorithm is O(V^2).
Without using distance we have to iterate through all E edges every single time, which at worst contains V^2 edges, meaning our time complexity would be O(V^3).
Proof:
To prove that without the distance array it is impossible to compute the MST in O(V^2) time, consider that then on each iteration with a tree of size n, there are V-n vertices to potentially be added.
To calculate which one to choose we must check each of these to find their minimum distance from the tree and then compare that to each other and find the minimum there.
In the worst case scenario, each of the nodes contains a connection to each node in the tree, resulting in n * (V-n) edges and a complexity of O(n(V-n)).
Since our total would be the sum of each of these steps as n goes from 1 to V, our final time complexity is:
(sum O(n(V-n)) as n = 1 to V) = O(1/6(V-1) V (V+1)) = O(V^3)
QED

Note: This answer just borrows jozefg's answer and tries to explain it more fully since I had to think a bit before I understood it.
Background
An Adjacency Matrix representation of a graph constructs a V x V matrix (where V is the number of vertices). The value of cell (a, b) is the weight of the edge linking vertices a and b, or zero if there is no edge.
Adjacency Matrix
A B C D E
--------------
A 0 1 0 3 2
B 1 0 0 0 2
C 0 0 0 4 3
D 3 0 4 0 1
E 2 2 3 1 0
Prim's Algorithm is an algorithm that takes a graph and a starting node, and finds a minimum spanning tree on the graph - that is, it finds a subset of the edges so that the result is a tree that contains all the nodes and the combined edge weights are minimized. It may be summarized as follows:
Place the starting node in the tree.
Repeat until all nodes are in the tree:
Find all edges that join nodes in the tree to nodes not in the tree.
Of those edges, choose one with the minimum weight.
Add that edge and the connected node to the tree.
Analysis
We can now start to analyse the algorithm like so:
At every iteration of the loop, we add one node to the tree. Since there are V nodes, it follows that there are O(V) iterations of this loop.
Within each iteration of the loop, we need to find and test edges in the tree. If there are E edges, the naive searching implementation uses O(E) to find the edge with minimum weight.
So in combination, we should expect the complexity to be O(VE), which may be O(V^3) in the worst case.
However, jozefg gave a good answer to show how to achieve O(V^2) complexity.
Distance to Tree
| A B C D E
|----------------
Iteration 0 | 0 1* # 3 2
1 | 0 0 # 3 2*
2 | 0 0 4 1* 0
3 | 0 0 3* 0 0
4 | 0 0 0 0 0
NB. # = infinity (not connected to tree)
* = minimum weight edge in this iteration
Here the distance vector represents the smallest weighted edge joining each node to the tree, and is used as follows:
Initialize with the edge weights to the starting node A with complexity O(V).
To find the next node to add, simply find the minimum element of distance (and remove it from the list). This is O(V).
After adding a new node, there are O(V) new edges connecting the tree to the remaining nodes; for each of these determine if the new edge has less weight than the existing distance. If so, update the distance vector. Again, O(V).
Using these three steps reduces the searching time from O(E) to O(V), and adds an extra O(V) step to update the distance vector at each iteration. Since each iteration is now O(V), the overall complexity is O(V^2).

First of all, it's obviously at least O(V^2), because that is how big the adjacency matrix is.
Looking at http://en.wikipedia.org/wiki/Prim%27s_algorithm, you need to execute the step "Repeat until Vnew = V" V times.
Inside that step, you need to work out the shortest link between any vertex in V and any vertex outside V. Maintain an array of size V, holding for each vertex either infinity (if the vertex is in V) or the length of the shortest link between any vertex in V and that vertex, and its length (so in the beginning this just comes from the length of links between the starting vertex and every other vertex). To find the next vertex to add to V, just search this array, at cost V. Once you have a new vertex, look at all the links from that vertex to every other vertex and see if any of them give shorter links from V to that vertex. If they do, update the array. This also cost V.
So you have V steps (V vertexes to add) each taking cost V, which gives you O(V^2)

Related

Depth first search (DFS) vs breadth first search (BFS) pseudocode and complexity

I have to develop pseudocode for an algorithm that computes the number of connected
components in a graph G = (V, E) given vertices V and edges E.
I know that I can use either depth-first search or breadth-first search to calculate the number of connected components.
However, I want to use the most efficient algorithm to solve this problem, but I am unsure of the complexity of each algorithm.
Below is an attempt at writing DFS in pseudocode form.
function DFS((V,E))
mark each node in V with 0
count ← 0
for each vertex in V do
if vertex is marked then
DFSExplore(vertex)
function DFSExplore(vertex)
count ← count + 1
mark vertex with count
for each edge (vertex, neighbour) do
if neighbour is marked with 0 then
DFSExplore(neighbour)
Below is an attempt at writing BFS in pseudocode form.
function BFS((V, E))
mark each node in V with 0
count ← 0, init(queue) #create empty queue
for each vertex in V do
if vertex is marked 0 then
count ← count + 1
mark vertex with count
inject(queue, vertex) #queue containing just vertex
while queue is non-empty do
u ← eject(queue) #dequeues u
for each edge (u, w) adjacent to u do
if w is marked with 0 then
count ← count + 1
mark w with count
inject(queue, w) #enqueues w
My lecturer said that BFS has the same complexity as DFS.
However, when I searched up the complexity of depth-first search it was O(V^2), while the complexity of breadth-first search is O(V + E) when adjacency list is used and O(V^2) when adjacency matrix is used.
I want to know how to calculate the complexity of DFS / BFS and I want to know how I can adapt the pseudocode to solve the problem.
Time complexity for both DFS & BFS is same i.e O(V+E) if you're using adjacency list. So if you ask, which one is better then it completely depends on the type of problem you are going to solve. Let's say you want to solve a problem where you have your goal near the starting vertex then BFS would be a better choice. Plus, if you consider memory then DFS is a better option because there is no need of storing child pointers.
Image courtesy - DSA Made Easy by Narasimha karumanchi

What is the Time Complexity of finding all possible ways from one to another?

Here I use a pseudocode to present my algorithm,it's a variation of DFS
The coding style is imitating the Introduction to Algorithm ,Everytime we come across a vertex,its color is painted BLACK .Suppose the starting vertex is START,and the target vertex is TARGET,and the graph is represented as G=(V,E).One more thing,assume that the graph is connected,and strongly connected if it's a directed graph.
FIND-ALL-PATH(G,START,END)
for each vertex u in G.V
u.color=WHITE
path=0//store the result
DFS-VISIT(G,START)
DFS-VISIT(G,u)
if(u==TARGET)
path=path+1
return
u.color=BLACK
for each v in G:Adj[u]
if(v.color==WHITE)
DFS-VISIT(G,v)
u.color=WHITE;//re-paint the vertex to find other possible ways
How to analyze the Time Complexity of the algorithm above?If it's the normal DFS then of course its O(N+E),because each vertex is visited only once,and each edge is visited twice.But what about this?It seems hard to specify the time that each vertex or edge is visited.
To analyze the time complexity for FIND-ALL-PATH, let's see what is the time complexity of DFS-VISIT. I am assuming you are using Adjacency List for representing the Graph.
Here, in one call of DFS-VISIT every vertex which is connected to u (the vertex you passed as the argument) is going to be explored once (i.e. vertex color is going to be changed to BLACK). Since this is a recursive function so in each recursion a new stack is going to be formed and there the set G:Adj[u] present in each stack is nothing but element adjacent to u. Therefore, every node in all the list put together will be examined(color is changed) exactly once and whenever they are examined, we do a constant work (i.e. O(1) operation). There are overall E elements in case of directed Graph and 2E in case of un-directed Graph in Adjacency List representation. So we say it's time is O(E), where E is the number of edges. In some books, they add extra time O(N), where N is the number of vertices, so they say they overall time complexity for DFS-VISIT is O(N+E)(I think that the reason for that extra O(N) time is the for loop which gets executed N number of times or it may be something else). BTW, N is always less than E so you can either ignore it or consider it, it doesn't affect the Asymptotic time for the DFS-VISIT.
The time complexity of the function FIND-ALL-PATH is N * time complexity for DFS-VISIT; where N is the number of vertices in the Graph. So I would say that the algorithm you wrote above is not exactly same as depth-first traversal algorithm but then it will do the same work as depth-first traversal. The time taken in your algorithms is more because you are calling DFS-VISIT for each vertex in your graph. Your function FIND-ALL-PATH could be optimized in a way that before calling DFS-VISIT function just check if the color of the vertex is changed to BLACK or not (that's what is generally done in depth-first traversal).
i.e. you should have written the function like this:
FIND-ALL-PATH(G,START,END)
for each vertex u in G.V
u.color=WHITE
path=0//store the result
for each vertex u in G.V
if u.color is WHITE
DFS-VISIT(G,START)
Now this function written above will have same time complexity as DFS-VISIT.
Also note that there is some time taken to initialize the color of all vertices to WHITE, which is O(N) operation.
So, the overall time complexity of your function FIND-ALL-PATH is O(N)+O(N*(N+E)) or you can ignore the first O(N) (as it's very less as compared to the other term).
Thus, time complexity = O(N*(N+E)), or if you assume just O(E) time for your DFS-VISIT function then you can say that time complexity is O(N*E).
Let me know if you have doubt at any point mentioned above.
For directed graph :-
Suppose there are 7 vertices {0,1,2,3,4,5,6} and consider the worst case where every vertex is connected to every other vertex =>
No of edges required to reach from x to y vertices is as follows :
(6->6) 6 to 6 =0
(5->6) 5 to 6 =1
(4->6) 4 to 6 = (4 to 5 -> 6) + (4 to 6->6) = (1+( 5 -> 6)) + (1+0) =(1 + 1) + 1= 3
(3->6) 3 to 6 =(3 to 4 -> 6) + (3 to 5 -> 6 ) + (3 to 6->6) = (1+3) + (1+1) + (1+0)=7
(2->6) 2 to 6= 4+7+3+1=15
(1->6) 1 to 6= 5+15+7+3+1=31
(0->6) 0 to 6 = 6+5+15+7+3+1=63
So the time complexity to cover all the path to reach from 0 to 6= summation of (1+3+7+15+.....+T(n-1)+T(n))+(Total no of vertices -1) = (2^(n+1)-2-n)+(V-1)
value of n=V-1.
So final time complexity = O(2^V)
For undirected graph :-
Every edge will be traversed twice = 2*((2^(n+1)-2-n)+(V-1))=O(2^(V+1))

How to count all reachable nodes in a directed graph?

There is a directed graph (which might contain cycles), and each node has a value on it, how could we get the sum of reachable value for each node. For example, in the following graph:
the reachable sum for node 1 is: 2 + 3 + 4 + 5 + 6 + 7 = 27
the reachable sum for node 2 is: 4 + 5 + 6 + 7 = 22
.....
My solution: To get the sum for all nodes, I think the time complexity is O(n + m), the n is the number of nodes, and m stands for the number of edges. DFS should be used,for each node we should use a method recursively to find its sub node, and save the sum of sub node when finishing the calculation for it, so that in the future we don't need to calculate it again. A set is needed to be created for each node to avoid endless calculation caused by loop.
Does it work? I don't think it is elegant enough, especially many sets have to be created. Is there any better solution? Thanks.
This can be done by first finding Strongly Connected Components (SCC), which can be done in O(|V|+|E|). Then, build a new graph, G', for the SCCs (each SCC is a node in the graph), where each node has value which is the sum of the nodes in that SCC.
Formally,
G' = (V',E')
Where V' = {U1, U2, ..., Uk | U_i is a SCC of the graph G}
E' = {(U_i,U_j) | there is node u_i in U_i and u_j in U_j such that (u_i,u_j) is in E }
Then, this graph (G') is a DAG, and the question becomes simpler, and seems to be a variant of question linked in comments.
EDIT previous answer (striked out) is a mistake from this point, editing with a new answer. Sorry about that.
Now, a DFS can be used from each node to find the sum of values:
DFS(v):
if v.visited:
return 0
if v is leaf:
return v.value
v.visited = true
return sum([DFS(u) for u in v.children])
This is O(V^2 + VE) worst vase, but since the graph has less nodes, V
and E are now significantly lower.
Some local optimizations can be made, for example, if a node has a single child, you can reuse the pre-calculated value and not apply DFS on the child again, since there is no fear of counting twice in this case.
A DP solution for this problem (DAG) can be:
D[i] = value(i) + sum {D[j] | (i,j) is an edge in G' }
This can be calculated in linear time (after topological sort of the DAG).
Pseudo code:
Find SCCs
Build G'
Topological sort G'
Find D[i] for each node in G'
apply value for all node u_i in U_i, for each U_i.
Total time is O(|V|+|E|).
You can use DFS or BFS algorithms for solving Your problem.
Both have complexity O(V + E)
You dont have to count all values for all nodes. And you dont need recursion.
Just make something like this.
Typically DFS looks like this.
unmark all vertices
choose some starting vertex x
mark x
list L = x
while L nonempty
choose some vertex v from front of list
visit v
for each unmarked neighbor w
mark w
add it to end of list
In Your case You have to add some lines
unmark all vertices
choose some starting vertex x
mark x
list L = x
float sum = 0
while L nonempty
choose some vertex v from front of list
visit v
sum += v->value
for each unmarked neighbor w
mark w
add it to end of list

Code on Weighted, Acyclic Graph

We have a Code on Weighted, Acyclic Graph G(V, E) with positive and negative edges. we change the weight of this graph with following code, to give a G without negative edge (G'). if V={1,2...,n} and G_ij be a weight of edge i to edge j.
Change_weight(G)
for t=1 to n
for j=1 to n
G_i=min G_ij for All K
if G_i < 0 (we have a bar on G)
G_ij = G_ij+G_i for all j
G_ki = G_ki+G_i for all k
We have two axioms:
1) the shortest path between every two vertex in G is the same as G'.
2) the length of shortest path between every two vertex in G is the same as G'.
i read one pdf that has low quality, i'm not sure the code exactly mentioned, and add the picture. in this book he say the above axioms is false, anyone could help me? i think these are true?
i think two is false as following counter example, the original graph is given in left, and after the algorithm is run, the result is in right the shortest path between 1 to 3 changed, it passed from vertex 2 but after the algorithm is run it never passed from vertex 2.
Algorithm
My reading of the PDF is:
Change_weight(G)
for i=i to n
for j=1 to n
c_i=min c_ij for all j
if c_i < 0
c_ij = c_ij-c_i for all j
c_ki = c_ki+c_i for all k
The interpretation is that for each vertex we increase its outgoing edges by c_i, and decrease the incoming edges by c_i, where c_i is chosen such that all outgoing edges become non-negative.
Claim 1
"the shortest path between every two vertex in G is the same as G'"
With my reading of the pdf, this claim is true because every path between vertices i and j is changed by the same amount (c_i-c_j) and so the relative order of paths is unchanged. (Note that the path may go via intermediate vertices, but the net effect is 0 because for each intermediate vertex k we decrease the length by c_k when entering, but increase by c_k when exiting.)
Claim 2
"the length of shortest path between every two vertex in G is the same as G'".
This cannot be true - suppose we start with an original graph which has a single edge A to B with weight -1.
In the modified graph this weight will become 0.
Therefore the length of the shortest path has changed from -1 in G to 0 in G' so the statement is false.
Example
Shown below is what would happen to your graph as you applied this algorithm to node 1, followed by node 2:
Topological sort
Note that as shown in the example, we still end up with some negative weights which is probably unintended. This is because the weights of incoming edges are reduced.
However, if we work backwards through the graph (e.g. by using a topological sort), then we will always end up with non-negative weights everywhere.
In the given example, working backwards means we first update 2, and then 1 as shown below:

Complete graph with only two possible costs. What's the shortest path's cost from 0 to N - 1

You are given a complete undirected graph with N vertices. All but K edges have a cost of A. Those K edges have a cost of B and you know them (as a list of pairs). What's the minimum cost from node 0 to node N - 1.
2 <= N <= 500k
0 <= K <= 500k
1 <= A, B <= 500k
The problem is, obviously, when those K edges cost more than the other ones and node 0 and node N - 1 are connected by a K-edge.
Dijkstra doesn't work. I've even tried something very similar with a BFS.
Step1: Let G(0) be the set of "good" adjacent nodes with node 0.
Step2: For each node in G(0):
compute G(node)
if G(node) contains N - 1
return step
else
add node to some queue
repeat step2 and increment step
The problem is that this uses up a lot of time due to the fact that for every node you have to make a loop from 0 to N - 1 in order to find the "good" adjacent nodes.
Does anyone have any better ideas? Thank you.
Edit: Here is a link from the ACM contest: http://acm.ro/prob/probleme/B.pdf
This is laborous case work:
A < B and 0 and N-1 are joined by A -> trivial.
B < A and 0 and N-1 are joined by B -> trivial.
B < A and 0 and N-1 are joined by A ->
Do BFS on graph with only K edges.
A < B and 0 and N-1 are joined by B ->
You can check in O(N) time is there is a path with length 2*A (try every vertex in middle).
To check other path lengths following algorithm should do the trick:
Let X(d) be set of nodes reachable by using d shorter edges from 0. You can find X(d) using following algorithm: Take each vertex v with unknown distance and iterativelly check edges between v and vertices from X(d-1). If you found short edge, then v is in X(d) otherwise you stepped on long edge. Since there are at most K long edges you can step on them at most K times. So you should find distance of each vertex in at most O(N + K) time.
I propose a solution to a somewhat more general problem where you might have more than two types of edges and the edge weights are not bounded. For your scenario the idea is probably a bit overkill, but the implementation is quite simple, so it might be a good way to go about the problem.
You can use a segment tree to make Dijkstra more efficient. You will need the operations
set upper bound in a range as in, given U, L, R; for all x[i] with L <= i <= R, set x[i] = min(x[i], u)
find a global minimum
The upper bounds can be pushed down the tree lazily, so both can be implemented in O(log n)
When relaxing outgoing edges, look for the edges with cost B, sort them and update the ranges in between all at once.
The runtime should be O(n log n + m log m) if you sort all the edges upfront (by outgoing vertex).
EDIT: Got accepted with this approach. The good thing about it is that it avoids any kind of special casing. It's still ~80 lines of code.
In the case when A < B, I would go with kind of a BFS, where you would check where you can't reach instead of where you can. Here's the pseudocode:
G(k) is the set of nodes reachable by k cheap edges and no less. We start with G(0) = {v0}
while G(k) isn't empty and G(k) doesn't contain vN-1 and k*A < B
A = array[N] of zeroes
for every node n in G(k)
for every expensive edge (n,m)
A[m]++
# now we have that A[m] == |G(k)| iff m can't be reached by a cheap edge from any of G(k)
set G(k+1) to {m; A[m] < |G(k)|} except {n; n is in G(0),...G(k)}
k++
This way you avoid iterating through the (many) cheap edges and only iterate through the relatively few expensive edges.
As you have correctly noted, the problem comes when A > B and edge from 0 to n-1 has a cost of A.
In this case you can simply delete all edges in the graph that have a cost of A. This is because an optimal route shall only have edges with cost B.
Then you can perform a simple BFS since the costs of all edges are the same. It will give you optimal performance as pointed out by this link: Finding shortest path for equal weighted graph
Moreover, you can stop your BFS when the total cost exceeds A.

Resources