running time variation Prim - performance

I wanted to find the "minimum spanning tree", given a set A with edges that must be in the "minimum spanning tree" (so it's not a real minimum spanning tree, but given A, it has the least sum of weights). So the "minimum spanning tree" must definitely contain all edges of A. I made some modifications to Prim's algorithm which can be found below. I then wanted to find the running time of this algorithm, however, I'm having trouble with finding the running time to check wheter the intersection of two sets is empty.
Could somebody please help me? And what would be to total running time then? I already put the running time for each step next to that step, except for "?".
Notation clarification:
δ(W) = {{v ,w} ∈ E : w ∈ W, v ∈ V\W} for W ⊂ V
algorithm:
1. T = ∅, W = {v} for some v ∈ V O(1)
2. While W ≠ V n iterations
If (A ∩ δ(W) ≠ ∅) do ?
Take e = {v,w} ∈ (A ∩ δ(W)) O(1)
T = T ∪ {e} O(1)
W = W ∪ {v,w } O(1)
Else
Find e = {v,w } ∈ δ(W) s.t. ce ≤ cf ∀ f ∈ δ(W) O(m)
T = T ∪ {e} O(1)
W = W ∪ {v,w } O(1)
End while

Related

unable to comprehend prims algorithm

Please help in understanding prims algo pseudocode(as it is in coreman and wiki)
Prim's algorithm.
MST-PRIM (G, w, r) {
for each u ∈ G.V
u.key = ∞
u.parent = NIL
r.key = 0
Q = G.V
while (Q ≠ ø)
//1
u = Extract-Min(Q)
for each v ∈ G.Adj[u]
if (v ∈ Q) and w(u,v) < v.key
v.parent = u
v.key = w(u,v)}
i am able to understand till 1 or while loop that r.key=0 ensure that neighours or adjacents of root are scanned first,
but as u already belongs to Q(queue of nodes till now not included in prims minimum spanning tree) and v also in Q,will not help in generating prims MST.
also both coreman and thus wiki states
1. A = { (v, v.parent) : v ∈ V - {r} - Q }.
2. The vertices already placed into the minimum spanning tree are those in V−Q.
3. For all vertices v ∈ Q, if v.parent ≠ NIL, then v.key < ∞ and v.key is the weight of a light edge
Prior to each iteration of the while loop of lines 6–11,
(v, v.parent) connecting v ::to some vertex already placed into the minimum spanning tree.
as A is our MST then how 1. will help as v is already been included in our MST (as shown by v ∈ V - {r} - Q ) why then it should be included.
For the part that you have doubts:
u = Extract-Min(Q)
for each v ∈ G.Adj[u]
if (v ∈ Q) and w(u,v) < v.key
v.parent = u
v.key = w(u,v)
"For each vertex v, the attribute v:key is the minimum weight of any edge connecting to a vertex in the tree; by convention, key = ∞ if there is no such edge." (http://en.wikipedia.org/wiki/Prim's_algorithm)
Therefore, u = Extract-Min(Q) will get the vertex with the minimum key.
for each v ∈ G.Adj[u] will find all the neighbors of u.
if (v ∈ Q) and w(u,v) < v.key condition to eliminate cycle and check if path should be updated.
Then the following lines of code update the neighbors edges.
v.parent = u
v.key = w(u,v)
"Prior to each iteration of the while loop of lines 6–11,
1. A = { (v, v.parent) : v ∈ V - {r} - Q }. " (http://en.wikipedia.org/wiki/Prim's_algorithm)
Based on the above statement, before the while loop A is empty as Q = G.V! After the while loop you will get A contains all the vertices that form the MST. Each vertex v in A has a parent (v.parent). For root r, its parent is NIL. Root r is excluded due to the statement V - {r} but it exists in A thanks to its children in the form of v.parent.
Therefore in this link http://en.wikipedia.org/wiki/Prim's_algorithm , it states that: "2. The vertices already placed into the minimum spanning tree are those in V−Q."
and "When the algorithm terminates, the min-priority queue Q is empty; the minimum spanning tree A for G is thus A = { (v, v.parent) : v ∈ V - {r} }."

why E dominates v?

I analyzed the running time for Kruskal algorithm and I come up with O(ElogE+Elogv+v)
I asked my prof and he said that if the graph is very sparse with many isolated vertices V dominates E which makes sense if not then E dominates V and I can not understand why?
I can give an example where graph is not sparse but still V is greater than E
Can anyone help me to clear this confusion?
A tree in a undirectional graph has |V|-1 edges.
Since a tree is the connected component with least edges as possible - it basically means that for each connected undirectional graph, |E| is in Omega(|V|), so |V| is dominated by |E|.
This basically means that if |E| < |V|-1 - the graph is not connected.
Now, since Kruskal algorithm is designed to find a spanning tree, you can abort the algorithm once you have found |E| < |V|-1 - there is no spanning tree at all, no point to look for one.
From this we conclude that when |E| < |V|-1, there is no point in discussing complexity of Kruskal Algorithm, and we can safely assume that |E| >= |V| -1 , so |V| is dominated by |E|.
Density = number of edges / number of possible edges = E / (V(V-1))/2
Let the graph be a tree E = V - 1
So V = (E + 1)
And Kruskal's complexity is
O(E log E + E log V + V) = O(E log E + E log (E + 1) + (E + 1)) = O( E log E )
So E dominates. E will dominate as long as E = O(V).

How to generate matrices which satisfy the triangle inequality?

Let's consider square matrix
(n is a dimension of the matrix E and fixed (for example n = 4 or n=5)). Matrix entries
satisfy following conditions:
The task is to generate all matrices E. My question is how to do that? Is there any common approach or algorithm? Is that even possible? What to start with?
Naive solution
A naive solution to consider is to generate every possible n-by-n matrix E where each component is a nonnegative integer no greater than n, then take from those only the matrices that satisfy the additional constraints. What would be the complexity of that?
Each component can take on n + 1 values, and there are n^2 components, so there are O((n+1)^(n^2)) candidate matrices. That has an insanely high growth rate.
Link: WolframAlpha analysis of (n+1)^(n^2)
I think it's safe to safe that this not a feasible approach.
Better solution
A better solution follows. It involves a lot of math.
Let S be the set of all matrices E that satisfy your requirements. Let N = {1, 2, ..., n}.
Definitions:
Let a metric on N to have the usual definition, except with the requirement of symmetry omitted.
Let I and J partition the set N. Let D(I,J) be the n x n matrix that has D_ij = 1 when i is in I and j is in J, and D_ij = 0 otherwise.
Let A and B be in S. Then A is adjacent to B if and only if there exist I and J partitioning N such that A + D(I,J) = B.
We say A and B are adjacent if and only if A is adjacent to B or B is adjacent to A.
Two matrices A and B in S are path-connected if and only if there exists a sequence of adjacent elements of S between them.
Let the function M(E) denote the sum of the elements of matrix E.
Lemma 1:
E = D(I,J) is a metric on N.
Proof:
This is a trivial statement except for the case of an edge going from I to J. Let i be in I and j be in J. Then E_ij = 1 by definition of D(I,J). Let k be in N. If k is in I, then E_ik = 0 and E_kj = 1, so E_ik + E_kj >= E_ij. If k is in J, then E_ik = 1 and E_kj = 0, so E_ij + E_kj >= E_ij.
Lemma 2:
Let E be in S such that E != zeros(n,n). Then there exist I and J partitioning N such that E' = E - D(I,J) is in S with M(E') < M(E).
Proof:
Let (i,j) be such that E_ij > 0. Let I be the subset of N that can be reached from i by a directed path of cost 0. I cannot be empty, because i is in I. I cannot be N, because j is not in I. This is because E satisfies the triangle inequality and E_ij > 0.
Let J = N - I. Then I and J are both nonempty and partition N. By the definition of I, there does not exist any (x,y) such that E_xy = 0 and x is in I and y is in J. Therefore E_xy >= 1 for all x in I and y in J.
Thus E' = E - D(I,J) >= 0. That M(E') < M(E) is obvious, because all we have done is subtract from elements of E to get E'. Now, since E is a metric on N and D(I,J) is a metric on N (by Lemma 1) and E >= D(I,J), we have E' is a metric on N. Therefore E' is in S.
Theorem:
Let E be in S. Then E and zeros(n,n) are path-connected.
Proof (by induction):
If E = zeros(n,n), then the statement is trivial.
Suppose E != zeros(n,n). Let M(E) be the sum of the values in E. Then, by induction, we can assume that the statement is true for any matrix E' having M(E') < M(E).
Since E != zeros(n,n), by Lemma 2 we have some E' in S such that M(E') < M(E). Then by the inductive hypothesis E' is path-connected to zeros(n,n). Therefore E is path-connected to zeros(n,n).
Corollary:
The set S is path-connected.
Proof:
Let A and B be in S. By the Theorem, A and B are both path-connected to zeros(n,n). Therefore A is path-connected to B.
Algorithm
The Corollary tells us that everything in S is path-connected. So an effective way to discover all of the elements of S is to perform a breadth-first search over the graph defined by the following.
The elements of S are the nodes of the graph
Nodes of the graph are connected by an edge if and only if they are adjacent
Given a node E, you can find all of the (potentially) unvisited neighbors of E by simply enumerating all of the possible matrices D(I,J) (of which there are 2^n) and generating E' = E + D(I,J) for each. Enumerating the D(I,J) should be relatively straightforward (there is one for every possible subset I of D, except for the empty set and D).
Note that, in the preceding paragraph, E and D(I,J) are both metrics on N. So when you generate E' = E + D(I,J), you don't have to check that it satisfies the triangle inequality - E' is the sum of two metrics, so it is a metric. To check that E' is in S, all you have to do is verify that the maximum element in E' does not exceed n.
You can start the breadth-first search from any element of S and be guaranteed that you won't miss any of S. So you can start the search with zeros(n,n).
Be aware that the cardinality of the set S grows extremely fast as n increases, so computing the entire set S will only be tractable for small n.

Find the set S which maximize the minimum distance between points in S union A

I would like to find a set S of given cardinality k maximizing the minimum distance between each points and a given set A. Is there a simple algorithm to find the solution of this max-min problem ?
Given a universe X ⊆ R^d and A ⊆ X,
find the argmax_{S⊆X, |S|=k} min_{s⊆S, s'≠s ⊆ S∪A} distance(s,s')
Thanks !
For a given k with X and A as input brute forcing is obviously in P (as C(|X|,k) is polynomial in |X|).
If k is also an input then it might depends on 'distance' :
If 'distance' is arbitrary then your problem is equivalent to find a fixed size clique in a graph (which is NP-complete):
NP-Hardness :
Take an instance of the clique problem, that is a graph (G,E) and a integer k.
Add to this graph a vertex 'a' connected to every other vertex, let (G',E') be your modified graph.
(G',E') k+1 is then an equivalent instance of your first instance of the clique problem.
Create a map Phi from G' to R^d (you can map G' on N anyway ...) and defined 'distance' such that distance(Phi(c),Phi(d')) = 1 if (c,d) ⊆ E', 0 otherwise.
X = Phi(G'), A = Phi({a}), k+1 give you an instance of your problem.
You can notice that by construction s ≠ Phi(a) <=> distance(s,Phi(A)) = 1 = max distance(.,.), i.e. min_{s⊆S, s'≠s ⊆ S∪A} distance(s,s') = min_{s⊆S, s'≠s ⊆ S} distance(s,s') if |S| = k >= 2
Solve this instance (X,A,k+1) of your problem : this give you a set S of cardinality k+1 such that min( distance(s,s') |s,s'⊆ S, s≠s') is maximal.
Check whether there are s,s'⊆ S, s≠s', distance(s,s') = 0 (this can be done in k^2 as |S| = k) :
if it is the case then there is no set of cardinality k+1 such that forall s,s' distance(s,s') = 1 i.e. there is no subgraph of G' which is a k+1-clique
if it's not the case then map back S into G', by definition of the 'distance' there is an edge between any vertices of Phi^-1 (S) : it's a k+1 clique
This solve with a polynomial reduction a problem equivalent to the clique problem (known as NP-hard).
NP-Easiness :
Let X, A, k be an instance of your problem.
For any subset S of X min_{s⊆S, s'≠s ⊆ S∪A} distance(s,s') can only take value in {distance(x,y), x,y ⊆ X} -which has a polynomial cardinal-.
To find a set that maximize this distance we will just test every possible distance for a correct set in a decreasing order.
To test a distance d we first reduce X to X' containing only points at distance >= d of A{the point itself}.
Then we create a graph (X',E) ¤ where (s,s') ⊆ E iff distance(s,s') >= d.
Test this graph for a k-clique (it's NP-easy), by construction there is one iff there it's vertices S are a set with min_{s⊆S, s'≠s ⊆ S∪A} distance(s,s') >= d
If 'distance' is euclidean or have any funny property it might be in P, I don't know but I won't hope too much if I were you.
¤ I assumed that X (and thus X') was finite here
This is probably NP-hard. Greedily choosing the point furthest away from previous choices is a 2-approximation. There might be a complicated approximation scheme for low d based on the scheme for Euclidean TSP by Arora and Mitchell. For d = 10, forget it.
The top 2 results for the search sphere packing np complete turned up references to a 1981 paper proving that optimal packing and covering in 2d is np complete. I do not have access to a research library, so I can't read the paper. But I expect that your problem can be rephrased as that one, in which case you have a proof that it you have an NP-complete problem.

Graph Minimum Spanning Tree using BFS

This is a problem from a practice exam that I'm struggling with:
Let G = (V, E) be a weighted undirected connected graph, with positive
weights (you may assume that the weights are distinct). Given a real
number r, define the subgraph Gr = (V, {e in E | w(e) <= r}). For
example, G0 has no edges (obviously disconnected), and Ginfinity = G
(which by assumption is connected). The problem is to find the
smallest r such that Gr is connected.
Describe an O(mlogn)-time algorithm that solves the problem by
repeated applications of BFS or DFS.
The real problem is doing it in O(mlogn). Here's what I've got:
r = min( w(e) ) => O(m)
while true do => O(m)
Gr = G with edges e | w(e) > r removed => O(m)
if | BFS( Gr ).V | < |V| => O(m + n)
r++ (or r = next smallest w(e))
else
return r
That's a whopping O(m^2 + mn). Any ideas for getting it down to O(mlogn)? Thanks!
You are iterating over all possible edge costs which results in the outer loop of O(m). Notice that if the graph is disconnected when you discard all edges >w(e), it is also disconnected for >w(e') where w(e') < w(e). You can use this property to do a binary search over the edge costs and thus do this in O(log(n)).
lo=min(w(e) for e in edges), hi=max(w(e) for e in edges)
while lo<hi:
mid=(lo+hi)/2
if connected(graph after discarding all e where w(e)>w(mid)):
lo=mid
else:
hi=mid-1
return lo
The binary search has a complexity of O(log (max_e-min_e)) (you can actually bring it down to O(log(edges)) and discarding edges and determining connectivity can be done in O(edges+vertices), so this can be done in O((edge+vertices)*log(edges)).
Warning: I have not tested this in code yet, so there may be bugs. But the idea should work.
How about the following algorithm?
First take a list of all edges (or all distinct edge lengths, using ) from the graph and sort them. That takes O(m*log m) = O(m*log n) time: m is usually less than n^2, so O(log m)=O(log n^2)=O(2*log n)=O(log n).
It is obvious that r should be equal to the weight of some edge. So you can do a binary search on the index of the edge in the sorted array.
For each index you try, you take the length of the correspondong edge as r, and check the graph for connectivity, only using the edges of length <= r with BFS or DFS.
Each iteration of the binary search takes O(m), and you have to make O(log m)=O(log n) iterations.

Resources