CLRS - Chapter 22
Theorem 22.10
In a depth-first search of an undirected graph G, every edge of G is
either a tree edge or a back edge.
Proof Let (u,v) be an arbitrary edge of G, and suppose without loss of
generality that u.d < v.d. Then the search must discover and finish v
before it finishes u (while u is gray), since v is on u’s adjacency
list. If the first time that the search explores edge (u,v), it is in
the direction from u to v, then v is undiscovered (white) until that
time, for otherwise the search would have explored this edge already
in the direction from v to u. Thus, (u.v) becomes a tree edge. If the
search explores (u,v) first in the direction from v to u, then (u,v)
is a back edge, since u is still gray at the time the edge is first
explored.
I most certainly understand the proof; but not quite convinced with the idea of forward edges.
In the above image, there is a forward edge from the first vertex to the third vertex (first row). The first vertex is the source.
As I understand DFS(S) would include a forward vertex 1 -> 3. (I am obviously wrong, but I need somebody to set me straight!)
It looks like you didn't include the definition of "forward edge," so I'll start with the definition I learned.
Assuming u.d < v.d, DFS labels the edge (u,v) a forward edge if
when crossing the edge from u to v, v has already been marked as visited.
Because of that though, I claim that you cannot have forward edges in an undirected graph.
Assume for the sake of contradiction that it was possible. Therefore, the destination node is already marked as visited. Thus, DFS has already gone there and crossed all of the adjacent edges. In particular, you had to have already crossed that edge in the opposite direction. Thus, the edge has already been marked as a certain type of edge and thus won't be marked as a "forward edge".
Because of this, forward edges can only occur in directed graphs.
Now, just in case you mixed up "forward edges" and "tree edges", the edge you describe still isn't necessarily a tree edge. It is only a tree edge if when crossing, that was the first time you've visited the destination node. The easy way to think about it in undirected graphs is that when you traverse an edge, it is a back edge if the destination node has been reached already, and a tree edge otherwise.
I believe that what you are missing is some assumption about the order in which the algorithm would visit the different vertices.
Let's assume the algorithm visits the vertices in a lexicographic order. let's name the vertices this way:
-------
| |
S - A - B
| | |
C - D - E
In this case, the forward edges will be S->A, A->B, B->E, E->D, D->C. the rest of the edges are back edges.
Now let's rename the graph:
-------
| |
S - B - A
| | |
C - D - E
In this case, the forward edges will be S->A, A->B, B->D, D->C, D->E (note that S->A and S->B are not the same edge as in the previous example).
As you can see, the output depends on the order in which the algorithm selects the vertices. when the graph is anonymous, any output may be correct.
In the DFS tree of a general graph, there are TREE, FORWARD, BACK and CROSS edges.
In the DFS tree of an undirected graph, the would-be FORWARD edges are labeled as BACK edges.
The would-be CROSS edges are labeled as TREE edges.
In both cases, the reason is that the edges can be traversed in both directions, but you first encounter them as BACK and TREE and second time as FORWARD and maybe CROSS and they are already labeled.
In a sense, an edge is both FORWARD and BACK and can be both CROSS and TREE, but is first found as BACK and TREE, repectively.
Related
The heuristic solution that I've been given is:
Perform a depth-first-search on the graph
Delete all the leaves
The remaining graph forms a vertex cover
I've been given the question: "Show that this heuristic is at most twice as large as the optimal solution to the vertex cover". How can I show this?
I assume that the graph is connected (if it's not the case, we can solve this problem for each component separately).
I also assume that a dfs-tree is rooted and a leaf is a vertex that doesn't have children in the rooted dfs-tree (it's important. If we define it differently, the algorithm may not work).
We need to show to things:
The set of vertices returned by the algorithm is a vertex cover. Indeed, there can be only types of edges in the dfs-tree of any undirected graph: tree edges (such an edge is covered as at least on of its endpoints is not a leaf) and a back edge (again, one of its endpoint is not a leaf because back edge goes from a vertex to its ancestor. A leaf cannot be an ancestor of a leaf).
Let's consider the dfs-tree and ignore the rest of the edges. I'll show that it's not possible to cover tree edges using less than half non-leave vertices. Let S be a minimum vertex cover. Consider a vertex v, such that v is not a leaf and v is not in S (that is, v is returned by the heuristic in question but it's not in the optimal answer). v is not a leaf, thus there is an edge v -> u in the dfs-tree (where u is a successor of v). The edge v -> u is covered by S. Thus, u is in S. Let's define a mapping f from vertices returned by the heuristic that are not in S as f(v) = u (where v and u have the same meaning as in the previous sentence). Note that v is a parent of u in the dfs-tree. But there can be only one parent for any vertex in a tree! Thus, f is an injection. It means that the number of vertices in the set returned by the heuristic but not in the optimal answer is not greater than the size of the optimal answer. That's exactly what we needed to show.
Bad news: heuristics does not work.
Strictly said, 1 isolated vertex is counter-example for the question.
Nevertheless, heuristic does not provide vertex cover solution at all, even if you correct it for isolated vertex and for 2-point cliques.
Take a look at fully connected graphs with number of vertexes from 1 to 3:
1 - strictly said, isolated vertex is not a leaf (it has degree 0, while leaf is a vertex with degree 1), so heuristic will keep it, while vertex cover will not
2 - heuristic will drop both leaves, while vertex cover will keep at least 1 of them
3 - heuristic will leave 1 vertex, while vertex cover has to keep at least 2 vertexes of this clique
I have searched the net and could not find any explanation of a DFS algorithm for finding all articulation vertices of a graph. There is not even a wiki page.
From reading around, I got to know the basic facts from here. PDF
There is a variable at each node which is actually looking at back edges and finding the closest and upmost node towards the root node. After processing all edges it would be found.
But I do not understand how to find this down & up variable at each node during the execution of DFS. What is this variable doing exactly?
Please explain the algorithm.
Thanks.
Finding articulation vertices is an application of DFS.
In a nutshell,
Apply DFS on a graph. Get the DFS tree.
A node which is visited earlier is a "parent" of those nodes which are reached by it and visited later.
If any child of a node does not have a path to any of the ancestors of its parent, it means that removing this node would make this child disjoint from the graph.
There is an exception: the root of the tree. If it has more than one child, then it is an articulation point, otherwise not.
Point 3 essentially means that this node is an articulation point.
Now for a child, this path to the ancestors of the node would be through a back-edge from it or from any of its children.
All this is explained beautifully in this PDF.
I'll try to develop an intuitive understanding on how this algorithm works and also give commented pseudocode that outputs Bi-Components as well as bridges.
It's actually easy to develop a brute force algorithm for articulation points. Just take out a vertex, and run BFS or DFS on a graph. If it remains connected, then the vertex is not an articulation point, otherwise it is. This will run in O(V(E+V)) = O(EV) time. The challenge is how to do this in linear time (i.e. O(E+V)).
Articulation points connect two (or more) subgraphs. This means there are no edges from one subgraph to another. So imagine you are within one of these subgraphs and visiting its node. As you visit the node, you flag it and then move on to the next unflagged node using some available edge. While you are doing this, how do you know you are within still same subgraph? The insight here is that if you are within the same subgraph, you will eventually see a flagged node through an edge while visiting an unflagged node. This is called a back edge and indicates that you have a cycle. As soon as you find a back edge, you can be confident that all the nodes through that flagged node to the one you are visiting right now are all part of the same subgraph and there are no articulation points in between. If you didn't see any back edges then all the nodes you visited so far are all articulation points.
So we need an algorithm that visits vertices and marks all points between the target of back edges as currently-being-visited nodes as within the same subgraph. There may obviously be subgraphs within subgraphs so we need to select largest subgraph we have so far. These subgraphs are called Bi-Components. We can implement this algorithm by assigning each bi-component an ID which is initialized as just a count of the number of vertices we have visited so far. Later as we find back edges, we can reset the bi-compinent ID to lowest we have found so far.
We obviously need two passes. In the first pass, we want to figure out which vertex we can see from each vertex through back edges, if any. In the second pass we want to visit vertices in the opposite direction and collect the minimum bi-component ID (i.e. earliest ancestor accessible from any descendants). DFS naturally fits here. In DFS we go down first and then come back up so both of the above passes can be done in a single DFS traversal.
Now without further ado, here's the pseudocode:
time = 0
visited[i] = false for all i
GetArticulationPoints(u)
visited[u] = true
u.st = time++
u.low = v.st //keeps track of highest ancestor reachable from any descendants
dfsChild = 0 //needed because if no child then removing this node doesn't decompose graph
for each ni in adj[i]
if not visited[ni]
GetArticulationPoints(ni)
++dfsChild
parents[ni] = u
u.low = Min(u.low, ni.low) //while coming back up, get the lowest reachable ancestor from descendants
else if ni <> parent[u] //while going down, note down the back edges
u.low = Min(u.low, ni.st)
//For dfs root node, we can't mark it as articulation point because
//disconnecting it may not decompose graph. So we have extra check just for root node.
if (u.low = u.st and dfsChild > 0 and parent[u] != null) or (parent[u] = null and dfsChild > 1)
Output u as articulation point
Output edges of u with v.low >= u.low as bridges
output u.low as bicomponent ID
One fact that seems to be left out of all the explanations:
Fact #1: In a depth first search spanning tree (DFSST), every backedge connects a vertex to one of its ancestors.
This is essential for the algorithm to work, it is why an arbitrary spanning tree won't work for the algorithm. It is also the reason why the root is an articulation point iff it has more than 1 child: there cannot be a backedge between the subtrees rooted at the children of the spanning tree's root.
A proof of the statement is, let (u, v) be a backedge where u is not an ancestor of v, and (WLOG) u is visited before v in the DFS. Let p be the deepest ancestor of both u and v. Then the DFS would have to visit p, then u, then somehow revisit p again before visiting v. But it isn't possible to revisit p before visiting v because there is an edge between u and v.
Call V(c) the set of vertices in the subtree rooted at c in the DFSST
Call N(c) the set of vertices for which that have a neighbor in V(c) (by edge or by backedge)
Fact #2:
For a non root node u,
If u has a child c such that N(c) ⊆ V(c) ∪ {u} then u is an articulation point.
Reason: for every vertex w in V(c), every path from the root to w must contain u. If not, such a path would have to contain a back edge that connects an ancestor of u to a descendant of u due to Fact #1, making N(c) larger than V(c).
Fact #3:
The converse of fact #2 is also true.
Reason: Every descendant of u has a path to the root that doesn't pass through u.
A descendant in V(c) can bypass u with a path through a backedge that connects V(c) to N(c)/V(c).
So for the algorithm, you only need to know 2 things about each non-root vertex u:
The depth of the vertex, say D(u)
The minimum depth of N(u), also called the lowpoint, lets say L(u)
So if a vertex u has a child c, and L(c) is less than D(u), then that mean the subtree rooted at c has a backedge that reaches out to an ancestor of u which makes it not an articulation point by Fact #3. Conversely also by Fact #2.
If low of the descendant of u is greater than the dfsnum of u, then u is said to be the Articulation Point.
int adjMatrix[256][256];
int low[256], num=0, dfsnum[256];
void cutvertex(int u){
low[u]=dfsnum[u]=num++;
for (int v = 0; v < 256; ++v)
{
if(adjMatrix[u][v] && dfsnum[v]==-1)
{
cutvertex(v);
if(low[v]>dfsnum[u])
cout<<"Cut Vertex: "<<u<<"\n";
low[u]=min(low[u], low[v]);
}
else{
low[u]=min(low[u], dfsnum[v]);
}
}
}
This is not a homework. I am trying to do exercises from a textbook to understand MST (minimum spanning tree).
Suppose I have a cycle C in a weighted undirected graph G. As I understand, the following is correct:
The heaviest edge in C belongs to no MST of G. That is, there is no MST of G, which contains that edge.
The lightest edge in C belongs to some MST of G. That is, there is an MST of G, which contains that edge.
Now I wonder if the followings claims are correct too.
The lightest edge in C belongs to all MST of G. That is, there is no MST of G, which does not contain that edge.
Any edge in C except the heaviest one belongs to some MST. That is, for each edge in C except the heaviest one there is an MST, which contains that edge.
Could you prove the last claim?
Even for the first claim if there are multiple edges which are lightest, all need not be included in the MST.
The first one of your claims is always true. The lightest edge is on the MST for any graph.
The second one is not always true. It is always true if the entire graph is a
cycle and thus every node has 2 edges incident to it. However, in the general case,
an edge (u,v) of weight k is never on MST whenever there is a path between the nodes u and v
connecting them at a total weight less than k.
I don't think your claims are valid. The problem is that you are only considering a cycle in a larger graph.
Consider for example a graph G consisting of 6 nodes in a cycle (with random weights >1). Your claims might hold for that graph but now add 1 node in the center of the graph and connect it with 6 links of cost 1. The MST of your entire graph now will consist of only those 6 edges (which form a star).
If you now look at your claims, you'll see:
The lightest edge in your cycle does not belong to the MST (=star)
None of the edges in the cycle are in the MST
I have the following problem on my homework:
Give an O(n+m) algorithm to find that whether an edge e would be a part of the MST of a graph
(We are allowed to get help from others on this assignment, so this isn't cheating.)
I think that I could do a BFS and find if this edge is a edge between two layers and if so whether this edge was the smallest across those layers. But what could I say when this edge is not a tree edge of the BFS tree?
As a hint, if an edge is not the heaviest edge in any cycle that contains it, there is some MST that contains that edge. To see this, consider any MST. If the MST already contains the edge, great! We're done. If not, then add the edge into the MST. This creates a cycle in the graph. Now, find the heaviest edge in that cycle and remove it from the graph. Everything is now still connected (because if two nodes used to be connected by a path that went across that edge, now they can be connected by just going around the cycle the other way). Moreover, since the cost of the edge was deleted wasn't any smaller than the cost of the edge in question (because the edge isn't the heaviest edge in the cycle), the cost of this tree can't be any greater than before. Since we started with an MST, we must therefore end with an MST.
Using this property, see if you can find whether the edge is the heaviest edge on any cycle in linear time.
We will solve this using MST cycle property, which says that, "For any cycle C in the graph, if the weight of an edge e of C is larger than the weights of all other edges of C, then this edge cannot belong to an MST."
Now, run the following O(E+V) algorithm to test if the edge E connecting vertices u and v will be a part of some MST or not.
Step 1
Run dfs from one of the end-points(either u or v) of the edge E considering only those edges that have weight less than that of E.
Step 2
Case 1
If at the end of this dfs, the vertices u and v get connected, then edge E cannot be a part of some MST. This is because in this case there definitely exists a cycle in the graph with the edge E having the maximum weight and it cannot be a part of the MST(from the cycle property).
Case 2
But if at the end of the dfs u and v stay disconnected, then edge E must be the part of some MST as in this case E is never the maximum weight edge of the cycles that it is a part of.
Find if there are any paths that are cheaper than the current one (u,v) that creates a cycle to u and v. If yes, then (u,v) is not on the mst. Otherwise it is. This can be proved by the cut property and the cycle property.
If we have an (arbitrary) connected undirected graph G, whose edges have distinct weights,
does every MST of G contains the minimum weighted edge?
is there an MST of G that does not contain the maximum weighted edge?
Also, I'm more thankful if someone can give a hint of the key things one must keep in mind when dealing with such MST questions.
This is a homework problem. Thanks.
is there an MST of G that does not contain the maximum weighted edge?
There may be, but there doesn't have to be. Consider a 4-vertex graph as follows:
[A]--{2}--[B]
| |
| |
{1} {3}
| |
| |
[C]-{50}--[D]
The minimum spanning tree consists of the edge set {CA, AB, BD}. The maximum edge weight is 50, along {CD}, but it's not part of the MST. But if G were already equal to its own MST, then obviously it would contain its own maximum edge.
does every MST of G contains the minimum weighted edge?
Yes. MSTs have a cut property. A cut is simply a partition of the vertices of the graph into two disjoint sets. For any cut you can make, if the weight of an edge in that cut is smaller than the weights of the other edges in the cut, then this edge belongs to all MSTs in the graph. Because you guaranteed that the edge weights are distinct, you have also guaranteed that there is an edge which is smaller than all other edges.
Also, I'm more thankful if someone can give a hint of the key things one must keep in mind when dealing with such MST questions.
Your best bet is to reason about things using the properties of MSTs in general, and to try to construct specific counterexamples which you think will prove your case. I gave an instance of each line of reasoning above. Because of the cut and cycle properties, you can always determine exactly which edges are in an MST, so you can systematically test each edge to determine whether or not it's in the MST.
Does every MST of G contains the minimum weighted edge?
Yes. Lets assume we have a MST which does not contain the min weight edge. Now the inclusion of this edge to the MST will result in a cycle. Now there will always be another edge in the cycle which can be removed to remove the cycle and still maintain the graph(MST) connected.
Is there an MST of G that does not contain the maximum weighted edge?
Depends on the graph. If the graph itself is a tree then we need to include all of its n-1 edges in the MST, so the max weight edge cannot be excluded. Also if the max weight edge is a cut-edge so that its exclusion will never result in connectivity, then the max weight edge cannot be excluded. But if the max weight edge is a part of a cycle then it is possible to exclude from the MST.
For your first question the answer is no, and kruskal's algorithm proves it. It will always select the minimum cost edge.
For the second question the answer is yes, and it's trivial to find an example graph:
1 - 2 (cost 10)
2 - 3 (cost 100)
3 - 1 (cost 1000)
The third edge will never be selected as it introduces a cycle. So basically, if the edge with the maximum cost would create a cycle if inserted in the MST, it won't be inserted.
I see you too are studying for CSC263 through the 2009 test? (Same here!)
Another way to see that the minimum is always in the MST is to look simply at this minimum edge (call it e):
e
v1 ---------------- v2
(Assume this has connections to other verticies). Now, for e NOT to be included in the final MST means at one point we have, without loss of generality, v1 in the MST but not v2. However, the only way to add v2 without adding e would be to say that the addition of v1 didn't add e to the queue (because by definition, e would be at the top of the queue because it has lowest priority) but this contradicts the MST construction theorem.
So essentially, it is impossible to have an edge with minimum weight not get to the queue which means that any MST constructed would have it.