Edmond matching algorithm. How to start with a empty matching set? - algorithm

I would like to perform the Edmond matching algorithm or Blossom algorithm on a Graph (example Graph in picture), but how to I start with a empty matching set?
The Algorithm work this way:
Given: Graph G and matching M in G
Task: find matching M' with |M'| =
[M| + 1, or |M'| = IM| if M maximum
1 let F be the forest consisting of all M-exposed nodes; 2 while there
is outer node x and edge {x, y) with y \in V(F), add (x, y} and
matching edge covering y to F;
3 if there are adjacent outer nodes x, y in same tree, then shrink
cycle (M-blossom) in F \cup {x, y) and go to Step 2;
4 if there are adjacent outer nodes x, y in different trees, then
augment M along M-augmenting path P(x) \cup {x, y} \cup P(y);
5 in reverse order, undo each shrinking and re-establish near-perfect
matchings in blossoms.

You don’t begin the algorithm with an empty M. You have to provide one, generally by generating it with a greedy algorithm that parses all edges e of the graph G and adds each e to M if M + e form a matching.

Related

Find the Optimal vertex cover of a tree with k blue vertices

I need to find a 'dynamic - programming' kind of solution for the following problem:
Input:
Perfect Binary-Tree, T = (V,E) - (each node has exactly 2 children except the leafs).
V = V(blue) ∪ V(black).
V(blue) ∩ V(black) = ∅.
(In other words, some vertices in the tree are blue)
Root of the tree 'r'.
integer k
A legal Solution:
A subset of vertices V' ⊆ V which is a vertex cover of T, and |V' ∩ V(blue)| = k. (In other words, the cover V' contains k blue vertices)
Solution Value:
The value of a legal solution V' is the number of vertices in the set = |V'|.
For convenience, we will define the value of an "un-legal" solution to be ∞.
What we need to find:
A solution with minimal Value.
(In other words, The best solution is a solution which is a cover, contains exactly k blue vertices and the number of vertices in the set is minimal.)
I need to define a typical sub-problem. (Like, if i know what is the value solution of a sub tree I can use it to find my value solution to the problem.)
and suggest a formula to solve it.
To me, looks like you are on the right track!
Still, I think you will have to use an additional parameter to tell us how far is any picked vertex from the current subtree's root.
For example, it can be just the indication whether we pick the current vertex, as below.
Let fun (v, b, p) be the optimal size for subtree with root v such that, in this subtree, we pick exactly b blue vertices, and p = 1 if we pick vertex v or p = 0 if we don't.
The answer is the minimum of fun (r, k, 0) and fun (r, k, 1): we want the answer for the full tree (v = r), with exactly k vertices covered in blue (b = k), and we can either pick or not pick the root.
Now, how do we calculate this?
For the leaves, fun (v, 0, 0) is 0 and fun (v, t, 1) is 1, where t tells us whether vertex v is blue (1 if yes, 0 if no).
All other combinations are invalid, and we can simulate it by saying the respective values are positive infinities: for example, for a leaf vertex v, the value fun (v, 3, 1) = +infinity.
In the implementation, the infinity can be just any value greater than any possible answer.
For all internal vertices, let v be the current vertex and u and w be its children.
We have two options: to pick or not to pick the vertex v.
Suppose we pick it.
Then the value we get for f (v, b, 1) is 1 (the picked vertex v) plus the minimum of fun (u, x, q) + fun (w, y, r) such that x + y is either b if the vertex v is black or b - 1 if it is blue, and q and r can be arbitrary: if we picked the vertex v, the edges v--u and v--w are already covered by our vertex cover.
Now let us not pick the vertex v.
Then the value we get for f (v, b, 0) is just the minimum of fun (u, x, 1) + fun (w, y, 1) such that x + y = b: if we did not pick the vertex v, the edges v--u and v--w have to be covered by u and w.

Proof of correctness: Algorithm for diameter of a tree in graph theory

In order to find the diameter of a tree I can take any node from the tree, perform BFS to find a node which is farthest away from it and then perform BFS on that node. The greatest distance from the second BFS will yield the diameter.
I am not sure how to prove this, though? I have tried using induction on the number of nodes, but there are too many cases.
Any ideas would be much appreciated...
Let's call the endpoint found by the first BFS x. The crucial step is proving that the x found in this first step always "works" -- that is, that it is always at one end of some longest path. (Note that in general there can be more than one equally-longest path.) If we can establish this, it's straightforward to see that a BFS rooted at x will find some node as far as possible from x, which must therefore be an overall longest path.
Hint: Suppose (to the contrary) that there is a longer path between two vertices u and v, neither of which is x.
Observe that, on the unique path between u and v, there must be some highest (closest to the root) vertex h. There are two possibilities: either h is on the path from the root of the BFS to x, or it is not. Show a contradiction by showing that in both cases, the u-v path can be made at least as long by replacing some path segment in it with a path to x.
[EDIT] Actually, it may not be necessary to treat the 2 cases separately after all. But I often find it easier to break a configuration into several (or even many) cases, and treat each one separately. Here, the case where h is on the path from the BFS root to x is easier to handle, and gives a clue for the other case.
[EDIT 2] Coming back to this later, it now seems to me that the two cases that need to be considered are (i) the u-v path intersects the path from the root to x (at some vertex y, not necessarily at the u-v path's highest point h); and (ii) it doesn't. We still need h to prove each case.
I'm going to work out j_random_hacker's hint. Let s, t be a maximally distant pair. Let u be the arbitrary vertex. We have a schematic like
u
|
|
|
x
/ \
/ \
/ \
s t ,
where x is the junction of s, t, u (i.e. the unique vertex that lies on each of the three paths between these vertices).
Suppose that v is a vertex maximally distant from u. If the schematic now looks like
u
|
|
|
x v
/ \ /
/ *
/ \
s t ,
then
d(s, t) = d(s, x) + d(x, t) <= d(s, x) + d(x, v) = d(s, v),
where the inequality holds because d(u, t) = d(u, x) + d(x, t) and d(u, v) = d(u, x) + d(x, v). There is a symmetric case where v attaches between s and x instead of between x and t.
The other case looks like
u
|
*---v
|
x
/ \
/ \
/ \
s t .
Now,
d(u, s) <= d(u, v) <= d(u, x) + d(x, v)
d(u, t) <= d(u, v) <= d(u, x) + d(x, v)
d(s, t) = d(s, x) + d(x, t)
= d(u, s) + d(u, t) - 2 d(u, x)
<= 2 d(x, v)
2 d(s, t) <= d(s, t) + 2 d(x, v)
= d(s, x) + d(x, v) + d(v, x) + d(x, t)
= d(v, s) + d(v, t),
so max(d(v, s), d(v, t)) >= d(s, t) by an averaging argument, and v belongs to a maximally distant pair.
Here's an alternative way to look at it:
Suppose G = ( V, E ) is a nonempty, finite tree with vertex set V and edge set E.
Consider the following algorithm:
Let count = 0. Let all edges in E initially be uncolored. Let C initially be equal to V.
Consider the subset V' of V containing all vertices with exactly one uncolored edge:
if V' is empty then let d = count * 2, and stop.
if V' contains exactly two elements then color their mutual (uncolored) edge green, let d = count * 2 + 1, and stop.
otherwise, V' contains at least three vertices; proceed as follows:
Increment count by one.
Remove all vertices from C that have no uncolored edges.
For each vertex in V with two or more uncolored edges, re-color each of its green edges red (some vertices may have zero such edges).
For each vertex in V', color its uncolored edge green.
Return to step (2).
That basically colors the graph from the leaves inward, marking paths with maximal distance to a leaf in green and marking those with only shorter distances in red. Meanwhile, the nodes of C, the center, with shorter maximal distance to a leaf are pared away until C contains only the one or two nodes with the largest maximum distance to a leaf.
By construction, all simple paths from leaf vertices to their nearest center vertex that traverse only green edges are the same length (count), and all other simple paths from a leaf vertex to its nearest center vertex (traversing at least one red edge) are shorter. It can furthermore be proven that
this algorithm always terminates under the conditions given, leaving every edge of G colored either red or green, and leaving C with either one or two elements.
at algorithm termination, d is the diameter of G, measured in edges.
Given a vertex v in V, the maximum-length simple paths in G starting at v are exactly those that contain contain all vertices of the center, terminate at a leaf, and traverse only green edges between center and the far endpoint. These go from v, across the center, to one of the leaves farthest from the center.
Now consider your algorithm, which might be more practical, in light of the above. Starting from any vertex v, there is exactly one simple path p from that vertex, ending at a center vertex, and containing all vertices of the center (because G is a tree, and if there are two vertices in C then they share an edge). It can be shown that the maximal simple paths in G having v as one endpoint all have the form of the union of p with a simple path from center to leaf traversing only green edges.
The key point for our purposes is that the incoming edge of the other endpoint is necessarily green. Therefore, when we perform a search for the longest paths starting there, we have access to those traversing only green edges from leaf across (all vertices of) the center to another leaf. Those are exactly the maximal-length simple paths in G, so we can be confident that the second search will indeed reveal the graph diameter.
1:procedureTreeDiameter(T)
2:pick an arbitrary vertex v where v∈V
3:u = BFS ( T, v )
4:t = BFS ( T, u )
5:return distance ( u, t )
Result: Complexity = O(|V|)

How to generate matrices which satisfy the triangle inequality?

Let's consider square matrix
(n is a dimension of the matrix E and fixed (for example n = 4 or n=5)). Matrix entries
satisfy following conditions:
The task is to generate all matrices E. My question is how to do that? Is there any common approach or algorithm? Is that even possible? What to start with?
Naive solution
A naive solution to consider is to generate every possible n-by-n matrix E where each component is a nonnegative integer no greater than n, then take from those only the matrices that satisfy the additional constraints. What would be the complexity of that?
Each component can take on n + 1 values, and there are n^2 components, so there are O((n+1)^(n^2)) candidate matrices. That has an insanely high growth rate.
Link: WolframAlpha analysis of (n+1)^(n^2)
I think it's safe to safe that this not a feasible approach.
Better solution
A better solution follows. It involves a lot of math.
Let S be the set of all matrices E that satisfy your requirements. Let N = {1, 2, ..., n}.
Definitions:
Let a metric on N to have the usual definition, except with the requirement of symmetry omitted.
Let I and J partition the set N. Let D(I,J) be the n x n matrix that has D_ij = 1 when i is in I and j is in J, and D_ij = 0 otherwise.
Let A and B be in S. Then A is adjacent to B if and only if there exist I and J partitioning N such that A + D(I,J) = B.
We say A and B are adjacent if and only if A is adjacent to B or B is adjacent to A.
Two matrices A and B in S are path-connected if and only if there exists a sequence of adjacent elements of S between them.
Let the function M(E) denote the sum of the elements of matrix E.
Lemma 1:
E = D(I,J) is a metric on N.
Proof:
This is a trivial statement except for the case of an edge going from I to J. Let i be in I and j be in J. Then E_ij = 1 by definition of D(I,J). Let k be in N. If k is in I, then E_ik = 0 and E_kj = 1, so E_ik + E_kj >= E_ij. If k is in J, then E_ik = 1 and E_kj = 0, so E_ij + E_kj >= E_ij.
Lemma 2:
Let E be in S such that E != zeros(n,n). Then there exist I and J partitioning N such that E' = E - D(I,J) is in S with M(E') < M(E).
Proof:
Let (i,j) be such that E_ij > 0. Let I be the subset of N that can be reached from i by a directed path of cost 0. I cannot be empty, because i is in I. I cannot be N, because j is not in I. This is because E satisfies the triangle inequality and E_ij > 0.
Let J = N - I. Then I and J are both nonempty and partition N. By the definition of I, there does not exist any (x,y) such that E_xy = 0 and x is in I and y is in J. Therefore E_xy >= 1 for all x in I and y in J.
Thus E' = E - D(I,J) >= 0. That M(E') < M(E) is obvious, because all we have done is subtract from elements of E to get E'. Now, since E is a metric on N and D(I,J) is a metric on N (by Lemma 1) and E >= D(I,J), we have E' is a metric on N. Therefore E' is in S.
Theorem:
Let E be in S. Then E and zeros(n,n) are path-connected.
Proof (by induction):
If E = zeros(n,n), then the statement is trivial.
Suppose E != zeros(n,n). Let M(E) be the sum of the values in E. Then, by induction, we can assume that the statement is true for any matrix E' having M(E') < M(E).
Since E != zeros(n,n), by Lemma 2 we have some E' in S such that M(E') < M(E). Then by the inductive hypothesis E' is path-connected to zeros(n,n). Therefore E is path-connected to zeros(n,n).
Corollary:
The set S is path-connected.
Proof:
Let A and B be in S. By the Theorem, A and B are both path-connected to zeros(n,n). Therefore A is path-connected to B.
Algorithm
The Corollary tells us that everything in S is path-connected. So an effective way to discover all of the elements of S is to perform a breadth-first search over the graph defined by the following.
The elements of S are the nodes of the graph
Nodes of the graph are connected by an edge if and only if they are adjacent
Given a node E, you can find all of the (potentially) unvisited neighbors of E by simply enumerating all of the possible matrices D(I,J) (of which there are 2^n) and generating E' = E + D(I,J) for each. Enumerating the D(I,J) should be relatively straightforward (there is one for every possible subset I of D, except for the empty set and D).
Note that, in the preceding paragraph, E and D(I,J) are both metrics on N. So when you generate E' = E + D(I,J), you don't have to check that it satisfies the triangle inequality - E' is the sum of two metrics, so it is a metric. To check that E' is in S, all you have to do is verify that the maximum element in E' does not exceed n.
You can start the breadth-first search from any element of S and be guaranteed that you won't miss any of S. So you can start the search with zeros(n,n).
Be aware that the cardinality of the set S grows extremely fast as n increases, so computing the entire set S will only be tractable for small n.

Prove traversing a k-ary tree twice yields the diameter

I've known the algorithm to find the diameter of a tree mentioned here for quite some time:
Select a random node A
Run BFS on this node to find furthermost node from A. name this node as S.
Now run BFS starting from S, find the furthermost node from S, name it D.
Path between S and D is diameter of the tree.
But why does it work?
I would accept both Ivan's and coproc's answer if I can. These are 2 very different approaches that both answer my question.
say S = [A - B - C - D - ... X - Y - Z] is the diameter of the tree.
consider each node in S, say #, start from it and go "away" from the diameter, there won't be a longer chain than min(length(#, A), length(#, Z)).
so dfs from any node on the tree, it will ends at A or 'Z', i.e. one end of the diameter, dfs again from it will of course lead you to the other side of the tree.
refer to this
Suppose you've completed steps 1 and 2 and have found S, and that there is no diameter in the tree that includes S. Pick a diameter PQ of the tree. You basically have to check the possible cases and in all of them, find that either PS or SQ is at least as long as PQ - which would be a contradiction.
In order to systematically check all cases, you can assume that the tree is rooted at A. Then the shortest path between any two vertices U and V is calculated in the following way - let W be the lowest common ancestor of U and V. Then the length of UV is equal to the sum of the distances between U and W and between V and W - and, in a rooted tree, these distances are just differences in the levels of the nodes (and S has a maximum level in this tree).
Then analyze all possible positions S could take with respect to the subtree rooted at W (lowest common ancestor of P and Q) and the vertices P and Q. For example, the first case is simple - S is not in the subtree rooted at W. Then, we can trivially improve the path by selecting the one of P and Q that is more distant to the root, and connecting it to the S. The rest of the cases are similar.
This algorithm works for any acyclic graph (a tree being a special acyclic graph in that it has a root).
A proof can be constructed by choosing any two additional points S2 and D2 and showing that their distance d(S2,D2) ≤ d(S,D). From the algorithm we know
by step 2: d(A,S)≥d(A,D), d(A,S)≥d(A,S2), d(A,S)≥d(A,D2) and
by step 3: d(S,D)≥d(S,A), d(S,D)≥d(S,S2), d(S,D)≥d(S,D2).
By distinguishing at most 5 cases (e.g. the paths SD and S2D2 have no edge in common, the paths SD and S2D2 have edges in common and A is connected to the edges running to S, etc. see image below) one can decompose the above distances into sub-paths and rewrite the inequalities based on the sub-paths. The conclusion follows from simple algebra. The details are left to the reader as an exercise. :-)
A few lemmas/facts before we get started with proof.
T is a tree so there is exactly 1 path between any 2 pair of vertices.
If S--D is the diameter then a BFS with source as S (or D) will end up giving D (or S) the largest distance. (By definition of diameter)
Also lets define |XY| to be the length of the path X--Y.
Define |XX| = 0.
Let A be the random node selected by the algorithm.
After Step 2 let the furthest node got be P.
If P is either S or D then using Lemma 2 we are done. So we must show that P has to be either S or D.
Claim : If S--D is the diameter, then P is either S or D.
Proof: I am going to prove the above by proving the Contrapositive. The proof is for a tree with a unique diameter but it should work with minor changes (mostly the equalities) for non-unique diameters too.
If P is neither S nor D then S--D is not the diameter.
Assume P is neither S nor D.
Case 1: The Path A--P intersects S--D
Let the point of intersection be K. We know that BFS marked P as the farthest node from A and from Lemma 1.
|AP| > |AS|
|AK| + |KP| > |AK| + |KS|
Therefore we get |KP| > |KS|.
Similarly |KP| > |KD|.
Now we consider the path SP
|SP| = |SK| + |KP|
|SP| > |SK| + |KD|
|SP| > |SD|
So SP is longer than the diameter which means SD is NOT the diameter.
Case 2:The Path A--P does NOT intersects S--D
Now we know BFS marked P as the farthest node. So we have
|AP| > |AD|
|AP| > |AS|
We can write |AD| = |AK| + |KD| where K is one of the vertices in the diameter (including S and D). Similarly |AS| = |AK| + |KS|.
Without loss of generality assume |AD|>=|AS|
|AK| + |KD| >= |AK| + |KS|
|KD| >= |KS|
Now consider the path PD
|PD| = |AP| + |AD|
|PD| = |AP| + |AK| + |KD|
|PD| > |AP| + |KD| (|AK| > 0 since A cannot be on the diameter)
|PD| > |KD| + |KD| (|AP| > |KD|)
|PD| > |SK| + |KD| (|KD| >= |KS|)
|PD| > |SD|
So SD is not the diameter and hence the claim.
Let the Set s represents the nodes along the diameter of the tree, with A and Z being the end nodes, and the distance from A to Z is the diameter. For any node, n, that is a member of s the longest possible path from n will end in either A or Z. Now if you pick a rand node in the tree, v, it either is a member of the set, or it has a path to a node, n, in this set. Since the longest path from n is either A or Z and the path from v to n can not be longer than either the path from n to A or n to Z (if it was then v would have to be a member of the set) then running BFS on any node V will first find either A or Z, and the subsequent call will find the complementary end point. Not a math girl, just throwing out thoughts.

Directed graph with max indegree of a vertex

I was trying to look at few applications of network flow when I came across this problem:
We begin with a directed graph, G = (V,E). We need to add more edges to the graph such that we have \forall u,v \in V, e = (u -> v) or e = (v -> u) but not both. i.e. we want to add more edges to the graph so that every pair of vertices in the graph are connected to each other (either with an outgoing edge or incoming edge but not both). So, in total we will have |V||V-1|/2 edges. While we build this graph, we need to ensure that the indegree of a given vertex, say w is the maximum among all the vertices of the graph (if it is possible, given the original graph). Note that we cannot change the orientation of the edges in the original graph.
I am trying to solve it using network flow by building a network without vertex w (and with 2 new vertices for source, s and sink, t). But I'm not sure how to represent the capacities and flow direction in the new graph so as to simplify the problem to network flow in order to find the edge orientations in the graph. Maybe what I'm doing is wrong, but I just wrote if someone might get a hint from it.
When attacking this kind of problem, I tend to write down a mathematical program and then massage it. Clearly, we should orient all missing edges involving w toward w. Let d be the resulting in-degree of w. For all distinct i, j, let x_{ij} = 1 if arc i->j appears in the solution and let x_{ij} = 0 if arc j->i appears.
forall j. sum_i x_{ij} <= k
forall i <> j. x_{ij} = 1 - x_{ji}
forall i <> j. x_{ij} in {0, 1}
Rewrite to use x_{ij} only if i < j.
(*) forall j. sum_{i<j} x_{ij} + sum_{i>j} (1-x_{ji}) <= k
forall i < j. x_{ij} in {0, 1}
Now (*) begins to resemble conservation constraints, as each variable appears once negatively and once positively. Let's change the inequality to an equality.
(*) forall j. x_{si} + sum_{i<j} x_{ij} + sum_{i>j} (1-x_{ji}) = k
^^^^^^ ^
forall i < j. x_{ij} in {0, 1}
forall i. x_{si} >= 0
^^^^^^^^^^^^^^^^^^^^^
We're almost all the way to a flow LP -- we just need to clean out the constants 1 and k. I'll let you handle the rest (it involves introducing t).

Resources