Find the Optimal vertex cover of a tree with k blue vertices - algorithm

I need to find a 'dynamic - programming' kind of solution for the following problem:
Input:
Perfect Binary-Tree, T = (V,E) - (each node has exactly 2 children except the leafs).
V = V(blue) ∪ V(black).
V(blue) ∩ V(black) = ∅.
(In other words, some vertices in the tree are blue)
Root of the tree 'r'.
integer k
A legal Solution:
A subset of vertices V' ⊆ V which is a vertex cover of T, and |V' ∩ V(blue)| = k. (In other words, the cover V' contains k blue vertices)
Solution Value:
The value of a legal solution V' is the number of vertices in the set = |V'|.
For convenience, we will define the value of an "un-legal" solution to be ∞.
What we need to find:
A solution with minimal Value.
(In other words, The best solution is a solution which is a cover, contains exactly k blue vertices and the number of vertices in the set is minimal.)
I need to define a typical sub-problem. (Like, if i know what is the value solution of a sub tree I can use it to find my value solution to the problem.)
and suggest a formula to solve it.

To me, looks like you are on the right track!
Still, I think you will have to use an additional parameter to tell us how far is any picked vertex from the current subtree's root.
For example, it can be just the indication whether we pick the current vertex, as below.
Let fun (v, b, p) be the optimal size for subtree with root v such that, in this subtree, we pick exactly b blue vertices, and p = 1 if we pick vertex v or p = 0 if we don't.
The answer is the minimum of fun (r, k, 0) and fun (r, k, 1): we want the answer for the full tree (v = r), with exactly k vertices covered in blue (b = k), and we can either pick or not pick the root.
Now, how do we calculate this?
For the leaves, fun (v, 0, 0) is 0 and fun (v, t, 1) is 1, where t tells us whether vertex v is blue (1 if yes, 0 if no).
All other combinations are invalid, and we can simulate it by saying the respective values are positive infinities: for example, for a leaf vertex v, the value fun (v, 3, 1) = +infinity.
In the implementation, the infinity can be just any value greater than any possible answer.
For all internal vertices, let v be the current vertex and u and w be its children.
We have two options: to pick or not to pick the vertex v.
Suppose we pick it.
Then the value we get for f (v, b, 1) is 1 (the picked vertex v) plus the minimum of fun (u, x, q) + fun (w, y, r) such that x + y is either b if the vertex v is black or b - 1 if it is blue, and q and r can be arbitrary: if we picked the vertex v, the edges v--u and v--w are already covered by our vertex cover.
Now let us not pick the vertex v.
Then the value we get for f (v, b, 0) is just the minimum of fun (u, x, 1) + fun (w, y, 1) such that x + y = b: if we did not pick the vertex v, the edges v--u and v--w have to be covered by u and w.

Related

shortest Path in directed graph G

I had an exam yesterday and I would like to check if I was answering correctly on one of the questions.
The question:
G = (V, E, w) is a directed, simple graph (V: set of vertices, E: set of edges, w: non-negative weight function). There is a non-empty subset of G denoted E(red).
A path p in G will be called n-red if there are n red edges on p. d_red(u, v) will be the lightest path from vertex u to vertex v that is at least 1-red. If all paths from u to v are 0-red, d_red(u, v) = Infinity.
The weight of a path p is the sum of all edges that are part of p.
Input:
G = (V, E, w)
s, t that are elements of V
f_red: E -> { true, false }
f_red(red edge) = true
f_red(non-red edge) = false
Output:
d_red(s, t) (the lightest path that includes at least one red edge).
Runtime Constraint: O(V log V + E)
In a few words, my solution was to use Dijkstra's algorithm. A Boolean variable that is initially false is used to keep track of whether at least one red edge has been encountered. This is checked for every iteration with f_red and the variable is set to true if f_red(current edge) = true. If the variable is still false at the end, return d_red(u, v) = Infinity.
What do you think about that?

Count cycles of length 3 using DFS

Let G=(V,E) be an undirected graph. How can we count cycles of length 3 exactly once using following DFS:
DFS(G,s):
foreach v in V do
color[v] <- white; p[v] <- nil
DFS-Visit(s)
DFS-Visit(u)
color[u] <- grey
foreach v in Adj[u] do
if color[v] = white then
p[v] = u; DFS-Visit(v)
color[u] <- black
There is a cycle whenever we discover a node that already has been discovered (grey). The edge to that node is called back edge. The cycle has length 3 when p[p[p[v]]] = v, right? So
DFS-Visit(u)
color[u] <- grey
foreach v in Adj[u] do
if color[v] = grey and p[p[p[v]]] = v then
// we got a cycle of length 3
else if color[v] = white then
p[v] = u; DFS-Visit(v)
color[u] <- black
However how can I create a proper counter to count the number of cycles and how can I count each cycle only once?
I'm not sure to understand how your condition parent[parent[parent[v]]] == v works. IMO it should never be true as long as parent represents a structure of tree (because it should correspond to the spanning tree associated with the DFS).
Directed graphs
Back edges, cross edges and forward edges can all "discover" new cycles. For example:
We separate the following possibilities (let's say you reach a u -> v edge):
Back edge: u and v belongs to the same 3-cycle iff parent[parent[u]] = v.
Cross edge: u and v belongs to the same 3-cycle iff parent[u] = parent[v].
Forward edge: u and v belongs to the same 3-cycle iff parent[parent[v]] = u.
Undirected graphs
There are no more cross edges. Back edges and forward edges are redundant. Therefore you only have to check back edges: when you reach a u -> v back edge, u and v belongs to the same 3-cycle iff parent[parent[u]] = v.
def dfs(u):
color[u] = GREY
for v in adj[u]:
# Back edge
if color[v] == GREY:
if parent[parent[u]] == v:
print("({}, {}, {})".format(v + 1, parent[u] + 1, u + 1))
# v unseen
elif color[v] == WHITE:
parent[v] = u
dfs(v)
color[u] = BLACK
If you want to test it:
WHITE, GREY, BLACK = 0, 1, 2
nb_nodes, nb_edges = map(int, input().split())
adj = [[] for _ in range(nb_nodes)]
for _ in range(nb_edges):
u, v = map(int, input().split())
adj[u - 1].append(v - 1)
adj[v - 1].append(u - 1)
parent = [None] * nb_nodes
color = [WHITE] * nb_nodes
If a solution without using DFS is okay, there is an easy solution which runs in O(NMlog(N³)) where N is the number of vertices in the graph and M is the number of edges.
We are going to iterate over edges instead of iterating over vertices. For every edge u-v, we have to find every vertex which is connected to both u and v. We can do this by iterating over every vertex w in the graph and checking if there is an edge v-w and w-u. Whenever you find such vertex, order u,v,w and add the ordered triplet to a BBST that doesn't allow repetitions (eg: std::set in C++). The count of length 3 cycles will be exactly the size of the BBST (amount of elements added) after you check every edge in the graph.
Let's analyze the complexity of the algorithm:
We iterate over every edge. Current complexity is O(M)
For each edge, we iterave over every vertex. Current complexity is O(NM)
For each (edge,vertex) pair that forms a cycle, we are going to add a triplet to a BBST. Adding to a BBST has O(log(K)) complexity where K is the size of the BST. In worst case, every triplet of vertices forms a cycle, so we may add up to O(N³) elements to the BST, and the complexity to add some element can get as high as O(log(N³)). Final complexity is O(NMlog(N³)) then. This may sound like a lot, but in worst case M = O(N²) so the complexity will be O(N³log(N³)). Since we may have up to O(N³) cycles of length 3, our algorithm is just a log factor away from an optimal algorithm.

Proof of correctness: Algorithm for diameter of a tree in graph theory

In order to find the diameter of a tree I can take any node from the tree, perform BFS to find a node which is farthest away from it and then perform BFS on that node. The greatest distance from the second BFS will yield the diameter.
I am not sure how to prove this, though? I have tried using induction on the number of nodes, but there are too many cases.
Any ideas would be much appreciated...
Let's call the endpoint found by the first BFS x. The crucial step is proving that the x found in this first step always "works" -- that is, that it is always at one end of some longest path. (Note that in general there can be more than one equally-longest path.) If we can establish this, it's straightforward to see that a BFS rooted at x will find some node as far as possible from x, which must therefore be an overall longest path.
Hint: Suppose (to the contrary) that there is a longer path between two vertices u and v, neither of which is x.
Observe that, on the unique path between u and v, there must be some highest (closest to the root) vertex h. There are two possibilities: either h is on the path from the root of the BFS to x, or it is not. Show a contradiction by showing that in both cases, the u-v path can be made at least as long by replacing some path segment in it with a path to x.
[EDIT] Actually, it may not be necessary to treat the 2 cases separately after all. But I often find it easier to break a configuration into several (or even many) cases, and treat each one separately. Here, the case where h is on the path from the BFS root to x is easier to handle, and gives a clue for the other case.
[EDIT 2] Coming back to this later, it now seems to me that the two cases that need to be considered are (i) the u-v path intersects the path from the root to x (at some vertex y, not necessarily at the u-v path's highest point h); and (ii) it doesn't. We still need h to prove each case.
I'm going to work out j_random_hacker's hint. Let s, t be a maximally distant pair. Let u be the arbitrary vertex. We have a schematic like
u
|
|
|
x
/ \
/ \
/ \
s t ,
where x is the junction of s, t, u (i.e. the unique vertex that lies on each of the three paths between these vertices).
Suppose that v is a vertex maximally distant from u. If the schematic now looks like
u
|
|
|
x v
/ \ /
/ *
/ \
s t ,
then
d(s, t) = d(s, x) + d(x, t) <= d(s, x) + d(x, v) = d(s, v),
where the inequality holds because d(u, t) = d(u, x) + d(x, t) and d(u, v) = d(u, x) + d(x, v). There is a symmetric case where v attaches between s and x instead of between x and t.
The other case looks like
u
|
*---v
|
x
/ \
/ \
/ \
s t .
Now,
d(u, s) <= d(u, v) <= d(u, x) + d(x, v)
d(u, t) <= d(u, v) <= d(u, x) + d(x, v)
d(s, t) = d(s, x) + d(x, t)
= d(u, s) + d(u, t) - 2 d(u, x)
<= 2 d(x, v)
2 d(s, t) <= d(s, t) + 2 d(x, v)
= d(s, x) + d(x, v) + d(v, x) + d(x, t)
= d(v, s) + d(v, t),
so max(d(v, s), d(v, t)) >= d(s, t) by an averaging argument, and v belongs to a maximally distant pair.
Here's an alternative way to look at it:
Suppose G = ( V, E ) is a nonempty, finite tree with vertex set V and edge set E.
Consider the following algorithm:
Let count = 0. Let all edges in E initially be uncolored. Let C initially be equal to V.
Consider the subset V' of V containing all vertices with exactly one uncolored edge:
if V' is empty then let d = count * 2, and stop.
if V' contains exactly two elements then color their mutual (uncolored) edge green, let d = count * 2 + 1, and stop.
otherwise, V' contains at least three vertices; proceed as follows:
Increment count by one.
Remove all vertices from C that have no uncolored edges.
For each vertex in V with two or more uncolored edges, re-color each of its green edges red (some vertices may have zero such edges).
For each vertex in V', color its uncolored edge green.
Return to step (2).
That basically colors the graph from the leaves inward, marking paths with maximal distance to a leaf in green and marking those with only shorter distances in red. Meanwhile, the nodes of C, the center, with shorter maximal distance to a leaf are pared away until C contains only the one or two nodes with the largest maximum distance to a leaf.
By construction, all simple paths from leaf vertices to their nearest center vertex that traverse only green edges are the same length (count), and all other simple paths from a leaf vertex to its nearest center vertex (traversing at least one red edge) are shorter. It can furthermore be proven that
this algorithm always terminates under the conditions given, leaving every edge of G colored either red or green, and leaving C with either one or two elements.
at algorithm termination, d is the diameter of G, measured in edges.
Given a vertex v in V, the maximum-length simple paths in G starting at v are exactly those that contain contain all vertices of the center, terminate at a leaf, and traverse only green edges between center and the far endpoint. These go from v, across the center, to one of the leaves farthest from the center.
Now consider your algorithm, which might be more practical, in light of the above. Starting from any vertex v, there is exactly one simple path p from that vertex, ending at a center vertex, and containing all vertices of the center (because G is a tree, and if there are two vertices in C then they share an edge). It can be shown that the maximal simple paths in G having v as one endpoint all have the form of the union of p with a simple path from center to leaf traversing only green edges.
The key point for our purposes is that the incoming edge of the other endpoint is necessarily green. Therefore, when we perform a search for the longest paths starting there, we have access to those traversing only green edges from leaf across (all vertices of) the center to another leaf. Those are exactly the maximal-length simple paths in G, so we can be confident that the second search will indeed reveal the graph diameter.
1:procedureTreeDiameter(T)
2:pick an arbitrary vertex v where v∈V
3:u = BFS ( T, v )
4:t = BFS ( T, u )
5:return distance ( u, t )
Result: Complexity = O(|V|)

Tarjan's lowest common ancestor algorithm explanation

I am having a tough time understanding Tarjan's lowest common ancestor algorithm. Can somebody explain it with an example?
I am stuck after the DFS search, what exactly does the algorithm do?
My explanation will be based on the wikipedia link posted above :).
I assumed that you already know about the union disjoint structure using in the algorithm.
(If not please read about it, you can find it in "Introduction to Algorithm").
The basic idea is every times the algorithm visit a node x, the ancestor of all its descendants will be that node x.
So to find a Least common ancestor (LCA) r of two nodes (u,v), there will be two cases:
Node u is a child of node v (vice versa), this case is obvious.
Node u is ith branch and v is the jth branch (i < j) of node r, so after visit node u, the algorithm backtrack to node r, which is the ancestor of the two nodes, mark the ancestor of node u as r and go to visit node v.
At the moment it visit node v, as u is already marked as visited (black), so the answer will be r. Hope you get it!
I will explain using the code from CP-Algorithms:
void dfs(int v)
{
visited[v] = true;
ancestor[v] = v;
for (int u : adj[v]) {
if (!visited[u]) {
dfs(u);
union_sets(v, u);
ancestor[find_set(v)] = v;
}
}
for (int other_node : queries[v]) {
if (visited[other_node])
cout << "LCA of " << v << " and " << other_node
<< " is " << ancestor[find_set(other_node)] << ".\n";
}
}
Let's outline a proof of the algorithm.
Lemma 1: For each vertex v and its parent p, after we visit v from p and union v with p, p and all vertices in the subtree of root v (i.e. p and all descendents of v, including v) will be in one disjoint set represented by p (i.e. ancester[root of the disjoint set] is p).
Proof: Suppose the tree has height h. Then proceed by induction in vertex height, starting from the leaf nodes.
Lemma 2: For each vertex v, right before we mark it as visited, the following statements are true:
Each v's parents pi will be in a disjoint set that contains precisely pi and all vertices in the subtrees of pi that pi has already finished visiting.
Every visited vertex so far is in one of these disjoint sets.
Proof: We proceed by induction. The statement is vacuously true for the root (the only vertex with height 0) as it has no parent. Now suppose the statement holds for every vertex of height k for k ≥ 0, and suppose v is a vertex of height k + 1. Let p be v's parent. Before p visits v, suppose it has already visited its children c1, c2, ..., cn. By Lemma 1, p and all vertices in the subtrees of root c1, c2, ..., cn are in one disjoint set represented by p. Furthermore, all newly visited vertices after we visited p are the vertices in this disjoint set. Since p is of height k, we can use the induction hypothesis to conclude that v indeed satisfies 1 and 2.
We are now ready to prove the algorithm.
Claim: For each query (u,v), the algorithm outputs the lowest common ancester of u and v.
Proof: Without loss of generality suppose we visit u before we visit v in the DFS. Then either v is a descendent of u or not.
If v is a descedent of u, by Lemma 1 we know that u and v are in one disjoint set that is represented by u, which means ancestor[find_set(v)] is u, the correct answer.
If v is not a descendent of u, then by Lemma 2 we know that u must be in one of the disjoint sets, each of them represented by a parent of v at the time we mark v. Let p be the representing vertex of the disjoint set u is in. By Lemma 2 we know p is a parent of v, and u is in a visited subtree of p and therefore a descendent of p. These are not changed after we have visited all v's children, so p is indeed a common ancestor of u and v. To see p is the lowest common ancestor, suppose q is the child of p of which v is a descendent (i.e. if we travel back to root from v, q is the last parent before we reach p; q can be v). Suppose for contradiction that u is also a descendent of q. Then by Lemma 2 u is in both the disjoint set represented by p and the disjoint set represented by q, so this disjoint set contains two v's parents, a contradiction.

find the minimum size dominating set for a tree using greedy algorithm

Dominating Set (DS) := given an undirected graph G = (V;E), a set of
vertices S V is a dominating set if for every vertex in V , there is a vertex in
S that is adjacent to v. Entire vertex set V is a trivial dominating set in
any graph.
Find minimum size dominating set for a tree.
I'll attempt to prove this in a more formal way.
OUTLINE
To prove your greedy algorithm is correct, you need to prove two things:
First, that your greedy choice is valid and can always be used in the formation of an optimal solution, and
second, that your problem has an optimal substructure property, that is, you can form an optimal solution from optimal solutions to subproblems of your own problem.
Greedy Choice: In your tree T = (V, E), find a vertex v in the tree with the highest number of leaves. Add it to your dominant set.
Optimal Substructure
T' = (V', E') such that:
V' = V \ ({a : a ϵ V, a is adjacent to v, and a's degree ≤ 2} ∪ {v})
E' = E - any edge involving any of the removed vertices
In other words
Look for a vertex with the highest number of leaves, remove any of its adjacent vertices with degree less than or equal to 2, then remove v itself, and add it to your dominant set. Repeat this until you have no vertices left.
PROOF
Greedy choice proof
For any leaf l, it must be that either itself or its parent is in the dominant set. In our case, the vertex v we would have chosen is in this situation.
Let A = {v1 , v2 , ... , vk} be a minimum dominant set of T. If A already has v as member, we are done. If it does not, we see two situations:
v has some neighbouring leaf l. Then, l must be part of the dominant set, otherwise our set is not dominating the entire tree. We can simply thus form A' = {A - {l} + {v}} and still be a dominant set. Since |A'| = |A|, A' is still optimal.
v does not have any neighbouring leaves l. Then, because v was chosen such that it has the highest number of leaves, then no vertex in T have any leaves. Then T is not a tree. Contradiction.
Thus, we will always be able to form an optimal solution with our greedy choice.
Optimal Substructure proof
Suppose that A is a minimum dominant set for T = (V, E), but that A' = A \ {v} is not a minimum dominant set for T' as defined above.
Make a minimum dominant set for T', call it B. As aforementioned, |B| < |A'|. It can be shown that B' = B ∪ {v} is a dominating set for T. Then, since |A'| = |A| - 1, |B'| = |B| + 1, we get |B'| < |A|. This is contradictory, since we assumed that A is an minimum independent set. Thus it must be that A' is also a minimum independent set of T'.
Proving B' = B ∪ {v} is a dominating set for T:
v may have had adjacent vertices adjacent not in T'. We will show that any vertices that were not considered in T' will be dominated by vertices in B' (This means that we picked our set optimally): Let y be some vertex adjacent to v and not in T'. By definition of T', y can only have degree 1 or 2. Now, y is dominated by v. If y is a leaf, then we are done. However, if y is of degree 2, then y is connected another node which is necessarily in the dominant set of B. This is because, when we removed v to make T', the degree of y became 1, meaning that y or its parent was necessarily added to the dominant set. Hence, B' is a dominant set for T.
1- Always start from leafs
2- Add their parent to DS and cut the children
3- Mark parent's of selected parent as already dominated
4- After completing process , check whether those marked nodes has a children that is not
dominated and add them to DS
Good luck

Resources