The problem is
Given a graph and N sets of vertices, how to check that if the vertices
in each set exist on a path (that may not have to be simple) in the
graph.
For example, if a set of vertices is {A, B, C}
1) On a graph
A --> B --> C
The vertices A, B and C are on a path
2) On a graph
A ---> B ---> ...
^
C -----+
The vertices A, B and C are not on a path
3) On a graph
A <--- B <--- ...
|
C <----+
The vertices A, B and C are not on a path
4) On a graph
A ---> X -----> B ---> C --> ...
| ^
+---------------+
The vertices A, B and C are on a path
Here is a simple algorithm with complexity N * (V + E).
for each set S of vertices with more than 1 elements {
initialize M that maps a vertice to another vertice it reaches;
mark all vertices of G unvisited;
pick and remove a vertex v from S;
while S is not empty {
from all unvisited node, find one vertex v' in S that v can reach;
if v' does not exist {
pick and remove a vertex v from S;
continue;
}
M[v] = v';
remove v' from S;
v = v';
}
// Check that the relations in M forms a path
{
if size[M] != size(S)-1 {
return false;
}
take the vertex v in S that is not in M
while M is not empty {
if not exist v' s.t. M[v]' = v {
return false;
}
}
return true;
}
}
The for-loop takes N step; the while-loop would visit all node/edge in the worst case with cost V + E.
Is there any known algorithm to solve the problem?
If a graph is DAG, could we have a better solution?
Thank you
Acyclicity here is not a meaningful assumption, since we can merge each strong component into one vertex. The paths are a red herring too; with a bunch of two-node paths, we're essentially making a bunch of reachability queries in a DAG. Doing this faster than O(n2) is thought to be a hard problem: https://cstheory.stackexchange.com/questions/25298/dag-reachability-with-on-log-n-space-and-olog-n-time-queries .
You should check out the Floyd-Warshall algorithm. It will give you the lengths of the shortest path between all pairs of vertices in the graph in O(V3). Once you have those results, you can do a brute-force depth-first traversal to determine whether you can go from one node to the next and the next etc. That should happen in O(nn) where n is the number of vertices in your current set. Total complexity, then, should be O(V3 + N*nn) (or something like that).
The nn seems daunting, but if n is small compared to V it won't be a big deal in practice.
I can guess that one could improve on that given certain restrictions on the graph.
Related
remarks:c' is logc with base 17
MST means (minimum spanning tree)
it's easy to prove the conclusion is correct when we use linear function to transform the cost of every edge.
But log function is not a linear function ,I could not understand why this conclusion is correct。
Supplementary notes:
I did not consider specific algorithms, such as the greedy algorithm. I simply consider the relationship between the sum of the weights of the two trees after transformation.
Numerically if (a + b) > (c + d) , (log a + log b) maybe not > ( logc + logd) .
If a tree generated by G has two edge a and b ,another tree generated by G has c and d,a + b < c + d and the first tree is a MST,but in transformed graph G' ,the sum of weights of edges of second tree may be smaller.
Because of this, I want to construct a counterexample based on "if (a + b)> (c + d), (log a + log b) maybe not> (logc + logd) ", but I failed.
One way to characterize when a spanning tree T is a minimum spanning tree is that, for every edge e not in T, the cycle formed by e and edges of T (the fundamental cycle of e with respect to T) has no edge more expensive than e. Using this characterization, I hope you see how to prove that transforming the costs with any increasing function preserves minimum spanning trees.
There's a one line proof that this condition is necessary. If the fundamental cycle contained a more expensive edge, we could replace it with e and get a spanning tree that costs less than T.
It's less obvious that this condition is sufficient, since at first glance it looks like we're trying to prove global optimality from a local optimality condition. To prove this statement, let T be a spanning tree that satisfies the condition, let T' be a minimum spanning tree, and let G' be the graph whose edges are the union of the edges of T and T'. Run Kruskal's algorithm on G', breaking ties by favoring edges in T over edges not in T. Let T'' be the resulting minimum spanning tree in G'. Since T' is a spanning tree in G', the cost of T'' is not greater than T', hence T'' is a minimum spanning tree in G as well as G'.
Suppose to the contrary that T'' ≠ T. Then there exists an edge in T but not in T''. Let e be the first such edge considered by Kruskal's algorithm. At the time that e was considered, it formed a cycle C in the edges that had been selected from T''. Since T is acyclic, C \ T is nonempty. By the tie breaking criterion, we know that every edge in C \ T costs less than e. Observing that some edge e' in C \ T must have one endpoint in each of the two connected components of T \ {e}, we infer that the fundamental cycle of e' with respect to T contains e, which violates the local optimality condition. In conclusion, T = T'', hence is a minimum spanning tree in G.
If you want a deeper dive, this logic gets abstracted out in the theory of matroids.
Well, its pretty easy to understand...let's see if I can break it down for you:
c` = log_17(c) // here 17 is base
log may not be linear function...but we can say that:
log_b(x) > log_b(y) if x > y and b > 1 (and of course x > 0 and y > 0)
I hope you get the equation I've written...In words in means, consider a base "b" such that b > 1, then log_b(x) would be greater than log_b(y) if x > y.
So, if we apply this rule in your costs of MST of G, then we see that the edges those were selected for G, would still produce the least possible edges to construct MST G' if c' = log_17(c) // here 17 is base.
UPDATE: As I can see you've problem understanding the proof, I'm elaborating a bit:
I guess, you know MST construction is greedy. We're going to use kruskal's algo to proof why it is correct.(In case, you don't know, how kruskal's algo works, you can read it somewhere, or just google it, you'll find millions of resources). Now, Let me write some steps of kruskal's edge selection for MST of G:
// the following edges are sorted by cost..i.e. c_0 <= c_1 <= c_2 ....
c_0: A, F // here, edge c_0 connects A, F, we've to take the edge in MST
c_1: A, B // it is also taken to construct MST
c_2: B, R // it is also taken to construct MST
c_3: A, R // we won't take it to construct to MST, cause (A, R) already connected through A -> B -> R
c_4: F, X // it is also taken to construct MST
...
...
so on...
Now, when constructing MST of G', we've to select edges which are in the form c' = log_17(c) // where 17 is base
Now, if we convert the edges using log of base 17, then c_0 becomes c_0', c_1 becomes c_1' and so on...
But we, know that:
log_b(x) > log_b(y) if x > y and b > 1 (and of course x > 0 and y > 0)
So, we may say that,
log_17(c_0) <= log_17(c_1), cause c_0 <= c_1
in general,
log_17(c_i) <= log_17(c_j), where i <= j
And now, we may say:
c_0` <= c_1` <= c_2` <= c_3` <= ....
So, the edge selection process to construct MST of G' would be:
// the following edges are sorted by cost..i.e. c_0` <= c_1` <= c_2` ....
c_0`: A, F // here, edge c_0` connects A, F, we've to take the edge in MST
c_1`: A, B // it is also taken to construct MST
c_2`: B, R // it is also taken to construct MST
c_3`: A, R // we won't take it to construct to MST, cause (A, R) already connected through A -> B -> R
c_4`: F, X // it is also taken to construct MST
...
...
so on...
Which is same as MST of G...
That proves the theorem ultimately....
I hope you get it...if not ask me in the comment what is not clear to you...
A source in a directed graph is a node that has no edges going into it. Give a linear-time algorithm
that takes as input a directed graph in adjacency list format, and outputs all of its sources.
solution:
Finding the sources of a directed graph.
We will keep an array in[u] which holds the indegree (number of incoming edges) of each node. For a
source, this value is zero.
function sources(G)
Input: Directed graph G = (V,E)
Output: A list of G's source nodes
for all u ∈ V : in[u] = 0
for all u ∈ V :
for all edges (u,w) ∈ E:
in[w] = in[w] + 1
L = empty linked list
for all u ∈ V :
if in[u] is 0: add u to L
return L
the thing i particularly do not understand about the code above is the innermost for loop in the first code block what exactly does in[w] = in[w]+1 mean? i think it means its counting the indegrees of each node, but how exactly it's doing that i cannot picture it, can someone please help me visualize this aspect
in[w] = in[w] + 1 increases the number of edges going into w.
Maybe an example will help:
Consider a simple graph:
a ---> b
The adjacency list representation is:
a: {b}
b: {}
Now the algorithm will loop through all vertices.
For a, it will loop over the edge (a,b) and increase b's count.
For b, there are no edges.
Now a's count is still zero, thus it is a source vertex.
I am having a tough time understanding Tarjan's lowest common ancestor algorithm. Can somebody explain it with an example?
I am stuck after the DFS search, what exactly does the algorithm do?
My explanation will be based on the wikipedia link posted above :).
I assumed that you already know about the union disjoint structure using in the algorithm.
(If not please read about it, you can find it in "Introduction to Algorithm").
The basic idea is every times the algorithm visit a node x, the ancestor of all its descendants will be that node x.
So to find a Least common ancestor (LCA) r of two nodes (u,v), there will be two cases:
Node u is a child of node v (vice versa), this case is obvious.
Node u is ith branch and v is the jth branch (i < j) of node r, so after visit node u, the algorithm backtrack to node r, which is the ancestor of the two nodes, mark the ancestor of node u as r and go to visit node v.
At the moment it visit node v, as u is already marked as visited (black), so the answer will be r. Hope you get it!
I will explain using the code from CP-Algorithms:
void dfs(int v)
{
visited[v] = true;
ancestor[v] = v;
for (int u : adj[v]) {
if (!visited[u]) {
dfs(u);
union_sets(v, u);
ancestor[find_set(v)] = v;
}
}
for (int other_node : queries[v]) {
if (visited[other_node])
cout << "LCA of " << v << " and " << other_node
<< " is " << ancestor[find_set(other_node)] << ".\n";
}
}
Let's outline a proof of the algorithm.
Lemma 1: For each vertex v and its parent p, after we visit v from p and union v with p, p and all vertices in the subtree of root v (i.e. p and all descendents of v, including v) will be in one disjoint set represented by p (i.e. ancester[root of the disjoint set] is p).
Proof: Suppose the tree has height h. Then proceed by induction in vertex height, starting from the leaf nodes.
Lemma 2: For each vertex v, right before we mark it as visited, the following statements are true:
Each v's parents pi will be in a disjoint set that contains precisely pi and all vertices in the subtrees of pi that pi has already finished visiting.
Every visited vertex so far is in one of these disjoint sets.
Proof: We proceed by induction. The statement is vacuously true for the root (the only vertex with height 0) as it has no parent. Now suppose the statement holds for every vertex of height k for k ≥ 0, and suppose v is a vertex of height k + 1. Let p be v's parent. Before p visits v, suppose it has already visited its children c1, c2, ..., cn. By Lemma 1, p and all vertices in the subtrees of root c1, c2, ..., cn are in one disjoint set represented by p. Furthermore, all newly visited vertices after we visited p are the vertices in this disjoint set. Since p is of height k, we can use the induction hypothesis to conclude that v indeed satisfies 1 and 2.
We are now ready to prove the algorithm.
Claim: For each query (u,v), the algorithm outputs the lowest common ancester of u and v.
Proof: Without loss of generality suppose we visit u before we visit v in the DFS. Then either v is a descendent of u or not.
If v is a descedent of u, by Lemma 1 we know that u and v are in one disjoint set that is represented by u, which means ancestor[find_set(v)] is u, the correct answer.
If v is not a descendent of u, then by Lemma 2 we know that u must be in one of the disjoint sets, each of them represented by a parent of v at the time we mark v. Let p be the representing vertex of the disjoint set u is in. By Lemma 2 we know p is a parent of v, and u is in a visited subtree of p and therefore a descendent of p. These are not changed after we have visited all v's children, so p is indeed a common ancestor of u and v. To see p is the lowest common ancestor, suppose q is the child of p of which v is a descendent (i.e. if we travel back to root from v, q is the last parent before we reach p; q can be v). Suppose for contradiction that u is also a descendent of q. Then by Lemma 2 u is in both the disjoint set represented by p and the disjoint set represented by q, so this disjoint set contains two v's parents, a contradiction.
(This is derived from a recently completed programming contest)
You are given G, a connected graph with N nodes and N-1 edges.
(Notice that this implies G forms a tree.)
Each edge of G is directed. (not necessarily upward to any root)
For each vertex v of G it is possible to invert zero or more edges such that there is a directed path from every other vertex w to v. Let the minimum possible number of edge inversions to achieve this be f(v).
By what linear or loglinear algorithm can we determine the subset of vertexes that have the minimal overall f(v) (including the value of f(v) of those vertexes)?
For example consider the 4 vertex graph with these edges:
A<--B
C<--B
D<--B
The value of f(A) = 2, f(B) = 3, f(C) = 2 and f(D) = 2...
..so therefore the desired output is {A,C,D} and 2
(note we only need to calculate the f(v) of vertexes that have a minimal f(v) - not all of them)
Code:
For posterity here is the code of solution:
int main()
{
struct Edge
{
bool fwd;
int dest;
};
int n;
cin >> n;
vector<vector<Edge>> V(n+1);
rep(i, n-1)
{
int src, dest;
scanf("%d %d", &src, &dest);
V[src].push_back(Edge{true, dest});
V[dest].push_back(Edge{false, src});
}
vector<int> F(n+1, -1);
vector<bool> done(n+1, false);
vector<int> todo;
todo.push_back(1);
done[1] = true;
F[1] = 0;
while (!todo.empty())
{
int next = todo.back();
todo.pop_back();
for (Edge e : V[next])
{
if (done[e.dest])
continue;
if (!e.fwd)
F[1]++;
done[e.dest] = true;
todo.push_back(e.dest);
}
}
todo.push_back(1);
while (!todo.empty())
{
int next = todo.back();
todo.pop_back();
for (Edge e : V[next])
{
if (F[e.dest] != -1)
continue;
if (e.fwd)
F[e.dest] = F[next] + 1;
else
F[e.dest] = F[next] - 1;
todo.push_back(e.dest);
}
}
int minf = INT_MAX;
rep(i,1,n)
chmin(minf, F[i]);
cout << minf << endl;
rep(i,1,n)
if (F[i] == minf)
cout << i << " ";
cout << endl;
}
I think that the following algorithm works correctly, and it certainly works in linear time.
The motivation for this algorithm is the following. Let's suppose that you already know the value of f(v) for some single node v. Now, consider any node u adjacent to v. If we want to compute the value of f(u), we can reuse some of the information from f(v) in order to compute it. Note that in order to get from any node w in the graph to u, one of two cases must happen:
That path passes through the edge connecting u and v. In that case, the way that we get from w to u is to go from w to v, then to follow the edge from v to u.
That path does not pass through the edge connecting u and v. In that case, the way that we get from w to u is the exact same way that we got from w to v, except that we stop as soon as we get to u.
The reason that this observation is important is that it means that if we know the number of edges we'd flip to get from any node to v, we can easily modify it to get the set of edges that we'd flip to get from any node to u. Specifically, it's going to be the same set of edges as before, except that we want to direct the edge connecting u and v so that it connects v to u rather than the other way around.
If the edge from u to v is initially directed (u, v), then we have to flip all the normal edges we flipped to get every node pointing at v, plus one more edge to get v pointed back at u. Thus f(u) = f(v) + 1. Otherwise, if the edge is originally directed (v, u), then the set of edges that we'd flip would be the same as before (pointing everything at v), except that we wouldn't flip the edge (v, u). Thus f(u) = f(v) - 1.
Consequently, once we know the value of f for a single node v, we can compute it for each adjacent node u as follows:
f(u) = f(v) + 1 if (u, v) is an edge.
f(u) = f(v) - 1 otherwise
This means that we can compute f(v) for all nodes v as follows:
Compute f(v) for some initial node v, chosen arbitrarily.
Do a DFS starting from v. When reaching a node u, compute its f score using the above logic.
All that's left to do is to compute f(v) for some initial node. To do this, we can run a DFS from v outward. Every time we see an edge pointed the wrong way, we have to flip it. Thus the initial value of f(v) is given by the number of wrong-pointing edges we find during the initial DFS.
We thus can compute the f score for each node in O(n) time by doing an initial DFS to compute f(v) for the initial node, then a secondary DFS to compute f(u) for each other node u. You can then for-loop over each of the n f-scores to find the minimum score, then do one more loop to find all values with that f-score. Each of these steps takes O(n) time, so the overall algorithm takes O(n) time as well.
Hope this helps! This was an awesome problem!
What is a good algorithm for getting the minimum vertex cover of a tree?
INPUT:
The node's neighbours.
OUTPUT:
The minimum number of vertices.
I didn't fully understand after reading the answers here, so I thought I'd post one from here
The general idea is that you root the tree at an arbitrary node, and ask whether that root is in the cover or not. If it is, then you calculate the min vertex cover of the subtrees rooted at its children by recursing. If it isn't, then every child of the root must be in the vertex cover so that every edge between the root and its children is covered. In this case, you recurse on the root's grandchildren.
So for example, if you had the following tree:
A
/ \
B C
/ \ / \
D E F G
Note that by inspection, you know the min vertex cover is {B, C}. We will find this min cover.
Here we start with A.
A is in the cover
We move down to the two subtrees of B and C, and recurse on this algorithm. We can't simply state that B and C are not in the cover, because even if AB and AC are covered, we can't say anything about whether we will need B and C to be in the cover or not.
(Think about the following tree, where both the root and one of its children are in the min cover ({A, D})
A
/|\___
B C D
/|\
E F G
)
A is not in the cover
But we know that AB and AC must be covered, so we have to add B and C to the cover. Since B and C are in the cover, we can recurse on their children instead of recursing on B and C (even if we did, it wouldn't give us any more information).
"Formally"
Let C(x) be the size of the min cover rooted at x.
Then,
C(x) = min (
1 + sum ( C(i) for i in x's children ), // root in cover
len(x's children) + sum( C(i) for i in x's grandchildren) // root not in cover
)
T(V,E) is a tree, which implies that for any leaf, any minimal vertex cover has to include either the leaf or the vertex adjacent to the leaf. This gives us the following algorithm to finding S, the vertex cover:
Find all leaves of the tree (BFS or DFS), O(|V|) in a tree.
If (u,v) is an edge such that v is a leaf, add u to the vertex cover, and prune (u,v). This will leave you with a forest T_1(V_1,E_1),...,T_n(U_n,V_n).
Now, if V_i={v}, meaning |V_i|=1, then that tree can be dropped since all edges incident on v are covered. This means that we have a termination condition for a recursion, where we have either one or no vertices, and we can compute S_i as the cover for each T_i, and define S as all the vertices from step 2 union the cover of each T_i.
Now, all that remains is to verify that if the original tree has only one vertex, we return 1 and never start the recursion, and the minimal vertex cover can be computed.
Edit:
Actually, after thinking about it for a bit, it can be accomplished with a simple DFS variant.
I hope here you can find more related answer to your question.
I was thinking about my solution, probably you will need to polish it but as long as dynamic programing is in one of your tags you probably need to:
For each u vertex define S+(u) is
cover size with vertex u and S-(u)
cover without vertex u.
S+(u)= 1 + Sum(S-(v)) for each child v of u.
S-(u)=Sum(max{S-(v),S+(v)}) for each child v of u.
Answer is max(S+(r), S-(r)) where r is root of your tree.
After reading this. Changed the above algorithm to find maximum independent set, since in wiki article stated
A set is independent if and only if its complement is a vertex cover.
So by changing min to max we can find the maximum independent set and by compliment the minimum vertex cover, since both problem are equivalent.
{- Haskell implementation of Artem's algorithm -}
data Tree = Branch [Tree]
deriving Show
{- first int is the min cover; second int is the min cover that includes the root -}
minVC :: Tree -> (Int, Int)
minVC (Branch subtrees) = let
costs = map minVC subtrees
minWithRoot = 1 + sum (map fst costs) in
(min minWithRoot (sum (map snd costs)), minWithRoot)
We can use a DFS based algorithm to solve this probelm:
DFS(node x)
{
discovered[x] = true;
/* Scan the Adjacency list for the node x*/
while((y = getNextAdj() != NULL)
{
if(discovered[y] == false)
{
DFS(y);
/* y is the child of node x*/
/* If child is not selected as a vertex for minimum selected cover
then select the parent */
if(y->isSelected == false)
{
x->isSelected = true;
}
}
}
}
The leaf node will never be selected for the vertex cover.
We need to find the minimum vertex cover for each node we have to choices to make, either to include it or not to include it. But according to the problem for each edge (u, v), either of 'u' or 'v' should be in the cover so we need to take care that if the current vertex is not included then we should include its children, and if we are including the current vertex then, we may or may not include its children based on optimal solution.
Here, DP1[v] for any vertex v = When we include it.
DP2[v] for any vertex v = when we don't include it.
DP1[v] = 1 + sum(min(DP2[c], DP1[c])) - this means include current and may or may not include its children, based on what is optimal.
DP2[v] = sum(DP1[c]) - this means not including current then we need to include children of current vertex. Here, c is the child of vertex v.
Then, our solution is min(DP1[root], DP2[root])
#include <bits/stdc++.h>
using namespace std;
vector<vector<int> > g;
int dp1[100010], dp2[100010];
void dfs(int curr, int p){
for(auto it : g[curr]){
if(it == p){
continue;
}
dfs(it, curr);
dp1[curr] += min(dp1[it], dp2[it]);
dp2[curr] += dp1[it];
}
dp1[curr] += 1;
}
int main(){
int n;
cin >> n;
g.resize(n+1);
for(int i=0 ; i<n-1 ; i++){
int u, v;
cin >> u >> v;
g[u].push_back(v);
g[v].push_back(u);
}
dfs(1, 0);
cout << min(dp1[1], dp2[1]);
return 0;
}
I would simply use a linear program to solve the minimum vertex cover problem.
A formulation as an integer linear program could look like the one given here: ILP formulation
I don't think that your own implementation would be faster than these highly optimized LP solvers.