Count cycles of length 3 using DFS - algorithm

Let G=(V,E) be an undirected graph. How can we count cycles of length 3 exactly once using following DFS:
DFS(G,s):
foreach v in V do
color[v] <- white; p[v] <- nil
DFS-Visit(s)
DFS-Visit(u)
color[u] <- grey
foreach v in Adj[u] do
if color[v] = white then
p[v] = u; DFS-Visit(v)
color[u] <- black
There is a cycle whenever we discover a node that already has been discovered (grey). The edge to that node is called back edge. The cycle has length 3 when p[p[p[v]]] = v, right? So
DFS-Visit(u)
color[u] <- grey
foreach v in Adj[u] do
if color[v] = grey and p[p[p[v]]] = v then
// we got a cycle of length 3
else if color[v] = white then
p[v] = u; DFS-Visit(v)
color[u] <- black
However how can I create a proper counter to count the number of cycles and how can I count each cycle only once?

I'm not sure to understand how your condition parent[parent[parent[v]]] == v works. IMO it should never be true as long as parent represents a structure of tree (because it should correspond to the spanning tree associated with the DFS).
Directed graphs
Back edges, cross edges and forward edges can all "discover" new cycles. For example:
We separate the following possibilities (let's say you reach a u -> v edge):
Back edge: u and v belongs to the same 3-cycle iff parent[parent[u]] = v.
Cross edge: u and v belongs to the same 3-cycle iff parent[u] = parent[v].
Forward edge: u and v belongs to the same 3-cycle iff parent[parent[v]] = u.
Undirected graphs
There are no more cross edges. Back edges and forward edges are redundant. Therefore you only have to check back edges: when you reach a u -> v back edge, u and v belongs to the same 3-cycle iff parent[parent[u]] = v.
def dfs(u):
color[u] = GREY
for v in adj[u]:
# Back edge
if color[v] == GREY:
if parent[parent[u]] == v:
print("({}, {}, {})".format(v + 1, parent[u] + 1, u + 1))
# v unseen
elif color[v] == WHITE:
parent[v] = u
dfs(v)
color[u] = BLACK
If you want to test it:
WHITE, GREY, BLACK = 0, 1, 2
nb_nodes, nb_edges = map(int, input().split())
adj = [[] for _ in range(nb_nodes)]
for _ in range(nb_edges):
u, v = map(int, input().split())
adj[u - 1].append(v - 1)
adj[v - 1].append(u - 1)
parent = [None] * nb_nodes
color = [WHITE] * nb_nodes

If a solution without using DFS is okay, there is an easy solution which runs in O(NMlog(N³)) where N is the number of vertices in the graph and M is the number of edges.
We are going to iterate over edges instead of iterating over vertices. For every edge u-v, we have to find every vertex which is connected to both u and v. We can do this by iterating over every vertex w in the graph and checking if there is an edge v-w and w-u. Whenever you find such vertex, order u,v,w and add the ordered triplet to a BBST that doesn't allow repetitions (eg: std::set in C++). The count of length 3 cycles will be exactly the size of the BBST (amount of elements added) after you check every edge in the graph.
Let's analyze the complexity of the algorithm:
We iterate over every edge. Current complexity is O(M)
For each edge, we iterave over every vertex. Current complexity is O(NM)
For each (edge,vertex) pair that forms a cycle, we are going to add a triplet to a BBST. Adding to a BBST has O(log(K)) complexity where K is the size of the BST. In worst case, every triplet of vertices forms a cycle, so we may add up to O(N³) elements to the BST, and the complexity to add some element can get as high as O(log(N³)). Final complexity is O(NMlog(N³)) then. This may sound like a lot, but in worst case M = O(N²) so the complexity will be O(N³log(N³)). Since we may have up to O(N³) cycles of length 3, our algorithm is just a log factor away from an optimal algorithm.

Related

Find the Optimal vertex cover of a tree with k blue vertices

I need to find a 'dynamic - programming' kind of solution for the following problem:
Input:
Perfect Binary-Tree, T = (V,E) - (each node has exactly 2 children except the leafs).
V = V(blue) ∪ V(black).
V(blue) ∩ V(black) = ∅.
(In other words, some vertices in the tree are blue)
Root of the tree 'r'.
integer k
A legal Solution:
A subset of vertices V' ⊆ V which is a vertex cover of T, and |V' ∩ V(blue)| = k. (In other words, the cover V' contains k blue vertices)
Solution Value:
The value of a legal solution V' is the number of vertices in the set = |V'|.
For convenience, we will define the value of an "un-legal" solution to be ∞.
What we need to find:
A solution with minimal Value.
(In other words, The best solution is a solution which is a cover, contains exactly k blue vertices and the number of vertices in the set is minimal.)
I need to define a typical sub-problem. (Like, if i know what is the value solution of a sub tree I can use it to find my value solution to the problem.)
and suggest a formula to solve it.
To me, looks like you are on the right track!
Still, I think you will have to use an additional parameter to tell us how far is any picked vertex from the current subtree's root.
For example, it can be just the indication whether we pick the current vertex, as below.
Let fun (v, b, p) be the optimal size for subtree with root v such that, in this subtree, we pick exactly b blue vertices, and p = 1 if we pick vertex v or p = 0 if we don't.
The answer is the minimum of fun (r, k, 0) and fun (r, k, 1): we want the answer for the full tree (v = r), with exactly k vertices covered in blue (b = k), and we can either pick or not pick the root.
Now, how do we calculate this?
For the leaves, fun (v, 0, 0) is 0 and fun (v, t, 1) is 1, where t tells us whether vertex v is blue (1 if yes, 0 if no).
All other combinations are invalid, and we can simulate it by saying the respective values are positive infinities: for example, for a leaf vertex v, the value fun (v, 3, 1) = +infinity.
In the implementation, the infinity can be just any value greater than any possible answer.
For all internal vertices, let v be the current vertex and u and w be its children.
We have two options: to pick or not to pick the vertex v.
Suppose we pick it.
Then the value we get for f (v, b, 1) is 1 (the picked vertex v) plus the minimum of fun (u, x, q) + fun (w, y, r) such that x + y is either b if the vertex v is black or b - 1 if it is blue, and q and r can be arbitrary: if we picked the vertex v, the edges v--u and v--w are already covered by our vertex cover.
Now let us not pick the vertex v.
Then the value we get for f (v, b, 0) is just the minimum of fun (u, x, 1) + fun (w, y, 1) such that x + y = b: if we did not pick the vertex v, the edges v--u and v--w have to be covered by u and w.

Checking if a graph is bipartite in a DFS

I wonder what will be the complexity of this algorithm of mine and why, used to check whether a graph (given in the form of neighbors list) is bipartite or not using DFS.
The algorithm works as following:
We will use edges classification, and look for back edges.
If we found one, it means there is a circle in the graph.
We will now check whether the cycle is odd cycle or not, using the the pi attribute added to each vertex, counting the number of edges participating in the cycle.
If the cycle is an odd one, return false. Else, continue the process.
Initially I thought the complexity will be O(|V| + |E|) as |V| stands for the number of vertices in the graph, and |E| stands for the number of edges in the graph, but I am afraid it might take O(|V| + |E|^2), and I wonder which option is correct and why (it may not be any of the above as well). Amortized or expected run times may also be different, and I wonder how can I check them as well.
pseudo code
DFS(G=(V,E))
// π[u] – Parent of u in the DFS tree
1 for each vertex u ∈ V {
2 color[u] ← WHITE
3 π[u]← NULL
4 time ← 0}
5 for each vertex u ∈ V {
6 if color[u] = WHITE
7 DFS-VISIT(u)}
and for the DFS-Visit:
DFS-Visit(u)
// white vertex u has just been discovered
1 color[u] ← GRAY
2 time ← time+1
3 d[u] ← time
4 for each v ∈ Adj[u] { // going over all edges {u, v}
5 if color[v] = WHITE {
6 π[v] ← u
7 DFS-VISIT(v) }
8 else if color[v] = GRAY // there is a cycle in the graph
9 CheckIfOddCycle (u, v);
10 color[u] ← BLACK
// change the color of vertex u to black as we finished going over it
11 f[u] ← time ← time+1
and as for deciding what type of cycle is it:
CheckIfOddCycle(u, v)
1 int count ← 1;
2 vertex p = u;
3 while (p! = v) {
4 p ← π[p]
5 count++ }
6 if count is an odd number {
7 S.O.P (“The graph is not bipartite!”);
8 stop the search, as the result is now concluded!
Thanks!
To determine whether or not a graph is bipartite, do a DFS or BFS that covers all the edges in the entire graph, and:
When you start on a new vertex that is disconnected from all previous vertices, color it blue;
When you discover a new vertex connected to a blue vertex, color it red;
When you discover a new vertex connected to a red vertex, color it blue;
When you find an edge to a previously discovered vertex, return FALSE if it connects blue to blue or red to red.
If you make it through the entire graph, return TRUE.
This algorithm takes very little work on top of the BFS or DFS, and is therefore O(|V|+|E|).
This algorithm is also essentially the same as the algorithm in your question. When we discover a back-edge with the same color on both sides, it means that the cycle(s) we just discovered are of odd length.
But really this algorithm has nothing to do with cycles. A graph can have a lot more cycles than it has vertices or edges, and a DFS or BFS will not necessarily find them all, so it wouldn't be accurate to say that we are searching for odd cycles.
Instead we are just trying to make a bipartite partition and returning whether or not it's possible to do so.

Find the most expensive path from s to t using DFS

In a given graph G=(V,E) each edge has a cost c(e). We have a starting node s and a target node t. How can we find the most expensive path from s to t using following DFS algorithm?
DFS(G,s):
foreach v in V do
color[v] <- white; parent[v] <- nil
DFS-Visit(s)
DFS-Visit(u)
color[u] <- grey
foreach v in Adj[u] do
if color[v] = white then
parent[v] = u; DFS-Visit(v)
color[u] <- black
What I have tried:
So first we create an array to maintain the cost to each node:
DFS(G,s,t):
foreach v in V do
color[v] <- white; parent[v] <- nil; cost[v] <- -inf
DFS-Visit(s,t)
return cost[t]
Second we should still visit a node event if it is gray to update its cost:
DFS-Visit(u,t)
color[u] <- grey
foreach v in Adj[u] do
if color[v] != black then
parent[v] = u;
if cost[u] < cost[v] + c(u,v) then
cost[v] = cost[v] + c(u,v)
if t != v then
DFS-Visit(v)
color[u] <- black
and we don't want to go pass t. What do you think? Is my approach correct?
Unfortunately this problem is NP-Complete. Proof is by a simple reduction of Longest Path Problem https://en.wikipedia.org/wiki/Longest_path_problem to this.
Proof:
If suppose we had an algorithm that could solve your problem in polynomial time. That is find longest path between two nodes s & t. We could then apply this algorithm for each pair of nodes (O(n^2) times) and obtain a solution for the Longest Path Problem in polynomial time.
If a simple but highly inefficient algorithm suffices, then you can adapt the DFS algorithm so that at each node, you conduct DFS of its adjacent nodes in all permuted orders. Keep track of the minimum cost obtained for each order.

Does the DFS algorithm differentiate between an ancestor and a parent while computing back edges?

Below is the general code for DFS with logic for marking back edges and tree edges. My doubt is that back edges from a vertex go back and point to an ancestor and those which point to the parent are not back edges (Lets assume undirected graph).
In an undirected graph we have an edge back and forth between 2 vertices x and y. So after visiting x when I process y, y has x as an adjacent vertex but as it's already visited, the code will mark it as a back edge.
Am I right in saying that? Should we add any extra logic to avoid this in case my assumption is valid?
DFS(G)
for v in vertices[G] do
color[v] = white
parent[v]= nil
time = 0
for v in vertices[G] do
if color[v] = white then
DFS-Visit(v)
Induce a depth-rst tree on a graph starting at v.
DFS-Visit(v)
color[v]=gray
time=time + 1
discovery[v]=time
for a in Adj[v] do
if color[a] = white then
parent[a] = v
DFS-Visit(a)
v->a is a tree edge
elseif color[a] = grey then
v->a is a back edge
color[v] = black
time = time + 1
white means unexplored, gray means frontier, black means processed
Yes, this implementation determines frontier nodes only by color (visited, not visited) and, thus, doesn't separate parent and ancestor nodes. So, each edge in DFS search tree will be marked as back edge.
In order to separate tree and back edges you need to separate edges to parent and ancestor nodes. Simple way is to provide parent node as a parameter to DFS-Visit (p). For example:
DFS-Visit(v, p)
color[v]=gray
time=time + 1
discovery[v]=time
for a in Adj[v] do
if color[a] = white then
parent[a] = v
DFS-Visit(a,v)
v->a is a tree edge
elseif color[a] = grey and (a is not p) then
v->a is a back edge
color[v] = black
time = time + 1
UPDATE: I haven't noticed you already store parent nodes. So, there is no need to introduce parameter:
DFS-Visit(v)
color[v]=gray
time=time + 1
discovery[v]=time
for a in Adj[v] do
if color[a] = white then
parent[a] = v
DFS-Visit(a)
v->a is a tree edge
elseif color[a] = grey and (a is not parent[v]) then
v->a is a back edge
color[v] = black
time = time + 1

Finding "Best Roots" in a Directed Tree Graph?

(This is derived from a recently completed programming contest)
You are given G, a connected graph with N nodes and N-1 edges.
(Notice that this implies G forms a tree.)
Each edge of G is directed. (not necessarily upward to any root)
For each vertex v of G it is possible to invert zero or more edges such that there is a directed path from every other vertex w to v. Let the minimum possible number of edge inversions to achieve this be f(v).
By what linear or loglinear algorithm can we determine the subset of vertexes that have the minimal overall f(v) (including the value of f(v) of those vertexes)?
For example consider the 4 vertex graph with these edges:
A<--B
C<--B
D<--B
The value of f(A) = 2, f(B) = 3, f(C) = 2 and f(D) = 2...
..so therefore the desired output is {A,C,D} and 2
(note we only need to calculate the f(v) of vertexes that have a minimal f(v) - not all of them)
Code:
For posterity here is the code of solution:
int main()
{
struct Edge
{
bool fwd;
int dest;
};
int n;
cin >> n;
vector<vector<Edge>> V(n+1);
rep(i, n-1)
{
int src, dest;
scanf("%d %d", &src, &dest);
V[src].push_back(Edge{true, dest});
V[dest].push_back(Edge{false, src});
}
vector<int> F(n+1, -1);
vector<bool> done(n+1, false);
vector<int> todo;
todo.push_back(1);
done[1] = true;
F[1] = 0;
while (!todo.empty())
{
int next = todo.back();
todo.pop_back();
for (Edge e : V[next])
{
if (done[e.dest])
continue;
if (!e.fwd)
F[1]++;
done[e.dest] = true;
todo.push_back(e.dest);
}
}
todo.push_back(1);
while (!todo.empty())
{
int next = todo.back();
todo.pop_back();
for (Edge e : V[next])
{
if (F[e.dest] != -1)
continue;
if (e.fwd)
F[e.dest] = F[next] + 1;
else
F[e.dest] = F[next] - 1;
todo.push_back(e.dest);
}
}
int minf = INT_MAX;
rep(i,1,n)
chmin(minf, F[i]);
cout << minf << endl;
rep(i,1,n)
if (F[i] == minf)
cout << i << " ";
cout << endl;
}
I think that the following algorithm works correctly, and it certainly works in linear time.
The motivation for this algorithm is the following. Let's suppose that you already know the value of f(v) for some single node v. Now, consider any node u adjacent to v. If we want to compute the value of f(u), we can reuse some of the information from f(v) in order to compute it. Note that in order to get from any node w in the graph to u, one of two cases must happen:
That path passes through the edge connecting u and v. In that case, the way that we get from w to u is to go from w to v, then to follow the edge from v to u.
That path does not pass through the edge connecting u and v. In that case, the way that we get from w to u is the exact same way that we got from w to v, except that we stop as soon as we get to u.
The reason that this observation is important is that it means that if we know the number of edges we'd flip to get from any node to v, we can easily modify it to get the set of edges that we'd flip to get from any node to u. Specifically, it's going to be the same set of edges as before, except that we want to direct the edge connecting u and v so that it connects v to u rather than the other way around.
If the edge from u to v is initially directed (u, v), then we have to flip all the normal edges we flipped to get every node pointing at v, plus one more edge to get v pointed back at u. Thus f(u) = f(v) + 1. Otherwise, if the edge is originally directed (v, u), then the set of edges that we'd flip would be the same as before (pointing everything at v), except that we wouldn't flip the edge (v, u). Thus f(u) = f(v) - 1.
Consequently, once we know the value of f for a single node v, we can compute it for each adjacent node u as follows:
f(u) = f(v) + 1 if (u, v) is an edge.
f(u) = f(v) - 1 otherwise
This means that we can compute f(v) for all nodes v as follows:
Compute f(v) for some initial node v, chosen arbitrarily.
Do a DFS starting from v. When reaching a node u, compute its f score using the above logic.
All that's left to do is to compute f(v) for some initial node. To do this, we can run a DFS from v outward. Every time we see an edge pointed the wrong way, we have to flip it. Thus the initial value of f(v) is given by the number of wrong-pointing edges we find during the initial DFS.
We thus can compute the f score for each node in O(n) time by doing an initial DFS to compute f(v) for the initial node, then a secondary DFS to compute f(u) for each other node u. You can then for-loop over each of the n f-scores to find the minimum score, then do one more loop to find all values with that f-score. Each of these steps takes O(n) time, so the overall algorithm takes O(n) time as well.
Hope this helps! This was an awesome problem!

Resources