Does the DFS algorithm differentiate between an ancestor and a parent while computing back edges? - algorithm

Below is the general code for DFS with logic for marking back edges and tree edges. My doubt is that back edges from a vertex go back and point to an ancestor and those which point to the parent are not back edges (Lets assume undirected graph).
In an undirected graph we have an edge back and forth between 2 vertices x and y. So after visiting x when I process y, y has x as an adjacent vertex but as it's already visited, the code will mark it as a back edge.
Am I right in saying that? Should we add any extra logic to avoid this in case my assumption is valid?
DFS(G)
for v in vertices[G] do
color[v] = white
parent[v]= nil
time = 0
for v in vertices[G] do
if color[v] = white then
DFS-Visit(v)
Induce a depth-rst tree on a graph starting at v.
DFS-Visit(v)
color[v]=gray
time=time + 1
discovery[v]=time
for a in Adj[v] do
if color[a] = white then
parent[a] = v
DFS-Visit(a)
v->a is a tree edge
elseif color[a] = grey then
v->a is a back edge
color[v] = black
time = time + 1
white means unexplored, gray means frontier, black means processed

Yes, this implementation determines frontier nodes only by color (visited, not visited) and, thus, doesn't separate parent and ancestor nodes. So, each edge in DFS search tree will be marked as back edge.
In order to separate tree and back edges you need to separate edges to parent and ancestor nodes. Simple way is to provide parent node as a parameter to DFS-Visit (p). For example:
DFS-Visit(v, p)
color[v]=gray
time=time + 1
discovery[v]=time
for a in Adj[v] do
if color[a] = white then
parent[a] = v
DFS-Visit(a,v)
v->a is a tree edge
elseif color[a] = grey and (a is not p) then
v->a is a back edge
color[v] = black
time = time + 1
UPDATE: I haven't noticed you already store parent nodes. So, there is no need to introduce parameter:
DFS-Visit(v)
color[v]=gray
time=time + 1
discovery[v]=time
for a in Adj[v] do
if color[a] = white then
parent[a] = v
DFS-Visit(a)
v->a is a tree edge
elseif color[a] = grey and (a is not parent[v]) then
v->a is a back edge
color[v] = black
time = time + 1

Related

How can I find shortest path with maximum number of yellow edges

In given directed graph G=(V,E) , and weight function: w: E -> ℝ+ , such any vertex with color yellow or black, and vertex s.
How can I find shortest path with maximum number of yellow edges?
I thought to use Dijkstra algorithm and change the value of the yellow edges (by epsilon).. But I do not see how it is going to work ..
You can use the Dijkstra shortest path algorithm, but add a new vector Y that has an element for each node that keeps track of the number of the yellow edges that it took so far until we get to that node.
Initially, set Y[i] = 0 for each node i.
Also suppose Yellow(u,v) is a function that returns 1 if (u,v) is yellow and 0 otherwise.
Normally, in the Dijkstra algorithm you have:
for each neighbor v of u still in Q:
alt ← dist[u] + Graph.Edges(u, v)
if alt < dist[v]:
dist[v] ← alt
prev[v] ← u
You can now change this to:
for each neighbor v of u still in Q:
alt ← dist[u] + Graph.Edges(u, v)
if alt < dist[v]:
dist[v] ← alt
prev[v] ← u
Y[v]← Y[u] + Yellow(u,v)
else if alt == dist[v] AND Y[u]+Yellow(u,v) > Y[v]:
prev[v] ← u
Y[v]← Y[u] + Yellow(u,v)
Explanation:
In the else part that we added, the algorithm decides between alternative shortest paths (with identical costs, hence we have if alt == dist[v]) and picks the one that has more yellow edges.
Note that this will still find the shortest path in the graph. If there are multiple, it picks the one with higher number of yellow edges.
Proof:
Consider the set of visited nodes Visited at any point in the algorithm. Note that Visited is the set of nodes that are removed from Q.
We already know that for each v ∈ Visited, dist[v] is the shortest path from Dijkstra Algorithm's proof.
We now show that for each v ∈ Visited, Y[v] is maximum, and we do this by induction.
When |Visited| = 1, we have Visited = {s}, and Y[s] = 0.
Now suppose the claim holds for |Visited| = k for some k >= 1, we show that when we add a new node u to Visited and the size of Visited grows to k+1, the claim still holds.
Let (t_i,u) represent all edges from a node in Visited to the new node u, for which (t_i,u) is on a shortest path to u, i.e. t_i ∈ Visited and (t_i,u) is the last edge on the shortest path from s to u.
The else part of our algorithm guarantees that Y[u] is updated to the maximum value among all such shortest paths.
To see why, without loss of generality consider this image:
Suppose s-t1-u and s-t2-u are both shortest paths and the distance of u was updated first through t1 and later through t2.
At the moment that we update u through t2, the distance of u doesn't change because S-t1-u and S-t2-u are both shortest paths. However in the else part of the algorithm, Y[u] will be updated to:
Y[u] = Max (Y[t1] + Yellow(t1,u) , Y[t2] + Yellow(t2,u) )
Also from the induction hypothesis, we know that Y[t1] and Y[t2] are already maximum. Hence Y[u] is maximum among both shortest paths from s to u.
Notice that for simplicity without loss of generality the image only shows two such paths, but the argument holds for all (t_i,u) edges.

Checking if a graph is bipartite in a DFS

I wonder what will be the complexity of this algorithm of mine and why, used to check whether a graph (given in the form of neighbors list) is bipartite or not using DFS.
The algorithm works as following:
We will use edges classification, and look for back edges.
If we found one, it means there is a circle in the graph.
We will now check whether the cycle is odd cycle or not, using the the pi attribute added to each vertex, counting the number of edges participating in the cycle.
If the cycle is an odd one, return false. Else, continue the process.
Initially I thought the complexity will be O(|V| + |E|) as |V| stands for the number of vertices in the graph, and |E| stands for the number of edges in the graph, but I am afraid it might take O(|V| + |E|^2), and I wonder which option is correct and why (it may not be any of the above as well). Amortized or expected run times may also be different, and I wonder how can I check them as well.
pseudo code
DFS(G=(V,E))
// π[u] – Parent of u in the DFS tree
1 for each vertex u ∈ V {
2 color[u] ← WHITE
3 π[u]← NULL
4 time ← 0}
5 for each vertex u ∈ V {
6 if color[u] = WHITE
7 DFS-VISIT(u)}
and for the DFS-Visit:
DFS-Visit(u)
// white vertex u has just been discovered
1 color[u] ← GRAY
2 time ← time+1
3 d[u] ← time
4 for each v ∈ Adj[u] { // going over all edges {u, v}
5 if color[v] = WHITE {
6 π[v] ← u
7 DFS-VISIT(v) }
8 else if color[v] = GRAY // there is a cycle in the graph
9 CheckIfOddCycle (u, v);
10 color[u] ← BLACK
// change the color of vertex u to black as we finished going over it
11 f[u] ← time ← time+1
and as for deciding what type of cycle is it:
CheckIfOddCycle(u, v)
1 int count ← 1;
2 vertex p = u;
3 while (p! = v) {
4 p ← π[p]
5 count++ }
6 if count is an odd number {
7 S.O.P (“The graph is not bipartite!”);
8 stop the search, as the result is now concluded!
Thanks!
To determine whether or not a graph is bipartite, do a DFS or BFS that covers all the edges in the entire graph, and:
When you start on a new vertex that is disconnected from all previous vertices, color it blue;
When you discover a new vertex connected to a blue vertex, color it red;
When you discover a new vertex connected to a red vertex, color it blue;
When you find an edge to a previously discovered vertex, return FALSE if it connects blue to blue or red to red.
If you make it through the entire graph, return TRUE.
This algorithm takes very little work on top of the BFS or DFS, and is therefore O(|V|+|E|).
This algorithm is also essentially the same as the algorithm in your question. When we discover a back-edge with the same color on both sides, it means that the cycle(s) we just discovered are of odd length.
But really this algorithm has nothing to do with cycles. A graph can have a lot more cycles than it has vertices or edges, and a DFS or BFS will not necessarily find them all, so it wouldn't be accurate to say that we are searching for odd cycles.
Instead we are just trying to make a bipartite partition and returning whether or not it's possible to do so.

Count cycles of length 3 using DFS

Let G=(V,E) be an undirected graph. How can we count cycles of length 3 exactly once using following DFS:
DFS(G,s):
foreach v in V do
color[v] <- white; p[v] <- nil
DFS-Visit(s)
DFS-Visit(u)
color[u] <- grey
foreach v in Adj[u] do
if color[v] = white then
p[v] = u; DFS-Visit(v)
color[u] <- black
There is a cycle whenever we discover a node that already has been discovered (grey). The edge to that node is called back edge. The cycle has length 3 when p[p[p[v]]] = v, right? So
DFS-Visit(u)
color[u] <- grey
foreach v in Adj[u] do
if color[v] = grey and p[p[p[v]]] = v then
// we got a cycle of length 3
else if color[v] = white then
p[v] = u; DFS-Visit(v)
color[u] <- black
However how can I create a proper counter to count the number of cycles and how can I count each cycle only once?
I'm not sure to understand how your condition parent[parent[parent[v]]] == v works. IMO it should never be true as long as parent represents a structure of tree (because it should correspond to the spanning tree associated with the DFS).
Directed graphs
Back edges, cross edges and forward edges can all "discover" new cycles. For example:
We separate the following possibilities (let's say you reach a u -> v edge):
Back edge: u and v belongs to the same 3-cycle iff parent[parent[u]] = v.
Cross edge: u and v belongs to the same 3-cycle iff parent[u] = parent[v].
Forward edge: u and v belongs to the same 3-cycle iff parent[parent[v]] = u.
Undirected graphs
There are no more cross edges. Back edges and forward edges are redundant. Therefore you only have to check back edges: when you reach a u -> v back edge, u and v belongs to the same 3-cycle iff parent[parent[u]] = v.
def dfs(u):
color[u] = GREY
for v in adj[u]:
# Back edge
if color[v] == GREY:
if parent[parent[u]] == v:
print("({}, {}, {})".format(v + 1, parent[u] + 1, u + 1))
# v unseen
elif color[v] == WHITE:
parent[v] = u
dfs(v)
color[u] = BLACK
If you want to test it:
WHITE, GREY, BLACK = 0, 1, 2
nb_nodes, nb_edges = map(int, input().split())
adj = [[] for _ in range(nb_nodes)]
for _ in range(nb_edges):
u, v = map(int, input().split())
adj[u - 1].append(v - 1)
adj[v - 1].append(u - 1)
parent = [None] * nb_nodes
color = [WHITE] * nb_nodes
If a solution without using DFS is okay, there is an easy solution which runs in O(NMlog(N³)) where N is the number of vertices in the graph and M is the number of edges.
We are going to iterate over edges instead of iterating over vertices. For every edge u-v, we have to find every vertex which is connected to both u and v. We can do this by iterating over every vertex w in the graph and checking if there is an edge v-w and w-u. Whenever you find such vertex, order u,v,w and add the ordered triplet to a BBST that doesn't allow repetitions (eg: std::set in C++). The count of length 3 cycles will be exactly the size of the BBST (amount of elements added) after you check every edge in the graph.
Let's analyze the complexity of the algorithm:
We iterate over every edge. Current complexity is O(M)
For each edge, we iterave over every vertex. Current complexity is O(NM)
For each (edge,vertex) pair that forms a cycle, we are going to add a triplet to a BBST. Adding to a BBST has O(log(K)) complexity where K is the size of the BST. In worst case, every triplet of vertices forms a cycle, so we may add up to O(N³) elements to the BST, and the complexity to add some element can get as high as O(log(N³)). Final complexity is O(NMlog(N³)) then. This may sound like a lot, but in worst case M = O(N²) so the complexity will be O(N³log(N³)). Since we may have up to O(N³) cycles of length 3, our algorithm is just a log factor away from an optimal algorithm.

Merge most of the black vertices of DAG together so that it remains DAG?

I have a DAG ( Directed Acyclic Graph ) with vertices having any of the 2 colours black or white. I need to merge as many black vertices together with the constraint that the graph should remain acyclic. Hence the final DAG should have minimum no. of black vertices. What is the best algorithm for this problem?
Here is one possible strategy. It reduces your problem to a colouring problem (which you can then use established heuristics algorithm from literature to solve).
Call the DAG G = (V,E) where V is the set of vertices. Let B be the set of black vertices and W be the set of white vertices. We want to construct a new simple graph G' = (B,E'). We construct it as follow:
algorithm contruct G' input: G
Let G' be a graph with vertex set B and no edges
for any pair of vertices v and v' where v,v' in B:
Let (G'', v'') = merge (v,v',G)
#comment: here, we let G'' to be the graph resulted from merging v and v'
#also, let's assume that v and v' merge to become v''
if detect_cycle(G'',v'') = true:
add edge (v,v') into G'
output G'
algorithm detect_cycle(G,v):
do BFS in G starting at v, with the modification when reaching any vertex v':
if v is connected to v': return true
return false
Note that G' is a simple graph and not a DAG and when doing BFS on G, you cannot go against the direction of an edge in G.
Essentially, we try to build G' with the set of black vertices in G such that if two vertices v adjacent to v' in G', then merging them causes cyclic graph in G. If v is not adjacent to v' in G' then it's safe to merge them. The problem then got reduced to find the minimum number of colors required to vertex-color G'. For background on vertex colouring, check out this link: https://en.wikipedia.org/wiki/Graph_coloring#Vertex_coloring. Basically vertex coloring is about finding the minimum number of sets where in each set, you can put in pairwise-nonadjacent vertices, then assign a label (or color) to each set (every vertex in the same set get the same label). Every black vertex with the same label in G' could be merged in G.
Heuristic algorithms for graph colouring could be found here:
http://heuristicswiki.wikispaces.com/Graph+coloring
and here: http://heuristicswiki.wikispaces.com/Degree+based+ordering
I hope it helps. Let me know if you find a better solution or a bug in the above solution.
Let the graph be G = (V,E)
Topological sort the graph to get the list of vertices = L(V).
L(B) = list of black vertices extracted from L(V) with the order maintained.
Let n = no. of vertices in L(B).
Let DVA = empty array of deleted vertices of size n initialized with 0.
for i = vertices 1 to n in L(B)
if(DVA[i] == 1)
continue;
for j = vertices i+1 to n in L(B)
if(DVA[j] == 1)
continue;
if(detect_cycle(G, i, j) == 0) //merging i and j will not create cycle
Merge j to i in G;
DVA[j] = 1;
This algorithm works on the fact that topological order of black vertices do not change while merging 2 vertices (except for these 2 vertices) .
I guess this method will produce fairly good result, but I am not sure whether it will produce the optimal result of having least no. of black vertices.

How to find if a graph is a tree and its center

Is there an algorithm (or a sequence of algorithms) to find, given a generic graph structure G=(V,E) with no notion of parent node, leaf node and child node but only neighboordhood relations:
1) If G it is a tree or not (is it sufficient to check |V| = |E|+1?)
2) If the graph is actually a tree, the leaves and the center of it? (i.e the node of the graph which minimizes the tree depth)
Thanks
If the "center" of the tree is defined as "the node of the graph which minimizes the tree depth", there's an easier way to find it than finding the diameter.
d[] = degrees of all nodes
que = { leaves, i.e i that d[i]==1}
while len(que) > 1:
i=que.pop_front
d[i]--
for j in neighbors[i]:
if d[j] > 0:
d[j]--
if d[j] == 1 :
que.push_back(j)
and the last one left in que is the center.
you can prove this by thinking about the diameter path.
to simpify , we assume the length of the diameter path is odd, so that the middle node of the path is unique, let's call that node M,
we can see that:
M will not be pushed to the back of que until every node else on
diameter path has been pushed into que
if there's another node N
that is pushed after M has already been pushed into que, then N must
be on a longer path than the diameter path. Therefore N can't exist. M must be the last
node pushed (and left) in que
For (1), all you have to do is verify |V| = |E| + 1 and that the graph is fully connected.
For (2), you need to find a maximal diameter then pick a node in the middle of the diameter path. I vaguely remember that there's an easy way to do this for trees.
You start with an arbitrary node a, then find a node at maximal distance from a, call it b. Then you search from b and find a node at maximal distance from b, call it c. The path from b to c is a maximal diameter.
There are other ways to do it that might be more convenient for you, like this one. Check Google too.
No, it is not enough - a tree is a CONNECTED graph with n-1 edges. There could be n-1 edges in a not connected graph - and it won't be a tree.
You can run a BFS to find if the graph is connected and then count the number of edges, that will give you enough information if the graph is a tree
The leaves are the nodes v with degree of the nodes denoted by d(v) given by the equation d(v) = 1 (which have only one connected vertex to each)
(1) The answer assumes non-directed graphs
(2) In here, n denotes the number of vertices.

Resources