Checking if a graph is bipartite in a DFS - algorithm

I wonder what will be the complexity of this algorithm of mine and why, used to check whether a graph (given in the form of neighbors list) is bipartite or not using DFS.
The algorithm works as following:
We will use edges classification, and look for back edges.
If we found one, it means there is a circle in the graph.
We will now check whether the cycle is odd cycle or not, using the the pi attribute added to each vertex, counting the number of edges participating in the cycle.
If the cycle is an odd one, return false. Else, continue the process.
Initially I thought the complexity will be O(|V| + |E|) as |V| stands for the number of vertices in the graph, and |E| stands for the number of edges in the graph, but I am afraid it might take O(|V| + |E|^2), and I wonder which option is correct and why (it may not be any of the above as well). Amortized or expected run times may also be different, and I wonder how can I check them as well.
pseudo code
DFS(G=(V,E))
// π[u] – Parent of u in the DFS tree
1 for each vertex u ∈ V {
2 color[u] ← WHITE
3 π[u]← NULL
4 time ← 0}
5 for each vertex u ∈ V {
6 if color[u] = WHITE
7 DFS-VISIT(u)}
and for the DFS-Visit:
DFS-Visit(u)
// white vertex u has just been discovered
1 color[u] ← GRAY
2 time ← time+1
3 d[u] ← time
4 for each v ∈ Adj[u] { // going over all edges {u, v}
5 if color[v] = WHITE {
6 π[v] ← u
7 DFS-VISIT(v) }
8 else if color[v] = GRAY // there is a cycle in the graph
9 CheckIfOddCycle (u, v);
10 color[u] ← BLACK
// change the color of vertex u to black as we finished going over it
11 f[u] ← time ← time+1
and as for deciding what type of cycle is it:
CheckIfOddCycle(u, v)
1 int count ← 1;
2 vertex p = u;
3 while (p! = v) {
4 p ← π[p]
5 count++ }
6 if count is an odd number {
7 S.O.P (“The graph is not bipartite!”);
8 stop the search, as the result is now concluded!
Thanks!

To determine whether or not a graph is bipartite, do a DFS or BFS that covers all the edges in the entire graph, and:
When you start on a new vertex that is disconnected from all previous vertices, color it blue;
When you discover a new vertex connected to a blue vertex, color it red;
When you discover a new vertex connected to a red vertex, color it blue;
When you find an edge to a previously discovered vertex, return FALSE if it connects blue to blue or red to red.
If you make it through the entire graph, return TRUE.
This algorithm takes very little work on top of the BFS or DFS, and is therefore O(|V|+|E|).
This algorithm is also essentially the same as the algorithm in your question. When we discover a back-edge with the same color on both sides, it means that the cycle(s) we just discovered are of odd length.
But really this algorithm has nothing to do with cycles. A graph can have a lot more cycles than it has vertices or edges, and a DFS or BFS will not necessarily find them all, so it wouldn't be accurate to say that we are searching for odd cycles.
Instead we are just trying to make a bipartite partition and returning whether or not it's possible to do so.

Related

How can I find shortest path with maximum number of yellow edges

In given directed graph G=(V,E) , and weight function: w: E -> ℝ+ , such any vertex with color yellow or black, and vertex s.
How can I find shortest path with maximum number of yellow edges?
I thought to use Dijkstra algorithm and change the value of the yellow edges (by epsilon).. But I do not see how it is going to work ..
You can use the Dijkstra shortest path algorithm, but add a new vector Y that has an element for each node that keeps track of the number of the yellow edges that it took so far until we get to that node.
Initially, set Y[i] = 0 for each node i.
Also suppose Yellow(u,v) is a function that returns 1 if (u,v) is yellow and 0 otherwise.
Normally, in the Dijkstra algorithm you have:
for each neighbor v of u still in Q:
alt ← dist[u] + Graph.Edges(u, v)
if alt < dist[v]:
dist[v] ← alt
prev[v] ← u
You can now change this to:
for each neighbor v of u still in Q:
alt ← dist[u] + Graph.Edges(u, v)
if alt < dist[v]:
dist[v] ← alt
prev[v] ← u
Y[v]← Y[u] + Yellow(u,v)
else if alt == dist[v] AND Y[u]+Yellow(u,v) > Y[v]:
prev[v] ← u
Y[v]← Y[u] + Yellow(u,v)
Explanation:
In the else part that we added, the algorithm decides between alternative shortest paths (with identical costs, hence we have if alt == dist[v]) and picks the one that has more yellow edges.
Note that this will still find the shortest path in the graph. If there are multiple, it picks the one with higher number of yellow edges.
Proof:
Consider the set of visited nodes Visited at any point in the algorithm. Note that Visited is the set of nodes that are removed from Q.
We already know that for each v ∈ Visited, dist[v] is the shortest path from Dijkstra Algorithm's proof.
We now show that for each v ∈ Visited, Y[v] is maximum, and we do this by induction.
When |Visited| = 1, we have Visited = {s}, and Y[s] = 0.
Now suppose the claim holds for |Visited| = k for some k >= 1, we show that when we add a new node u to Visited and the size of Visited grows to k+1, the claim still holds.
Let (t_i,u) represent all edges from a node in Visited to the new node u, for which (t_i,u) is on a shortest path to u, i.e. t_i ∈ Visited and (t_i,u) is the last edge on the shortest path from s to u.
The else part of our algorithm guarantees that Y[u] is updated to the maximum value among all such shortest paths.
To see why, without loss of generality consider this image:
Suppose s-t1-u and s-t2-u are both shortest paths and the distance of u was updated first through t1 and later through t2.
At the moment that we update u through t2, the distance of u doesn't change because S-t1-u and S-t2-u are both shortest paths. However in the else part of the algorithm, Y[u] will be updated to:
Y[u] = Max (Y[t1] + Yellow(t1,u) , Y[t2] + Yellow(t2,u) )
Also from the induction hypothesis, we know that Y[t1] and Y[t2] are already maximum. Hence Y[u] is maximum among both shortest paths from s to u.
Notice that for simplicity without loss of generality the image only shows two such paths, but the argument holds for all (t_i,u) edges.

Can Dijkstra's Algorithm work on a graph with weights of 0?

If there exists a weighted graph G, and all weights are 0, does Dijkstra's algorithm still find the shortest path? If so, why?
As per my understanding of the algorithm, Dijsktra's algorithm will run like a normal BFS if there are no edge weights, but I would appreciate some clarification.
Explanation
Dijkstra itself has no problem with 0 weight, per definition of the algorithm. It only gets problematic with negative weights.
Since in every round Dijkstra will settle a node. If you later find a negative weighted edge, this could lead to a shorter path to that settled node. The node would then need to be unsettled, which Dijkstras algorithm does not allow (and it would break the complexity of the algorithm). It gets clear if you take a look at the actual algorithm and some illustration.
The behavior of Dijkstra on such an all zero-graph is the same as if all edges would have a different, but same, value, like 1 (except of the resulting shortest path length). Dijkstra will simply visit all nodes, in no particular order. Basically, like an ordinary Breadth-first search.
Details
Take a look at the algorithm description from Wikipedia:
1 function Dijkstra(Graph, source):
2
3 create vertex set Q
4
5 for each vertex v in Graph: // Initialization
6 dist[v] ← INFINITY // Unknown distance from source to v
7 prev[v] ← UNDEFINED // Previous node in optimal path from source
8 add v to Q // All nodes initially in Q (unvisited nodes)
9
10 dist[source] ← 0 // Distance from source to source
11
12 while Q is not empty:
13 u ← vertex in Q with min dist[u] // Node with the least distance
14 // will be selected first
15 remove u from Q
16
17 for each neighbor v of u: // where v is still in Q.
18 alt ← dist[u] + length(u, v)
19 if alt < dist[v]: // A shorter path to v has been found
20 dist[v] ← alt
21 prev[v] ← u
22
23 return dist[], prev[]
The problem with negative values lies in line 15 and 17. When you remove node u, you settle it. That is, you say that the shortest path to this node is now known. But that means you won't consider u again in line 17 as a neighbor of some other node (since it's not contained in Q anymore).
With negative values it could happen that you later find a shorter path (due to negative weights) to that node. You would need to consider u again in the algorithm and re-do all the computation that depended on the previous shortest path to u. So you would need to add u and every other node that was removed from Q that had u on its shortest path back to Q.
Especially, you would need to consider all edges that could lead to your destination, since you never know where some nasty -1_000_000 weighted edge hides.
The following example illustrates the problem:
Dijkstra will declare the red way as shortest path from A to C, with length 0. However, there is a shorter path. It is marked blue and has a length of 99 - 300 + 1 = -200.
With negative weights you could even create a more dangerous scenario, negative cycles. That is a cycle in the graph with a negative total weight. You then need a way to stop moving along the cycle all the time, endlessly dropping your current weight.
Notes
In an undirected graph edges with weight 0 can be eliminated and the nodes be merged. A shortest path between them will always have length 0. If the whole graph only has 0 weights, then the graph could just be merged to one node. The result to every shortest path query is simply 0.
The same holds for directed graphs if you have such an edge in both directions. If not, you can't do that optimization as you would change reach-ability of nodes.

How to count all reachable nodes in a directed graph?

There is a directed graph (which might contain cycles), and each node has a value on it, how could we get the sum of reachable value for each node. For example, in the following graph:
the reachable sum for node 1 is: 2 + 3 + 4 + 5 + 6 + 7 = 27
the reachable sum for node 2 is: 4 + 5 + 6 + 7 = 22
.....
My solution: To get the sum for all nodes, I think the time complexity is O(n + m), the n is the number of nodes, and m stands for the number of edges. DFS should be used,for each node we should use a method recursively to find its sub node, and save the sum of sub node when finishing the calculation for it, so that in the future we don't need to calculate it again. A set is needed to be created for each node to avoid endless calculation caused by loop.
Does it work? I don't think it is elegant enough, especially many sets have to be created. Is there any better solution? Thanks.
This can be done by first finding Strongly Connected Components (SCC), which can be done in O(|V|+|E|). Then, build a new graph, G', for the SCCs (each SCC is a node in the graph), where each node has value which is the sum of the nodes in that SCC.
Formally,
G' = (V',E')
Where V' = {U1, U2, ..., Uk | U_i is a SCC of the graph G}
E' = {(U_i,U_j) | there is node u_i in U_i and u_j in U_j such that (u_i,u_j) is in E }
Then, this graph (G') is a DAG, and the question becomes simpler, and seems to be a variant of question linked in comments.
EDIT previous answer (striked out) is a mistake from this point, editing with a new answer. Sorry about that.
Now, a DFS can be used from each node to find the sum of values:
DFS(v):
if v.visited:
return 0
if v is leaf:
return v.value
v.visited = true
return sum([DFS(u) for u in v.children])
This is O(V^2 + VE) worst vase, but since the graph has less nodes, V
and E are now significantly lower.
Some local optimizations can be made, for example, if a node has a single child, you can reuse the pre-calculated value and not apply DFS on the child again, since there is no fear of counting twice in this case.
A DP solution for this problem (DAG) can be:
D[i] = value(i) + sum {D[j] | (i,j) is an edge in G' }
This can be calculated in linear time (after topological sort of the DAG).
Pseudo code:
Find SCCs
Build G'
Topological sort G'
Find D[i] for each node in G'
apply value for all node u_i in U_i, for each U_i.
Total time is O(|V|+|E|).
You can use DFS or BFS algorithms for solving Your problem.
Both have complexity O(V + E)
You dont have to count all values for all nodes. And you dont need recursion.
Just make something like this.
Typically DFS looks like this.
unmark all vertices
choose some starting vertex x
mark x
list L = x
while L nonempty
choose some vertex v from front of list
visit v
for each unmarked neighbor w
mark w
add it to end of list
In Your case You have to add some lines
unmark all vertices
choose some starting vertex x
mark x
list L = x
float sum = 0
while L nonempty
choose some vertex v from front of list
visit v
sum += v->value
for each unmarked neighbor w
mark w
add it to end of list

Dijkstra with negative edges. Don't understand the examples, they work according to CLRS pseudocode

EDIT 2: It seems this isn't from CLRS (I assumed it was because it followed the same format of CLRS code that was given to us in this Algos and DS course).
Still, in this course we were given this code as being "Dijkstra's Algorithm".
I read Why doesn't Dijkstra's algorithm work for negative weight edges? and Negative weights using Dijkstra's Algorithm (second one is specific to the OP's algorithm I think).
Looking at the Pseudocode from CLRS ("Intro to Algorithms"), I don't understand why Dijkstra wouldn't work on those examples of graphs with negative edges.
In the pseudocode (below), we Insert nodes back onto the heap if the new distance to them is shorter than the previous distance to them, so it seems to me that the distances would eventually be updated to the correct distances.
For example:
The claim here is that (A,C) will be set to 1 and never updated to the correct distance -2.
But the pseudocode from CLRS says that we first put C and B on the Heap with distances 1 and 2 respectively; then we pop C, see no outgoing edges; then we pop B, look at the edge (B,C), see that Dist[C] > Dist[B] + w(B,C), update Dist[C] to -2, put C back on the heap, see no outgoing edges and we're done.
So it worked fine.
Same for the example in the first answer to this question: Negative weights using Dijkstra's Algorithm
The author of the answer claims that the distance to C will not be updated to -200, but according to this pseudocode that's not true, since we would put B back on the heap and then compute the correct shortest distance to C.
(pseudocode from CLRS)
Dijkstra(G(V, E, ω), s ∈ V )
for v in V do
dist[v] ← ∞
prev[v] ← nil
end for
dist[s] = 0
H←{(s,0)}
while H̸=∅ do
v ← DeleteMin(H)
for (v, w) ∈ E do
if dist[w] > dist[v] + ω(v, w) then
dist[w] ← dist[v] + ω(v, w)
prev[w] ← v
Insert((w, dist[w]), H)
end if
end for
end while
EDIT: I understand that we assume that once a node is popped off the heap, the shortest distance has been found; but still, it seems (according to CLRS) that we do put nodes back on the heapif the distance is shorter than previously computed, so in the end when the algorithm is done running we should get the correct shortest distance regardless.
That implementation is technically not Dijkstra's algorithm, which is described by Dijkstra here (could not find any better link): the set A he talks about are the nodes for which the minimum path is known. So once you add a node to this set, it's fixed. You know the minimum path to it, and it no longer participates in the rest of the algorithm. It also talks about transferring nodes, so they cannot be in two sets at once.
This is in line with Wikipedia's pseudocode:
1 function Dijkstra(Graph, source):
2
3 create vertex set Q
4
5 for each vertex v in Graph: // Initialization
6 dist[v] ← INFINITY // Unknown distance from source to v
7 prev[v] ← UNDEFINED // Previous node in optimal path from source
8 add v to Q // All nodes initially in Q (unvisited nodes)
9
10 dist[source] ← 0 // Distance from source to source
11
12 while Q is not empty:
13 u ← vertex in Q with min dist[u] // Node with the least distance will be selected first
14 remove u from Q
15
16 for each neighbor v of u: // where v is still in Q.
17 alt ← dist[u] + length(u, v)
18 if alt < dist[v]: // A shorter path to v has been found
19 dist[v] ← alt
20 prev[v] ← u
21
22 return dist[], prev[]
And its heap pseudocode as well.
However, note that Wikipedia also states, at the time of this answer:
Instead of filling the priority queue with all nodes in the initialization phase, it is also possible to initialize it to contain only source; then, inside the if alt < dist[v] block, the node must be inserted if not already in the queue (instead of performing a decrease_priority operation).[3]:198
Doing this would still lead to reinserting a node in some cases with negative valued edges, such as the example graph given in the accepted answer to the second linked question.
So it seems that some authors make this confusion. In this case, they should clearly state that either this implementation works with negative edges or that it's not a proper Dijkstra's implementation.
I guess the original paper might be interpreted as a bit vague. Nowhere in it does Dijkstra make any mention of negative or positive edges, nor does he make it clear beyond any alternative interpretation that a node cannot be updated once in the A set. I don't know if he himself further clarified things in any subsequent works or speeches, or if the rest is just a matter of interpretation by others.
So from my point of view, you could argue that it's also a valid Dijkstra's.
As to why you might implement it this way: because it will likely be no slower in practice if we only have positive edges, and because it is quicker to write without having to perform additional checks or not-so-standard heap operations.

Count cycles of length 3 using DFS

Let G=(V,E) be an undirected graph. How can we count cycles of length 3 exactly once using following DFS:
DFS(G,s):
foreach v in V do
color[v] <- white; p[v] <- nil
DFS-Visit(s)
DFS-Visit(u)
color[u] <- grey
foreach v in Adj[u] do
if color[v] = white then
p[v] = u; DFS-Visit(v)
color[u] <- black
There is a cycle whenever we discover a node that already has been discovered (grey). The edge to that node is called back edge. The cycle has length 3 when p[p[p[v]]] = v, right? So
DFS-Visit(u)
color[u] <- grey
foreach v in Adj[u] do
if color[v] = grey and p[p[p[v]]] = v then
// we got a cycle of length 3
else if color[v] = white then
p[v] = u; DFS-Visit(v)
color[u] <- black
However how can I create a proper counter to count the number of cycles and how can I count each cycle only once?
I'm not sure to understand how your condition parent[parent[parent[v]]] == v works. IMO it should never be true as long as parent represents a structure of tree (because it should correspond to the spanning tree associated with the DFS).
Directed graphs
Back edges, cross edges and forward edges can all "discover" new cycles. For example:
We separate the following possibilities (let's say you reach a u -> v edge):
Back edge: u and v belongs to the same 3-cycle iff parent[parent[u]] = v.
Cross edge: u and v belongs to the same 3-cycle iff parent[u] = parent[v].
Forward edge: u and v belongs to the same 3-cycle iff parent[parent[v]] = u.
Undirected graphs
There are no more cross edges. Back edges and forward edges are redundant. Therefore you only have to check back edges: when you reach a u -> v back edge, u and v belongs to the same 3-cycle iff parent[parent[u]] = v.
def dfs(u):
color[u] = GREY
for v in adj[u]:
# Back edge
if color[v] == GREY:
if parent[parent[u]] == v:
print("({}, {}, {})".format(v + 1, parent[u] + 1, u + 1))
# v unseen
elif color[v] == WHITE:
parent[v] = u
dfs(v)
color[u] = BLACK
If you want to test it:
WHITE, GREY, BLACK = 0, 1, 2
nb_nodes, nb_edges = map(int, input().split())
adj = [[] for _ in range(nb_nodes)]
for _ in range(nb_edges):
u, v = map(int, input().split())
adj[u - 1].append(v - 1)
adj[v - 1].append(u - 1)
parent = [None] * nb_nodes
color = [WHITE] * nb_nodes
If a solution without using DFS is okay, there is an easy solution which runs in O(NMlog(N³)) where N is the number of vertices in the graph and M is the number of edges.
We are going to iterate over edges instead of iterating over vertices. For every edge u-v, we have to find every vertex which is connected to both u and v. We can do this by iterating over every vertex w in the graph and checking if there is an edge v-w and w-u. Whenever you find such vertex, order u,v,w and add the ordered triplet to a BBST that doesn't allow repetitions (eg: std::set in C++). The count of length 3 cycles will be exactly the size of the BBST (amount of elements added) after you check every edge in the graph.
Let's analyze the complexity of the algorithm:
We iterate over every edge. Current complexity is O(M)
For each edge, we iterave over every vertex. Current complexity is O(NM)
For each (edge,vertex) pair that forms a cycle, we are going to add a triplet to a BBST. Adding to a BBST has O(log(K)) complexity where K is the size of the BST. In worst case, every triplet of vertices forms a cycle, so we may add up to O(N³) elements to the BST, and the complexity to add some element can get as high as O(log(N³)). Final complexity is O(NMlog(N³)) then. This may sound like a lot, but in worst case M = O(N²) so the complexity will be O(N³log(N³)). Since we may have up to O(N³) cycles of length 3, our algorithm is just a log factor away from an optimal algorithm.

Resources