Simple Paths between 2 nodes - algorithm

I know that myself,and many others probably stuch here,
Well,according to the CLRS(3 edition,22.4.2),there is a O(n) algorithm for finding all simple paths between 2 nodes in a directed acyclic graph.
I went through similar questions,Number of paths between two nodes in a DAG and All the paths between 2 nodes in graph,but on both occasions,no proper explanation or pseudocode is mentioned,or if it is,i doubt that is it the most efficient one (O(n)).
If someone could really post one exact code,or pseudocode,which settles the deal,because as i went through all those above links,i didnt really find 1 single answer which stands Tall.
It would be better if the code also handles cyclic graphs,i.e,IF there is a cycle in the graph,but If no path between two nodes contains the cycle,the number of paths SHOULD be FINITE,else INFINITE.

Jeremiah Willcock's answer is correct but light on details. Here's the linear-time algorithm for DAGs.
for each node v, initialize num_paths[v] = 0
let t be the destination and set num_paths[t] = 1
for each node v in reverse topological order (sinks before sources):
for each successor w of v:
set num_paths[v] = num_paths[v] + num_paths[w]
let s be the origin and return num_paths[s]
I'm pretty sure the problem for general directed graphs is #P-complete, but I couldn't find anything in a couple minutes of searching except an unsourced question on cstheory.
Okay, here's some pseudocode. I've integrated the contents of the previous algorithm with the topological sort and added some cycle detection logic. In case of a cycle between s and t, the values of num_paths may be inaccurate but will be zero-nonzero depending on whether t is reachable. Not every node in a cycle will have in_cycle set to true, but every SCC root (in the sense of Tarjan's SCC algorithm) will, which suffices to trigger the early exit if and only if there is a cycle between s and t.
REVISED ALGORITHM
let the origin be s
let the destination be t
for each node v, initialize
color[v] = WHITE
num_paths[v] = 0
in_cycle[v] = FALSE
num_paths[t] = 1
let P be an empty stack
push (ENTER, s) onto P
while P is not empty:
pop (op, v) from P
if op == ENTER:
if color[v] == WHITE:
color[v] = GRAY
push (LEAVE, v) onto P
for each successor w of v:
push (ENTER, w) onto P
else if color[v] == GRAY:
in_cycle[v] = TRUE
else: # op == LEAVE
color[v] = BLACK
for each successor w of v:
set num_paths[v] = num_paths[v] + num_paths[w]
if num_paths[v] > 0 and in_cycle[v]:
return infinity
return num_paths[s]

Related

The special case for using Tarjan's algorithm to find bridges in directed graphs

I am trying to get better understanding of Tarjan's algorithm for finding SCC, articulation points and bridges. I am considering a special case where the graph contains only 2 nodes with edges 0->1 and 1->0. The following code will output [0,1] as a bridge.
class Solution(object):
def criticalConnections(self, n, connections):
"""
:type n: int
:type connections: List[List[int]]
:rtype: List[List[int]]
"""
g = defaultdict(set)
pre = [-1]*n
low = [-1]*n
cnt = [0]
for c in connections:
g[c[0]].add(c[1]) # undirected graph, connect
g[c[1]].add(c[0]) # in both directions
ans = []
def dfs(edge):
v, w = edge
pre[w] = cnt[0]
low[w] = pre[w]
cnt[0] += 1
for i in g[w]:
if i == v: continue # we don't want to go back through the same path.
# if we go back is because we found another way back
if pre[i] == -1:
dfs((w,i))
# low[i] > pre[w] indicates no back edge to
# w's ancesters; otherwise, low[i] will be
# < pre[w]+1 since back edge makes low[i] smaller
if low[i] > pre[w]:
#print(low[i], pre[w]+1, (w,i))
ans.append([w,i])
low[w] = min(low[w], low[i]) # low[i] might be an ancestor of w
else: # if i was already discovered means that we found an ancestor
low[w] = min(low[w], pre[i]) # finds the ancestor with the least
# discovery time
dfs((-1,0))
return ans
print(Solution().criticalConnections(2, [[0,1],[1,0]]))
However, from many discussions online, after removing node 1, node 0 can still be considered as connected (to itself) which means edge 0->1 is not a bridge. Am I missing something here?
Or Tarjan's algorithm is not suitable for this kind of degenerate graph with 2 nodes?
A bridge in a directed graph is an edge whose deletion increases the graph's number of strongly connected components, and the number connected components when the graph is undirected. So when you remove any edge in your graph then the number of strongly connected components increases so the output of this code is correct in this case.

Returning all shortest paths in lowest run time and complexity

This post has is the result that constantly appears for this problem but doesn't provide an optimal solution.
Currently I am trying to return all shortest paths starting atfrom and ending at target using BFS but I am running into a bottleneck with either my algorithm or the data structures I use.
pseudocode:
// The graph is an adjacency list of type unordered_map<string, unordered_set<string>>
// deque with pair of (visited unordered_set, vector with current path)
deque q = [({from}, [from]);
while q:
pair = q.dequeue()
visited = pair.first
path = pair.second
foreach adjacent_node to path[-1] in the graph:
if (adjacent_node == target):
res.append(path + [adjacent_node])
else if adjacent_node not in visited:
newPath = path + [adjacent_node]
visited.add(adjacent_node)
q.push((visited, newPath))
Currently the bottleneck seems to be with the queue's pair of items. I'm unsure how to solve the problem without storing a visited set with every path, or without copying a new path into the queue.
Firstly you should know that number of shortest paths can be huge and returning them all is not practical. Consider a graph with 2k+1 layers numbered from 1 to 2k+1, in which each layer is fully connected with the next layer, and odd layers has only one point while even layers has q points. Although this graph only has k(q+1)+1 nodes and kq edges, there are in total q^k different shortest paths which can be inefficient for normal computers to handle. However if you're sure that the number of shortest paths is relatively small I can introduce the following algorithm.
The basic idea is to store a list back for each node, meaning the shortest distance between from and x equals to the shortest distance between from and v plus one if and only if v in back[x]. back[x] can be computed during the process. Then you can perform a depth-first search to print all the shortest path. Pseudo code (BTW I noticed that your code is not correct):
queue q = [ from ]
visited = set<node>
back = map<node, list<node>>
while q.not_empty():
now = q.front()
if (now == target):
continue
foreach adjacent_node to now in the graph:
if (adjacent_node in visited):
back[adjacent_node].push(now)
else:
visited.add(adjacent_node)
back[adjacent_node] = [ now ]
q.push(adjacent_node)
# Now collect all shortest paths
ret = []
current = []
def collect(x):
current.push(x)
if (x == from):
ret.push(current.reversed())
return
foreach v in back[x]:
collect(v)
current.pop()
Sorry for my poor English. Feel free to point out my typos and mistakes.

sort graph by distance to end nodes

I have a list of nodes which belong in a graph. The graph is directed and does not contain cycles. Also, some of the nodes are marked as "end" nodes. Every node has a set of input nodes I can use.
The question is the following: How can I sort (ascending) the nodes in the list by the biggest distance to any reachable end node? Here is an example off how the graph could look like.
I have already added the calculated distance after which I can sort the nodes (grey). The end nodes have the distance 0 while C, D and G have the distance 1. However, F has the distance of 3 because the approach over D would be shorter (2).
I have made a concept of which I think, the problem would be solved. Here is some pseudo-code:
sortedTable<Node, depth> // used to store nodes and their currently calculated distance
tempTable<Node>// used to store nodes
currentDepth = 0;
- fill tempTable with end nodes
while( tempTable is not empty)
{
- create empty newTempTable<Node node>
// add tempTable to sortedTable
for (every "node" in tempTable)
{
if("node" is in sortedTable)
{
- overwrite depth in sortedTable with currentDepth
}
else
{
- add (node, currentDepth) to sortedTable
}
// get the node in the next layer
for ( every "newNode" connected to node)
{
- add newNode to newTempTable
}
- tempTable = newTempTable
}
currentDepth++;
}
This approach should work. However, the problem with this algorithm is that it basicly creates a tree from the graph based from every end node and then corrects old distance-calculations for every depth. For example: G would have the depth 1 (calculatet directly over B), then the depth 3 (calculated over A, D and F) and then depth 4 (calculated over A, C, E and F).
Do you have a better solution to this problem?
It can be done with dynamic programming.
The graph is a DAG, so first do a topological sort on the graph, let the sorted order be v1,v2,v3,...,vn.
Now, set D(v)=0 for all "end node", and from last to first (according to topological order) do:
D(v) = max { D(u) + 1, for each edge (v,u) }
It works because the graph is a DAG, and when done in reversed to the topological order, the values of all D(u) for all outgoing edges (v,u) is already known.
Example on your graph:
Topological sort (one possible):
H,G,B,F,D,E,C,A
Then, the algorithm:
init:
D(B)=D(A)=0
Go back from last to first:
D(A) - no out edges, done
D(C) = max{D(A) + 1} = max{0+1}=1
D(E) = max{D(C) + 1} = 2
D(D) = max{D(A) + 1} = 1
D(F) = max{D(E)+1, D(D)+1} = max{2+1,1+1} = 3
D(B) = 0
D(G) = max{D(B)+1,D(F)+1} = max{1,4}=4
D(H) = max{D(G) + 1} = 5
As a side note, if the graph is not a DAG, but a general graph, this is a variant of the Longest Path Problem, which is NP-Complete.
Luckily, it does have an efficient solution when our graph is a DAG.

minimum weight vertex cover of a tree

There's an existing question dealing with trees where the weight of a vertex is its degree, but I'm interested in the case where the vertices can have arbitrary weights.
This isn't homework but it is one of the questions in the algorithm design manual, which I'm currently reading; an answer set gives the solution as
Perform a DFS, at each step update Score[v][include], where v is a vertex and include is either true or false;
If v is a leaf, set Score[v][false] = 0, Score[v][true] = wv, where wv is the weight of vertex v.
During DFS, when moving up from the last child of the node v, update Score[v][include]:
Score[v][false] = Sum for c in children(v) of Score[c][true] and Score[v][true] = wv + Sum for c in children(v) of min(Score[c][true]; Score[c][false])
Extract actual cover by backtracking Score.
However, I can't actually translate that into something that works. (In response to the comment: what I've tried so far is drawing some smallish graphs with weights and running through the algorithm on paper, up until step four, where the "extract actual cover" part is not transparent.)
In response Ali's answer: So suppose I have this graph, with the vertices given by A etc. and the weights in parens after:
A(9)---B(3)---C(2)
\ \
E(1) D(4)
The right answer is clearly {B,E}.
Going through this algorithm, we'd set values like so:
score[D][false] = 0; score[D][true] = 4
score[C][false] = 0; score[C][true] = 2
score[B][false] = 6; score[B][true] = 3
score[E][false] = 0; score[E][true] = 1
score[A][false] = 4; score[A][true] = 12
Ok, so, my question is basically, now what? Doing the simple thing and iterating through the score vector and deciding what's cheapest locally doesn't work; you only end up including B. Deciding based on the parent and alternating also doesn't work: consider the case where the weight of E is 1000; now the correct answer is {A,B}, and they're adjacent. Perhaps it is not supposed to be confusing, but frankly, I'm confused.
There's no actual backtracking done (or needed). The solution uses dynamic programming to avoid backtracking, since that'd take exponential time. My guess is "backtracking Score" means the Score contains the partial results you would get by doing backtracking.
The cover vertex of a tree allows to include alternated and adjacent vertices. It does not allow to exclude two adjacent vertices, because it must contain all of the edges.
The answer is given in the way the Score is recursively calculated. The cost of not including a vertex, is the cost of including its children. However, the cost of including a vertex is whatever is less costly, the cost of including its children or not including them, because both things are allowed.
As your solution suggests, it can be done with DFS in post-order, in a single pass. The trick is to include a vertex if the Score says it must be included, and include its children if it must be excluded, otherwise we'd be excluding two adjacent vertices.
Here's some pseudocode:
find_cover_vertex_of_minimum_weight(v)
find_cover_vertex_of_minimum_weight(left children of v)
find_cover_vertex_of_minimum_weight(right children of v)
Score[v][false] = Sum for c in children(v) of Score[c][true]
Score[v][true] = v weight + Sum for c in children(v) of min(Score[c][true]; Score[c][false])
if Score[v][true] < Score[v][false] then
add v to cover vertex tree
else
for c in children(v)
add c to cover vertex tree
It actually didnt mean any thing confusing and it is just Dynamic Programming, you seems to almost understand all the algorithm. If I want to make it any more clear, I have to say:
first preform DFS on you graph and find leafs.
for every leaf assign values as the algorithm says.
now start from leafs and assign values to each leaf parent by that formula.
start assigning values to parent of nodes that already have values until you reach the root of your graph.
That is just it, by backtracking in your algorithm it means that you assign value to each node that its child already have values. As I said above this kind of solving problem is called dynamic programming.
Edit just for explaining your changes in the question. As you you have the following graph and answer is clearly B,E but you though this algorithm just give you B and you are incorrect this algorithm give you B and E.
A(9)---B(3)---C(2)
\ \
E(1) D(4)
score[D][false] = 0; score[D][true] = 4
score[C][false] = 0; score[C][true] = 2
score[B][false] = 6 this means we use C and D; score[B][true] = 3 this means we use B
score[E][false] = 0; score[E][true] = 1
score[A][false] = 4 This means we use B and E; score[A][true] = 12 this means we use B and A.
and you select 4 so you must use B and E. if it was just B your answer would be 3. but as you find it correctly your answer is 4 = 3 + 1 = B + E.
Also when E = 1000
A(9)---B(3)---C(2)
\ \
E(1000) D(4)
it is 100% correct that the answer is B and A because it is wrong to use E just because you dont want to select adjacent nodes. with this algorithm you will find the answer is A and B and just by checking you can find it too. suppose this covers :
C D A = 15
C D E = 1006
A B = 12
Although the first two answer have no adjacent nodes but they are bigger than last answer that have adjacent nodes. so it is best to use A and B for cover.

Shortest path with depth first search

How can I get the shortest path using DFS. I've seen this question asked several times, but the replies are generally use BFS or a different algorithm. What about in the context of a robot traversing a maze? BFS isn't possible since it jumps from node to node and a robot would require backtracking.
Right now I am trying to solve the problem using:
def dfs(self, v):
v.visited = True
for adj in v.adj:
if adj.visited is False:
# set the parent
adj.successor = v
# explore the other nodes
self.dfs(adj)
However, this does not necessarily return the shortest path. Is there another way to approach this problem? I've seen some suggestions to use depth first iterative deepening, but I can't find many examples implementing this algorithm.
If anyone is interested in how I solved this here is my solution:
def dfs(self, v):
if v.successor is None: # this is the root node
v.cost = 0
v.visited = True
for adj in v.adj:
if adj.visited is False:
# set the parent
adj.successor = v
adj.cost = v.cost + 1
self.dfs(adj)
# if the cost is less switch the successor
# and set the new cost
elif adj.cost > v.cost + 1:
adj.successor = v
adj.cost = v.cost + 1
self.dfs(adj)
This is essentially a version of DFS that keeps track of the cost for each vertex. If the cost of the vertex is more than a successor + 1 step then it sets the new successor. This allows for finding the shortest path without using BFS.

Resources