Find a path whose length can be divided by 3 - algorithm

Given an undirected graph (not weighted) and two vertices u and v,
how can I find a path between u and v whose length is divisible by 3?
Note that the path need not necessarily be a simple path.
I thought about a variation of DFS and a stack that stores the path (and for backtracking), but cannot fully understand how to keep track of non-simple paths as well.
The time complexity should be O(V+E), so I expect it must be a variant of BFS or DFS.

One way to do this is to compute a modified version of the graph and do a BFS or DFS on that graph.
Imagine stacking the graph on top of itself three times. Each node now appears three times. Annotate the first copy as "mod 0," the second copy as "mod 1," and the third copy as "mod 2." Then, change the edges so that any edge from a node u to a node v always goes from the node u to the node v in the next layer of the graph. Thus if there was an edge from u to v, there is now an edge from u mod 0 to v mod 1, u mod 1 to v mod 2, and u mod 2 to v mod 0. If you do a BFS or DFS over this graph and find a path from u mod 0 to any node v mod 0, you necessarily have a path whose length must be a multiple of three.
You can explicitly construct this graph in time O(m + n) by copying the graph two times and rewiring the edges appropriately, and from there a BFS or DFS will take time O(m + n). This uses memory Θ(m + n), though.
An alternative solution would be to simulate doing this without actually building the new graph. Do a BFS, and store for each node three distances - a mod 0 distance, a mod 1 distance, and a mod 2 distance. Whenever you dequeue a node from the queue, enqueue its successors, but tag them as being at the next mod layer (e.g. if you dequeued a node at level mod 0, enqueue its successors at mod 1, etc.) You can independently track whether you have reached a node at distances mod 0, mod 1, and mod 2, and should not enqueue a node at a given mod level multiple times. This also takes time O(m + n), but doesn't explicitly construct the second graph and thus only requires O(n) storage space.
Hope this helps!

A cheating way:
Just DFS/BFS from A to B; if the length L % 3 == 0, finish; L % 3 == 1, walk to a neighbour and back; else, walk to a neighbour and back and there and back again.
If that does not meet your constraints, then:
You could do this using a modified BFS. While doing the search, try to mark all the cycles you encounter; you can get all the simple cycles along with the lengths.
After that, if the path from A to B has length L % 3 == 0, then you found it.
If not, then in the case where all the cycle has lengths Lk % 3 == 0, there's no solution;
If there are some cycles with length K % 3 != 0, you could first go from A to this cycle, cycle one or two times then go back to A then to B. You are guaranteed to find a length 3 path this way.

Here i a pseudo code. It uses BFS and, simple counter and modulo test.
i:= 0
Enqueue the node u
i++, dequeue a node and examine it
If the element v is found in this node and i mod 3 == 0 , quit the search and return a result.
Otherwise enqueue any successors (the direct child nodes) that have not yet been discovered.
If the queue is empty, every node on the graph has been examined – quit the search and return "not found".
If the queue is not empty, repeat from Step 2.
NOTE: replacing queue with a stack should work as well. It will be DFS then.

Related

How to count all reachable nodes in a directed graph?

There is a directed graph (which might contain cycles), and each node has a value on it, how could we get the sum of reachable value for each node. For example, in the following graph:
the reachable sum for node 1 is: 2 + 3 + 4 + 5 + 6 + 7 = 27
the reachable sum for node 2 is: 4 + 5 + 6 + 7 = 22
.....
My solution: To get the sum for all nodes, I think the time complexity is O(n + m), the n is the number of nodes, and m stands for the number of edges. DFS should be used,for each node we should use a method recursively to find its sub node, and save the sum of sub node when finishing the calculation for it, so that in the future we don't need to calculate it again. A set is needed to be created for each node to avoid endless calculation caused by loop.
Does it work? I don't think it is elegant enough, especially many sets have to be created. Is there any better solution? Thanks.
This can be done by first finding Strongly Connected Components (SCC), which can be done in O(|V|+|E|). Then, build a new graph, G', for the SCCs (each SCC is a node in the graph), where each node has value which is the sum of the nodes in that SCC.
Formally,
G' = (V',E')
Where V' = {U1, U2, ..., Uk | U_i is a SCC of the graph G}
E' = {(U_i,U_j) | there is node u_i in U_i and u_j in U_j such that (u_i,u_j) is in E }
Then, this graph (G') is a DAG, and the question becomes simpler, and seems to be a variant of question linked in comments.
EDIT previous answer (striked out) is a mistake from this point, editing with a new answer. Sorry about that.
Now, a DFS can be used from each node to find the sum of values:
DFS(v):
if v.visited:
return 0
if v is leaf:
return v.value
v.visited = true
return sum([DFS(u) for u in v.children])
This is O(V^2 + VE) worst vase, but since the graph has less nodes, V
and E are now significantly lower.
Some local optimizations can be made, for example, if a node has a single child, you can reuse the pre-calculated value and not apply DFS on the child again, since there is no fear of counting twice in this case.
A DP solution for this problem (DAG) can be:
D[i] = value(i) + sum {D[j] | (i,j) is an edge in G' }
This can be calculated in linear time (after topological sort of the DAG).
Pseudo code:
Find SCCs
Build G'
Topological sort G'
Find D[i] for each node in G'
apply value for all node u_i in U_i, for each U_i.
Total time is O(|V|+|E|).
You can use DFS or BFS algorithms for solving Your problem.
Both have complexity O(V + E)
You dont have to count all values for all nodes. And you dont need recursion.
Just make something like this.
Typically DFS looks like this.
unmark all vertices
choose some starting vertex x
mark x
list L = x
while L nonempty
choose some vertex v from front of list
visit v
for each unmarked neighbor w
mark w
add it to end of list
In Your case You have to add some lines
unmark all vertices
choose some starting vertex x
mark x
list L = x
float sum = 0
while L nonempty
choose some vertex v from front of list
visit v
sum += v->value
for each unmarked neighbor w
mark w
add it to end of list

Complete graph with only two possible costs. What's the shortest path's cost from 0 to N - 1

You are given a complete undirected graph with N vertices. All but K edges have a cost of A. Those K edges have a cost of B and you know them (as a list of pairs). What's the minimum cost from node 0 to node N - 1.
2 <= N <= 500k
0 <= K <= 500k
1 <= A, B <= 500k
The problem is, obviously, when those K edges cost more than the other ones and node 0 and node N - 1 are connected by a K-edge.
Dijkstra doesn't work. I've even tried something very similar with a BFS.
Step1: Let G(0) be the set of "good" adjacent nodes with node 0.
Step2: For each node in G(0):
compute G(node)
if G(node) contains N - 1
return step
else
add node to some queue
repeat step2 and increment step
The problem is that this uses up a lot of time due to the fact that for every node you have to make a loop from 0 to N - 1 in order to find the "good" adjacent nodes.
Does anyone have any better ideas? Thank you.
Edit: Here is a link from the ACM contest: http://acm.ro/prob/probleme/B.pdf
This is laborous case work:
A < B and 0 and N-1 are joined by A -> trivial.
B < A and 0 and N-1 are joined by B -> trivial.
B < A and 0 and N-1 are joined by A ->
Do BFS on graph with only K edges.
A < B and 0 and N-1 are joined by B ->
You can check in O(N) time is there is a path with length 2*A (try every vertex in middle).
To check other path lengths following algorithm should do the trick:
Let X(d) be set of nodes reachable by using d shorter edges from 0. You can find X(d) using following algorithm: Take each vertex v with unknown distance and iterativelly check edges between v and vertices from X(d-1). If you found short edge, then v is in X(d) otherwise you stepped on long edge. Since there are at most K long edges you can step on them at most K times. So you should find distance of each vertex in at most O(N + K) time.
I propose a solution to a somewhat more general problem where you might have more than two types of edges and the edge weights are not bounded. For your scenario the idea is probably a bit overkill, but the implementation is quite simple, so it might be a good way to go about the problem.
You can use a segment tree to make Dijkstra more efficient. You will need the operations
set upper bound in a range as in, given U, L, R; for all x[i] with L <= i <= R, set x[i] = min(x[i], u)
find a global minimum
The upper bounds can be pushed down the tree lazily, so both can be implemented in O(log n)
When relaxing outgoing edges, look for the edges with cost B, sort them and update the ranges in between all at once.
The runtime should be O(n log n + m log m) if you sort all the edges upfront (by outgoing vertex).
EDIT: Got accepted with this approach. The good thing about it is that it avoids any kind of special casing. It's still ~80 lines of code.
In the case when A < B, I would go with kind of a BFS, where you would check where you can't reach instead of where you can. Here's the pseudocode:
G(k) is the set of nodes reachable by k cheap edges and no less. We start with G(0) = {v0}
while G(k) isn't empty and G(k) doesn't contain vN-1 and k*A < B
A = array[N] of zeroes
for every node n in G(k)
for every expensive edge (n,m)
A[m]++
# now we have that A[m] == |G(k)| iff m can't be reached by a cheap edge from any of G(k)
set G(k+1) to {m; A[m] < |G(k)|} except {n; n is in G(0),...G(k)}
k++
This way you avoid iterating through the (many) cheap edges and only iterate through the relatively few expensive edges.
As you have correctly noted, the problem comes when A > B and edge from 0 to n-1 has a cost of A.
In this case you can simply delete all edges in the graph that have a cost of A. This is because an optimal route shall only have edges with cost B.
Then you can perform a simple BFS since the costs of all edges are the same. It will give you optimal performance as pointed out by this link: Finding shortest path for equal weighted graph
Moreover, you can stop your BFS when the total cost exceeds A.

Time complexity of Prim's MST Algorithm

Can someone explain to me why is Prim's Algorithm using adjacent matrix result in a time complexity of O(V2)?
(Sorry in advance for the sloppy looking ASCII math, I don't think we can use LaTEX to typeset answers)
The traditional way to implement Prim's algorithm with O(V^2) complexity is to have an array in addition to the adjacency matrix, lets call it distance which has the minimum distance of that vertex to the node.
This way, we only ever check distance to find the next target, and since we do this V times and there are V members of distance, our complexity is O(V^2).
This on it's own wouldn't be enough as the original values in distance would quickly become out of date. To update this array, all we do is at the end of each step, iterate through our adjacency matrix and update the distance appropriately. This doesn't affect our time complexity since it merely means that each step takes O(V+V) = O(2V) = O(V). Therefore our algorithm is O(V^2).
Without using distance we have to iterate through all E edges every single time, which at worst contains V^2 edges, meaning our time complexity would be O(V^3).
Proof:
To prove that without the distance array it is impossible to compute the MST in O(V^2) time, consider that then on each iteration with a tree of size n, there are V-n vertices to potentially be added.
To calculate which one to choose we must check each of these to find their minimum distance from the tree and then compare that to each other and find the minimum there.
In the worst case scenario, each of the nodes contains a connection to each node in the tree, resulting in n * (V-n) edges and a complexity of O(n(V-n)).
Since our total would be the sum of each of these steps as n goes from 1 to V, our final time complexity is:
(sum O(n(V-n)) as n = 1 to V) = O(1/6(V-1) V (V+1)) = O(V^3)
QED
Note: This answer just borrows jozefg's answer and tries to explain it more fully since I had to think a bit before I understood it.
Background
An Adjacency Matrix representation of a graph constructs a V x V matrix (where V is the number of vertices). The value of cell (a, b) is the weight of the edge linking vertices a and b, or zero if there is no edge.
Adjacency Matrix
A B C D E
--------------
A 0 1 0 3 2
B 1 0 0 0 2
C 0 0 0 4 3
D 3 0 4 0 1
E 2 2 3 1 0
Prim's Algorithm is an algorithm that takes a graph and a starting node, and finds a minimum spanning tree on the graph - that is, it finds a subset of the edges so that the result is a tree that contains all the nodes and the combined edge weights are minimized. It may be summarized as follows:
Place the starting node in the tree.
Repeat until all nodes are in the tree:
Find all edges that join nodes in the tree to nodes not in the tree.
Of those edges, choose one with the minimum weight.
Add that edge and the connected node to the tree.
Analysis
We can now start to analyse the algorithm like so:
At every iteration of the loop, we add one node to the tree. Since there are V nodes, it follows that there are O(V) iterations of this loop.
Within each iteration of the loop, we need to find and test edges in the tree. If there are E edges, the naive searching implementation uses O(E) to find the edge with minimum weight.
So in combination, we should expect the complexity to be O(VE), which may be O(V^3) in the worst case.
However, jozefg gave a good answer to show how to achieve O(V^2) complexity.
Distance to Tree
| A B C D E
|----------------
Iteration 0 | 0 1* # 3 2
1 | 0 0 # 3 2*
2 | 0 0 4 1* 0
3 | 0 0 3* 0 0
4 | 0 0 0 0 0
NB. # = infinity (not connected to tree)
* = minimum weight edge in this iteration
Here the distance vector represents the smallest weighted edge joining each node to the tree, and is used as follows:
Initialize with the edge weights to the starting node A with complexity O(V).
To find the next node to add, simply find the minimum element of distance (and remove it from the list). This is O(V).
After adding a new node, there are O(V) new edges connecting the tree to the remaining nodes; for each of these determine if the new edge has less weight than the existing distance. If so, update the distance vector. Again, O(V).
Using these three steps reduces the searching time from O(E) to O(V), and adds an extra O(V) step to update the distance vector at each iteration. Since each iteration is now O(V), the overall complexity is O(V^2).
First of all, it's obviously at least O(V^2), because that is how big the adjacency matrix is.
Looking at http://en.wikipedia.org/wiki/Prim%27s_algorithm, you need to execute the step "Repeat until Vnew = V" V times.
Inside that step, you need to work out the shortest link between any vertex in V and any vertex outside V. Maintain an array of size V, holding for each vertex either infinity (if the vertex is in V) or the length of the shortest link between any vertex in V and that vertex, and its length (so in the beginning this just comes from the length of links between the starting vertex and every other vertex). To find the next vertex to add to V, just search this array, at cost V. Once you have a new vertex, look at all the links from that vertex to every other vertex and see if any of them give shorter links from V to that vertex. If they do, update the array. This also cost V.
So you have V steps (V vertexes to add) each taking cost V, which gives you O(V^2)

How to find the number of different shortest paths between two vertices, in directed graph and with linear-time?

Here is the exercise:
Let v and w be two vertices in a directed graph G = (V, E). Design a linear-time algorithm to find the number of different shortest paths (not necessarily vertex disjoint) between v and w. Note: the edges in G are unweighted
For this excise, I summarise as follows:
It is a directed graph
It asks for the number of different shortest paths. First, the paths should be shortest, then there might be more than one such shortest paths whose length are the same.
between v and w, so both from v to w and from w to v should be counted.
linear-time.
The graph is not weighted.
From the points above, I have the following thoughts:
I don't need to use Dijkstra’s Algorithm because the graph is not weighted and we are try to find all shortest paths, not just single one.
I maintain a count for the number of shortest paths
I would like to use BFS from v first and also maintain a global level information
I increase the global level by one each time then BFS reaches a new level
I also maintain the shortest level info for shortest path to w
The first time I meet w while traveling, I assign the global level to the shortest level and count++;
as long as the global level equals to the shortest level, I increase count each time I met w again.
if the global level becomes bigger than the shortest level, I terminate the travelling, because I am looking for shortest path not path.
Then I do 2 - 8 again for w to v
Is my algorithm correct? If I do v to w and then w to v, is that still considered as linear-time?
Here are some ideas on this.
There can only be multiple shortest paths from v->w through node x, either if there are multiple paths into x through the same vertice or if x is encountered multiple time at the same DFS level.
Proof: If there are multiple paths entering x through the same vertex there are obviously multiple ways through x. This is simple. Now let us assume there is only one way into x through each vertex going into x (at maximum).
If x has been encountered before, none of the current paths can contribute to another shortest path. Since x has been encountered before, all paths that can follow will be at least one longer than the previous shortest path. Hence none of these paths can contribute to the sum.
This means however we encounter each node at most once and are done. So a normal BFS is just fine.
We do not even need to know the level, instead we can get the final number once we encounter the final node.
This can be compiled into a very simple algorithm, which is mainly just BFS.
- Mark nodes as visited as usual with BFS.
- Instead of adding just nodes to the queue in the DFS add nodes plus number of incoming paths.
- If a node that has been visited should be added ignore it.
- If you find a node again, which is currently in the queue, do not add it again, instead add the counts together.
- Propagate the counts on the queue when adding new nodes.
- when you encounter the final, the number that is stored with it, is the number of possible paths.
Your algorithm breaks on a graph like
* * * 1
/ \ / \ / \ / \
v * * * w
\ / \ / \ / \ /
* * * 2
with all edges directed left to right. It counts two paths, one through 1 and the other through 2, but both 1 and 2 can be reached from v by eight different shortest paths, for a total of sixteen.
As qrqrq shows, your algorithm fails on some graphs, but the idea of BFS is good. Instead, maintain an array z of size |V| which you initialize to zero; keep the number of shortest paths to a discovered vertex i at distance less than level in z[i]. Also maintain an array d of size |V| such that d[i] is the distance from v to vertex i if that distance is less than level. Initialize level to 0, d[v] to 0, and z[v] to 1 (there is a single path of length 0 from v to v), and set all other entries of d to -1 and of z to 0.
Now whenever you encounter an edge from i to j in your BFS, then:
If d[j] = -1, then set d[j] := level and z[j] := z[i].
If d[j] = level, then set z[j] := z[j] + z[i].
Otherwise, do nothing.
The reason is that for every shortest path from v to i, there is one shortest path from v to j. This will give the number of shortest paths from v to each vertex in linear time. Now do the same again but starting from w.
This algorithm looks correct to me.
BFS, as you know, is a linear time (O(N)) search because the time T required to complete it is, at worst, T = C + a * N, where N is the number of nodes and C, a are any fixed constants.
In your case, performing the search twice - first from v to w, and then from w to v - is (at worst) 2T, or 2C + 2a * N, which also satisfies the linear time requirement O(N) if you define a new C' = 2C, and a new a' = 2a, because both C' and a' are also fixed constants.
int edgeCb( graphPT g, int x, int y )
{
if ( dist[ y ] > dist[ x ] + 1 ) {
dist[ y ] = dist[ x ] + 1; // New way
ways[ y ] = ways[ x ]; // As many ways as it's parent can be reached
} else if ( dist[ y ] == dist[ x ] + 1 ) {
ways[ y ] += ways[ x ]; // Another way
} else {
// We already found a way and that is the best
assert( dist[ y ] < g->nv );
}
return 1;
}
Above code is giving me proper results for all kind of graphs mentioned in this post. Basically it is a edge callback for the BFS traversal.
dist[ start ] = 0;
ways[ start ] = 1;
for rest all vertices
dist[ x ] = numberOfVertices; // This is beyond the max possible disatance
BFS( g, start );
If ways[ end ] is not zero then that represents the number of ways and dist[ end ] represents the shortest distance.
Incase ways[ end ] == 0 means end can't be reached from start.
Please let me know if there are any loop holes in this.
Simplest solution by changing BFS:
count(v) = 0, count(s) = 1. for every neighbour u of v, if(d(v) + 1 == d(u)), then count(u) += count(v). now reset everything and do the same from the end vertex.
Can i do it this way
I traverse using BFS till I reach the destination vertex and maintain levels
Once i reach the destination level, I use the level table as follows
From the level table, i start traversing back counting the number of parents to the vertex in our path(first time it would be the destination vertex).
At every step, I multiply the number of distinct parents found at that particular level to the the shortest paths I can have to the destination vertex.
I move up the levels, only considering the nodes that fall into my path and multiply the number of distinct parents found at each level till I reach the level 0.
Does this work?
Just check the good explanation given here:
https://www.geeksforgeeks.org/number-shortest-paths-unweighted-directed-graph/
Briefly, we can modify any shortest path algorithm,
and when the update step comes increases a counter for the number of
shortest path previous discovered when the current path proposal has the
same length of the shortest path found until that moment.
In the particular case, when it's a graph unweighted or with constant
weight for all edges, the easiest way is to modify a BFS.

Number of paths between two nodes in a DAG

I want to find number of paths between two nodes in a DAG. O(V^2) and O(V+E) are acceptable.
O(V+E) reminds me to somehow use BFS or DFS but I don't know how.
Can somebody help?
Do a topological sort of the DAG, then scan the vertices from the target backwards to the source. For each vertex v, keep a count of the number of paths from v to the target. When you get to the source, the value of that count is the answer. That is O(V+E).
The number of distinct paths from node u to v is the sum of distinct paths from nodes x to v, where x is a direct descendant of u.
Store the number of paths to target node v for each node (temporary set to 0), go from v (here the value is 1) using opposite orientation and recompute this value for each node (sum the value of all descendants) until you reach u.
If you process the nodes in topological order (again opposite orientation) you are guaranteed that all direct descendants are already computed when you visit given node.
Hope it helps.
This question has been asked elsewhere on SO, but nowhere has the simpler solution of using DFS + DP been mentioned; all solutions seem to use topological sorting. The simpler solution goes like this (paths from s to t):
Add a field to the vertex representation to hold an integer count. Initially, set vertex t’s count to 1 and other vertices’ count to 0. Start running DFS with s as the start vertex. When t is discovered, it should be immediately marked as finished (BLACK), without further processing starting from it. Subsequently, each time DFS finishes a vertex v, set v’s count to the sum of the counts of all vertices adjacent to v. When DFS finishes vertex s, stop and return the count computed for s. The time complexity of this solution is O(V+E).
Pseudo-code:
simple_path (s, t)
if (s == t)
return 1
else if (path_count != NULL)
return path_count
else
path_count = 0
for each node w ϵ adj[s]
do path_count = path_count + simple_path(w, t)
end
return path_count
end

Resources