There is an alternative method respect to the DFS algorithm to check if there are cycles in a directed graph represented with an adjacency matrix?
I found piecemeal information on the properties of matrices.
Maybe I can multiply matrix A by itself n times, and check if there is non-zero diagonal in each resulting matrix.
While this approach may be right, how can I extract explicitly the list of vertices representing a cycle?
And what about the complexity of this hypothetical algorithm?
Thanks in advance for your help.
Let's say after n iterations, you have a matrix where the cell at row i and column j is M[n][i][j]
By definition M[n][i][j] = sum over k (M[n - 1][i][k] * A[k][j]). Let's say M[13][5][5] > 0, meaning it has a cycle of length 13 starting at 5 and ending at 5. To have M[13][5][5] > 0, there must be some k such that M[12][5][k] * A[k][5] > 0. Let's say k = 6, now you know one more node in the cycle (6). It also follows that M[12][5][6] > 0 and A[6][5] > 0
To have M[12][5][6] > 0, there must be some k such that M[11][5][k] * A[k][6] > 0. Let's say k = 9, now, you know one more node in the cycle (9). It also follows that M[11][5][9] > 0 and A[9][6] > 0
Then, you can do the same repetitively to find other nodes in the cycle.
Depth-first search can be modified to decide the existence of a cycle. The first time that the algorithm discovers a node which has previously been visited, the cycle can be extracted from the stack, as the previously found node must still be on the stack; it would make sense to use a user-defined stack instead of the call stack. The complexity would be O(|V|+|E|), as for unmodified depth-first search itself.
Related
I was torn between these two methods:
M1:
Use adjacency list to represent graph G with vertices P and edges A
Use DFS on G storing all the distances from p in an array d;
Loop through d checking all entries. If some d[u] >6, return false otherwise true
M2:
Use adjacency list to represent graph G with vertices P and edges A
Use BFS on G storing all the distances from p in an array d;
Loop through d checking all entries. If some d[u] >6, return false otherwise true
Both these methods will produce a worst case O(|P| + |A|), therefore I think that both would be a correct answer to this question. I had chosen the DFS method, with the reasoning that with DFS you should be able to find the "outlier" of freedom degree 7 earlier than with BFS, since with BFS you would have to traverse every single Vertex until degree 7 in every case.
Apparently this is wrong according to the teacher, as using DFS, you can't compute the distances. I don't understand why you wouldn't be able to compute the distances. I could have a number n indicating the degree of freedom I am currently at. Starting from root p, the child would have n = 1. Now I store n in array d. Then I keep traversing down until no child is to be found, while incrementing n and storing the value in my array d. Then, if the back-tracking starts, the value n will be decremented until we find an unvisited child node of any of the visited nodes on the stack. If there is an unvisited child, increment once again, then increment until no more child is found, decrement until the next unvisited child from the stack is found...
I believe that would be a way to store the distances with DFS
Both BFS and DFS can do the job: they can both limit their search to a depth of 6, and at the end of the traversal they can check whether the whole population was reached or not. But there are some important differences:
With BFS
The BFS traversal is the algorithm I would opt for. When a BFS search determines the degree of a person, it is definitive: no correction needs to be made to it.
Here is sketch of how you can do this with BFS:
visited = set() # empty set
frontier = [] # empty array
visited.add(p) # search starts at person p
frontier.append(p)
for degree in [1, 2, 3, 4, 5, 6]:
nextFrontier = [] # empty array
for person in frontier:
for acquaintance in A[person]:
if acquaintance not in visited:
visited.add(acquaintance)
nextFrontier.append(acquaintance)
frontier = nextFrontier
if size(visited) == size(P): # have we reached the whole population?
return True
# After six rounds we did not reach all people, so...
return False
This assumes that you can find the list of acquaintances for a given person via A[person]. If A is not structured like an adjacency list but as a list of pairs, then first do some preprocessing on the original A to create such an adjacency list.
With DFS
A DFS algorithm has as downside that it will not necessarily start with optimal paths, and so it will find that some persons have degree 6, while there really are shorter, uninvestigated paths that could improve on that degree. This means that a DFS algorithm may need to revisit nodes and even partial paths (edges) to register such improvements and cascade them through a visited path up to degree 6. And there might even be several improvements to be applied for the same person.
A DFS algorithm could look like this:
degreeOfPerson = dict() # empty key/value dictionary
for person in P:
degreeOfPerson[person] = 7 # some value greater than 6
function dfs(person, degree):
if degree >= 7:
return # don't lose time for higher degrees than 6.
for acquaintance in A[person]:
if degree < degreeOfPerson[acquaintance]: # improvement?
degreeOfPerson[acquaintance] = degree
dfs(acquaintance, degree+1)
# start DFS
degreeOfPerson[p] = 0
dfs(p, 1)
# Check if all persons got a degree of maximum 6
for person in P:
if degreeOfPerson[person] > 6:
return False
return True
Example
If the graph has three nodes, linked as a triangle a-b-c, with starting point a, then this would be the sequence. Indentation means (recursive) call of dfs:
degreeOfPerson[a] = 0
a->b: degreeOfPerson[b] = 1
b->c: degreeOfPerson[c] = 2
c->a: # cannot improve degreeOfPerson[a]. Backtrack
c->b: # cannot improve degreeOfPerson[b]. Backtrack
b->a: # cannot improve degreeOfPerson[a]. Backtrack
a->c: degreeOfPerson[c] = 1 # improvement!
c->a: # cannot improve degreeOfPerson[a]. Backtrack
c->b: # cannot improve degreeOfPerson[b]. Backtrack
Time Complexity
The number of times the same edge can be visited with DFS is not more than the maximum degree we are looking for -- in your case 6. If that is a constant, then it does not affect the time complexity. If however the degree to check for is an input value, then the time complexity of DFS becomes O(maxdegree * |E| + |V|).
A simple depth-first search algorithm does not necessary yield the shortest path in an undirected graph. For example, consider a simple triangle graph. If you start at one vertex, you will process the other two vertices. A naive algorithm will find that there is one vertex whose distance equals one away from the source, and a second vertex whose distance equals two away from the source. However, this is incorrect since the distance from the source to either vertex is actually one.
A much more natural approach is to use the breadth-first search (BFS) algorithm. It can be shown that a breadth-first search computes shortest paths, and it requires significantly fewer modifications.
You definitely can use depth-first search to compute the distances from one node to another, but it is not a natural approach. In fact, it is very common to miscompute distances using a depth-first search algorithm (see: http://www-student.cse.buffalo.edu/~atri/cse331/support/dfs-bfs/index.html), particularly when the underlying graph has cycles. There are some special cases you must handle if you want to do it this way, but it definitely is possible.
With that being said, the depth-first search algorithm you describe does not appear to be correct. For example, it will fail on the triangle graph that I described above. This is true because the standard depth-first search only visits each vertex once, and you would not revisit a vertex after its distance has been set. Thus, if you take the "longer path" to a vertex in a cycle at first, you will end up with an incorrect distance value.
Given an undirected graph (not weighted) and two vertices u and v,
how can I find a path between u and v whose length is divisible by 3?
Note that the path need not necessarily be a simple path.
I thought about a variation of DFS and a stack that stores the path (and for backtracking), but cannot fully understand how to keep track of non-simple paths as well.
The time complexity should be O(V+E), so I expect it must be a variant of BFS or DFS.
One way to do this is to compute a modified version of the graph and do a BFS or DFS on that graph.
Imagine stacking the graph on top of itself three times. Each node now appears three times. Annotate the first copy as "mod 0," the second copy as "mod 1," and the third copy as "mod 2." Then, change the edges so that any edge from a node u to a node v always goes from the node u to the node v in the next layer of the graph. Thus if there was an edge from u to v, there is now an edge from u mod 0 to v mod 1, u mod 1 to v mod 2, and u mod 2 to v mod 0. If you do a BFS or DFS over this graph and find a path from u mod 0 to any node v mod 0, you necessarily have a path whose length must be a multiple of three.
You can explicitly construct this graph in time O(m + n) by copying the graph two times and rewiring the edges appropriately, and from there a BFS or DFS will take time O(m + n). This uses memory Θ(m + n), though.
An alternative solution would be to simulate doing this without actually building the new graph. Do a BFS, and store for each node three distances - a mod 0 distance, a mod 1 distance, and a mod 2 distance. Whenever you dequeue a node from the queue, enqueue its successors, but tag them as being at the next mod layer (e.g. if you dequeued a node at level mod 0, enqueue its successors at mod 1, etc.) You can independently track whether you have reached a node at distances mod 0, mod 1, and mod 2, and should not enqueue a node at a given mod level multiple times. This also takes time O(m + n), but doesn't explicitly construct the second graph and thus only requires O(n) storage space.
Hope this helps!
A cheating way:
Just DFS/BFS from A to B; if the length L % 3 == 0, finish; L % 3 == 1, walk to a neighbour and back; else, walk to a neighbour and back and there and back again.
If that does not meet your constraints, then:
You could do this using a modified BFS. While doing the search, try to mark all the cycles you encounter; you can get all the simple cycles along with the lengths.
After that, if the path from A to B has length L % 3 == 0, then you found it.
If not, then in the case where all the cycle has lengths Lk % 3 == 0, there's no solution;
If there are some cycles with length K % 3 != 0, you could first go from A to this cycle, cycle one or two times then go back to A then to B. You are guaranteed to find a length 3 path this way.
Here i a pseudo code. It uses BFS and, simple counter and modulo test.
i:= 0
Enqueue the node u
i++, dequeue a node and examine it
If the element v is found in this node and i mod 3 == 0 , quit the search and return a result.
Otherwise enqueue any successors (the direct child nodes) that have not yet been discovered.
If the queue is empty, every node on the graph has been examined – quit the search and return "not found".
If the queue is not empty, repeat from Step 2.
NOTE: replacing queue with a stack should work as well. It will be DFS then.
I run into a dynamic programming problem on interviewstreet named "Far Vertices".
The problem is like:
You are given a tree that has N vertices and N-1 edges. Your task is
to mark as small number of verices as possible so that the maximum
distance between two unmarked vertices be less than or equal to K. You
should write this value to the output. Distance between two vertices i
and j is defined as the minimum number of edges you have to pass in
order to reach vertex i from vertex j.
I was trying to do dfs from every node of the tree, in order to find the max connected subset of the nodes, so that every pair of subset did not have distance more than K.
But I could not define the state, and transitions between states.
Is there anybody that could help me?
Thanks.
The problem consists essentially of finding the largest subtree of diameter <= k, and subtracting its size from n. You can solve it using DP.
Some useful observations:
The diameter of a tree rooted at node v (T(v)) is:
1 if n has no children,
max(diameter T(c), height T(c) + 1) if there is one child c,
max(max(diameter T(c)) for all children c of v, max(height T(c1) + height T(c2) + 2) for all children c1, c2 of v, c1 != c2)
Since we care about maximizing tree size and bounding tree diameter, we can flip the above around to suggest limits on each subtree:
For any tree rooted at v, the subtree of interest is at most k deep.
If n is a node in T(v) and has no children <= k away from v, its maximum size is 1.
If n has one child c, the maximum size of T(n) of diameter <= k is max size T(c) + 1.
Now for the tricky bit. If n has more than one child, we have to find all the possible tree sizes resulting from allocating the available depth to each child. So say we are at depth 3, k = 7, we have 4 depth left to play with. If we have three children, we could allocate all 4 to child 1, 3 to child 1 and 1 to child 2, 2 to child 1 and 1 to children 2 and 3, etc. We have to do this carefully, making sure we don't exceed diameter k. You can do this with a local DP.
What we want for each node is to calculate maxSize(d), which gives the max size of the tree rooted at that node that is up to d deep that has diameter <= k. Nodes with 0 and 1 children are easy to figure this for, as above (for example, for one child, v.maxSize(i) = c.maxSize(i - 1) + 1, v.maxSize(0) = 1). Nodes with 2 or more children, you compute dp[i][j], which gives the max size of a k-diameter-bound tree using up to the ith child taking up to j depth. The recursion is dp[i][j] = max(child(i).maxSize(m - 1) + dp[i - 1][min(j, k - m)] for m from 1 to j. d[i][0] = 1. This says, try giving the ith child 1 to j depth, and give the rest of the available depth to the previous nodes. The "rest of the available depth" is the minimum of j, the depth we are working with, or k - m, because depth given to child i + depth given to the rest cannot exceed k. Transfer the values of the last row of dp to the maxSize table for this node. If you run the above using a depth-limited DFS, it will compute all the necessary maxSize entries in the correct order, and the answer for node v is v.maxSize(k). Then you do this once for every node in the tree, and the answer is the maximum value found.
Sorry for the muddled nature of the explanation. It was hard for me to think through, and difficult to describe. Working through a few simple examples should make it clearer. I haven't calculated the complexity, but n is small, and it went through all the test cases in .5 to 1s in Scala.
A few basic things I can notice (maybe very obvious to others):
1. There is only one route possible between two given vertices.
2. The farthest vertices would be the one with only one outgoing edge.
Now to solve the issue.
I would start with the set of Vertices that have only one edge and call them EDGE[] calculate the distances between the vertices in EDGE[]. This will give you (EDGE[i],EDGE[j],distance ) value pairs
For all the vertices pairs in EDGE that have a distance of > K, DO EDGE[i].occur++,EDGE[i].distance = MAX(EDGE[i].distance, distance)
EDGE[j].occur++,EDGE[j].distance = MAX(EDGE[j].distance, distance)
Find the CANDIDATES in EDGE[] that have max(distance) from those Mark the with with max (occur)
Repeat till all edge vertices pair have distance less then or equal to K
'Length' of a path is the number of edges in the path.
Given a source and a destination vertex, I want to find the number of paths form the source vertex to the destination vertex of given length k.
We can visit each vertex as many times as we want, so if a path from a to b goes like this: a -> c -> b -> c -> b it is considered valid. This means there can be cycles and we can go through the destination more than once.
Two vertices can be connected by more than one edge. So if vertex a an vertex b are connected by two edges, then the paths , a -> b via edge 1 and a -> b via edge 2 are considered different.
Number of vertices N is <= 70, and K, the length of the path, is <= 10^9.
As the answer can be very large, it is to be reported modulo some number.
Here is what I have thought so far:
We can use breadth-first-search without marking any vertices as visited, at each iteration, we keep track of the number of edges 'n_e' we required for that path and product 'p' of the number of duplicate edges each edge in our path has.
The search search should terminate if the n_e is greater than k, if we ever reach the destination with n_eequal to k, we terminate the search and add p to out count of number of paths.
I think it we could use a depth-first-search instead of breadth first search, as we do not need the shortest path and the size of Q used in breadth first search might not be enough.
The second algorithm i have am thinking about, is something similar to Floyd Warshall's Algorithm using this approach . Only we dont need a shortest path, so i am not sure this is correct.
The problem I have with my first algorithm is that 'K' can be upto 1000000000 and that means my search will run until it has 10^9 edges and n_e the edge count will be incremented by just 1 at each level, which will be very slow, and I am not sure it will ever terminate for large inputs.
So I need a different approach to solve this problem; any help would be greatly appreciated.
So, here's a nifty graph theory trick that I remember for this one.
Make an adjacency matrix A. where A[i][j] is 1 if there is an edge between i and j, and 0 otherwise.
Then, the number of paths of length k between i and j is just the [i][j] entry of A^k.
So, to solve the problem, build A and construct A^k using matrix multiplication (the usual trick for doing exponentiation applies here). Then just look up the necessary entry.
EDIT: Well, you need to do the modular arithmetic inside the matrix multiplication to avoid overflow issues, but that's a much smaller detail.
Actually the [i][j] entry of A^k shows the total different "walk", not "path", in each simple graph. We can easily prove it by "mathematical induction".
However, the major question is to find total different "path" in a given graph.
We there are a quite bit of different algorithm to solve, but the upper bound is as follow:
(n-2)*(n-3)*...(n-k) which "k" is the given parameter stating length of path.
Let me add some more content to above answers (as this is the extended problem I faced). The extended problem is
Find the number of paths of length k in a given undirected tree.
The solution is simple for the given adjacency matrix A of the graph G find out Ak-1 and Ak and then count number of the 1s in the elements above the diagonal (or below).
Let me also add the python code.
import numpy as np
def count_paths(v, n, a):
# v: number of vertices, n: expected path length
paths = 0
b = np.array(a, copy=True)
for i in range(n-2):
b = np.dot(b, a)
c = np.dot(b, a)
x = c - b
for i in range(v):
for j in range(i+1, v):
if x[i][j] == 1:
paths = paths + 1
return paths
print count_paths(5, 2, np.array([
np.array([0, 1, 0, 0, 0]),
np.array([1, 0, 1, 0, 1]),
np.array([0, 1, 0, 1, 0]),
np.array([0, 0, 1, 0, 0]),
np.array([0, 1, 0, 0, 0])
])
Can someone explain to me why is Prim's Algorithm using adjacent matrix result in a time complexity of O(V2)?
(Sorry in advance for the sloppy looking ASCII math, I don't think we can use LaTEX to typeset answers)
The traditional way to implement Prim's algorithm with O(V^2) complexity is to have an array in addition to the adjacency matrix, lets call it distance which has the minimum distance of that vertex to the node.
This way, we only ever check distance to find the next target, and since we do this V times and there are V members of distance, our complexity is O(V^2).
This on it's own wouldn't be enough as the original values in distance would quickly become out of date. To update this array, all we do is at the end of each step, iterate through our adjacency matrix and update the distance appropriately. This doesn't affect our time complexity since it merely means that each step takes O(V+V) = O(2V) = O(V). Therefore our algorithm is O(V^2).
Without using distance we have to iterate through all E edges every single time, which at worst contains V^2 edges, meaning our time complexity would be O(V^3).
Proof:
To prove that without the distance array it is impossible to compute the MST in O(V^2) time, consider that then on each iteration with a tree of size n, there are V-n vertices to potentially be added.
To calculate which one to choose we must check each of these to find their minimum distance from the tree and then compare that to each other and find the minimum there.
In the worst case scenario, each of the nodes contains a connection to each node in the tree, resulting in n * (V-n) edges and a complexity of O(n(V-n)).
Since our total would be the sum of each of these steps as n goes from 1 to V, our final time complexity is:
(sum O(n(V-n)) as n = 1 to V) = O(1/6(V-1) V (V+1)) = O(V^3)
QED
Note: This answer just borrows jozefg's answer and tries to explain it more fully since I had to think a bit before I understood it.
Background
An Adjacency Matrix representation of a graph constructs a V x V matrix (where V is the number of vertices). The value of cell (a, b) is the weight of the edge linking vertices a and b, or zero if there is no edge.
Adjacency Matrix
A B C D E
--------------
A 0 1 0 3 2
B 1 0 0 0 2
C 0 0 0 4 3
D 3 0 4 0 1
E 2 2 3 1 0
Prim's Algorithm is an algorithm that takes a graph and a starting node, and finds a minimum spanning tree on the graph - that is, it finds a subset of the edges so that the result is a tree that contains all the nodes and the combined edge weights are minimized. It may be summarized as follows:
Place the starting node in the tree.
Repeat until all nodes are in the tree:
Find all edges that join nodes in the tree to nodes not in the tree.
Of those edges, choose one with the minimum weight.
Add that edge and the connected node to the tree.
Analysis
We can now start to analyse the algorithm like so:
At every iteration of the loop, we add one node to the tree. Since there are V nodes, it follows that there are O(V) iterations of this loop.
Within each iteration of the loop, we need to find and test edges in the tree. If there are E edges, the naive searching implementation uses O(E) to find the edge with minimum weight.
So in combination, we should expect the complexity to be O(VE), which may be O(V^3) in the worst case.
However, jozefg gave a good answer to show how to achieve O(V^2) complexity.
Distance to Tree
| A B C D E
|----------------
Iteration 0 | 0 1* # 3 2
1 | 0 0 # 3 2*
2 | 0 0 4 1* 0
3 | 0 0 3* 0 0
4 | 0 0 0 0 0
NB. # = infinity (not connected to tree)
* = minimum weight edge in this iteration
Here the distance vector represents the smallest weighted edge joining each node to the tree, and is used as follows:
Initialize with the edge weights to the starting node A with complexity O(V).
To find the next node to add, simply find the minimum element of distance (and remove it from the list). This is O(V).
After adding a new node, there are O(V) new edges connecting the tree to the remaining nodes; for each of these determine if the new edge has less weight than the existing distance. If so, update the distance vector. Again, O(V).
Using these three steps reduces the searching time from O(E) to O(V), and adds an extra O(V) step to update the distance vector at each iteration. Since each iteration is now O(V), the overall complexity is O(V^2).
First of all, it's obviously at least O(V^2), because that is how big the adjacency matrix is.
Looking at http://en.wikipedia.org/wiki/Prim%27s_algorithm, you need to execute the step "Repeat until Vnew = V" V times.
Inside that step, you need to work out the shortest link between any vertex in V and any vertex outside V. Maintain an array of size V, holding for each vertex either infinity (if the vertex is in V) or the length of the shortest link between any vertex in V and that vertex, and its length (so in the beginning this just comes from the length of links between the starting vertex and every other vertex). To find the next vertex to add to V, just search this array, at cost V. Once you have a new vertex, look at all the links from that vertex to every other vertex and see if any of them give shorter links from V to that vertex. If they do, update the array. This also cost V.
So you have V steps (V vertexes to add) each taking cost V, which gives you O(V^2)