I have a "tree"-like structure of nodes and I'm trying to figure out an algorithm that will find previous "chokepoint" when end node is given. Here is a picture to better demonstrate:
So when 15 is specified as end node I want to find 7
And if 7 is specified as end node I want to find 1
But in the example above if anything else than 7,15 or 16 is specified as end node the found node is the previous one since that is the only node connecting to the end node.
So the node I am searching for is the previous node that all paths must go through to get to the end node.
I tried an algorithm where I start from the end node and go backwards (using Breadth-first) and every node I find that has 2 or more outputs I add to a new list and nodes with one output I skip. For example in case with 15 as the end node, I end up adding 10 and 7 to list of potential nodes, but I'm not sure how to from there. Since I should not continue traversing from 7.
Is there potentially an algorithm out there that already does that and if not how could I achieve this?
I believe your "choke points" are what is commonly known as "dominators". In a directed graph, one node X dominates another Y if all paths to Y must go through X. In your graph, 1 and 7 dominate all greater nodes.
See: https://en.wikipedia.org/wiki/Dominator_(graph_theory)
The dominators of a directed graph form a tree. The wikipedia article gives a simple algorithm for finding them all in quadratic time.
You can do it in linear time, but it's tricky. The classic algorithm is from Lengauer and Tarjan. You can find it here: https://www.cs.princeton.edu/courses/archive/fall03/cs528/handouts/a%20fast%20algorithm%20for%20finding.pdf
A topological sort is an ordering of the graph such that each arrow agrees with the order. For example in your example we might come up with the order:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 15
Do a topological sort using any of the O(n) algorithms for it.
Next, we walk the graph and track how many incoming edges each node has.
Finally we walk the graph in our sorted order and track how many edges we have seen one end of but not the other, and how many nodes have no incoming edges. Any time we come to a node where all outgoing edges have not ended and every future node has incoming edges, that is a chokepoint.
After that, we can prepare two maps. The first is from each node to its topological order. The second is a balanced binary tree of where the chokepoints are.
The analysis in advance is O(n). The actual lookups are now O(log(n)).
Related
Typically, in Dijkstra's algorithm, for each encountered node, we check whether that node was processed before attempting to update the distances of its neighbors and adding them to the queue. This method is under the assumption that if a distance to a node is set once then the distance to that node cannot improve for the rest of the algorithm, and so if the node was processed once already, then the distances to its neighbors cannot improve. However, this is not true for graphs with negative edges.
If there are no negatives cycles then if we remove that "processed" check, then will the algorithm always work for graphs with negative edges?
Edit: an example of a graph where the algorithm would fail would be nice
Edit 2: Java code https://pastebin.com/LSnfzBW4
Example usage:
3 3 1 <-- 3 nodes, 3 edges, starting point at node 1
1 2 5 <-- edge of node 1 and node 2 with a weight of 5 (unidirectional)
2 3 -20 <-- more edges
1 3 2
The algorithm will produce the correct answer, but since nodes can now be visited multiple times the time complexity will be exponential.
Here's an example demonstrating the exponential complexity:
w(1, 3) = 4
w(1, 2) = 100
w(2, 3) = -100
w(3, 5) = 2
w(3, 4) = 50
w(4, 5) = -50
w(5, 7) = 1
w(5, 6) = 25
w(6, 7) = -25
If the algorithm is trying to find the shortest path from node 1 to node 7, it will first reach node 3 via the edge with weight 4 and then explore the rest of the graph. Then, it will find a shorter path to node 3 by going to node 2 first, and then it will explore the rest of the graph again.
Every time the algorithm reaches one of the odd indexed nodes, it will first go to the next odd indexed node via the direct edge and explore the rest of the graph. Then it will find a shorter path to the next odd indexed node via the even indexed node and explore the rest of the graph again. This means that every time one of the odd indexed nodes is reached, the rest of the graph will be explored twice, leading to a complexity of at least O(2^(|V|/2)).
If I understand your question correctly, I don't think its possible. Without the processed check the algorithm would fall into infinite loop. For example, for a bidirected graph having two nodes i.e. a and b with one edge from "a" to "b" or "b" to "a", it will first insert node "a" inside the priority queue, then as there have an edge between "a" to "b", it will insert node "b" and pop node "a". And then as node "a" is not marked processed for node "b" it will again insert node "a" inside the priority queue and so on. Which leads to an infinite loop.
For finding shortest path in the graphs with negative edges Bellmen-ford algorithm would be the right way.
If negative edges release from start node, dijkstra's algorithm works. But in the other situation Usually it dosen't works for negative edges.
Given an undirected graph, I want an algorithm (inO(|V|+|E|)) that will find me the heaviest edge in the graph that forms a cycle. For example, if my graph is as below, and I'll run DFS(A), then the heaviest edge in the graph will be BC.
(*) In this problem, I have at most 1 cycle.
I'm trying to write a modified DFS, that will return the desired heavy edge, but I'm having some trouble.
Because I have at most 1 cycle, I can save the edges in the cycle in an array, and find the maximum edge easily at the end of the run, but I think this answer seems a bit messy, and I'm sure there's a better recursive answer.
I think the easiest way to solve this is to use a union-find data structure (https://en.wikipedia.org/wiki/Disjoint-set_data_structure) in a manner similar to Kruskal's MST algorithm:
Put each vertex in its own set
Iterate through the edges in order of weight. For each edge, merge the sets of the adjacent vertices if they're not already in the same set.
Remember the last edge for which you found that its adjacent vertices were already in the same set. That's the one you're looking for.
This works because the last and heaviest edge that you visit in any cycle must already have its adjacent vertices connected by edges you visited earlier.
Use Tarjan's Strongly Connected Components algorithm.
Once you have split your graph into many strongly connected graphs assign a COMP_ID to each node which specifies the component ID to which this node belongs (This can be done with a small edit on the algorithm. Define a global integer value which starts at 1. Every time you pop nodes from the stack they all correspond to the same component, save the value of this variable to the COMP_ID of these nodes. When the pop loop ends increment the value of this integer by one).
Now, iterate over all the edges. You have 2 possibilities:
If this edge links two nodes from two different components, then this edge can't be the answer, since it can't possibly be a part of a cycle.
If this edge links two nodes from the same component, then this edge is a part of some cycle. All you have left to do now is to choose the maximum edge among all the edges of type 2.
The described approach runs in a total complexity of O(|V| + |E|) because every node and edge corresponds to at most one strongly connected component.
In the graph example you provided COMP_ID will be as follows:
COMP_ID[A] = 1
COMP_ID[B] = 2
COMP_ID[C] = 2
COMP_ID[D] = 2
Edge 10 connects COMP_ID 1 with COMP_ID 2, thus it can't be the answer. The answer is the maximum among edges {2, 5, 8} since they all connect COMP_ID 1 with it self, thus the answer is 8
Consider a Graph connecting various cities through railways. Every Node is a city which has various railway lines (edges) to reach the other city. You need to find if a meeting point exists i.e. one such route (i.e. sequence of lines) which when taken always arrives to the same city no matter from which city you start.
Eg.
Consider Graph G = [[2, 1], [2, 0], [3, 1], [1, 0]]. The k^th element of Graph (counting from 0) will give the list of stations directly reachable from station k.
The outgoing lines are numbered 0, 1, 2... The r^th element of the list for station k, gives the number of the station directly reachable by taking line r from station k.
Then one could take the path [1, 0]. That is, from the starting station, take the second direction, then the first. If the first direction was the red line, and the second was the green line, you could phrase this as:
if you are lost, take the green line for 1 stop, then the red line for 1 stop.
So, consider following the directions starting at each
0 -> 1 -> 2.
1 -> 0 -> 2.
2 -> 1 -> 2.
3 -> 0 -> 2.
So, no matter the starting station, the path leads to station 2.
The limits for lines is from 0 to 5 and the limits for station is 2 to 50. So in the worst case there might be 2^(49*5) subsets of routes so brute force is out of the question.
Edit1 :
After mcdowella mentioned this problem being called Synchronising sequences in DFAs
Ans also, I am interested only if a meeting path exists or not I found out this pdf (slide 5) states that
Adler and Weiss, 1970 (Conjecture)
Every finite strongly connected aperiodic directed graph of uniform outdegree has
a synchronizing coloring.
Alternatively,
Every strongly connected graph with 'x' cycles all having gcd 1 (which states
aperiodicity) has a meeting path.
Which works for most cases. However, it's not hard to come up with something like this :
Which is neither strongly connected, so aperiodicity becomes out of question. And still has a meeting path [0 -> 1]. So what am I missing here?
You don't say what to do if the path says to go out along x and there is no outgoing link labelled x, so I am going to suppose that all nodes have a full set of outgoing links, or we treat such missing links as links back to the current node, or copies of the link labelled 0, or something.
I start with a set of possible nodes that we may be on, initialised to the set of all nodes.
For each label, take the set of possible nodes and compute the set of nodes that you get by going from any node in the current set, following the current label, to another node. If, for each possible label, the result is always the same set as the current set of possible nodes, give up. This means that each label maps each node in the current set to a different node, and, given any node in the current set and any path, of whatever length, you can find a unique node in the current set with a path that ends in the chosen node, so the situation looks hopeless to me.
If, for some label, the set of nodes after applying this label to the current set is smaller than the current set, note down that label, make the new smaller set the current set, and repeat.
If this process terminates in a set of size one you have worked out a path that ends with that node a meeting point, and the path is no longer than the number of nodes in the original graph, since each step reduces the number of nodes in the graph by at least one. Each step costs you at most the number of edges in the graph, so for a graph with N nodes and K labels per node, the cost is at most KN^2.
In fact, since the check at each stage amounts to looking for at least one node in the current set with two incoming edges with the same label on it, and then removing all nodes in the current set which don't have an incoming edge with the chosen label, I would hope that you can make the cost at each step linear in the number of nodes discarded, and argue that the total cost is something below O(KN^2)
(I'm pretty sure that I have seen this worked out properly somewhere as an exercise in robot navigation or something so a web search might be more reliable than reading this, but I've had fun writing it, and it looks plausible to me).
Edit -
It appears that the problem is referred to as a search for synchronizing sequences for finite automata. math.uni.wroc.pl/~kisiel/auto/Homing.pdf looks very promising but I haven't gone through it in detail.
TL;DR: There is an O(n^2) algorithm which determines if a meeting path exists, which is described here:
Consider P(G), the power graph of the original graph G. The power graph is created by taking all subsets of the set of nodes of the original graph G, and making each of those subsets into nodes themselves. The edges connect nodes as follows:
(using G = [[2, 1], [2, 0], [3, 1], [1, 0]]), and looking at the edge 1 (or line 1, as your problem states)
{0, 1, 2} -> {1, 0, 1} = {0, 1}, since, when taking line 1, 0 -> 1, 2 -> 1, 1 -> 0.
{0, 3} -> {1, 0}, since, when taking line 1, 0 -> 1 and 3 -> 0.
etc.
Now, if there are n stations, then, if there exists a path from the node {0, 1, ..., n-1} in P(G) to a singleton (set of one element) node in P(G), there is a meeting point. Because, if you take that path as the series of lines, starting at any of the stations will end in the same station. Now, creating the powerset is of course very expensive (O(2^n)), but making an important remark causes the amount of computation to be O(n^2).
This remark is very similar to that of (Černý, 1964) on the word synchronization problem of DFA's. A proof sketch of this remark is at the end this answer. The remark is that, when looking at the power graph of G:
Every node representing a subset of size 2 has a path to a singleton node if and only if there exists a meeting path
That is, when we create P(G), we only need to create nodes that represent subsets of size 2 or less. This means that P(G) will have only n^2 nodes.
So, essentially the algorithm is:
Create P_2(G), the power graph of G where each node represents a subset of size 2 or less.
For each node representing a subset of size 2 in P_2(G):
If there is no path from this node to a singleton node, return False
return True (will only happen if, for every node representing a subset of size 2, there is a path to a singleton node).
Part 2 of the algorithm can be done with DFS: you can reverse all of the edges of P_2(G) and begin the DFS stack with all of the singleton nodes. Then, if the DFS tree contains all of the nodes representing subsets of size 2, then all of the nodes representing subsets of size 2 have a path to a singleton node.
Part 1 is O(n^2) and Part 2 can be done in O(n^2) by reversing the edges of the graph and performing DFS as described above.
I hope this has been mostly clear.
Proof of the remark:
We first treat the direction where, a meeting path exists implies that, for every node representing a subset of size 2, there is a path to a singleton node. Just take the meeting path. Since it's a meeting path, starting at any node in G ends up in the same node. So, taking the meeting path will reach a singleton node.
Now, if for every node representing a subset of size 2, there is a path to a singleton node: then, one can construct a meeting path. Take two different nodes representing subsets of size 2, name them A and B. Then, take A's path to a singleton. Taking this path, A -> {i} and B -> C, For some C and {i}. Then, take C's path to a singleton. {i} -> {j} and C -> {k}. Then, there is a meeting path for the node representing {j, k}. So, we can find a meeting path for the union of A and B. Thus, we can do this for any pair of nodes. Inductively, you can find a meeting path for the union of any set of nodes representing subsets of size 2, so this can be done until the entire set of nodes is the starting point and a singleton node is the end.
As I already wrote in a comment on mcdowella's comment, I suspect this problem to be NP-complete. But a reasonable heuristic may allow you to get to the target in a reasonable time.
The following idea is based on A*. You represent the current set of stations as nodes (i.e. every set corresponds to one node). Since you have at most 50 stations, every step can be represented as a 64-bit number (where each number describes if the according station is in the set).
You want to maintain a list O of open nodes and a list V visited nodes. Start with C and O containing a single node that represents all stations.
The algorithm then has the following structure:
Choose the node in O with the least number of stations -> n.
Remove n from O.
For every line
transform n to the set of stations nt that result from travelling from the stations in n with the specified line (and count the number of stations to speed things up).
If nt has a single station, you have found a meeting point (there may be more than one)
If nt has not been visited before, add it to O and V and set its path to the current path.
If nt has already been visited, you have two choices. Either update the path of nt to achieve a path of minimal length or just ignore it.
Go to 1
The size of this graph is exponential in the length of the path. Therefore, this algorithm has exponential worst-case time and space complexity. However, since you always pick the node with the fewest stations, you take a step that is assumed to take you to the target on the fastest route. This may be a wrong step, which is why we need to keep the remaining graph. You will also calculate the graph on the fly, which avoids keeping the entire graph in memory (unless there is no solution).
I've got an undirected graph that I need to traverse using depth first search.
The excel chart below shows each node has been marked after traversal in the marked column, and the edgeTo column shows which node brought us to that node. For example, we got to node 1 from node 5, we got to node 2 from node 7, etc.
My question is for node 6 and 8, since they are separated from the main graph, how do I properly traverse it? My guess is that I start at 6 and go to 8, but since 6 will already have been visited at that point, I do not go back to 6 from 8. Hence row 6 is left blank in the edgeTo column.
Am I correct? Is my chart correct?
Depth first search is basically used to find a path between two nodes in a graph. The graph of your example is disconnected, i.e. there exist two nodes in your graph such that no path in your graph has those nodes as endpoints.
6 and 8 are obviously nodes that belong to a different subgraph and therefore you can't find a path between 0 and 8 and the DFS will return IMPOSSIBLE or No path found. Apart from that your chart is correct.
I'm trying to generate an undirected graph in which each node has a maximum degree associated with it. That is, if a node has a maximum degree of 2, it can connect to at most two nodes (connected node would be allowed, but not 0). My problem is that I'm trying to generate a graph in which its possible to get from one node to the other. Currently, I can have nodes "randomly" connect to one other, but the problem is that its possible to create divided graphs, ie if you have 10 nodes, then sometimes inadvertently two graphs of 5 nodes each forms. If anyone knows of an efficient solution, I'd love to hear it!
EDIT: Suppose that I have a graph with ten nodes, and I specify a maximum degree of 2. In this case, here is something that would be desirable:
Whereas this is what I'm trying to avoid:
Both graphs have a maximum degree of 2 per node, but in the second image, it's not possible to select an arbitrary node and be able to get to any other arbitrary node.
This problem is a pretty well-known problem in graph theory, soluble in polynomial time, the name of which I forget (which is probably "find a graph given its degree sequence"). Anyhow, Király's solution is a nice way to do it, explained much better here than by me. This algorithm solves for the exact graphs that satisfy the given degree sequence, but it should be easy to modify for your more loose constraints.
The obvious solution would be to build it as an N-way tree -- if the maximum degree is two, you end up with a binary tree.
To make it undirected, you'll have pointers not only to the "child" nodes, but also a backward pointer to the "parent" node. At least presumably, that one doesn't count toward the degree of the node (if it does, your degree of two basically ends up as a doubly-linked linear list instead of a tree).
Edit: post-clarification, it appears that the latter really is the case. Although they're drawn different (with links going in various different directions) your first picture showing the desired result is topologically just a linear linked list. As noted above, since you want an undirected graph, it ends up as a doubly linked list.
It sounds like you already know what the graph should look like, so I believe if you can use a depth-first search approach. Although breath-first search can be used to avoid recursion.
For example, if you have the nodes 1-5, and k=2, then you can build a graph by starting at node 1, and then simply randomly choosing an unvisited node. Like so:
1 [Start at 1]
1-2 [expand 2, add edge(1,2) to graph]
1-2-3 [expand 3, add edge(2,3) to graph]
1-2-3-4 [expand 4, add edge(3,4) to graph]
1-2-3-4-5 [expand 5, add edge(4,5) to graph]
1-2-3-4-5-1 [expand 1, add edge(5,1) to graph] (this step may or may not be done)
If an edge is never used twice, then p paths will lead to degree p*2 overall, with the degree of the start and end nodes dependent on if the paths are really a tour. To avoid duplicate work, it is probably easier to just label of the vertices as the integers 1 through N, then create edges such that each vertex, v, connects to the vertex numbered (v+j) mod (N+1) where j and (N+1) are co-prime < N-1. The last bit making things a bit problematic, as the number of co-primes from 1 to N can be limited if N is not prime. This means solutions don't exist for certain values, at least in the form of a new Hamiltonian path/tour. However, if you ignore the co-prime aspect and simply make j be integers from 1 thru p, then go through each vertex and create the edges (instead of using the path approach), you can make all the vertices have degree k, where k is an even number >= 2. This is achievable in O(N*k), although it may be pushed back as far as O(N^2) if co-prime method is used.
Thus the path for k=4 would look like this, if started at 1, with j=2:
1 [Start at 1]
1-3 [expand 3, add edge(1,3) to graph]
1-3-5 [expand 5, add edge(3,5) to graph]
1-3-5-2 [expand 2, add edge(5,2) to graph]
1-3-5-2-4 [expand 4, add edge(2,4) to graph]
1-3-5-2-4-1 [expand 1, add edge(4,1) to graph] (this step may or may not be done)
Since |V| = 5 and k = 4, the resulting edges form a complete graph, which is expected. It's also works out since 2 and 5 are co-prime.
Obtaining an odd degree is a bit more difficult. First obtain the degree k-1, then edges are added in such a way an odd degree is obtained overall. It seems fairly easy to get very close (with one or two exceptions) to all edges being an odd degree, but it seems impossible or at least very difficult with odd number of vertices, and requires a careful selection of edges with even number of vertices. The section of which, isn't easy to put into an algorithm. However, it can be approximated by simply picking two unused vertices and creating an edge between them such that the vertices are not used twice, and the edges are not used twice.