procedure explore(G; v)
Input: G = (V;E) is a graph; v 2 V
Output: visited(u) is set to true for all nodes u reachable from v
visited(v) = true
previsit(v)
for each edge (v; u) 2 E:
if not visited(u): explore(u)
postvisit(v)
All this pseudocode does is find one path right? It does nothing while backtracking if I'm not wrong?
It just explores the graph (it doesn't return a path) - everything that's reachable from the starting vertex will be explored and have the corresponding value in visited set (not just the vertices corresponding to one of the paths).
It moves on to the next edge while backtracking ... and it does postvisit.
So if we're at a, which has edges to b, c and d, we'll start by going to b, then, when we eventually return to a, we'll then go to c (if it hasn't been visited already), and then we will similarly go to d after return to a for the 2nd time.
It's called depth-first search, in case you were wondering. Wikipedia also gives an example of the order in which vertices will get explored in a tree: (the numbers correspond to the visit order, we start at 1)
In the above, you're not just exploring the vertices going down the left (1-4), but after 4 you go back to 3 to visit 5, then back to 2 to visit 6, and so on, until all 12 are visited.
With regard to previsit and postvisit - previsit will happen when we first get to a vertex, postvisit will happen after we've explored all of it's children (and their descendants in the corresponding DFS tree). So, in the above example, for 1, previsit will happen right at the start, but post-visit will happen only at the very end, because all the vertices are children of 1 or descendants of those children. The order will go something like:
pre 1, pre 2, pre 3, pre 4, post 4, pre 5, post 5, post 3, pre 6, post 6, post 2, ...
Related
I'm trying to implement an algorithm to find what I call 'guaranteed ancestors' in a directed graph. I have a list of nodes which each can point to zero, one or multiple child nodes.
Below you see an example of a simple graph. I've marked all circles with a unique number.
Let's imagine we're trying to determine which nodes I'm guaranteed to have visited before reaching node 13 starting at node 0.
My thoughts when solving this simple example by hand is starting in node 13 and working my way back, which nodes am I guaranteed to visit no matter which direction I go. The first node I notice obeying this property is node 10, since no matter if I choose to visit node 11 or node 12, then I'm guaranteed to eventually reach node 13. Similarly I can conclude I have to visit node 9 if I want to reach node 13. Working all the way up the graph I conclude that node 13 has node 0, 1, 9, 10 as it's guaranteed anchestors.
I'm not sure what such an algorithm is called, but I'm sure there is a name for this specific search.
Here is the constraints you can assume about my graph.
There is a single defined "head/root" node, which is the only node without any other nodes pointing to it.
The graph is acyclic (Ideally the algorithm would be able to handle cycles too, but I have a different check, verifying that the graph is acyclic, so this is not a must.)
There is no "dead" nodes, eg. nodes which can't be reached from the head/root node.
This has to run on more complicated graphs with up to 500 nodes and many nodes with multiple "parents", which could be connected back and forth. Runtime is a priority as well - I assume we should be able to solve this problem in linear time complexity.
I've tried simplifying the problem to the point where I tried making an algorithm which could determine if a single node was a guaranteed anchestor of another node, which I believe is pretty simple to determine in O(n), however if I want a complete list of all guaranteed anchestors I assume I'd have to run this algorithm for every node, leaving me with O(n^2).
Does anyone know the correct name of the algorithm I'm describing?
Assign a weight of 1 to every edge
Run Dijkstra to find shortest path between head and root.
Assign weight of 2 * ( edge count of graph ) to every edge in path
Run Dijkstra to find cheapest path
Identify edges that are present in both paths. ( they could not be avoided although very expensive )
The nodes at both ends of every edge identified in 5 will be critical - i.e they must ALL be visted by any route between head and root.
Consider an example:
The first Dijkstra run would return a path containing node 1 or 2 ( they both belong on 5 hop paths. The second run would return a path containing the other of those two nodes
This is almost what the definition of an articulation or cut vertex is in an undirected graph. See Biconnected component:
a cut vertex is any vertex whose removal increases the number of connected components.
The difference is that your graph is directed, and that you consider the root also as such a vertex.
So my suggestion is to temporarily consider the graph to be undirected, and to apply a depth-first algorithm to identify such cut vertices, and include the root.
The algorithm is given as pseudo code in the same Wikipedia article. I have rewritten it in JavaScript, so it can be run here for the graph that you have given as example:
function buildAdjacencyList(n, edges) {
// Indexes in adj represent node identifiers.
// Values in adj are lists of neighbors: start out with empty lists
let adj = [];
for (let i = 0; i < n; i++) adj.push([]);
for (let [start, end] of edges) {
adj[start].push(end );
adj[end ].push(start); // make edge bidirectional
}
return adj;
}
function markArticulationPoints(nodes, node, depth) {
node.visited = true;
node.depth = depth;
node.low = depth;
for (let neighborId of node.neighbors) {
let neighbor = nodes[neighborId];
if (!neighbor.visited) {
neighbor.parent = node;
markArticulationPoints(nodes, neighbor, depth + 1);
if (neighbor.low >= node.depth) node.isArticulation = true;
if (neighbor.low < node.low) node.low = neighbor.low;
} else if (neighbor != node.parent && neighbor.depth < node.low) {
node.low = neighbor.depth;
}
}
}
function getArticulationPoints(adj, root) {
// Create object for each node, having meta data for algorithm
let nodes = [];
for (let i = 0; i < adj.length; i++) {
nodes.push({
neighbors: adj[i],
visited: false,
depth: Infinity,
low: Infinity,
parent: -1,
isArticulation: i == root // root is considered articulation point
});
}
markArticulationPoints(nodes, nodes[root], 0); // start DFS algorithm
// Collect articulation points from meta data
let result = [];
for (let i = 0; i < adj.length; i++) {
if (nodes[i].isArticulation) result.push(i);
}
return result;
}
// Build adjacency list for example graph, but with undirected edges
let adj = buildAdjacencyList(14, [
[0, 1],
[1, 2],
[1, 3],
[2, 4],
[2, 5],
[4, 5],
[4, 6],
[3, 7],
[7, 8],
[6, 9],
[8, 9],
[9, 10],
[10, 11],
[10, 12],
[11, 13],
[12, 13]
]);
let result = getArticulationPoints(adj, 0);
console.log("Articluation points:", ...result);
Looking at wiki's description on node ordering of contraction hierarchies
https://en.wikipedia.org/wiki/Contraction_hierarchies
I can't seem to understand how they come up with this "correct" order.
I have followed some heuristics from several papers on the subject. Those most include edge difference and increasing cost of neighbors when contraction happens. (also mentioned on wiki)
Following those heuristics my algorithms looks like:
First run over all nodes in the graph and compute edge difference (look out/ingoing edges and subtract number of edges made through contraction).
Use this list for the contraction phrase. Get the node with min value. At contraction, add +1 to the cost to it's adjacency nodes.
I come up with the following contraction order which is not the same as wiki papers.
node order list after edge difference = {-1, -1 ,-1 ,-1 ,-1, -1}, c = contracted.
Contract node 0, add +1 to node 1 as it's a neighbor.
node order list now = {c, 0 ,-1 ,-1 ,-1, -1}
Contract node 2, add +1 to node 1 and 3.
node order list now = {c, 1 ,c ,0 ,-1, -1}
Contract node 4, add +1 to node 3 and 5. node order list now = {c, 1 ,c ,1 ,c, 0}
Contract node 5, no neighbors. node order list now = {c, 1 ,c ,1 ,c, c}
Contract node 1, no neighbors. node order list now = {c, c ,c ,1 ,c, c}
Contract node 3, no neighbors. node order list now = {c, c ,c ,c ,c, c}
This only gives two shortcuts, one between 1 and 3 and the other one between 3 and 5, but missing one from 1 to 5. Wiki's example gives 3 shortcuts including the last mentioned.
What am I missing?
I have a situation where the predecessors of a node must be visited before the node is visited. So, here is the code for that:
nodeQ.Enqueue(rootNode);
while(!nodeQ.Empty())
{
node = nodeQ.Dequeue();
ForEach(var predecessor in node.Predecessors)
{
if(predecessor is not visited)
{
//put the node back into the queue
nodeQ.Enqueue(node);
skip = true;
break;
}
}
if(skip)continue;
Visit(node)
foreach(var successor in node.Successors)
{
if(successor is not already visited)
{
nodeQ.Enqueue(successor);
}
}
}
The above algorithm will be ok for linear control flow graphs without cycles (read: loops)
The normal BFS traversal doesn't ensure that the predecessors of a node are visited before the node itself.
Example CFG:
The Normal BFS traversal will be:
0, 1 , 2 , 3 , 12, 4, 5, 9, 10, 11, 8 , 6, 7
However, I want the order to be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 which can be acheived by the small modification that is there in my code shown in the beginning.
However, this modification will cause endless skipping of blocks when there are loops involved.
Example CFG where this can fail:
In this scenario, my code will endlessly postpone Visiting of Nodes 1, 2, 3
So, I was looking for a way of traversal that ensures the traversal of nodes of a CFG (with or without loops) in such a way that the predecessors of a node are visited before the node itself.
I was thinking of identifying back-edge i.e, checking if a node N is a Dominator of its predecessor P, then P->N is a backedge and there is no need to consider P as a predecessor of node N. However this doesnt seem to work as node N doesnt always have to dominate node P.
I solved this problem by first finding Dominators, Creating a Dominator Tree and then traverse the Tree in DFS pre-order.
For others who come to this question seeking CFG and dominator calculation guidance, it might be useful to check out the IBM WALA tools. I'm finding a lot of useful information here for my particular quest along these lines.
What is an efficient algorithm for the enumeration of all subgraphs of a parent graph. In my particular case, the parent graph is a molecular graph, and so it will be connected and typically contain fewer than 100 vertices.
Edit: I am only interested in the connected subgraphs.
This question has a better answer in the accepted answer to this question. It avoids the computationally complex step marked "you fill in above function" in #ninjagecko's answer. It can deal efficiently with compounds where there are several rings.
See the linked question for the full details, but here's the summary. (N(v) denotes the set of neighbors of vertex v. In the "choose a vertex" step, you can choose any arbitrary vertex.)
GenerateConnectedSubgraphs(verticesNotYetConsidered, subsetSoFar, neighbors):
if subsetSoFar is empty:
let candidates = verticesNotYetConsidered
else
let candidates = verticesNotYetConsidered intersect neighbors
if candidates is empty:
yield subsetSoFar
else:
choose a vertex v from candidates
GenerateConnectedSubgraphs(verticesNotYetConsidered - {v},
subsetSoFar,
neighbors)
GenerateConnectedSubgraphs(verticesNotYetConsidered - {v},
subsetSoFar union {v},
neighbors union N(v))
What is an efficient algorithm for the enumeration of all subgraphs of a parent graph. In my particular case, the parent graph is a molecular graph, and so it will be connected and typically contain fewer than 100 vertices.
Comparison with mathematical subgraphs:
You could give each element a number from 0 to N, then enumerate each subgraph as any binary number of length N. You wouldn't need to scan the graph at all.
If what you really want is subgraphs with a certain property (fully connected, etc.) that is different, and you'd need to update your question. As a commentor noted, 2^100 is very large, so you definitely don't want to (like above) enumerate the mathematically-correct-but-physically-boring disconnected subgraphs. It would literally take you, assuming a billion enumerations per second, at least 40 trillion years to enumerate them all.
Connected-subgraph-generator:
If you want some kind of enumeration that retains the DAG property of subgraphs under some metric, e.g. (1,2,3)->(2,3)->(2), (1,2,3)->(1,2)->(2), you'd just want an algorithm that could generate all CONNECTED subgraphs as an iterator (yielding each element). This can be accomplished by recursively removing a single element at a time (optionally from the "boundary"), checking if the remaining set of elements is in a cache (else adding it), yielding it, and recursing. This works fine if your molecule is very chain-like with very few cycles. For example if your element was a 5-pointed star of N elements, it would only have about (100/5)^5 = 3.2million results (less than a second). But if you start adding in more than a single ring, e.g. aromatic compounds and others, you might be in for a rough ride.
e.g. in python
class Graph(object):
def __init__(self, vertices):
self.vertices = frozenset(vertices)
# add edge logic here and to methods, etc. etc.
def subgraphs(self):
cache = set()
def helper(graph):
yield graph
for element in graph:
if {{REMOVING ELEMENT WOULD DISCONNECT GRAPH}}:
# you fill in above function; easy if
# there is 0 or 1 ring in molecule
# (keep track if molecule has ring, e.g.
# self.numRings, maybe even more data)
# if you know there are 0 rings the operation
# takes O(1) time
continue
subgraph = Graph(graph.vertices-{element})
if not subgraph in cache:
cache.add(subgraph)
for s in helper(subgraph):
yield s
for graph in helper(self):
yield graph
def __eq__(self, other):
return self.vertices == other.vertices
def __hash__(self):
return hash(self.vertices)
def __iter__(self):
return iter(self.vertices)
def __repr__(self):
return 'Graph(%s)' % repr(set(self.vertices))
Demonstration:
G = Graph({1,2,3,4,5})
for subgraph in G.subgraphs():
print(subgraph)
Result:
Graph({1, 2, 3, 4, 5})
Graph({2, 3, 4, 5})
Graph({3, 4, 5})
Graph({4, 5})
Graph({5})
Graph(set())
Graph({4})
Graph({3, 5})
Graph({3})
Graph({3, 4})
Graph({2, 4, 5})
Graph({2, 5})
Graph({2})
Graph({2, 4})
Graph({2, 3, 5})
Graph({2, 3})
Graph({2, 3, 4})
Graph({1, 3, 4, 5})
Graph({1, 4, 5})
Graph({1, 5})
Graph({1})
Graph({1, 4})
Graph({1, 3, 5})
Graph({1, 3})
Graph({1, 3, 4})
Graph({1, 2, 4, 5})
Graph({1, 2, 5})
Graph({1, 2})
Graph({1, 2, 4})
Graph({1, 2, 3, 5})
Graph({1, 2, 3})
Graph({1, 2, 3, 4})
There is this algorithm called gspan [1] that has been used to count frequent subgraphs it can also be used to enumerate all subgraphs. You can find an implementation of it here [2].
The idea is the following: Graphs are represented by so called DFS codes. A DFS code corresponds to a depth first search on a graph G and has an entry of the form
(i, j, l(v_i), l(v_i, v_j), l(v_j)), for each edge (v_i, v_j) of the graph, where the vertex subscripts correspond to the order in which the vertices are discovered by the DFS. It is possible to define a total order on the set of all DFS codes (as is done in [1]) and as a consequence to obtain a canonical graph label for a given graph by computing the minimum over all DFS codes representing this graph. Meaning that if two graphs have the same minimum DFS code they are isomorphic. Now, starting from all possible DFS codes of length 1 (one per edge), all subgraphs of a graph can be enumerated by subsequently adding one edge at a time to the codes which gives rise to an enumeration tree where each node corresponds to a graph. If the enumeration is done carefully (i.e., compatible with the order on the DFS codes) minimal DFS codes are encountered first. Therefore, whenever a DFS code is encountered that is not minimal its corresponding subtree can be pruned. Please consult [1] for further details.
[1] https://sites.cs.ucsb.edu/~xyan/papers/gSpan.pdf
[2] http://www.nowozin.net/sebastian/gboost/
In my particular case, the graph is represented as an adjacency list and is undirected and sparse, n can be in the millions, and d is 3. Calculating A^d (where A is the adjacency matrix) and picking out the non-zero entries works, but I'd like something that doesn't involve matrix multiplication. A breadth-first search on every vertex is also an option, but it is slow.
def find_d(graph, start, st, d=0):
if d == 0:
st.add(start)
else:
st.add(start)
for edge in graph[start]:
find_d(graph, edge, st, d-1)
return st
graph = { 1 : [2, 3],
2 : [1, 4, 5, 6],
3 : [1, 4],
4 : [2, 3, 5],
5 : [2, 4, 6],
6 : [2, 5]
}
print find_d(graph, 1, set(), 2)
Let's say that we have a function verticesWithin(d,x) that finds all vertices within distance d of vertex x.
One good strategy for a problem such as this, to expose caching/memoisation opportunities, is to ask the question: How are the subproblems of this problem related to each other?
In this case, we can see that verticesWithin(d,x) if d >= 1 is the union of vertices(d-1,y[i]) for all i within range, where y=verticesWithin(1,x). If d == 0 then it's simply {x}. (I'm assuming that a vertex is deemed to be of distance 0 from itself.)
In practice you'll want to look at the adjacency list for the case d == 1, rather than using that relation, to avoid an infinite loop. You'll also want to avoid the redundancy of considering x itself as a member of y.
Also, if the return type of verticesWithin(d,x) is changed from a simple list or set, to a list of d sets representing increasing distance from x, then
verticesWithin(d,x) = init(verticesWithin(d+1,x))
where init is the function that yields all elements of a list except the last one. Obviously this would be a non-terminating recursive relation if transcribed literally into code, so you have to be a little bit clever about how you implement it.
Equipped with these relations between the subproblems, we can now cache the results of verticesWithin, and use these cached results to avoid performing redundant traversals (albeit at the cost of performing some set operations - I'm not entirely sure that this is a win). I'll leave it as an exercise to fill in the implementation details.
You already mention the option of calculating A^d, but this is much, much more than you need (as you already remark).
There is, however, a much cheaper way of using this idea. Suppose you have a (column) vector v of zeros and ones, representing a set of vertices. The vector w := A v now has a one at every node that can be reached from the starting node in exactly one step. Iterating, u := A w has a one for every node you can reach from the starting node in exactly two steps, etc.
For d=3, you could do the following (MATLAB pseudo-code):
v = j'th unit vector
w = v
for i = (1:d)
v = A*v
w = w + v
end
the vector w now has a positive entry for each node that can be accessed from the jth node in at most d steps.
Breadth first search starting with the given vertex is an optimal solution in this case. You will find all the vertices that within the distance d, and you will never even visit any vertices with distance >= d + 2.
Here is recursive code, although recursion can be easily done away with if so desired by using a queue.
// Returns a Set
Set<Node> getNodesWithinDist(Node x, int d)
{
Set<Node> s = new HashSet<Node>(); // our return value
if (d == 0) {
s.add(x);
} else {
for (Node y: adjList(x)) {
s.addAll(getNodesWithinDist(y,d-1);
}
}
return s;
}