Finding the list of common children (descendants) for any two nodes in a cyclic graph - algorithm

I have a cyclic directed graph and I was wondering if there is any algorithm (preferably an optimum one) to make a list of common descendants between any two nodes? Something almost opposite of what Lowest Common Ancestor (LCA) does.

As user1990169 suggests, you can compute the set of vertices reachable from each of the starting vertices using DFS and then return the intersection.
If you're planning to do this repeatedly on the same graph, then it might be worthwhile first to compute and contract the strong components to supervertices representing a set of vertices. As a side effect, you can get a topological order on supervertices. This allows a data-parallel algorithm to compute reachability from multiple starting vertices at the same time. Initialize all vertex labels to {}. For each start vertex v, set the label to {v}. Now, sweep all vertices w in topological order, updating the label of w's out-neighbors x by setting it to the union of x's label and w's label. Use bitsets for a compact, efficient representation of the sets. The downside is that we cannot prune as with single reachability computations.

I would recommend using a DFS (depth first search).
For each input node
Create a collection to store reachable nodes
Perform a DFS to find reachable nodes
When a node is reached
If it's already stored stop searching that path // Prevent cycles
Else store it and continue
Find the intersection between all collections of nodes
Note: You could easily use BFS (breadth first search) instead with the same logic if you wanted.
When you implement this keep in mind there will be a few special cases you can look for to further optimize your search such as:
If an input node doesn't have any vertices then there are no common nodes
If one input node (A) reaches another input node (B), then A can reach everything B can. This means the algorithm wouldn't have to be ran on B.
etc.

Why not just reverse the direction of the edge and use LCA?

Related

Finding the closest marked node in a graph

In a graph with a bunch of normal nodes and a few special marked nodes, is there a common algorithm to find the closest marked node from a given starting position in the graph?
Or is the best way to do a BFS search to find the marked nodes and then doing Dijkstra's on each of the discovered marked nodes to see which one is the closest?
This depends on the graph, and your definition of "closest".
If you compute "closest" ignoring edge weights, or your graph has no edge weights, a simple breadth-first search (BFS) will suffice. The first node reached vía BFS is, by definition of BFS, the closest (or, if there are several closest nodes, tied for closeness). If you keep track of the number of expanded BFS levels, you can locate all closest nodes by reaching the end of the level instead of stopping as soon as you find the first marked node.
If you have edge weights, and need to use them in your computation, use Dijkstra instead. If the edges can have negative weights, and there happen to be any negative cycles, then you will need to instead use Bellman-Ford.
As mentioned by SaiBot, if the start node is always the same, and you will perform several queries with changing "marked" nodes, there are faster ways to do things. In particular, you can store in each node the "parent" found in a first full traversal, and the node's distance to the start node. When adding a new batch of k marked nodes, you would immediately know the closest to the start by looking at this distance for each marked node.
The fastest way would be to perform Dijkstra right away from your starting position (starting node). When "closeness" is defined as the number of edges that have to be traversed, you can just assign a weight of 1 to each edge. In case precomputation is allowed there will be faster ways to do it.

detecting mutual edges in a graph

I have an adjacency list representation of a graph but it is not symmetric i.e,. if a node A has an edge to B, it is not true that B has an edge with A. I guess this will be a directional graph (digraph).
What is a good way to detect all the bidirectional paths from a node. I know I can use DFS to detect the paths from a node to another nodes of the graph. I guess what I am looking for is a bidirectional DFS where only the bidirectional edges are taken into account.
So one way to do that is to look at the neighbour for a node and figure out if this is a bidirectional relationship. However, for this I will need to go through all the immediate connections of this neighbouring node and see if the first node is also a connection and if yes, to continue with the recursion. I wonder if this is an efficient way to do this?
With a fairly standard "adjacency set" representation, where you use some sort of set data structure (hash- or tree-based) instead of lists to represent the edges coming out of a node, you can just query whether the reversed version of an edge exists in the graph. Building an adjacency set representation from an adjacency list representation is straightfoward.
Alternatively, you can build one set of all the edges in the graph and then filter edges out of the graph whose reversed version isn't in the set. This can be more convenient if you wouldn't have any further use for the adjacency set representation after using it in the other approach. If you want to keep memory usage down, you can remove edges from the set while building it if you find their reversed version in the graph, and then remove edges from the graph afterward if they're in the set instead of if their reversed version isn't.

Creating a graph and finding strongly connected components in a single pass (not just Tarjan!)

I have a particular problem where each vertex of a directed graph has exactly four outward-pointing paths (which can point back to the same vertex).
At the beginning, I have only the starting vertex and I use DFS to discover/enumerate all the vertices and edges.
I can then use something like Tarjan's algo to break the graph into strongly connected components.
My question is, is there a more efficient way to doing this than discovering the graph and then applying an algorithm. For example, is there a way of combining the two parts to make them more efficient?
To avoid having to "discover" the graph at the outset, the key property that Tarjan's algorithm would need is that, at any point in its execution, it should only depend on the subgraph it has explored so far, and it should only ever extend this explored region by enumerating the neighbours of some already-visited vertex. (If, for example, it required knowing the total number of nodes or edges in the graph at the outset, then you would be sunk.) From looking at the Wikipedia page it seems that the algorithm does indeed have this property, so no, you don't need to perform a separate discovery phase at the start -- you can discover each vertex "lazily" at the lines for each (v, w) in E do (enumerate all neighbours of v just as you currently do in your discovery DFS) and for each v in V do (just pick v to be any vertex you have already discovered as a w in the previous step, but which you haven't yet visited yet with a call to strongconnect(v)).
That said, since your initial DFS discovery phase only takes linear time anyway, I'd be surprised if eliminating it sped things up much. If your graph is so large that it doesn't fit in cache, it could halve the total time, though.

Tree root finding

How could I get from set of nodes and edges get tree with a root?
(I'm working with connectivity-matrix, each edge has weight: graph[i][j], without any negative edges). Later I need to do DFS and find LCA's in that tree, so it would be good for optimize.
I suppose that your matrix represents the child relationship (i.e. M[i][j] tells that j is the child of i), for a directed graph G(V,E).
You have 2 different strategies:
use a bit vector, go through each cell of your matrix, and mark the child index in the vector if the cell's weight is not null): the root is the vertex not set in the vector,
look for the columns (or rows, if your matrix is column first) whose cells are all null (no ancestors),
The second solution is better for dense matrices. Its worst running time would be when the root is the last entry (O(V²)). In this case you can stop at the first hit, or run til the end to get all the roots, if your graph has many.
The first one is better suited for sparse matrices, since you have to go through all the cells. It's running time is in O(E). You also get all the roots with this algorithm.
If you are certain that your graph has only one root, you can use the walk the edges up technique, as described in other answers.
Here is a computationally MUCH SLOWER version that is also much easier to code. For small graphs, it is just fine.
Find the node with in-degree zero!
You have to compute all node degrees, O(n), but depending on the setting, this is often much easier to code and thus less prone to error.
Pick one node in the tree and walk up, that is, against the orientation of the edges. When you find a node without an ancestor you have the root.
If you need to do something like this often, just remember the parent node for each node.
a DFS search from any graph gives you a tree (assuming the graph is connected, of course).
you can iterate it, and start from each node as a possible root, you will get a spanning tree eventually this way, if there is one. complexity will be O(V^2+VE)
EDIT: it works because for any finite graph, if there is a root form node a to node b, there will be a path from a to b in the tree DFS creates. so, assuming there is a possible spanning tree, there is a root r, which you can get from to each v in V. when iterating when r chosen as root, there is a path from r to each v in V, so there will be a path from r to it in the spanning tree.

Is there a proper algorithm to solve edge-removing problem?

There is a directed graph (not necessarily connected) of which one or more nodes are distinguished as sources. Any node accessible from any one of the sources is considered 'lit'.
Now suppose one of the edges is removed. The problem is to determine the nodes that were previously lit and are not lit anymore.
An analogy like city electricity system may be considered, I presume.
This is a "dynamic graph reachability" problem. The following paper should be useful:
A fully dynamic reachability algorithm for directed graphs with an almost linear update time. Liam Roditty, Uri Zwick. Theory of Computing, 2002.
This gives an algorithm with O(m * sqrt(n))-time updates (amortized) and O(sqrt(n))-time queries on a possibly-cyclic graph (where m is the number of edges and n the number of nodes). If the graph is acyclic, this can be improved to O(m)-time updates (amortized) and O(n/log n)-time queries.
It's always possible you could do better than this given the specific structure of your problem, or by trading space for time.
If instead of just "lit" or "unlit" you would keep a set of nodes from which a node is powered or lit, and consider a node with an empty set as "unlit" and a node with a non-empty set as "lit", then removing an edge would simply involve removing the source node from the target node's set.
EDIT: Forgot this:
And if you remove the last lit-from-node in the set, traverse the edges and remove the node you just "unlit" from their set (and possibly traverse from there too, and so on)
EDIT2 (rephrase for tafa):
Firstly: I misread the original question and thought that it stated that for each node it was already known to be lit or unlit, which as I re-read it now, was not mentioned.
However, if for each node in your network you store a set containing the nodes it was lit through, you can easily traverse the graph from the removed edge and fix up any lit/unlit references.
So for example if we have nodes A,B,C,D like this: (lame attempt at ascii art)
A -> B >- D
\-> C >-/
Then at node A you would store that it was a source (and thus lit by itself), and in both B and C you would store they were lit by A, and in D you would store that it was lit by both A and C.
Then say we remove the edge from B to D: In D we remove B from the lit-source-list, but it remains lit as it is still lit by A. Next say we remove the edge from A to C after that: A is removed from C's set, and thus C is no longer lit. We then go on to traverse the edges that originated at C, and remove C from D's set which is now also unlit. In this case we are done, but if the set was bigger, we'd just go on from D.
This algorithm will only ever visit the nodes that are directly affected by a removal or addition of an edge, and as such (apart from the extra storage needed at each node) should be close to optimal.
Is this your homework?
The simplest solution is to do a DFS (http://en.wikipedia.org/wiki/Depth-first_search) or a BFS (http://en.wikipedia.org/wiki/Breadth-first_search) on the original graph starting from the source nodes. This will get you all the original lit nodes.
Now remove the edge in question. Do again the DFS. You can the nodes which still remain lit.
Output the nodes that appear in the first set but not the second.
This is an asymptotically optimal algorithm, since you do two DFSs (or BFSs) which take O(n + m) times and space (where n = number of nodes, m = number of edges), which dominate the complexity. You need at least o(n + m) time and space to read the input, therefore the algorithm is optimal.
Now if you want to remove several edges, that would be interesting. In this case, we would be talking about dynamic data structures. Is this what you intended?
EDIT: Taking into account the comments:
not connected is not a problem, since nodes in unreachable connected components will not be reached during the search
there is a smart way to do the DFS or BFS from all nodes at once (I will describe BFS). You just have to put them all at the beginning on the stack/queue.
Pseudo code for a BFS which searches for all nodes reachable from any of the starting nodes:
Queue q = [all starting nodes]
while (q not empty)
{
x = q.pop()
forall (y neighbour of x) {
if (y was not visited) {
visited[y] = true
q.push(y)
}
}
}
Replace Queue with a Stack and you get a sort of DFS.
How big and how connected are the graphs? You could store all paths from the source nodes to all other nodes and look for nodes where all paths to that node contain one of the remove edges.
EDIT: Extend this description a bit
Do a DFS from each source node. Keep track of all paths generated to each node (as edges, not vertices, so then we only need to know the edges involved, not their order, and so we can use a bitmap). Keep a count for each node of the number of paths from source to node.
Now iterate over the paths. Remove any path that contains the removed edge(s) and decrement the counter for that node. If a node counter is decremented to zero, it was lit and now isn't.
I would keep the information of connected source nodes on the edges while building the graph.(such as if edge has connectivity to the sources S1 and S2, its source list contains S1 and S2 ) And create the Nodes with the information of input edges and output edges. When an edge is removed, update the output edges of the target node of that edge by considering the input edges of the node. And traverse thru all the target nodes of the updated edges by using DFS or BFS. (In case of a cycle graph, consider marking). While updating the graph, it is also possible to find nodes without any edge that has source connection (lit->unlit nodes). However, it might not be a good solution, if you'd like to remove multiple edges at the same time since that may cause to traverse over same edges again and again.

Resources