I have a cyclic directed graph. Starting at the leaves, I wish to propagate data attached to each node downstream to all nodes that are reachable from that node. In particular, I need to keep pushing data around any cycles that are reached until the cycles stabilise.
I'm completely sure that this is a stock graph traversal problem. However, I'm having a fair bit of difficulty trying to find a suitable algorithm --- I think I'm missing a few crucial search keywords.
Before I attempt to write my own half-assed O(n^3) algorithm, can anyone point me at a proper solution? And what is this particular problem called?
Since the graph is cyclic (i.e. can contain cycles), I would first break it down into strongly connected components. A strongly connected component of a directed graph is a subgraph where each node is reachable from every other node in the same subgraph. This would yield a set of subgraphs. Notice that a strongly connected component of more than one node is effectively a cycle.
Now, in each component, any information in one node will eventually end up in every other node of the graph (since they are all reachable). Thus for each subgraph we can simply take all the data from all the nodes in it and make every node have the same set of data. No need to keep going through the cycles. Also, at the end of this step, all nodes in the same component contains exactly the same data.
The next step would be to collapse each strongly connected component into a single node. As the nodes within the same component all have the same data, and are therefore basically the same, this operation does not really change the graph. The newly created "super node" will inherit all the edges going out or coming into the component's nodes from nodes outside the component.
Since we have collapsed all strongly connected components, there will be no cycles in the resultant graph (why? because had there been a cycle formed by the resultant nodes, they would all have been placed in the same component in the first place). The resultant graph is now a Directed Acyclic Graph. There are no cycles, and a simple depth first traversal from all nodes with indegree=0 (i.e. nodes that have no incoming edges), propagating data from each node to its adjacent nodes (i.e. its "children"), should get the job done.
Related
Although we can check a if a graph is bipartite using BFS and DFS (2 coloring ) on any given undirected graph, Same implementation may not work for the directed graph.
So for testing same on directed graph , Am building a new undirected graph G2 using my source graph G1, such that for every edge E[u -> v] am adding an edge [u,v] in G2.
So by applying a 2 coloring BFS I can now find if G2 is bipartite or not.
and same applies for the G1 since these two are structurally same. But this method is costly as am using extra space for graph. Though this will suffice my purpose as of now, I'd like know if there any better implementations for the same.
Thanks In advance.
You can execute the algorithm to find the 2-partition of an undirected graph on a directed graph as well, you just need a little twist. (BTW, in the algorithm below I assume that you will eventually find a 2-coloring. If not, then you will run into a node that is already colored and you find you need to color it to the other color. Then you just exit saying it's not bipartite.)
Start from any node and do the 2-coloring by traversing the edges. If you have traversed every edge and every node in the graph then you have your partition. If not, then you have a component that is 2-colored and there are no edges leaving the component. Pick any node not in the component and repeat. If you get into a situation when you have a few components that are all 2-colored, and there are no edges leaving any of them, and you encounter an edge that originates in a node in the component you are currently building and goes into a node in one of the previous components then you just merge the current component with the older one (and possibly need to flip the color of every node in one of the components -- flip it in the smaller component). After merging just continue. You can do the merge, because at the time of the merge you have scanned only one edge between the two components, so flipping the coloring of one of the components leaves you in a valid state.
The time complexity is still O(max(|N|,|E|)), and all you need is an extra field for every node indicating which component that node is in.
I have an undirected graph which initially has no edges. Now in every step an edge is added or deleted and one has to check whether the graph has at least one circle. Probably the easiest sufficient condition for that is
connected components + number of edges <= number of nodes.
As the "steps" I mentioned above are executed millions of times, this check has to be really fast. So I wonder what would be a quick way to check the condition depending on the fact that in each step only one edge changes.
Any suggestions?
If you are keen, you can try to implement a fully dynamic graph connectivity data structure like described in "Poly-logarithmic deterministic fully-dynamic graph algorithms I: connectivity and minimum spanning tree" by Jacob Holm, Kristian de Lichtenberg, Mikkel Thorup.
When adding an edge, you check whether the two endpoints are connected. If not, the number of connected components decreases by one. After deleting an edge, check if the two endpoints are stil connected. If not, the number of connected components increases by one. The amortized runtime of edge insertion and deletion would be O(log^2 n), but I can imagine the constant factor is quite high.
There are newer result with better bounds. There is also an experimental evaluation of some of the dynamic connectivity algorithms that considers implementation details as well. There is also a Javascript implementation. I have no idea how good it is.
I guess in practice you can have it much easier by maintaining a spanning forest. You get edge additions and non-tree edge deletions (almost) for free. For tree edge deletions you could just use "brute force" in the form of BFS or DFS to check whether the end points are still connected. Especially if the number of nodes is bounded, maybe that works well enough in practice, BFS and DFS are both O(n^2) for dense graphs and you can charge some of that work to the operations where you got lucky and didn't have a lot to do.
I suggest you label all the nodes. Use integers, that's easiest.
At any point, your graph will be divided into a number of disjoint subgraphs. Initially, each node is in its own subgraph.
Maintain the condition that each subgraph has a unique label, and all the nodes in the subgraph carry that label. Initially, just give each node a unique label. If your problem includes adding nodes, you might want to maintain a variable to hold the next available label.
If and only if a new edge would connect two nodes with identical labels, then the edge would create a cycle.
Whenever you add an edge, you will connect two previously disjoint subgraphs. You must relabel one of the subgraphs to match the other, which will require visiting all the nodes of one subgraph. This is the highest computatonal burden in this scheme.
If you don't mind allocating more space, you should also maintain a list of labels in use, associated with a count of the nodes carrying that label. This will allow you to choose the smaller subgraph when relabeling.
If you know which two nodes are being connected by the new edge, you could use some sort of path finding algorithm to detect an alternative path between the two nodes. In other words, if a path exists which connects the two nodes of your new edge before you add the new edge, adding the new edge will create a circle.
Your problem then reduces to finding the paths between two given nodes.
There is a directed graph having a single designated node called root from which all other nodes are reachable. Each terminal node (no outgoing edges) takes an string value.
Intermediate nodes have one or more outgoing edges but no value associated with them. Edges connecting a node to its neighbor have a string label. The labels for edges emanating from a single node are unique. There could be possible cycles in the graph!
What is the best graph algorithm for checking if two such directed (possibly having cycles) graphs (as described above) are equal?
The graph isomorphism problem is one of the more intriguing problems in TCS. There is an entire page dedicated to it on the wiki.
In your particular case you have two rooted directed graph with a source and a sink.
You could start two BFS in parallel, and check for isomorphism level by level; i.e. levelize the graph and check whether the subset of nodes at each level are isomorphic across the two graphs.
Note that since you have a Directed, Rooted graph you should still be able to levelize it for the purpose of finding isomorphism. Do not enque nodes already visited during the BFS; i.e. levelize using the shortest path to the node from the root when determining the level to group in.
Within a set the comparison should be relatively easy. You have many properties to distinguish nodes at the same level (degree, labels) and should be able to create suitable signatures to sort them. Since you are looking for perfect isomorphism, you should get an exact match.
I'm not sure exactly sure what the correct terminology is for my question so I'll just explain what I want to do. I have a directed graph and after I delete a node I want all independently related nodes to be removed as well.
Here's an example:
Say, I delete node '11', I want node '2' to be deleted as well(and in my own example, they'll be nodes under 2 that will now have to be deleted as well) because its not connected to the main graph anymore. Note, that node '9' or '10' should not be deleted because node '8' and '3' connect to them still.
I'm using the python library networkx. I searched the documentation but I'm not sure of the terminology so I'm not sure what this is called. If possible, I would want to use a function provided by the library than create my own recursion through the graph(as my graph is quite large).
Any help or suggestions on how to do this would be great.
Thanks!
I am assuming that the following are true:
The graph is acyclic. You mentioned this in your comment, but I'd like to make explicit that this is a starting assumption.
There is a known set of root nodes. We need to have some way of knowing what nodes are always considered reachable, and I assume that (somehow) this information is known.
The initial graph does not contain any superfluous nodes. That is, if the initial graph contains any nodes that should be deleted, they've already been deleted. This allows the algorithm to work by maintaining the invariant that every node should be there.
If this is the case, then given an initial setup, the only reason that a node is in the graph would be either
The node is in the root reachable set of nodes, or
The node has a parent that is in the root reachable set of nodes.
Consequently, any time you delete a node from the graph, the only nodes that might need to be deleted are that node's descendants. If the node that you remove is in the root set, you may need to prune a lot of the graph, and if the node that you remove is a descendant node with few of its own descendants, then you might need to do very little.
Given this setup, once a node is deleted, you would need to scan all of that node's children to see if any of them have no other parents that would keep them in the graph. Since we assume that the only nodes in the graph are nodes that need to be there, if the child of a deleted node has at least one other parent, then it should still be in the graph. Otherwise, that node needs to be removed. One way to do the deletion step, therefore, would be the following recursive algorithm:
For each of children of the node to delete:
If that node has exactly one parent: (it must be the node that we're about to delete)
Recursively remove that node from the graph.
Delete the specified node from the graph.
This is probably not a good algorithm to implement directly, though, since the recursion involved might get pretty deep if you have a large graph. Thus you might want to implement it using a worklist algorithm like this one:
Create a worklist W.
Add v, the node to delete, to W.
While W is not empty:
Remove the first entry from W; call it w.
For each of w's children:
If that child has just one parent, add it to W.
Remove w from the graph.
This ends up being worst-case O(m) time, where m is the number of edges in the graph, since in theory every edge would have to be scanned. However, it could be much faster, assuming that your graph has some redundancies in it.
Hope this helps!
Let me provide you with the python networkX code that solves your task:
import networkx as nx
import matplotlib.pyplot as plt#for the purpose of drawing the graphs
DG=nx.DiGraph()
DG.add_edges_from([(3,8),(3,10),(5,11),(7,11),(7,8),(11,2),(11,9),(11,10),(8,9)])
DG.remove_node(11)
connected_components method surprisingly doesn't work on the directed graphs, so we turn the graph to undirected, find out not connected nodes and then delete them from the directed graph
UG=DG.to_undirected()
not_connected_nodes=[]
for component in nx.connected_components(UG):
if len(component)==1:#if it's not connected, there's only one node inside
not_connected_nodes.append(component[0])
for node in not_connected_nodes:
DG.remove_node(node)#delete non-connected nodes
If you want to see the result, add to the script the following two lines:
nx.draw(DG)
plt.show()
If a graph has back edges, is it singly connected or not? By back edges I mean connections from child node to one of its ancestors, under the same root. If a node is connected to a node higher than it, but not its ancestor, then it's a cross node.
http://en.wikipedia.org/wiki/Polytree
This link clarifies the concept of singly connected graph.
If a graph has back edges, that doesn't prevent it from being singly-connected. But it might not be singly-connected for other reasons. For example, if the graph is undirected.
It seems that you are trying to make an analogy with linked lists (where singly-connected and doubly-connected are common terms with an usual meaning).
However, this isn't a big deal for graphs, and the term connectivity is more usually associated with reachability (ie.: is there a path from a node to another?)
If I understand you question correctly, you want to know whether a Polytree can contain back edges (edges from a node to one of its ancestors).
From the wikipedia article you linked to, a Polytree is a DAG that remains a tree even if the edges are made undirected. If a directed graph contained back edges, it would mean there would be a cycle in the graph (you can reach the node from its ancestor and then go back to the ancestor using the back edge). Thus it would no longer be a DAG, let alone a tree. If it isn;t a DAG, it cannot be a Polytree. So, no a Polytree cannot have a back edge.