Detect Disconnected Subtours in Graph - algorithm

I have used an Integer Linear Program (ILP) to generate a path from source to sink in a graph. Each variable Pij in the solution represents a transition from node i to j in the path. Unfortunately, the result is a collection of 2 or more disconnected directed subtours, that do not have a transition from one to the other, thereby making the path useless.
There are ways to prevent subtours from happening, but they first require the detection of such disconnected subtours in primitive solution. Now here's my question: how do I detect disconnected subtours?
Some features of the problem:
I have the primitive solution in the form of both, an adjacency matrix and adjacency list
The source and sink vertices are distinct. No looping back to source
Any vertex in the graph may be traversed any number of times. No rule that it should be traversed only once
My thoughts:
If there are disconnected subtours, then they are actually acting as independent, directed graphs on their own, and hence the problem can be restated as detecting all graphs within an adjacency matrix/list
My first instict was to detect all Strongly Connected Components (SCC's) in the graph, but I retracted after realising that Kosaraju's and other such algorithms are infeasible over disconnected subtours. I can apply such algorithms within each subtour, though.
What could solve the problem (According to me):
Modification of existing SCC detection algorithms to operate on disconnected graphs
Adaptation of graph traversal algorithms to operate on disconnected graphs
Any existing method (of course)
Experience of anyone to have faced a similar issue.

Related

Efficient way to check whether a graph has become disconnected?

Using a disjoint set forest, we can efficiently check incrementally whether a graph has become connected while adding edges to it. (Or rather, it allows us to check whether two vertices already belong to the same connected component.) This is useful, for example, in Kruskal's algorithm for computing the minimum spanning tree.
Is there a similarly efficient data structure and algorithm to check whether a graph has become disconnected while removing edges from it?
More formally, I have a connected graph G = (V, E) from which I'm iteratively removing edges one by one. Without loss of generality, let's say these are e_1, e_2, and so on. I want to stop right before G becomes disconnected, so I need a way to check whether the removal of e_i will make the graph disconnected. This can of course be done by removing e_i and then doing a breadth-first or depth-first search from an arbitrary vertex, but this is an O(|E|) operation, making the entire algorithm O(|E|²). I'm hoping there is a faster way by somehow leveraging results from the previous iteration.
I've looked at partition refinement but it doesn't quite solve this problem.
There’s a family of data structures that are designed to support queries of the form “add an edge between two nodes,” “delete an edge between two nodes,” and “is there a path from node x to node y?” This is called the dynamic connectivity problem and there are many data structures that you can use to solve it. Unfortunately, these data structures tend to be much more complicated than a disjoint-set forest, so you may want to search around to see if you can find a ready-made implementation somewhere.
The layered forest structure of Holm et al works by maintaining a hierarchy of spanning forests for the graph and doing some clever bookkeeping with edges to avoid rescanning them when seeing if two parts of the graph are still linked. It supports adding and deleting edges in amortized time O(log2 n) and queries of the form “are these two nodes connected?” in time O(log n / log log n).
A more recent randomized structure called the randomized cutset was developed by Kapron et al. It has worst-case O(log5 n) costs for all three operations (IIRC).
There might be a better way to solve your particular problem that doesn’t require these heavyweight approaches, but I figured I’d mention these in case they’re helpful.

Finding strongly connected subgraph that contains no negative cycles

Is there an algorithm that solves the following decision problem:
Given a strongly connected weighted directed graph G, defined by its transition matrix, is there a strongly connected spanning subgraph of G that has no negative cycles?
A strongly connected spanning subgraph of G is a strongly connected subgraph of G that shares the same vertexes with G. You can look to this paper for the definition of strongly connected spanning subgraph. In this paper they present an approximation for the minimum strongly connected subgraph problem.
A naive approach to this problem is to find a negative cycle of the graph using the Ford-Bellman or Floyd-Warshall algorithm, deleting an edge from this cycle, and repeating while the graph is still strongly connected. But this naive approach has poor time complexity because we will potentially run the Ford-bellman algorithm and check for strong connectivity many times -- moreover I am unable to prove if this algorithm is correct in all instances.
I'm hoping to find experts here who can tell me if this decision problem can be solved in a polynomial time and what algorithm does so. Many thanks in advance.
Here is a naive solution that has a reasonable chance of finding a strongly connecting spanning subgraph with no negative cycles in polynomial time. But is emphatically not guaranteed to find one.
Turn all weights to their negatives. Now use Ford-Bellman or Floyd-Warshall to find a negative cycle. This is actually a positive cycle in the original graph.
Now we pick one vertex in the cycle, and contract the other vertices to it.
Edges that connect to/from removed vertices are replaced by ones representing traveling along that edge and around the cycle to the one we keep. If you wind up with multiple edges between two vertices, only keep the best one.
Repeat the exercise on the new, smaller, graph.
This algorithm runs in guaranteed polynomial time (each iteration runs in polynomial time and removes at least one vertex). If it manages to reduce your graph to a point, then you just walk the process backwards and find that you have in fact found a strongly connected spanning graph with no negative cycles.
However if it fails to do so, there is no guarantee that there isn't one. You just didn't find it.
(I suspect that a guaranteed algorithm will be NP-complete.)
This problem is NP hard in general, this can be proven by reducing Hamiltonian cycle into it.

Check if a changing undirected graph has at least one circle

I have an undirected graph which initially has no edges. Now in every step an edge is added or deleted and one has to check whether the graph has at least one circle. Probably the easiest sufficient condition for that is
connected components + number of edges <= number of nodes.
As the "steps" I mentioned above are executed millions of times, this check has to be really fast. So I wonder what would be a quick way to check the condition depending on the fact that in each step only one edge changes.
Any suggestions?
If you are keen, you can try to implement a fully dynamic graph connectivity data structure like described in "Poly-logarithmic deterministic fully-dynamic graph algorithms I: connectivity and minimum spanning tree" by Jacob Holm, Kristian de Lichtenberg, Mikkel Thorup.
When adding an edge, you check whether the two endpoints are connected. If not, the number of connected components decreases by one. After deleting an edge, check if the two endpoints are stil connected. If not, the number of connected components increases by one. The amortized runtime of edge insertion and deletion would be O(log^2 n), but I can imagine the constant factor is quite high.
There are newer result with better bounds. There is also an experimental evaluation of some of the dynamic connectivity algorithms that considers implementation details as well. There is also a Javascript implementation. I have no idea how good it is.
I guess in practice you can have it much easier by maintaining a spanning forest. You get edge additions and non-tree edge deletions (almost) for free. For tree edge deletions you could just use "brute force" in the form of BFS or DFS to check whether the end points are still connected. Especially if the number of nodes is bounded, maybe that works well enough in practice, BFS and DFS are both O(n^2) for dense graphs and you can charge some of that work to the operations where you got lucky and didn't have a lot to do.
I suggest you label all the nodes. Use integers, that's easiest.
At any point, your graph will be divided into a number of disjoint subgraphs. Initially, each node is in its own subgraph.
Maintain the condition that each subgraph has a unique label, and all the nodes in the subgraph carry that label. Initially, just give each node a unique label. If your problem includes adding nodes, you might want to maintain a variable to hold the next available label.
If and only if a new edge would connect two nodes with identical labels, then the edge would create a cycle.
Whenever you add an edge, you will connect two previously disjoint subgraphs. You must relabel one of the subgraphs to match the other, which will require visiting all the nodes of one subgraph. This is the highest computatonal burden in this scheme.
If you don't mind allocating more space, you should also maintain a list of labels in use, associated with a count of the nodes carrying that label. This will allow you to choose the smaller subgraph when relabeling.
If you know which two nodes are being connected by the new edge, you could use some sort of path finding algorithm to detect an alternative path between the two nodes. In other words, if a path exists which connects the two nodes of your new edge before you add the new edge, adding the new edge will create a circle.
Your problem then reduces to finding the paths between two given nodes.

Is it possible to develop an algorithm to solve a graph isomorphism?

Or will I need to develop an algorithm for every unique graph? The user is given a type of graph, and they are then supposed to use the interface to add nodes and edges to an initial graph. Then they submit the graph and the algorithm is supposed to confirm whether the user's graph matches the given graph.
The algorithm needs to confirm not only the neighbours of each node, but also that each node and each edge has the correct value. The initial graphs will always have a root node, which is where the algorithm can start from.
I am wondering if I can develop the logic for such an algorithm in the general sense, or will I need to actually code a unique algorithm for each unique graph. It isn't a big deal if it's the latter case, since I only have about 20 unique graphs.
Thanks. I hope I was clear.
Graph isomorphism problem might not be hard. But it's very hard to prove this problem is not hard.
There are three possibilities for this problem.
1. Graph isomorphism problem is NP-hard.
2. Graph isomorphism problem has a polynomial time solution.
3. Graph isomorphism problem is neither NP-hard or P.
If two graphs are isomorphic, then there exist a permutation for this isomorphism. Take this permutation as a certificate, we could prove this two graphs are isomorphic to each other in polynomial time. Thus, graph isomorphism lies in the territory of NP set. However, it has been more than 30 years that no one could prove whether this problem is NP-hard or P. Thus, this problem is intrinsically hard despite its simple problem description.
If I understand the question properly, you can have ONE single algorithm, which will work by accepting one of several reference graphs as its input (in addition to the input of the unknown graph which isomorphism with the reference graph is to be asserted).
It appears that you seek to assert whether a given graph is exactly identical to another graph rather than asserting if the graphs are isomorph relative to a particular set of operations or characteristics. This implies that the algorithm be supplied some specific reference graph, rather than working off some set of "abstract" rules such as whether neither graphs have loops, or both graphs are fully connected etc. even though the graphs may differ in some other fashion.
Edit, following confirmation that:
Yeah, the algorithm would be supplied a reference graph (which is the answer), and will then check the user's graph to see if it is isomorphic (including the values of edges and nodes) to the reference
In that case, yes, it is quite possible to develop a relatively simple algorithm which would assert isomorphism of these two graphs. Note that the considerations mentioned in other remarks and answers and relative to the fact that the problem may be NP-Hard are merely indicative that a simple algorithm [or any algorithm for that matter] may not be sufficient to solve the problem in a reasonable amount of time for graphs which size and complexity are too big. However, assuming relatively small graphs and taking advantage (!) of the requirement that the weights of edges and nodes also need to match, the following algorithm should generally be applicable.
General idea:
For each sub-graph that is disconnected from the rest of the graph, identify one (or possibly several) node(s) in the user graph which must match a particular node of the reference graph. By following the paths from this node [in an orderly fashion, more on this below], assert the identity of other nodes and/or determine that there are some nodes which cannot be matched (and hence that the two structures are not isomorphic).
Rough pseudo code:
1. For both the reference and the user supplied graph, make the the list of their Connected Components i.e. the list of sub-graphs therein which are disconnected from the rest of the graph. Finding these connected components is done by following either a breadth-first or a depth-first path from starting at a given node and "marking" all nodes on that path with an arbitrary [typically incremental] element ID number. Once a given path has been fully visited, repeat the operation from any other non-marked node, and do so until there are no more non-marked nodes.
2. Build a "database" of the characteristics of each graph.
This will be useful to identify matching candidates and also to determine, early on, instances of non-isomorphism.
Each "database" would have two kinds of "records" : node and edge, with the following fields, respectively:
- node_id, Connected_element_Id, node weight, number of outgoing edges, number of incoming edges, sum of outgoing edges weights, sum of incoming edges weight.
node
- edge_id, Connected_element_Id, edge weight, node_id_of_start, node_id_of_end, weight_of_start_node, weight_of_end_node
3. Build a database of the Connected elements of each graph
Each record should have the following fields: Connected_element_id, number of nodes, number of edges, sum of node weights, sum of edge weights.
4. [optionally] Dispatch the easy cases of non-isomorphism:
4.a mismatch of the number of connected elements
4.b mismatch of of number of connected elements, grouped-by all fields but the id (number of nodes, number of edges, sum of nodes weights, sum of edges weights)
5. For each connected element in the reference graph
5.1 Identify candidates for the matching connected element in the user-supplied graph. The candidates must have the same connected element characteristics (number of nodes, number of edges, sum of nodes weights, sum of edges weights) and contain the same list of nodes and edges, again, counted by grouping by all characteristics but the id.
5.2 For each candidate, finalize its confirmation as an isomorph graph relative to the corresponding connected element in the reference graph. This is done by starting at a candidate node-match, i.e. a node, hopefully unique which has the exact same characteristics on both graphs. In case there is not such a node, one needs to disqualify each possible candidate until isomorphism can be confirmed (or all candidates are exhausted). For the candidate node match, walk the graph, in, say, breadth first, and by finding matches for the other nodes, on the basis of the direction and weight of the edges and weight of the nodes.
The main tricks with this algorithm is are to keep proper accounting of the candidates (whether candidate connected element at higher level or candidate node, at lower level), and to also remember and mark other identified items as such (and un-mark them if somehow the hypothetical candidate eventually proves to not be feasible.)
I realize the above falls short of a formal algorithm description, but that should give you an idea of what is required and possibly a starting point, would you decide to implement it.
You can remark that the requirement of matching nodes and edges weights may appear to be an added difficulty for asserting isomorphism, effectively simplify the algorithm because the underlying node/edge characteristics render these more unique and hence make it more likely that the algorithm will a) find unique node candidates and b) either quickly find other candidates on the path and/or quickly assert non-isomorphism.

how to find Connected Component dynamically

Using disjoint-set data structure can easily get connected component of Graph. And, it just supports Incremental Connected Components.
However, in my case, removing edge is very common so that I am looking for an algorithm or new structure can maintain Connected Components fully dynamically(including adding and removing edge)
Thanks
Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity (Holm, de Lichtenberg and Thorup 2001) gives an algorithm that allows an arbitrary sequence of edge insertions, deletions and connectivity queries, with updates (insertions and deletions) taking O(log(n)^2) amortised time, and queries taking O(log(n)/log(log(n))) time, with n being the number of vertices in the graph. These time bounds assume that the graph starts with no edges.
I only skimmed the first 2 of its 38 pages, but don't be (too) scared -- the paper describes a bunch of new algorithms on dynamic graphs (that is, graphs that can be efficiently modified over time) of which connectivity is the simplest.

Resources