Possible solution to find a Hamiltonian path in polynomial time - algorithm

I was thinking recently about a possible solution to find, in polynomial time, whether an undirected graph has a Hamiltonian path or not.
The main concept used as part of this implementation is based on an observation that I noticed while
trying to find (i.e. on paper) a Hamiltonian path for several undirected graphs.
The steps could be defined as follows:
Read the adjacency matrix of the graph.
While the adjacency matrix is being read, a map (i.e. dictionary-based structure) will be created for all the nodes. Also, the starting node of the path will be selected while the adjacency matrix
is being read. These operations can be described as follows:
2.1. The map will store all the nodes from the graph, as a key - value structure.
Each entry in the map will be represented as: (key: node index, value: node class)
The node class will contain the following details about the node: node index, number of incident
edges to it, and a flag to indicate if the current node has already been visited or not.
By taking into consideration that each entry in the map will contain just the value
corresponding to that node, it can be stated that any read access from the map for a given node
index will be constant (i.e. O(1)).
2.2. As part of reading the adjacency matrix and building the map at step 2.1., the starting
node will also be retained. The starting node of the path will be represented by the node which
has the minimum number of edges incident to it.
If multiple nodes exist in the graph with this property, then the node with the lowest index
number will be selected. In this context, we can assume that each node will have an index
associated to it, starting from zero: 0, 1, 2, etc.
The starting node identified at step 2.2. will be marked as visited.
The next operations will be followed for the remaining nodes. The loop will end either when
the number of visited nodes is equal to the number of nodes from the graph, or when there hasn't
been found an unvisited adjacent node for the current node.
Therefore, the next steps will be followed as part of this loop:
4.1. The first operation will be to find the next node to visit.
The next node to be visited will have to respect the following constraints:
To have an edge to the current node
To not have been visited so far
To have the minimum number of edges incident to it, when compared to the other adjacent nodes
to the current node.
4.2. If a next node hasn't been found, then the algorithm will end and indicate that no
Hamiltonian paths were found.
4.3. If a next node has been found, then this will represent the current node. It will be marked
as visited, and the number of visited nodes will be incremented.
If the number of visited nodes is equal to the number of nodes from the graph, then a Hamiltonian
path has been found. Either way, a message will be displayed based on the outcome of the algorithm.
The implementation / tests are available on GitHub: https://github.com/george-cionca/hamiltonian-path
My main questions are:
Is there an undirected graph which would cause this algorithm to not generate the correct solution?
On the repository's page, I included a few more details and stated that this implementation provides a solution in quadratic time (i.e. O(n2)). Is there any aspect that I haven't taken into account for the time complexity?

The algorithm is not guaranteed to find the correct answer
As I understand it, your algorithm is a heuristic greedy algorithm. That is, the path starts at the vertex with the lowest degree, and the path continues toward the unvisited vertex with the lowest degree (or the one with the fewest edges to unvisited nodes).
This fails if the vertex with the lowest degree is not the correct vertex.
Consider, for example, a graph with a single vertex v1 that connects, through two edges, two large complete graphs. We then have vertex v1 that connects to, say, v2 and v7, and we have vertices {v2, v3, v4, v5, v6} and {v7, v8, v9, v10, v11}, with both sets fully connected.
A Hamiltonian path certainly exists, as we can cover one cluster, move to the other and clear that one. However, your algorithm will start at v1 and be unable to find the path.
A note on solving famous problems
It will not have escaped your notice that the hamiltonian path problem is NP-complete. As you present a polynomial-time algorithm to solve the problem, correctness would mean you would have proven P=NP. This is highly unlikely. When it seems like you have proven something famously unsolved and widely believed to be false, I recommend somewhat lowering your expectations, and looking for a mistake you might have made as opposed to looking for verification that the algorithm works. In this case, you might have looked at the implicit assumptions of the algorithm (such as the lowest degree vertex being a valid starting point) and tried to think of a counterexample for this intuition.

Related

Algorithm to visit every node in a directed cyclic graph

As the title says, I have a graph that contains cycles and is directed. It's strongly connected so there's no danger of getting "stuck". Given a start node, I want to find the a path (ideally the shortest but that's not the thing I'm optimising for) that visits every node.
It's worth saying that many of the nodes in this graph are frequently connected both ways - i.e. it's almost undirected. I'm wondering if there's a modified DFS that might work well for this particular use case?
If not, should I be looking at the Held-Karp algortihm? The visit once and return to starting point restrictions don't apply for me.
The easiest approach would probably be to choose a root arbitrarily and compute a BFS tree on G (i.e., paths from the root to each other vertex) and a BFS tree on the transpose of G (i.e., paths from each other vertex to the root). Then for each other vertex you can navigate to and from the root by alternating tree paths. There are various quick optimizations to this method.
Another possibility would be to use A* on the search space consisting of states current node × set of visited nodes, with heuristic equal to the number of nodes not visited yet. The worst-case running time is comparable to Held–Karp (which you could also apply after running Floyd–Warshall to form a complete unsymmetric distance matrix).

Algorithm: Minimal path alternating colors

Let G be a directed weighted graph with nodes colored black or white, and all weights non-negative. No other information is specified--no start or terminal vertex.
I need to find a path (not necessarily simple) of minimal weight which alternates colors at least n times. My first thought is to run Kosaraju's algorithm to get the component graph, then find a minimal path between the components. Then you could select nodes with in-degree equal to zero since those will have at least as many color alternations as paths which start at components with in-degree positive. However, that also means that you may have an unnecessarily long path.
I've thought about maybe trying to modify the graph somehow, by perhaps making copies of the graph that black-to-white edges or white-to-black edges point into, or copying or deleting edges, but nothing that I'm brain-storming seems to work.
The comments mention using Dijkstra's algorithm, and in fact there is a way to make this work. If we create an new "root" vertex in the graph, and connect every other vertex to it with a directed edge, we can run a modified Dijkstra's algorithm from the root outwards, terminating when a given path's inversions exceeds n. It is important to note that we must allow revisiting each vertex in the implementation, so the key of each vertex in our priority queue will not be merely node_id, but a tuple (node_id, inversion_count), representing that vertex on its ith visit. In doing so, we implicitly make n copies of each vertex, one per potential visit. Visually, we are effectively making n copies of our graph, and translating the edges between each (black_vertex, white_vertex) pair to connect between the i and i+1th inversion graphs. We run the algorithm until we reach a path with n inversions. Alternatively, we can connect each vertex on the nth inversion graph to a "sink" vertex, and run any conventional path finding algorithm on this graph, unmodified. This will run in O(n(E + Vlog(nV))) time. You could optimize this quite heavily, and also consider using A* instead, with the smallest_inversion_weight * (n - inversion_count) as a heuristic.
Furthermore, another idea hit me regarding using knowledge of the inversion requirement to speedup the search, but I was unable to find a way to implement it without exceeding O(V^2) time. The idea is that you can use an addition-chain (like binary exponentiation) to decompose the shortest n-inversion path into two smaller paths, and rinse and repeat in a divide and conquer fashion. The issue is you would need to construct tables for the shortest i-inversion path from any two vertices, which would be O(V^2) entries per i, and O(V^2logn) overall. To construct each table, for every entry in the preceding table you'd need to append V other paths, so it'd be O(V^3logn) time overall. Maybe someone else will see a way to merge these two ideas into a O((logn)(E + Vlog(Vlogn))) time algorithm or something.

Effect on shortest paths after edges have been deleted

An input of directed graph has been provided and I have found shortest paths to a particular node 'T' using both - asynchronous and synchronous Bellman-Ford algorithm.
I was trying to find out the effect on the shortest paths after some edges are deleted.
In my approach, I tried to mark the distances at start nodes of the deleted edges as infinity and was trying to apply asynchronous Bellman-Ford, but I get stuck at the point because other nodes will not update their value as they already have the shortest path minimum value.
Can anyone help me to figure out a way to find the new shortest paths without having to run the full algorithm again on the new graph?
You can not. And a simple explanation can be found in Bellman-Ford algorithm itself:
If V is the set of nodes. A minimal path from starting node to any other node will pass maximum |V| nodes ( |V|-1 edges). This is the reason why you relax the edges for |V|-1 time, so that the 'information' from all nodes will propagate to the source.
Is you already have applied Bellman-Ford algorithm on a graph, you can start relaxing all the deleted node's neighbors and propagate the changes to their neighbors until a path that wasn't using the deleted node (until no updates are being made). Aware of negative cycle.

How to detect if the given graph has a cycle containing all of its nodes? Does the suggested algorithm have any flaws?

I have a connected, non-directed, graph with N nodes and 2N-3 edges. You can consider the graph as it is built onto an existing initial graph, which has 3 nodes and 3 edges. Every node added onto the graph and has 2 connections with the existing nodes in the graph. When all nodes are added to the graph (N-3 nodes added in total), the final graph is constructed.
Originally I'm asked, what is the maximum number of nodes in this graph that can be visited exactly once (except for the initial node), i.e., what is the maximum number of nodes contained in the largest Hamiltonian path of the given graph? (Okay, saying largest Hamiltonian path is not a valid phrase, but considering the question's nature, I need to find a max. number of nodes that are visited once and the trip ends at the initial node. I thought it can be considered as a sub-graph which is Hamiltonian, and consists max. number of nodes, thus largest possible Hamiltonian path).
Since i'm not asked to find a path, I should check if a hamiltonian path exists for given number of nodes first. I know that planar graphs and cycle graphs (Cn) are hamiltonian graphs (I also know Ore's theorem for Hamiltonian graphs, but the graph I will be working on will not be a dense graph with a great probability, thus making Ore's theorem pretty much useless in my case). Therefore I need to find an algorithm for checking if the graph is cycle graph, i.e. does there exist a cycle which contains all the nodes of the given graph.
Since DFS is used for detecting cycles, I thought some minor manipulation to the DFS can help me detect what I am looking for, as in keeping track of explored nodes, and finally checking if the last node visited has a connection to the initial node. Unfortunately
I could not succeed with that approach.
Another approach I tried was excluding a node, and then try to reach to its adjacent node starting from its other adjacent node. That algorithm may not give correct results according to the chosen adjacent nodes.
I'm pretty much stuck here. Can you help me think of another algorithm to tell me if the graph is a cycle graph?
Edit
I realized by the help of the comment (thank you for it n.m.):
A cycle graph consists of a single cycle and has N edges and N vertices. If there exist a cycle which contains all the nodes of the given graph, that's a Hamiltonian cycle. – n.m.
that I am actually searching for a Hamiltonian path, which I did not intend to do so:)
On a second thought, I think checking the Hamiltonian property of the graph while building it will be more efficient, which is I'm also looking for: time efficiency.
After some thinking, I thought whatever the number of nodes will be, the graph seems to be Hamiltonian due to node addition criteria. The problem is I can't be sure and I can't prove it. Does adding nodes in that fashion, i.e. adding new nodes with 2 edges which connect the added node to the existing nodes, alter the Hamiltonian property of the graph? If it doesn't alter the Hamiltonian property, how so? If it does alter, again, how so? Thanks.
EDIT #2
I, again, realized that building the graph the way I described might alter the Hamiltonian property. Consider an input given as follows:
1 3
2 3
1 5
1 3
these input says that 4th node is connected to node 1 and node 3, 5th to node 2 and node 3 . . .
4th and 7th node are connected to the same nodes, thus lowering the maximum number of nodes that can be visited exactly once, by 1. If i detect these collisions (NOT including an input such as 3 3, which is an example that you suggested since the problem states that the newly added edges are connected to 2 other nodes) and lower the maximum number of nodes, starting from N, I believe I can get the right result.
See, I do not choose the connections, they are given to me and I have to find the max. number of nodes.
I think counting the same connections while building the graph and subtracting the number of same connections from N will give the right result? Can you confirm this or is there a flaw with this algorithm?
What we have in this problem is a connected, non-directed graph with N nodes and 2N-3 edges. Consider the graph given below,
A
/ \
B _ C
( )
D
The Graph does not have a Hamiltonian Cycle. But the Graph is constructed conforming to your rules of adding nodes. So searching for a Hamiltonian Cycle may not give you the solution. More over even if it is possible Hamiltonian Cycle detection is an NP-Complete problem with O(2N) complexity. So the approach may not be ideal.
What I suggest is to use a modified version of Floyd's Cycle Finding algorithm (Also called the Tortoise and Hare Algorithm).
The modified algorithm is,
1. Initialize a List CYC_LIST to ∅.
2. Add the root node to the list CYC_LIST and set it as unvisited.
3. Call the function Floyd() twice with the unvisited node in the list CYC_LIST for each of the two edges. Mark the node as visited.
4. Add all the previously unvisited vertices traversed by the Tortoise pointer to the list CYC_LIST.
5. Repeat steps 3 and 4 until no more unvisited nodes remains in the list.
6. If the list CYC_LIST contains N nodes, then the Graph contains a Cycle involving all the nodes.
The algorithm calls Floyd's Cycle Finding Algorithm a maximum of 2N times. Floyd's Cycle Finding algorithm takes a linear time ( O(N) ). So the complexity of the modied algorithm is O(N2) which is much better than the exponential time taken by the Hamiltonian Cycle based approach.
One possible problem with this approach is that it will detect closed paths along with cycles unless stricter checking criteria are implemented.
Reply to Edit #2
Consider the Graph given below,
A------------\
/ \ \
B _ C \
|\ /| \
| D | F
\ / /
\ / /
E------------/
According to your algorithm this graph does not have a cycle containing all the nodes.
But there is a cycle in this graph containing all the nodes.
A-B-D-C-E-F-A
So I think there is some flaw with your approach. But suppose if your algorithm is correct, it is far better than my approach. Since mine takes O(n2) time, where as yours takes just O(n).
To add some clarification to this thread: finding a Hamiltonian Cycle is NP-complete, which implies that finding a longest cycle is also NP-complete because if we can find a longest cycle in any graph, we can find the Hamiltonian cycle of the subgraph induced by the vertices that lie on that cycle. (See also for example this paper regarding the longest cycle problem)
We can't use Dirac's criterion here: Dirac only tells us minimum degree >= n/2 -> Hamiltonian Cycle, that is the implication in the opposite direction of what we would need. The other way around is definitely wrong: take a cycle over n vertices, every vertex in it has exactly degree 2, no matter the size of the circle, but it has (is) an HC. What you can tell from Dirac is that no Hamiltonian Cycle -> minimum degree < n/2, which is of no use here since we don't know whether our graph has an HC or not, so we can't use the implication (nevertheless every graph we construct according to what OP described will have a vertex of degree 2, namely the last vertex added to the graph, so for arbitrary n, we have minimum degree 2).
The problem is that you can construct both graphs of arbitrary size that have an HC and graphs of arbitrary size that do not have an HC. For the first part: if the original triangle is A,B,C and the vertices added are numbered 1 to k, then connect the 1st added vertex to A and C and the k+1-th vertex to A and the k-th vertex for all k >= 1. The cycle is A,B,C,1,2,...,k,A. For the second part, connect both vertices 1 and 2 to A and B; that graph does not have an HC.
What is also important to note is that the property of having an HC can change from one vertex to the other during construction. You can both create and destroy the HC property when you add a vertex, so you would have to check for it every time you add a vertex. A simple example: take the graph after the 1st vertex was added, and add a second vertex along with edges to the same two vertices of the triangle that the 1st vertex was connected to. This constructs from a graph with an HC a graph without an HC. The other way around: add now a 3rd vertex and connect it to 1 and 2; this builds from a graph without an HC a graph with an HC.
Storing the last known HC during construction doesn't really help you because it may change completely. You could have an HC after the 20th vertex was added, then not have one for k in [21,2000], and have one again for the 2001st vertex added. Most likely the HC you had on 23 vertices will not help you a lot.
If you want to figure out how to solve this problem efficiently, you'll have to find criteria that work for all your graphs that can be checked for efficiently. Otherwise, your problem doesn't appear to me to be simpler than the Hamiltonian Cycle problem is in the general case, so you might be able to adjust one of the algorithms used for that problem to your variant of it.
Below I have added three extra nodes (3,4,5) in the original graph and it does seem like I can keep adding new nodes indefinitely while keeping the property of Hamiltonian cycle. For the below graph the cycle would be 0-1-3-5-4-2-0
1---3---5
/ \ / \ /
0---2---4
As there were no extra restrictions about how you can add a new node with two edges, I think by construction you can have a graph that holds the property of hamiltonian cycle.

Is it possible to develop an algorithm to solve a graph isomorphism?

Or will I need to develop an algorithm for every unique graph? The user is given a type of graph, and they are then supposed to use the interface to add nodes and edges to an initial graph. Then they submit the graph and the algorithm is supposed to confirm whether the user's graph matches the given graph.
The algorithm needs to confirm not only the neighbours of each node, but also that each node and each edge has the correct value. The initial graphs will always have a root node, which is where the algorithm can start from.
I am wondering if I can develop the logic for such an algorithm in the general sense, or will I need to actually code a unique algorithm for each unique graph. It isn't a big deal if it's the latter case, since I only have about 20 unique graphs.
Thanks. I hope I was clear.
Graph isomorphism problem might not be hard. But it's very hard to prove this problem is not hard.
There are three possibilities for this problem.
1. Graph isomorphism problem is NP-hard.
2. Graph isomorphism problem has a polynomial time solution.
3. Graph isomorphism problem is neither NP-hard or P.
If two graphs are isomorphic, then there exist a permutation for this isomorphism. Take this permutation as a certificate, we could prove this two graphs are isomorphic to each other in polynomial time. Thus, graph isomorphism lies in the territory of NP set. However, it has been more than 30 years that no one could prove whether this problem is NP-hard or P. Thus, this problem is intrinsically hard despite its simple problem description.
If I understand the question properly, you can have ONE single algorithm, which will work by accepting one of several reference graphs as its input (in addition to the input of the unknown graph which isomorphism with the reference graph is to be asserted).
It appears that you seek to assert whether a given graph is exactly identical to another graph rather than asserting if the graphs are isomorph relative to a particular set of operations or characteristics. This implies that the algorithm be supplied some specific reference graph, rather than working off some set of "abstract" rules such as whether neither graphs have loops, or both graphs are fully connected etc. even though the graphs may differ in some other fashion.
Edit, following confirmation that:
Yeah, the algorithm would be supplied a reference graph (which is the answer), and will then check the user's graph to see if it is isomorphic (including the values of edges and nodes) to the reference
In that case, yes, it is quite possible to develop a relatively simple algorithm which would assert isomorphism of these two graphs. Note that the considerations mentioned in other remarks and answers and relative to the fact that the problem may be NP-Hard are merely indicative that a simple algorithm [or any algorithm for that matter] may not be sufficient to solve the problem in a reasonable amount of time for graphs which size and complexity are too big. However, assuming relatively small graphs and taking advantage (!) of the requirement that the weights of edges and nodes also need to match, the following algorithm should generally be applicable.
General idea:
For each sub-graph that is disconnected from the rest of the graph, identify one (or possibly several) node(s) in the user graph which must match a particular node of the reference graph. By following the paths from this node [in an orderly fashion, more on this below], assert the identity of other nodes and/or determine that there are some nodes which cannot be matched (and hence that the two structures are not isomorphic).
Rough pseudo code:
1. For both the reference and the user supplied graph, make the the list of their Connected Components i.e. the list of sub-graphs therein which are disconnected from the rest of the graph. Finding these connected components is done by following either a breadth-first or a depth-first path from starting at a given node and "marking" all nodes on that path with an arbitrary [typically incremental] element ID number. Once a given path has been fully visited, repeat the operation from any other non-marked node, and do so until there are no more non-marked nodes.
2. Build a "database" of the characteristics of each graph.
This will be useful to identify matching candidates and also to determine, early on, instances of non-isomorphism.
Each "database" would have two kinds of "records" : node and edge, with the following fields, respectively:
- node_id, Connected_element_Id, node weight, number of outgoing edges, number of incoming edges, sum of outgoing edges weights, sum of incoming edges weight.
node
- edge_id, Connected_element_Id, edge weight, node_id_of_start, node_id_of_end, weight_of_start_node, weight_of_end_node
3. Build a database of the Connected elements of each graph
Each record should have the following fields: Connected_element_id, number of nodes, number of edges, sum of node weights, sum of edge weights.
4. [optionally] Dispatch the easy cases of non-isomorphism:
4.a mismatch of the number of connected elements
4.b mismatch of of number of connected elements, grouped-by all fields but the id (number of nodes, number of edges, sum of nodes weights, sum of edges weights)
5. For each connected element in the reference graph
5.1 Identify candidates for the matching connected element in the user-supplied graph. The candidates must have the same connected element characteristics (number of nodes, number of edges, sum of nodes weights, sum of edges weights) and contain the same list of nodes and edges, again, counted by grouping by all characteristics but the id.
5.2 For each candidate, finalize its confirmation as an isomorph graph relative to the corresponding connected element in the reference graph. This is done by starting at a candidate node-match, i.e. a node, hopefully unique which has the exact same characteristics on both graphs. In case there is not such a node, one needs to disqualify each possible candidate until isomorphism can be confirmed (or all candidates are exhausted). For the candidate node match, walk the graph, in, say, breadth first, and by finding matches for the other nodes, on the basis of the direction and weight of the edges and weight of the nodes.
The main tricks with this algorithm is are to keep proper accounting of the candidates (whether candidate connected element at higher level or candidate node, at lower level), and to also remember and mark other identified items as such (and un-mark them if somehow the hypothetical candidate eventually proves to not be feasible.)
I realize the above falls short of a formal algorithm description, but that should give you an idea of what is required and possibly a starting point, would you decide to implement it.
You can remark that the requirement of matching nodes and edges weights may appear to be an added difficulty for asserting isomorphism, effectively simplify the algorithm because the underlying node/edge characteristics render these more unique and hence make it more likely that the algorithm will a) find unique node candidates and b) either quickly find other candidates on the path and/or quickly assert non-isomorphism.

Resources