My professor wants us to implement it for a single source node to all other nodes in the network. He said to keep track of the shortest path by using parent nodes, but I have no idea what this means in the context of the algorithm.
I can implement my code more or less properly, in the sense that my output distances are all correct for any network I run it on.
But most online resources talk about visiting nodes and marking them as visited once you explore all of the neighboring nodes. So for instance, if nodes A and B neighbor node C, and the new distance to A is smaller than that of B, do I mark node C visited? And then what happens if I get to node A and realize that the path it leads me down would actually cause an already recorded distance to actually be larger?
In order to get a path (as opposed to just a cost) from Dystra's algo, instead of saving a best-cost for each node, save the pair (best_cost, from_where). The from-where is a handle to the adjacent node that produced the best_cost.
You can then follow the from_where pointers all the way back to the origin to get the best path. I suspect "parent" is his name for the from_where element in the 2-tuple/pair.
My professor wants us to implement it for a single source node to all other nodes in the network. He said to keep track of the shortest path by using parent nodes, but I have no idea what this means in the context of the algorithm.
Well, that just mean that for each node, you store which node is the node it came from in the shortest path to it. This way, you can walk the shortest path in reverse order once you're done with your algorithm to not only find the distance of the shortest path, but also the shortest path itself.
But most online resources talk about visiting nodes and marking them as visited once you explore all of the neighboring nodes.
You mark a node visited after it was the unvisited node with the lowest distance. Unless there are negative distances, you won't be able to find a path that has a lower distance (and even then, it's only a problem if your graph has a cycle with distance below zero).
Related
I was thinking recently about a possible solution to find, in polynomial time, whether an undirected graph has a Hamiltonian path or not.
The main concept used as part of this implementation is based on an observation that I noticed while
trying to find (i.e. on paper) a Hamiltonian path for several undirected graphs.
The steps could be defined as follows:
Read the adjacency matrix of the graph.
While the adjacency matrix is being read, a map (i.e. dictionary-based structure) will be created for all the nodes. Also, the starting node of the path will be selected while the adjacency matrix
is being read. These operations can be described as follows:
2.1. The map will store all the nodes from the graph, as a key - value structure.
Each entry in the map will be represented as: (key: node index, value: node class)
The node class will contain the following details about the node: node index, number of incident
edges to it, and a flag to indicate if the current node has already been visited or not.
By taking into consideration that each entry in the map will contain just the value
corresponding to that node, it can be stated that any read access from the map for a given node
index will be constant (i.e. O(1)).
2.2. As part of reading the adjacency matrix and building the map at step 2.1., the starting
node will also be retained. The starting node of the path will be represented by the node which
has the minimum number of edges incident to it.
If multiple nodes exist in the graph with this property, then the node with the lowest index
number will be selected. In this context, we can assume that each node will have an index
associated to it, starting from zero: 0, 1, 2, etc.
The starting node identified at step 2.2. will be marked as visited.
The next operations will be followed for the remaining nodes. The loop will end either when
the number of visited nodes is equal to the number of nodes from the graph, or when there hasn't
been found an unvisited adjacent node for the current node.
Therefore, the next steps will be followed as part of this loop:
4.1. The first operation will be to find the next node to visit.
The next node to be visited will have to respect the following constraints:
To have an edge to the current node
To not have been visited so far
To have the minimum number of edges incident to it, when compared to the other adjacent nodes
to the current node.
4.2. If a next node hasn't been found, then the algorithm will end and indicate that no
Hamiltonian paths were found.
4.3. If a next node has been found, then this will represent the current node. It will be marked
as visited, and the number of visited nodes will be incremented.
If the number of visited nodes is equal to the number of nodes from the graph, then a Hamiltonian
path has been found. Either way, a message will be displayed based on the outcome of the algorithm.
The implementation / tests are available on GitHub: https://github.com/george-cionca/hamiltonian-path
My main questions are:
Is there an undirected graph which would cause this algorithm to not generate the correct solution?
On the repository's page, I included a few more details and stated that this implementation provides a solution in quadratic time (i.e. O(n2)). Is there any aspect that I haven't taken into account for the time complexity?
The algorithm is not guaranteed to find the correct answer
As I understand it, your algorithm is a heuristic greedy algorithm. That is, the path starts at the vertex with the lowest degree, and the path continues toward the unvisited vertex with the lowest degree (or the one with the fewest edges to unvisited nodes).
This fails if the vertex with the lowest degree is not the correct vertex.
Consider, for example, a graph with a single vertex v1 that connects, through two edges, two large complete graphs. We then have vertex v1 that connects to, say, v2 and v7, and we have vertices {v2, v3, v4, v5, v6} and {v7, v8, v9, v10, v11}, with both sets fully connected.
A Hamiltonian path certainly exists, as we can cover one cluster, move to the other and clear that one. However, your algorithm will start at v1 and be unable to find the path.
A note on solving famous problems
It will not have escaped your notice that the hamiltonian path problem is NP-complete. As you present a polynomial-time algorithm to solve the problem, correctness would mean you would have proven P=NP. This is highly unlikely. When it seems like you have proven something famously unsolved and widely believed to be false, I recommend somewhat lowering your expectations, and looking for a mistake you might have made as opposed to looking for verification that the algorithm works. In this case, you might have looked at the implicit assumptions of the algorithm (such as the lowest degree vertex being a valid starting point) and tried to think of a counterexample for this intuition.
As the title says, I have a graph that contains cycles and is directed. It's strongly connected so there's no danger of getting "stuck". Given a start node, I want to find the a path (ideally the shortest but that's not the thing I'm optimising for) that visits every node.
It's worth saying that many of the nodes in this graph are frequently connected both ways - i.e. it's almost undirected. I'm wondering if there's a modified DFS that might work well for this particular use case?
If not, should I be looking at the Held-Karp algortihm? The visit once and return to starting point restrictions don't apply for me.
The easiest approach would probably be to choose a root arbitrarily and compute a BFS tree on G (i.e., paths from the root to each other vertex) and a BFS tree on the transpose of G (i.e., paths from each other vertex to the root). Then for each other vertex you can navigate to and from the root by alternating tree paths. There are various quick optimizations to this method.
Another possibility would be to use A* on the search space consisting of states current node × set of visited nodes, with heuristic equal to the number of nodes not visited yet. The worst-case running time is comparable to Held–Karp (which you could also apply after running Floyd–Warshall to form a complete unsymmetric distance matrix).
In a graph with a bunch of normal nodes and a few special marked nodes, is there a common algorithm to find the closest marked node from a given starting position in the graph?
Or is the best way to do a BFS search to find the marked nodes and then doing Dijkstra's on each of the discovered marked nodes to see which one is the closest?
This depends on the graph, and your definition of "closest".
If you compute "closest" ignoring edge weights, or your graph has no edge weights, a simple breadth-first search (BFS) will suffice. The first node reached vía BFS is, by definition of BFS, the closest (or, if there are several closest nodes, tied for closeness). If you keep track of the number of expanded BFS levels, you can locate all closest nodes by reaching the end of the level instead of stopping as soon as you find the first marked node.
If you have edge weights, and need to use them in your computation, use Dijkstra instead. If the edges can have negative weights, and there happen to be any negative cycles, then you will need to instead use Bellman-Ford.
As mentioned by SaiBot, if the start node is always the same, and you will perform several queries with changing "marked" nodes, there are faster ways to do things. In particular, you can store in each node the "parent" found in a first full traversal, and the node's distance to the start node. When adding a new batch of k marked nodes, you would immediately know the closest to the start by looking at this distance for each marked node.
The fastest way would be to perform Dijkstra right away from your starting position (starting node). When "closeness" is defined as the number of edges that have to be traversed, you can just assign a weight of 1 to each edge. In case precomputation is allowed there will be faster ways to do it.
I just want to get the distance of source node from every node. But it is different than graph problems since it is a tree and path between every node is unique so I expect answer to be in more efficient time.
Is it possible to get answer in efficient time?
You're absolutely right that in a tree, the difficulty of finding a path between two nodes is a lot lower than in a general graph because once you find any path (at least, one without cycles) you know it's the shortest. So all you have to do is just find all paths starting at the given node and going to each other node. You can do this with either a depth-first or a breadth-first search in time O(n). To find the lengths, just keep track of the lengths of the edges you've seen along the paths you've traveled as you travel them.
This is not different from "graph problems": a tree is a special case of a graph. Dijkstra's algorithm is a standard of graph traversal. Just modify it a little: keep all of the path lengths as you find them, and don't worry about the compare-update step, since you're going to keep all of the results. Continue until you run out of nodes to check, and there are your path lengths.
I am working on a graph library that requires to determine whether two nodes are connected or not and if connected what is the degree of separation between them
i.e number of nodes needed to travel to reach the target node from the source node.
Since its an non-weighted graph, a bfs gives the shortest path. But how to keep the track of number of nodes discovered before reaching the target node.
A simple counter which increments on discovering a new node will give a wrong answer as it may include nodes which are not even in the path.
Another way would be to treat this as a weighted graph of uniform weighted edges and using Djkastra's shortest path algorithm.
But I want to manage it with bfs only.
How to do it ?
During the BFS, have each node store a pointer to its predecessor node (the node in the graph along whose edge the node was first discovered). Then, once you've run BFS, you can repeatedly follow this pointer from the destination node to the source node. If you count up how many steps this takes, you will have the distance from the destination to the source node.
Alternatively, if you need to repeatedly determine the distances between nodes, you might want to use the Floyd-Warshall all-pairs shortest paths algorithm, which if precomputed would let you immediately read off the distances between any pair of nodes.
Hope this helps!
I don't see why a simple counter wouldn't work. In this case, breadth-first search would definitely give you the shortest path. So what you want to do is attach a property to every node called 'count'. Now when you encounter a node that you have not visited yet, you populate the 'count' property with whatever the current count is and move on.
If later on, you come back to the node, you should know by the populated count property that it has already been visited.
EDIT: To expand a bit on my answer here, you'll have to maintain a variable that'll track the degree of separation from your starting node as you navigate the graph. For every new set of children that you load into the queue, make sure that you increment the value in that variable.
If all you want to know is the distance (possibly to cut off the search if the distance is too large), and all edges have the same weight (i.e. 1):
Pseudocode:
Let Token := a new node object which is different from every node in the graph.
Let Distance := 0
Let Queue := an empty queue of nodes
Push Start node and Token onto Queue
(Breadth-first-search):
While Queue is not empty:
If the head of Queue is Target node:
return Distance
If the head of Queue is Token:
Increment Distance
Push Token onto back of the Queue
If the head of Queue has not yet been seen:
Mark the head of the Queue as seen
Push all neighbours of the head of the Queue onto the back of Queue
Pop the head of Queue
(Did not find target)