calculate the degree of separation between two nodes in a connected graph - algorithm

I am working on a graph library that requires to determine whether two nodes are connected or not and if connected what is the degree of separation between them
i.e number of nodes needed to travel to reach the target node from the source node.
Since its an non-weighted graph, a bfs gives the shortest path. But how to keep the track of number of nodes discovered before reaching the target node.
A simple counter which increments on discovering a new node will give a wrong answer as it may include nodes which are not even in the path.
Another way would be to treat this as a weighted graph of uniform weighted edges and using Djkastra's shortest path algorithm.
But I want to manage it with bfs only.
How to do it ?

During the BFS, have each node store a pointer to its predecessor node (the node in the graph along whose edge the node was first discovered). Then, once you've run BFS, you can repeatedly follow this pointer from the destination node to the source node. If you count up how many steps this takes, you will have the distance from the destination to the source node.
Alternatively, if you need to repeatedly determine the distances between nodes, you might want to use the Floyd-Warshall all-pairs shortest paths algorithm, which if precomputed would let you immediately read off the distances between any pair of nodes.
Hope this helps!

I don't see why a simple counter wouldn't work. In this case, breadth-first search would definitely give you the shortest path. So what you want to do is attach a property to every node called 'count'. Now when you encounter a node that you have not visited yet, you populate the 'count' property with whatever the current count is and move on.
If later on, you come back to the node, you should know by the populated count property that it has already been visited.
EDIT: To expand a bit on my answer here, you'll have to maintain a variable that'll track the degree of separation from your starting node as you navigate the graph. For every new set of children that you load into the queue, make sure that you increment the value in that variable.

If all you want to know is the distance (possibly to cut off the search if the distance is too large), and all edges have the same weight (i.e. 1):
Let Token := a new node object which is different from every node in the graph.
Let Distance := 0
Let Queue := an empty queue of nodes
Push Start node and Token onto Queue
While Queue is not empty:
If the head of Queue is Target node:
return Distance
If the head of Queue is Token:
Increment Distance
Push Token onto back of the Queue
If the head of Queue has not yet been seen:
Mark the head of the Queue as seen
Push all neighbours of the head of the Queue onto the back of Queue
Pop the head of Queue
(Did not find target)


DAG - Algorithm to ensure there is a single source and a single sink

I have to ensure that a graph in our application is a DAG with a unique source and a unique sink.
Specifically I have to ensure that for a given start node and end node (both of which are known at the outset) that every node in the graph lies on a path from the start node to the end node.
I already have an implementation of Tarjan's algorithm which I use to identify cycles, and a a topological sorting algorithm that I can run once Tarjan's algorithm reports the graph is a DAG.
What is the most efficient way to make sure the graph meets this criterion?
If your graph is represented by an adjacency matrix, then node x is a source node if the xth column of the matrix is 0 and is a sink node if row x of the matrix is 0. You could run two quick passes over the matrix to count the number of rows and columns that are 0s to determine how many sources and sinks exist and what they are. This takes time O(n2) and is probably the fastest way to check this.
If your graph is represented by an adjacency list, you can find all sink nodes in time O(n) by checking whether any node has no outgoing edges. You can find all sinks by maintaining for each node a boolean value indicating whether it has any incoming edges, which is initially false. You can then iterate across all the edges in the list in time O(n + m) marking all nodes with incoming edges. The nodes that weren't marked as having incoming edges are then sources. This process takes time O(m + n) and has such little overhead that it's probably one of the fastest approaches.
Hope this helps!
A simple breadth or depth-first search should satisfy this. Firstly you can keep a set of nodes that comprise sink nodes that you've seen. Secondly, you can keep a set of nodes that you've discovered using BFS/DFS. Then the graph will then be connected iff there is a single connected component. Assuming you're using some kind of adjacency list style representation for your graph, the algorithm would look something like this:
create an empty queue
create an empty set to store seen vertices
create an empty set for sink nodes
add source node to queue
while queue is not empty
get next vertex from queue, add vertex to seen vertices
if num of adjacent nodes == 0
add sink nodes to sink node set
for each node in adjacent nodes
if node is not in seen vertices
add node to queue
return size of sink nodes == 1 && size of seen vertices == total number in graph
This will be linear in the number of vertices and edges in the graph.
Note that as long as you know a source vertex to start from, this will also ensure the property of a single source: any other vertex that is a source won't be discovered by the BFS/DFS, and thus the size of the seen vertices won't be the total number in the graph.
If your algorithm takes as input a DAG, which is weakly connected, assume that there are only one node s whose in-degree is zero and only one node t whose out-degree is zero, while all other nodes have positive in-degree and out-degree, then s can reach all other nodes, and all other nodes can reach t. By contradiction, assume that there is a node v that s cannot reach. Since no nodes can reach s, that is, v cannot reach s too. As such, v and s are disconnected, which contradicts the assumption. On the other hand, if the DAG is not weakly connected, it definitely does not satisfy the requirements that you want. In sum, you can first compute the weakly connected component of the DAG simply using BFS/DFS, meanwhile remembering the number of nodes whose in-degree or out-degree is zero. The complexity of this algorithm is O(|E|).
First, do a DFS on the graph, starting from the source node, which, as you say, is known in advance. If you encounter a back edge[1], then you have a cycle and you can exit with an error. During this traversal you can identify if there are nodes, not reachable from the source[2] and take the appropriate action.
Once you have determined the graph is a DAG, you can ensure that every node lies on a path from the source to the sink by another DFS, starting from the source, as follows:
bool have_path(source, sink) {
if source == sink {
source.flag = true
return true
// traverse all successor nodes of `source`
for dst in succ(source) {
if not dst.flag and not have_path(dst, sink)
return false // exit as soon as we find a node with no path to `sink`
source.flag = true;
return true
The procedure have_path sets a boolean flag in each node, for which there exists some path from that node to the sink. In the same time the procedure traverses only nodes reachable from the source and it traverses all nodes reachable from the source. If the procedure returns true, then all the nodes, reachable from the source lie on a path to the sink. Unreachable nodes were already handled in the first phase.
[1] an edge, linking a node with a greater DFS number to an node with a lesser DFS number, that is not already completely processed, i.e. is still on the DFS stack
[2] e.g. they would not have an assigned DFS number

How can I do this graph traversal?

I have a Directed Cyclic graph consisting of node a, b, c, d, e,f g, where ever node is connected to every other node. The edges may be unidirectional or bidirectional. I need to printout a valid order like this for eg. f->a->c->b->e->d->g such that I can reach the end node from the start node. Note that all the nodes must be present in the output list.
Also note that there may be cycles in the graph.
What I came up with:
Basically first we can try to find a start node. If there is a node such that there is no incoming edge to it (there could be atmost one such node). I may find a start node or may not. Also I will do some preprocessing to find the total number of nodes(lets call it n). Now I will start a DFS from the start node marking nodes as visited when I reach them and counting how many nodes I visited. If I can reach n nodes by this method. I am done. If I hit a node, from which there are no outgoing edges to any unvisited node, I have hit a dead end, and I will just mark that node as unvisited again, reduce the pointer and go to its previous node to try a different route.
This was the case when I find a start node. If I dont find a start node, I will just have to try this with various nodes.
I have no idea if I am even close to the solution. Can anyone help me in this regard?
In my opinion, if there is no incoming edge to a node, it means that node is a start node. You can traverse the graph using this start node. And if this start node can not visit all the n nodes, then there is no solution (as you said that all the nodes must be present in the output list.). This is because if you start with some other nodes, you won't be able to reach this start node.
The problem with your solution is that if you enter a loop you don't know if and when to exit.
A DFS search in these conditions can easily became a non polynomial task!
Let me introduce a polynomial algorithm for your problem.
It looks complicated I hope there's room for simplifications.
Here my suggested solution
1) For each node construct the table of the nodes it can reach (if a can reach b and c; b can reach d; c can reach e; a can reach b,c,d,e even tough there is not a single pathfrom a passing through all of them).
If no node can reach all the other ones you're done: there is no the path you're looking for.
2) Find loops. That's easy: if a node can reach itself, there is a loop. This should be part of the construction of the table at the previous point.
Once you have find one loop you can shrink it (and its nodes) to the representative node whose ingoing (outgoing) connections are the union of the ingoing (outgoing) connections of the nodes in the loop.
You keep reducing loops until you cannot do any more.
3) At this point you are left with an acyclic graph, If there is a path connecting all nodes, there is a single node connected to all and starting from it you can perform depth first search.
Write down the path by replacing the traversal of representative nodes with a loop from the entry point of the loop to the exit point.

Breadth first search with fixed stopping distance

I'm looking at a graph problem where I'm given a source node and need to find all other nodes up to a fixed distance away, where each edge between nodes has a uniform cost. So I've implemented a breadth first search using the standard FIFO queue technique, but stopping the BFS at a fixed distance is causing me problems.
If I were using DFS, I could pass in the current depth with each recursive call, but I can't do that here. I cannot modify the nodes of the graph to keep an extra parameter (distance) either. Any suggestions or references?
Just use two queues and bounce back and forth between them. Every time you switch from one to the other, increment your depth count by one.
To elaborate...
Maintain an "active" queue and an "inactive" queue.
Pop a node from the active queue. Add its neighbors to the inactive queue. Repeat until the active queue is empty. Then swap the queues.
This maintains the invariant that if the distance to all nodes in the active queue is d, the distance to all nodes in the inactive queue is d+1. Easy enough to keep track of and stop when you want.
You can pass the depth to the value you put in the queue. You can also keep a separate array to store the depth you reached each node.
Encapsulate the vertices you pass together with their distance from the source of the BFS.
Another possibility would be to just mark the vertices in the queue; usually frameworks for graphs allow you to assign weights to elements of the graph, which is a mechanism you could use for your purpose.
One last possibility would be to insert a marking vertex that isn't actually in the graph into the queue after the frontier of one level of the BFS has been completely processed so you know when a new level of BFS depth starts. That would make your queue look something like v u w x y MARKER s t j l k with all of these being vertices of the graph, except for MARKER.

Dijkstras Algorithm - Parent Nodes?

My professor wants us to implement it for a single source node to all other nodes in the network. He said to keep track of the shortest path by using parent nodes, but I have no idea what this means in the context of the algorithm.
I can implement my code more or less properly, in the sense that my output distances are all correct for any network I run it on.
But most online resources talk about visiting nodes and marking them as visited once you explore all of the neighboring nodes. So for instance, if nodes A and B neighbor node C, and the new distance to A is smaller than that of B, do I mark node C visited? And then what happens if I get to node A and realize that the path it leads me down would actually cause an already recorded distance to actually be larger?
In order to get a path (as opposed to just a cost) from Dystra's algo, instead of saving a best-cost for each node, save the pair (best_cost, from_where). The from-where is a handle to the adjacent node that produced the best_cost.
You can then follow the from_where pointers all the way back to the origin to get the best path. I suspect "parent" is his name for the from_where element in the 2-tuple/pair.
My professor wants us to implement it for a single source node to all other nodes in the network. He said to keep track of the shortest path by using parent nodes, but I have no idea what this means in the context of the algorithm.
Well, that just mean that for each node, you store which node is the node it came from in the shortest path to it. This way, you can walk the shortest path in reverse order once you're done with your algorithm to not only find the distance of the shortest path, but also the shortest path itself.
But most online resources talk about visiting nodes and marking them as visited once you explore all of the neighboring nodes.
You mark a node visited after it was the unvisited node with the lowest distance. Unless there are negative distances, you won't be able to find a path that has a lower distance (and even then, it's only a problem if your graph has a cycle with distance below zero).

Most efficient way to visit nodes of a DAG in order

I have a large (100,000+ nodes) Directed Acyclic Graph (DAG) and would like to run a "visitor" type function on each node in order, where order is defined by the arrows in the graph. i.e. all parents of a node are guaranteed to be visited before the node itself.
If two nodes do not refer to each other directly or indirectly, then I don't care which order they are visited in.
What's the most efficient algorithm to do this?
You would have to perform a topological sort on the nodes, and visit the nodes in the resulting order.
The complexity of such algorithm is O(|V|+|E|) which is quite good. You want to traverse all nodes, so if you would want a faster algorithm than that, you would have to solve it without even looking at all edges, which would be dangerous, because one single edge could havoc the order completely.
There are some answers here:
Good graph traversal algorithm
and here:
In general, after visiting a node, you should visit its related nodes, but only the nodes that are not already visited. In order to keep track of the visited nodes, you need to keep the IDs of the nodes in a set (or map), or you can mark the node as visited (somehow).
If you care about the topological order, you must first get hold of a collection of all the un-traversed links ("remaining links") to a node, sorted by the id of the referenced node (typically: map(node-ID -> link-count)). If you haven't got that, you might need to build it using an approach similar to the one above. Then, start by visiting a node whose remaining incoming link count is zero. For each link from that node, reduce the remaining link count for each related node, adding the related node to the set of nodes-to-visit (or just visiting the node) if the count reaches zero.
As mentioned in the other answers, this problem can be solved by Topological Sorting.
A very simple algorithm for that (not the most efficient):
Keep an array (or map) indegree[] where indegree[node]=number of incoming edges of node
while there is at least one node n with indegree[n]=0:
for each node n in nodes where indegree[n]>0:
indegree[n]=-1 # mark n as visited
for each node x adjacent to n:
indegree[x]=indegree[x]-1 # its parent has been visited, so one less edge coming into it
You can traverse a DAG in O(N) (without any topsort) by just running your dfs from every node with zero indegree, because those will be the valid "starting point". This will work because graph has no cycles, those zero indegree nodes must exist, and must traverse the whole graph.
