How can I do this graph traversal? - algorithm

I have a Directed Cyclic graph consisting of node a, b, c, d, e,f g, where ever node is connected to every other node. The edges may be unidirectional or bidirectional. I need to printout a valid order like this for eg. f->a->c->b->e->d->g such that I can reach the end node from the start node. Note that all the nodes must be present in the output list.
Also note that there may be cycles in the graph.
What I came up with:
Basically first we can try to find a start node. If there is a node such that there is no incoming edge to it (there could be atmost one such node). I may find a start node or may not. Also I will do some preprocessing to find the total number of nodes(lets call it n). Now I will start a DFS from the start node marking nodes as visited when I reach them and counting how many nodes I visited. If I can reach n nodes by this method. I am done. If I hit a node, from which there are no outgoing edges to any unvisited node, I have hit a dead end, and I will just mark that node as unvisited again, reduce the pointer and go to its previous node to try a different route.
This was the case when I find a start node. If I dont find a start node, I will just have to try this with various nodes.
I have no idea if I am even close to the solution. Can anyone help me in this regard?

In my opinion, if there is no incoming edge to a node, it means that node is a start node. You can traverse the graph using this start node. And if this start node can not visit all the n nodes, then there is no solution (as you said that all the nodes must be present in the output list.). This is because if you start with some other nodes, you won't be able to reach this start node.

The problem with your solution is that if you enter a loop you don't know if and when to exit.
A DFS search in these conditions can easily became a non polynomial task!
Let me introduce a polynomial algorithm for your problem.
It looks complicated I hope there's room for simplifications.
Here my suggested solution
1) For each node construct the table of the nodes it can reach (if a can reach b and c; b can reach d; c can reach e; a can reach b,c,d,e even tough there is not a single pathfrom a passing through all of them).
If no node can reach all the other ones you're done: there is no the path you're looking for.
2) Find loops. That's easy: if a node can reach itself, there is a loop. This should be part of the construction of the table at the previous point.
Once you have find one loop you can shrink it (and its nodes) to the representative node whose ingoing (outgoing) connections are the union of the ingoing (outgoing) connections of the nodes in the loop.
You keep reducing loops until you cannot do any more.
3) At this point you are left with an acyclic graph, If there is a path connecting all nodes, there is a single node connected to all and starting from it you can perform depth first search.
4)
Write down the path by replacing the traversal of representative nodes with a loop from the entry point of the loop to the exit point.

Related

DFS on each node of the graph

Let's say in my directed graph G = (V, E), I have nodes which have certain numbers assigned to it, n[v]. I want to find the highest n[v] for each vertex, that is max(n[v]) means the maximum n[v] of the node reachable from each vertex in G.
What would we the efficient to solve this?
I am thinking on the lines of DFS on each node without backtracking and comparing the n[v] for all the 'visited' nodes and storing the maximum n[v] value for that path.
However, I am afraid that this might not be the efficient solution.
Why do you think this is inefficient?
If you run DFS on all source nodes (nodes that only out edges and no edges coming in) you are guaranteed to visit all nodes.
You can easily store the max(n[v]) on a node only after you visit all of its children.
And then if you reach a node that already has a max(n[v]) after starting from another source, you know you don't have to continue exploring that nodes children because the value is only assigned once you've visited all the children.
You're doing this in ~|E| steps (unless you have isolated nodes with no in our out edges) since you'll travel each edge exactly once.
I guess if you want to be super accurate you'll be doing this in |E| + |S| steps where |S| is the number of source nodes in your graph.
This seems to be pretty efficient to me. I'm not sure if it's possible to get a deterministic answer without checking every edge at least once.
Edit:
Also, I guess if you want to be extra complete, you may need to run a pre-step of checking all V nodes to determine which are source nodes. AND THEN running DFS on only source nodes.
So overall if would be |V| + |E| + |S| steps which is O(|V| + |E|)

calculate the degree of separation between two nodes in a connected graph

I am working on a graph library that requires to determine whether two nodes are connected or not and if connected what is the degree of separation between them
i.e number of nodes needed to travel to reach the target node from the source node.
Since its an non-weighted graph, a bfs gives the shortest path. But how to keep the track of number of nodes discovered before reaching the target node.
A simple counter which increments on discovering a new node will give a wrong answer as it may include nodes which are not even in the path.
Another way would be to treat this as a weighted graph of uniform weighted edges and using Djkastra's shortest path algorithm.
But I want to manage it with bfs only.
How to do it ?
During the BFS, have each node store a pointer to its predecessor node (the node in the graph along whose edge the node was first discovered). Then, once you've run BFS, you can repeatedly follow this pointer from the destination node to the source node. If you count up how many steps this takes, you will have the distance from the destination to the source node.
Alternatively, if you need to repeatedly determine the distances between nodes, you might want to use the Floyd-Warshall all-pairs shortest paths algorithm, which if precomputed would let you immediately read off the distances between any pair of nodes.
Hope this helps!
I don't see why a simple counter wouldn't work. In this case, breadth-first search would definitely give you the shortest path. So what you want to do is attach a property to every node called 'count'. Now when you encounter a node that you have not visited yet, you populate the 'count' property with whatever the current count is and move on.
If later on, you come back to the node, you should know by the populated count property that it has already been visited.
EDIT: To expand a bit on my answer here, you'll have to maintain a variable that'll track the degree of separation from your starting node as you navigate the graph. For every new set of children that you load into the queue, make sure that you increment the value in that variable.
If all you want to know is the distance (possibly to cut off the search if the distance is too large), and all edges have the same weight (i.e. 1):
Pseudocode:
Let Token := a new node object which is different from every node in the graph.
Let Distance := 0
Let Queue := an empty queue of nodes
Push Start node and Token onto Queue
(Breadth-first-search):
While Queue is not empty:
If the head of Queue is Target node:
return Distance
If the head of Queue is Token:
Increment Distance
Push Token onto back of the Queue
If the head of Queue has not yet been seen:
Mark the head of the Queue as seen
Push all neighbours of the head of the Queue onto the back of Queue
Pop the head of Queue
(Did not find target)

Safe Vertex deletion that won't cause graph disconnection

Here are two excises about safe vertex deletions
5-28. An articulation vertex of a graph G is a vertex whose deletion disconnects G. Let G be a graph with n vertices and m edges. Give a simple O(n + m) algorithm for finding a vertex of G that is not an articulation vertex—i.e. , whose deletion does not disconnect G.
5-29. Following up on the previous problem, give an O(n + m) algorithm that finds a deletion order for the n vertices such that no deletion disconnects the graph. (Hint: think DFS/BFS.)
For 5-28, here is my thought:
I will just do a dfs, but not complete. The very first vertex which finished being processed will be a non-articulation vertex as it must be a leaf, or a leaf with a back edge pointing back to its ancestor (it is also not a articulation vertex).
For 5-29
I am not yet sure how to do it nicely. What comes into my mind is that in the graph, any vertex in a cycle can't deleted safely. Also, if there is no cycle, then deleting vertex backwards up from a dfs tree is also safe.
Could anyone give me some hints or tell me whether my thinking is correct or wrong?
I think your solution to 5-28 is correct, it guarantees to find an node which is not articulation in O(n+m) time.
For 5-29, I think one way to do it is based on your solution to 5-28. While doing dfs, keep tract of when did each node leaves the stack(the time finished being processed). As you said, the one leaves first must be a leaf node so delete it will not disconnect the graph. Then you can delete the node leaves the stack secondly, it must also be a leaf node when we removed the first node. So we can delete the nodes in the reverse order of when they are popped from stack while doing DFS. Doing this only need one pass of DFS, thus running time is O(n+m).
Another simple way is to do it with BFS. For 5.28, deleting any node with maximum depth will not make the graph disconnect. Because each other nodes can be reached by a node with less depth. So for 5.29, we can delete all nodes by their sort depth in descending order. And also, we only need 1 BFS so the running time is O(n+m). I think it's easier for people to understand this approach.
5-29:
Expanding on your idea from 5-28, when you finish processing a vertex, it is a non-articulation vertex, so delete it. Then continue the DFS, and every time you finish processing another vertex, delete it too. Since you deleted the previous vertices that finished processing, every time you finish processing a vertex it is actually the first time you finish processing a vertex (for the graph without the previously deleted ones).
Another method, easier to prove, and a bit less efficient (but still O(V + E)) - Crete a DFS tree from the graph, then do topological sorting, then remove the vertices one by one starting from the last one in the sorted graph and moving back to the first one. At every step you remove the last one, and you know for sure (because it's a topological sorted graph) that it doesn't point to any other node, meaning no edges will be deleted except the edges leading to it. That means all other nodes are still reachable from the first node, and if the graph were bi-directional, then all nodes can reach the first node too, making it connected.
For the first problem, I will just delete the vertex you want to test from the graph and then run a DFS/BFS starting from any other vertex, counting the number of visited vertices. If it's less than (original size - 1), then the tested vertex is an articulation vertex.
Same idea applies to the second problem. You randomly pick a vertex and delete it, which will in general cut the graph into two blocks. If the deleted vertex is not an articulation vertex, then one of the two blocks must be empty. Otherwise, both blocks have some vertices, in which case, all vertices in BOTH blocks have to be listed in front of this vertex in the final "safe-deletion" order, while it is not important to decide which block to be completely removed first. So we can write a little recursive function like this:
vertex[] safe_order_cut (vertex[] v)
if (v.length==0) return empty_vertex_list;
vertex x = randomly_pick(v);
vertex v1[], v2[];
cut_graph(v,x,v1,v2);
return safe_order_cut(v1) + safe_order_cut(v2) + x;
The connectivity problem (and related cut vertex problems) has been extensively studied in graph theory. If you are interested, you can read the wiki pages for more algorithms.

Most efficient way to visit nodes of a DAG in order

I have a large (100,000+ nodes) Directed Acyclic Graph (DAG) and would like to run a "visitor" type function on each node in order, where order is defined by the arrows in the graph. i.e. all parents of a node are guaranteed to be visited before the node itself.
If two nodes do not refer to each other directly or indirectly, then I don't care which order they are visited in.
What's the most efficient algorithm to do this?
You would have to perform a topological sort on the nodes, and visit the nodes in the resulting order.
The complexity of such algorithm is O(|V|+|E|) which is quite good. You want to traverse all nodes, so if you would want a faster algorithm than that, you would have to solve it without even looking at all edges, which would be dangerous, because one single edge could havoc the order completely.
There are some answers here:
Good graph traversal algorithm
and here:
http://en.wikipedia.org/wiki/Topological_sorting
In general, after visiting a node, you should visit its related nodes, but only the nodes that are not already visited. In order to keep track of the visited nodes, you need to keep the IDs of the nodes in a set (or map), or you can mark the node as visited (somehow).
If you care about the topological order, you must first get hold of a collection of all the un-traversed links ("remaining links") to a node, sorted by the id of the referenced node (typically: map(node-ID -> link-count)). If you haven't got that, you might need to build it using an approach similar to the one above. Then, start by visiting a node whose remaining incoming link count is zero. For each link from that node, reduce the remaining link count for each related node, adding the related node to the set of nodes-to-visit (or just visiting the node) if the count reaches zero.
As mentioned in the other answers, this problem can be solved by Topological Sorting.
A very simple algorithm for that (not the most efficient):
Keep an array (or map) indegree[] where indegree[node]=number of incoming edges of node
while there is at least one node n with indegree[n]=0:
for each node n in nodes where indegree[n]>0:
visit(n)
indegree[n]=-1 # mark n as visited
for each node x adjacent to n:
indegree[x]=indegree[x]-1 # its parent has been visited, so one less edge coming into it
You can traverse a DAG in O(N) (without any topsort) by just running your dfs from every node with zero indegree, because those will be the valid "starting point". This will work because graph has no cycles, those zero indegree nodes must exist, and must traverse the whole graph.

Is there a proper algorithm to solve edge-removing problem?

There is a directed graph (not necessarily connected) of which one or more nodes are distinguished as sources. Any node accessible from any one of the sources is considered 'lit'.
Now suppose one of the edges is removed. The problem is to determine the nodes that were previously lit and are not lit anymore.
An analogy like city electricity system may be considered, I presume.
This is a "dynamic graph reachability" problem. The following paper should be useful:
A fully dynamic reachability algorithm for directed graphs with an almost linear update time. Liam Roditty, Uri Zwick. Theory of Computing, 2002.
This gives an algorithm with O(m * sqrt(n))-time updates (amortized) and O(sqrt(n))-time queries on a possibly-cyclic graph (where m is the number of edges and n the number of nodes). If the graph is acyclic, this can be improved to O(m)-time updates (amortized) and O(n/log n)-time queries.
It's always possible you could do better than this given the specific structure of your problem, or by trading space for time.
If instead of just "lit" or "unlit" you would keep a set of nodes from which a node is powered or lit, and consider a node with an empty set as "unlit" and a node with a non-empty set as "lit", then removing an edge would simply involve removing the source node from the target node's set.
EDIT: Forgot this:
And if you remove the last lit-from-node in the set, traverse the edges and remove the node you just "unlit" from their set (and possibly traverse from there too, and so on)
EDIT2 (rephrase for tafa):
Firstly: I misread the original question and thought that it stated that for each node it was already known to be lit or unlit, which as I re-read it now, was not mentioned.
However, if for each node in your network you store a set containing the nodes it was lit through, you can easily traverse the graph from the removed edge and fix up any lit/unlit references.
So for example if we have nodes A,B,C,D like this: (lame attempt at ascii art)
A -> B >- D
\-> C >-/
Then at node A you would store that it was a source (and thus lit by itself), and in both B and C you would store they were lit by A, and in D you would store that it was lit by both A and C.
Then say we remove the edge from B to D: In D we remove B from the lit-source-list, but it remains lit as it is still lit by A. Next say we remove the edge from A to C after that: A is removed from C's set, and thus C is no longer lit. We then go on to traverse the edges that originated at C, and remove C from D's set which is now also unlit. In this case we are done, but if the set was bigger, we'd just go on from D.
This algorithm will only ever visit the nodes that are directly affected by a removal or addition of an edge, and as such (apart from the extra storage needed at each node) should be close to optimal.
Is this your homework?
The simplest solution is to do a DFS (http://en.wikipedia.org/wiki/Depth-first_search) or a BFS (http://en.wikipedia.org/wiki/Breadth-first_search) on the original graph starting from the source nodes. This will get you all the original lit nodes.
Now remove the edge in question. Do again the DFS. You can the nodes which still remain lit.
Output the nodes that appear in the first set but not the second.
This is an asymptotically optimal algorithm, since you do two DFSs (or BFSs) which take O(n + m) times and space (where n = number of nodes, m = number of edges), which dominate the complexity. You need at least o(n + m) time and space to read the input, therefore the algorithm is optimal.
Now if you want to remove several edges, that would be interesting. In this case, we would be talking about dynamic data structures. Is this what you intended?
EDIT: Taking into account the comments:
not connected is not a problem, since nodes in unreachable connected components will not be reached during the search
there is a smart way to do the DFS or BFS from all nodes at once (I will describe BFS). You just have to put them all at the beginning on the stack/queue.
Pseudo code for a BFS which searches for all nodes reachable from any of the starting nodes:
Queue q = [all starting nodes]
while (q not empty)
{
x = q.pop()
forall (y neighbour of x) {
if (y was not visited) {
visited[y] = true
q.push(y)
}
}
}
Replace Queue with a Stack and you get a sort of DFS.
How big and how connected are the graphs? You could store all paths from the source nodes to all other nodes and look for nodes where all paths to that node contain one of the remove edges.
EDIT: Extend this description a bit
Do a DFS from each source node. Keep track of all paths generated to each node (as edges, not vertices, so then we only need to know the edges involved, not their order, and so we can use a bitmap). Keep a count for each node of the number of paths from source to node.
Now iterate over the paths. Remove any path that contains the removed edge(s) and decrement the counter for that node. If a node counter is decremented to zero, it was lit and now isn't.
I would keep the information of connected source nodes on the edges while building the graph.(such as if edge has connectivity to the sources S1 and S2, its source list contains S1 and S2 ) And create the Nodes with the information of input edges and output edges. When an edge is removed, update the output edges of the target node of that edge by considering the input edges of the node. And traverse thru all the target nodes of the updated edges by using DFS or BFS. (In case of a cycle graph, consider marking). While updating the graph, it is also possible to find nodes without any edge that has source connection (lit->unlit nodes). However, it might not be a good solution, if you'd like to remove multiple edges at the same time since that may cause to traverse over same edges again and again.

Resources