Breadth first search with fixed stopping distance - algorithm

I'm looking at a graph problem where I'm given a source node and need to find all other nodes up to a fixed distance away, where each edge between nodes has a uniform cost. So I've implemented a breadth first search using the standard FIFO queue technique, but stopping the BFS at a fixed distance is causing me problems.
If I were using DFS, I could pass in the current depth with each recursive call, but I can't do that here. I cannot modify the nodes of the graph to keep an extra parameter (distance) either. Any suggestions or references?

Just use two queues and bounce back and forth between them. Every time you switch from one to the other, increment your depth count by one.
To elaborate...
Maintain an "active" queue and an "inactive" queue.
Pop a node from the active queue. Add its neighbors to the inactive queue. Repeat until the active queue is empty. Then swap the queues.
This maintains the invariant that if the distance to all nodes in the active queue is d, the distance to all nodes in the inactive queue is d+1. Easy enough to keep track of and stop when you want.

You can pass the depth to the value you put in the queue. You can also keep a separate array to store the depth you reached each node.

Encapsulate the vertices you pass together with their distance from the source of the BFS.
Another possibility would be to just mark the vertices in the queue; usually frameworks for graphs allow you to assign weights to elements of the graph, which is a mechanism you could use for your purpose.
One last possibility would be to insert a marking vertex that isn't actually in the graph into the queue after the frontier of one level of the BFS has been completely processed so you know when a new level of BFS depth starts. That would make your queue look something like v u w x y MARKER s t j l k with all of these being vertices of the graph, except for MARKER.

Related

Finding the closest marked node in a graph

In a graph with a bunch of normal nodes and a few special marked nodes, is there a common algorithm to find the closest marked node from a given starting position in the graph?
Or is the best way to do a BFS search to find the marked nodes and then doing Dijkstra's on each of the discovered marked nodes to see which one is the closest?
This depends on the graph, and your definition of "closest".
If you compute "closest" ignoring edge weights, or your graph has no edge weights, a simple breadth-first search (BFS) will suffice. The first node reached vía BFS is, by definition of BFS, the closest (or, if there are several closest nodes, tied for closeness). If you keep track of the number of expanded BFS levels, you can locate all closest nodes by reaching the end of the level instead of stopping as soon as you find the first marked node.
If you have edge weights, and need to use them in your computation, use Dijkstra instead. If the edges can have negative weights, and there happen to be any negative cycles, then you will need to instead use Bellman-Ford.
As mentioned by SaiBot, if the start node is always the same, and you will perform several queries with changing "marked" nodes, there are faster ways to do things. In particular, you can store in each node the "parent" found in a first full traversal, and the node's distance to the start node. When adding a new batch of k marked nodes, you would immediately know the closest to the start by looking at this distance for each marked node.
The fastest way would be to perform Dijkstra right away from your starting position (starting node). When "closeness" is defined as the number of edges that have to be traversed, you can just assign a weight of 1 to each edge. In case precomputation is allowed there will be faster ways to do it.

Finding the list of common children (descendants) for any two nodes in a cyclic graph

I have a cyclic directed graph and I was wondering if there is any algorithm (preferably an optimum one) to make a list of common descendants between any two nodes? Something almost opposite of what Lowest Common Ancestor (LCA) does.
As user1990169 suggests, you can compute the set of vertices reachable from each of the starting vertices using DFS and then return the intersection.
If you're planning to do this repeatedly on the same graph, then it might be worthwhile first to compute and contract the strong components to supervertices representing a set of vertices. As a side effect, you can get a topological order on supervertices. This allows a data-parallel algorithm to compute reachability from multiple starting vertices at the same time. Initialize all vertex labels to {}. For each start vertex v, set the label to {v}. Now, sweep all vertices w in topological order, updating the label of w's out-neighbors x by setting it to the union of x's label and w's label. Use bitsets for a compact, efficient representation of the sets. The downside is that we cannot prune as with single reachability computations.
I would recommend using a DFS (depth first search).
For each input node
Create a collection to store reachable nodes
Perform a DFS to find reachable nodes
When a node is reached
If it's already stored stop searching that path // Prevent cycles
Else store it and continue
Find the intersection between all collections of nodes
Note: You could easily use BFS (breadth first search) instead with the same logic if you wanted.
When you implement this keep in mind there will be a few special cases you can look for to further optimize your search such as:
If an input node doesn't have any vertices then there are no common nodes
If one input node (A) reaches another input node (B), then A can reach everything B can. This means the algorithm wouldn't have to be ran on B.
etc.
Why not just reverse the direction of the edge and use LCA?

calculate the degree of separation between two nodes in a connected graph

I am working on a graph library that requires to determine whether two nodes are connected or not and if connected what is the degree of separation between them
i.e number of nodes needed to travel to reach the target node from the source node.
Since its an non-weighted graph, a bfs gives the shortest path. But how to keep the track of number of nodes discovered before reaching the target node.
A simple counter which increments on discovering a new node will give a wrong answer as it may include nodes which are not even in the path.
Another way would be to treat this as a weighted graph of uniform weighted edges and using Djkastra's shortest path algorithm.
But I want to manage it with bfs only.
How to do it ?
During the BFS, have each node store a pointer to its predecessor node (the node in the graph along whose edge the node was first discovered). Then, once you've run BFS, you can repeatedly follow this pointer from the destination node to the source node. If you count up how many steps this takes, you will have the distance from the destination to the source node.
Alternatively, if you need to repeatedly determine the distances between nodes, you might want to use the Floyd-Warshall all-pairs shortest paths algorithm, which if precomputed would let you immediately read off the distances between any pair of nodes.
Hope this helps!
I don't see why a simple counter wouldn't work. In this case, breadth-first search would definitely give you the shortest path. So what you want to do is attach a property to every node called 'count'. Now when you encounter a node that you have not visited yet, you populate the 'count' property with whatever the current count is and move on.
If later on, you come back to the node, you should know by the populated count property that it has already been visited.
EDIT: To expand a bit on my answer here, you'll have to maintain a variable that'll track the degree of separation from your starting node as you navigate the graph. For every new set of children that you load into the queue, make sure that you increment the value in that variable.
If all you want to know is the distance (possibly to cut off the search if the distance is too large), and all edges have the same weight (i.e. 1):
Pseudocode:
Let Token := a new node object which is different from every node in the graph.
Let Distance := 0
Let Queue := an empty queue of nodes
Push Start node and Token onto Queue
(Breadth-first-search):
While Queue is not empty:
If the head of Queue is Target node:
return Distance
If the head of Queue is Token:
Increment Distance
Push Token onto back of the Queue
If the head of Queue has not yet been seen:
Mark the head of the Queue as seen
Push all neighbours of the head of the Queue onto the back of Queue
Pop the head of Queue
(Did not find target)

Safe Vertex deletion that won't cause graph disconnection

Here are two excises about safe vertex deletions
5-28. An articulation vertex of a graph G is a vertex whose deletion disconnects G. Let G be a graph with n vertices and m edges. Give a simple O(n + m) algorithm for finding a vertex of G that is not an articulation vertex—i.e. , whose deletion does not disconnect G.
5-29. Following up on the previous problem, give an O(n + m) algorithm that finds a deletion order for the n vertices such that no deletion disconnects the graph. (Hint: think DFS/BFS.)
For 5-28, here is my thought:
I will just do a dfs, but not complete. The very first vertex which finished being processed will be a non-articulation vertex as it must be a leaf, or a leaf with a back edge pointing back to its ancestor (it is also not a articulation vertex).
For 5-29
I am not yet sure how to do it nicely. What comes into my mind is that in the graph, any vertex in a cycle can't deleted safely. Also, if there is no cycle, then deleting vertex backwards up from a dfs tree is also safe.
Could anyone give me some hints or tell me whether my thinking is correct or wrong?
I think your solution to 5-28 is correct, it guarantees to find an node which is not articulation in O(n+m) time.
For 5-29, I think one way to do it is based on your solution to 5-28. While doing dfs, keep tract of when did each node leaves the stack(the time finished being processed). As you said, the one leaves first must be a leaf node so delete it will not disconnect the graph. Then you can delete the node leaves the stack secondly, it must also be a leaf node when we removed the first node. So we can delete the nodes in the reverse order of when they are popped from stack while doing DFS. Doing this only need one pass of DFS, thus running time is O(n+m).
Another simple way is to do it with BFS. For 5.28, deleting any node with maximum depth will not make the graph disconnect. Because each other nodes can be reached by a node with less depth. So for 5.29, we can delete all nodes by their sort depth in descending order. And also, we only need 1 BFS so the running time is O(n+m). I think it's easier for people to understand this approach.
5-29:
Expanding on your idea from 5-28, when you finish processing a vertex, it is a non-articulation vertex, so delete it. Then continue the DFS, and every time you finish processing another vertex, delete it too. Since you deleted the previous vertices that finished processing, every time you finish processing a vertex it is actually the first time you finish processing a vertex (for the graph without the previously deleted ones).
Another method, easier to prove, and a bit less efficient (but still O(V + E)) - Crete a DFS tree from the graph, then do topological sorting, then remove the vertices one by one starting from the last one in the sorted graph and moving back to the first one. At every step you remove the last one, and you know for sure (because it's a topological sorted graph) that it doesn't point to any other node, meaning no edges will be deleted except the edges leading to it. That means all other nodes are still reachable from the first node, and if the graph were bi-directional, then all nodes can reach the first node too, making it connected.
For the first problem, I will just delete the vertex you want to test from the graph and then run a DFS/BFS starting from any other vertex, counting the number of visited vertices. If it's less than (original size - 1), then the tested vertex is an articulation vertex.
Same idea applies to the second problem. You randomly pick a vertex and delete it, which will in general cut the graph into two blocks. If the deleted vertex is not an articulation vertex, then one of the two blocks must be empty. Otherwise, both blocks have some vertices, in which case, all vertices in BOTH blocks have to be listed in front of this vertex in the final "safe-deletion" order, while it is not important to decide which block to be completely removed first. So we can write a little recursive function like this:
vertex[] safe_order_cut (vertex[] v)
if (v.length==0) return empty_vertex_list;
vertex x = randomly_pick(v);
vertex v1[], v2[];
cut_graph(v,x,v1,v2);
return safe_order_cut(v1) + safe_order_cut(v2) + x;
The connectivity problem (and related cut vertex problems) has been extensively studied in graph theory. If you are interested, you can read the wiki pages for more algorithms.

Is there a proper algorithm to solve edge-removing problem?

There is a directed graph (not necessarily connected) of which one or more nodes are distinguished as sources. Any node accessible from any one of the sources is considered 'lit'.
Now suppose one of the edges is removed. The problem is to determine the nodes that were previously lit and are not lit anymore.
An analogy like city electricity system may be considered, I presume.
This is a "dynamic graph reachability" problem. The following paper should be useful:
A fully dynamic reachability algorithm for directed graphs with an almost linear update time. Liam Roditty, Uri Zwick. Theory of Computing, 2002.
This gives an algorithm with O(m * sqrt(n))-time updates (amortized) and O(sqrt(n))-time queries on a possibly-cyclic graph (where m is the number of edges and n the number of nodes). If the graph is acyclic, this can be improved to O(m)-time updates (amortized) and O(n/log n)-time queries.
It's always possible you could do better than this given the specific structure of your problem, or by trading space for time.
If instead of just "lit" or "unlit" you would keep a set of nodes from which a node is powered or lit, and consider a node with an empty set as "unlit" and a node with a non-empty set as "lit", then removing an edge would simply involve removing the source node from the target node's set.
EDIT: Forgot this:
And if you remove the last lit-from-node in the set, traverse the edges and remove the node you just "unlit" from their set (and possibly traverse from there too, and so on)
EDIT2 (rephrase for tafa):
Firstly: I misread the original question and thought that it stated that for each node it was already known to be lit or unlit, which as I re-read it now, was not mentioned.
However, if for each node in your network you store a set containing the nodes it was lit through, you can easily traverse the graph from the removed edge and fix up any lit/unlit references.
So for example if we have nodes A,B,C,D like this: (lame attempt at ascii art)
A -> B >- D
\-> C >-/
Then at node A you would store that it was a source (and thus lit by itself), and in both B and C you would store they were lit by A, and in D you would store that it was lit by both A and C.
Then say we remove the edge from B to D: In D we remove B from the lit-source-list, but it remains lit as it is still lit by A. Next say we remove the edge from A to C after that: A is removed from C's set, and thus C is no longer lit. We then go on to traverse the edges that originated at C, and remove C from D's set which is now also unlit. In this case we are done, but if the set was bigger, we'd just go on from D.
This algorithm will only ever visit the nodes that are directly affected by a removal or addition of an edge, and as such (apart from the extra storage needed at each node) should be close to optimal.
Is this your homework?
The simplest solution is to do a DFS (http://en.wikipedia.org/wiki/Depth-first_search) or a BFS (http://en.wikipedia.org/wiki/Breadth-first_search) on the original graph starting from the source nodes. This will get you all the original lit nodes.
Now remove the edge in question. Do again the DFS. You can the nodes which still remain lit.
Output the nodes that appear in the first set but not the second.
This is an asymptotically optimal algorithm, since you do two DFSs (or BFSs) which take O(n + m) times and space (where n = number of nodes, m = number of edges), which dominate the complexity. You need at least o(n + m) time and space to read the input, therefore the algorithm is optimal.
Now if you want to remove several edges, that would be interesting. In this case, we would be talking about dynamic data structures. Is this what you intended?
EDIT: Taking into account the comments:
not connected is not a problem, since nodes in unreachable connected components will not be reached during the search
there is a smart way to do the DFS or BFS from all nodes at once (I will describe BFS). You just have to put them all at the beginning on the stack/queue.
Pseudo code for a BFS which searches for all nodes reachable from any of the starting nodes:
Queue q = [all starting nodes]
while (q not empty)
{
x = q.pop()
forall (y neighbour of x) {
if (y was not visited) {
visited[y] = true
q.push(y)
}
}
}
Replace Queue with a Stack and you get a sort of DFS.
How big and how connected are the graphs? You could store all paths from the source nodes to all other nodes and look for nodes where all paths to that node contain one of the remove edges.
EDIT: Extend this description a bit
Do a DFS from each source node. Keep track of all paths generated to each node (as edges, not vertices, so then we only need to know the edges involved, not their order, and so we can use a bitmap). Keep a count for each node of the number of paths from source to node.
Now iterate over the paths. Remove any path that contains the removed edge(s) and decrement the counter for that node. If a node counter is decremented to zero, it was lit and now isn't.
I would keep the information of connected source nodes on the edges while building the graph.(such as if edge has connectivity to the sources S1 and S2, its source list contains S1 and S2 ) And create the Nodes with the information of input edges and output edges. When an edge is removed, update the output edges of the target node of that edge by considering the input edges of the node. And traverse thru all the target nodes of the updated edges by using DFS or BFS. (In case of a cycle graph, consider marking). While updating the graph, it is also possible to find nodes without any edge that has source connection (lit->unlit nodes). However, it might not be a good solution, if you'd like to remove multiple edges at the same time since that may cause to traverse over same edges again and again.

Resources