How to remove invalid edges in graded directed acyclig graph? - algorithm

The following graph sample is a portion of a directed acyclic graph which is to be layered and cleaned up so that only edges connecting consecutive layers are kept.
So what I need is to eliminate edges that form "shortcuts", that is, that jump between non-consecutive layers.
The following considerations apply:
The bluish ring layering is valid because, starting at 83140 and ending at 29518, both branches have the same amount (3) of intermediary nodes, and there is no path that is longer between start and end node;
The green ring, starting at 94347 and ending at 107263, has an invalid edge (already red-crossed), because the left branch encompasses only one intermediary node, while the right branch encompasses three intermediary nodes; Besides, since the first edge of that branch is already valid - we know it pertains to the valid blue ring - it is possible to know which is the right edge to cross-out - otherwise it would be impossible to know which layer should be assigned to node 94030 and so it should be eliminated;
If we consider the pink ring after considering the green one, we know that the lower red-crossed edge is to be removed.
BUT if we consider only the yellow ring, both branches seem to be right (they contain the same number of inner nodes), but actually they only seem right because they contain symmetric errors (shortcuts jumping the same amount of nodes on both branches). If we take this ring locally, at least one of the branches would end up in wrong layers, so it is necessary to use more global data to avoid this error.
My questions are:
What typical concepts and operations are involved in the formulation and possible solution of this problem?
Is there an algorithm for that?

First, topologically sort the graph.
Now from the beginning of sorted array, start breadth first search and try to find the proper "depth" (i.e distance from root) of every node. Since a node can have multiple parents, for a node x, depth[x] is maximum of depth of all it's parents, plus one. We initialize depth for all nodes as -1.
Now in bfs traversal, when we encounter a node p, we try to update the depth of all it's childs c, where depth[c] = max(depth[c],depth[p]+1). Now there are two ways we can detect a child with shortcut.
if depth[p]+1 < depth[c], it means c has a parent with higher depth than p. So edge p to c must be a shortcut.
if depth[p]+1 > depth[c] and depth[c]!=-1, it means c have a parent with lower depth than p. So p is a better parent, and that other parent of c must have a shortcut with p.
In both cases, we mark c as problematic.
Now our goal is for every 'problematic' node x, we check all it's parent, whose depth should be depth[x]-1. If any of them have depth that is lower than that, that one have a shortcut edge with x that needs to be removed.
Since the graph can have multiple roots, we should have a variable to mark visited nodes, and repeat the above thing for any that's left unvisited.
This will sort the yellow ring problem, because before we visit any node, all it's predecessors has already been visited and properly ranked. This is ensured by the topological sort.
(Note : we can do this by just one pass. Instead of marking problematic nodes, we can maintain a parent variable for all nodes, and delete edge with the old parent whenever case 2 occurs. case 1 should be obvious)

Related

Graph theory: best algorithm to find combination of edges “directions”, where each node has at most one edge directed to it

I’m dealing with a graph where there are a certain number of nodes, and there are predefined connections between them which don’t have “directions” yet.
Problem is to give all the edges a direction (ex. If there’s a connection between A And B, give this edge the A->B direction, or B->A), in a way that no node is at the receiving end of more than one edge.
Examples:
For this model (A-B-C), A->B->C works, but A->B<-C does not work, as B is at the receiving end of more than one connection. Although A<-B->C works, as B is on the giving end of both of its connections.
I’ve tried loop detection, but the fact that these nodes can be arbitrarily connected to one another, there can be numerous loops which may or may not be directly attached to each other, I could not find a solution to make use of the information.
Number of nodes can be north of thousands, and connections can be many hundreds in my case. This also rules out brute force.
It is not guaranteed that there will be a definite solution, the aim of the algorithm is to find a combination where there’s the least number of connections causing nodes to have more than one edge pointing to them.
Not a complete algorithm, but given your description of the problem in the comments, I feel like these steps will probably bring the problem back into the brute-forcible range.
First, you should "trim" your graph. Any nodes of degree one should be pruned, with their connected edge being directed at the pruned node. Since no other edge can point to that node, we know that this choice is optimal. Rinse and repeat until all nodes remaining have two or more edges.
Next, as you mentioned, you should exclude any isolated nodes. You can actually extend this up to connected components of size <= 3. This is because for up to three nodes, your number of edges cannot exceed the number of nodes, so you can randomly assign one edge, and the rest will fall into place.
Now, what will remain are a bunch of large, highly-connected, connected components. You could actually do one more check and see if any of these form a single cycle (all nodes degree two) and then assign one edge randomly, but this is probably a fairly rare case. You'll probably just want to start brute forcing each of these independently. It'd probably be best to start from the nodes with the smallest number of edges first, updating the degree of nodes as you assign edges (and also pruning any degree one edges as before), backtracking as necessary.
This is a continuation of the answer by Dillon Davis.
After tree-like branches are removed, and simple cycles are resolved, the remaining graph has nodes of degree 2 or more. I propose that (for the purposes of analyzing the graph) all of the nodes of degree 2 can be removed.
Allow me to explain by example. In this example, when a node is represented by a number, that number is the degree of the node. When a node is represented by a letter, that node has degree 2. So the graph
3 - A - B - C - 4
represents a node of degree 3, connected to a chain of nodes of degree 2, connected to a node of degree 4.
The two ideal choices for this section of the graph are
3 -> A -> B -> C -> 4
3 <- A <- B <- C <- 4
These are ideal in the sense that each lettered node has exactly one incoming edge. I propose that these aren't just ideal choices, they are the only choices. Consider the first ideal solution
3 -> A -> B -> C -> 4
If node 4 has too many incoming edges, we can reduce its count by reversing the edge to C, giving
3 -> A -> B -> C <- 4
But that hasn't improved the situation, it trades "too many edges into 4" with "too many edges into C". Subsequently reversing the edge between C and B resolves C, but breaks B. Keep reversing along the chain and eventually the connection between A and 3 is reversed, and we've arrived at the second ideal solution.
Which leads me to conclude that (for the purposes of analysis)
3 - A - B - C - 4
is equivalent to
3 - 4
So how is this useful in simplifying the problem. Consider the following graph:
When nodes A and B are removed, the remaining edge connects the top node 3 to itself, so that edge can be removed. Likewise for C and D. Which leaves a graph with a single edge. Choose either direction for that edge. Then complete the solution by choosing a direction for the simple cycle A-B-3, and independently choose a direction for the simple cycle C-D-3.
Here's another example:
In this case, removing A and B creates redundant edges between the remaining nodes. After removing the redundant edges, choose either direction for the edge. The direction of that edge determines the direction of the cycle 3-A-3, and cycle 3-B-3.
I wasn't sure about adding another answer, but the answer by user3386109 gave me insight into what I believe is the complete solution, and I felt that it differs too drastically from the spirit of my original answer to include as an edit.
To recap, we have a few tools under out belt:
We can prune nodes with a single edge optimally, repeating the process to completion
We can assign a direction to any edge in a simple cycle (connected components with only nodes of degree 2) and the rest will follow (optimally).
Nodes with two edges in more complex cycles can be temporarily ignored, as their edge directions will be assigned by higher degree nodes.
After reading the last point, the problem itself becomes a bit more clear. Once we have pruned the degree one nodes in bullet one, all remaining nodes have at least two edges. We can say for certain in the optimal graph that each of these nodes will have at least one directional edge pointing to them. As proof, since each node has at least two edges, but the connected component is not a simple cycle (else it would be eliminated in bullet 2), we have more edges than nodes. If any node has zero edges directed towards it, one of those edges could be reversed to reduce the number of conflicting edges, or to "free up" another node to have zero inward edges, to then do the same.
Armed with this knowledge, we know that the minimal number of conflicts (extra edges directed at nodes that already have an edge directed at them) equals the number of edges minus the number of vertices in our pruned graph. We can also conclude that as long as we manage to direct at least one edge to each node, we'll have an optimal graph, regardless of how we scatter the conflicting edges.
Originally I tried to draft an algorithm based on bullet three to accomplish this assignment, but it turns out the answer is actually a lot simpler than that even. The only way we can accidentally create a node with no edges directed away from it is by actively directing all edges away from that node. The solution is to pick a single edge in the connected component, and assign it a direction at random. Then, do a search (DFS, BFS, anything) outward from the node its directed at, assigning directions to the edges as you go, in the direction you that traverse them. Any node you reach will have an edge directed at it (the edge you took to reach it), and the root node has the edge you manually assigned to it.
In the end, this will produce a graph with the minimal number of extra edges directed at nodes. If you instead wish to minimize the number of nodes containing conflicting edges, solve the problem as stated above, and then form a subgraph of the nodes of degree three or more and their connecting edges. Solve for the minimal vertex cover of this subgraph, and then reverse the direction of the edges connecting nodes not in the minimal vertex cover yet containing conflicting edges, with those of the corresponding node in the minimal vertex cover.

Given a source node, dest node, and intermediate nodes, how does one detect if the shortest Manhattan Distance is blocked?

Here is the full title I would have posted, but it happens to be too long:
Given a source node, dest node, and intermediate nodes, how does one detect if the shortest Manhattan Distance is blocked by the intermediate nodes?
I've drawn a diagram to make it more clear. On the left side, "u" is the source node and "v" is the destination node. The nodes labeled 1 through 6 are the intermediate nodes. The shortest Manhattan Distance from u -> v would be 12, but the intermediate nodes form a wall blocking it. The diagram on the right, with u' being the source, and v' being the destination, shows that the intermediate nodes 1 through 5 do not block the shortest Manhattan distance from u' to v'.
I'm trying to find an algorithm that won't require me to actually do a graph search (e.g. BFS), because the distance between u and v could potentially be very large.
If all you want to do is detect whether the shortest path (one consisting of moves that monotonically take you in the right direction) is blocked, then you are trying to check whether the blocking nodes cut the rectangle whose corners are given by the source and destination node into two different regions that are disconnected. If no shortest path from the source to the destination is possible, then every path must have some point in it that's blocked.
Let's suppose for simplicity that your start point is below and to the left of the destination point. In that case, find, in O(n), all of the other points that are obstacle points contained in the bounding box holding the start and end point. You now want to see if there is some subset of those nodes that cuts the rectangle into two pieces, one containing the bottom-left corner and one containing the top-right corner. This is possible iff there is a path of the blocking nodes from the left side to the right side, from the left side to the bottom side, from the top side to the right side, or from the top side to the bottom side. Thus we just need to check if any of these are possible.
Fortunately, this can be done efficiently by modeling the problem as a graph search in a graph that has size O(n), where n is the number of blocking points, and has nothing to do with the size of the bounding box. That is, no matter how far apart the test points are, the size of the graph to search depends solely on the number of blocking points.
The graph you want to construct has two parts. First, build a graph where each blocking point is connected to each other blocking point in the 3x3 square surrounding it. These edges link together blocking points that could be part of the same barrier, in that no path from the source to the target could pass between two blocking points joined by an edge. Now, add in four new nodes representing the top wall, left wall, right wall, and bottom wall and connect them to each node that is adjacent to the appropriate wall. That way, for example, a path from the left wall node to the right wall node would represent a series of blocking nodes that make it impossible to get from the bottom-left node to the top-right node.
This graph has size O(n), where n is the number of blocking nodes, since there are O(n) nodes and each node can have at most 12 edges - one for each of the 8 neighbors and potentially one for each of the four walls. You could construct it in at worst quadratic time by scanning over each node and, for each other node, seeing if they are adjacent. There is probably a better way to do this, but nothing comes to me at the moment.
Now that you have the graph, for each of the pairs of walls that, if connected, would disconnect the graph, run a graph search in this graph between those two wall nodes. If a path exists, report that the shortest path is blocked. If not, report that some shortest path is unblocked. These searches could be done with a simple DFS, or since you're running multiple searches and just want to know if they're connected, using a strongly connected components algorithm once and checking if any pair of important nodes are in the same SCC. Either approach takes time O(n).
Thus the time to solve this problem is at most O(n2), with the bottleneck being the time required to construct the graph.
Hope this helps!
Here's my idea:
I'll describe the case when the destination is upper and to right from the source, for other cases, rotate. (For simple cases where the nodes have the same x/y coordinate, just checks whether there's a blocking node directly between them)
Take the matrix with source and destination in corners. Now, a column at a time, from left to right and inside the column, bottom up, mark blocked nodes. A node B is blocked iff any of following is true:
B is an intermediate node
the nodes left to B and bottom from B are both blocked (both were already checked given the order of processing) or outside the bounds of the matrix
In the end, if destination is blocked, there's no free shortest path.
The time required is O(m*n), where m, n are the lengths of sides of the matrix. So when you'll only have several intermediate nodes, templatetypedef's solution may be more appropriate.
EDIT: Got it a little wrong previously, now I hope I didn't miss anything

Finding Center of The Tree

I have a question which is part of my program.
For a tree T=(V,E) we need to find the node v in the tree that minimize the length of the longest path from v to any other node.
so how we find the center of the tree? Is there can be only one center or more?
If anyone can give me good algorithm for this so i can get the idea on how i can fit into my program.
There are two approaches to do this (both runs in the same time):
using BFS (which I will describe here)
using FIFO queue.
Select any vertex v1 on your tree. Run BFS from this vertex. The last vertex (v2) you will proceed will be the furthest vertex from v1. Now run another BFS, this time from vertex v2 and get the last vertex v3.
The path from v2 to v3 is the diameter of the tree and your center lies somewhere on it. More precisely it lies in the middle of it. If the path has 2n + 1 points, there will be only 1 center (in the position n + 1). If the path has 2n points, there will be two centers at the positions n and n + 1.
You only use 2 BFS calls which runs in 2 * O(V) time.
Consider a tree with two nodes? Which is the center? Either one will suffice, ergo a tree can have more than one center.
Now, think about what it means to be the center. If all of the branches are the same height, the center is the root (all paths go through the root). If the branches are of different heights then the center must be either the root or in the branch with the greatest height otherwise the maximum path is greater than the height of the tallest branch and the root would be a better choice. Now, how far down the tallest branch do we need to look? Half the difference in height between the tallest branch (from the root) and the next tallest branch (if the difference is at most 1 the root will suffice). Why, because as we go down the tallest branch by one level we are lengthening the path to the deepest node of the next tallest branch by one and reducing the distance to the deepest node in the current branch by one. Eventually, they will be equal when you've traversed half the difference in the depth. Now as we go down the tallest branch, we simply need to choose at each level the tallest sub-branch. If more than one has the same height, we simply choose one arbitrarily.
Essentially, what you are doing is finding the longest path in the graph, which is the distance between the tallest branch of the tree and the next tallest branch, then finding the middle node on that branch. Because there may be multiple paths of the same length as the longest path and the length of the longest path may be even, you have multiple possible centers.
Rather than do this homework problem for you, I'm going to ask you through the thought process that gets the answer...
1) What would you do with the graph a-b-c (three vertices, two edges, and definitely acyclic)? Imagine for a moment that you have to put some labels on some of the vertices, you know you're going to get the minimum of the longest path on the "center" vertex. (b, with eventual label "1") But doing that in one step requires psychic powers. So ask yourself what b is 1 step away from. If the longest path to b is 1, and we've just stepped one step backwards along that path, what's the length of our path so far? (longest path = 1, -1 for the back one step. Aha: 0). So that must be the label for the leaves.
2) What does this suggest as a first cut for an algorithm? Mark the leaves "0", mark their upstreams "1", mark their upstreams "2" and so on. Marching in from the leaves and counting the distance as we go...
3) Umm... We have a problem with the graph a-b-c-d. (From now on, a labelled vertex will be replaced with its label.) Labeling the leaves "0" gives 0-b-c-0... We can't get two centers... Heck, what do we do in the simpler condition 0-b-1? We want to label b with both "1" and "2"... Handle those in reverse order...
In 0-b-1, if we extend the path from b's left by one, we get a path of length 1. If we extend the path from b's right, we get 2. We want to track "the length of the longest path from v to any other node", so we want to keep track of the longest path to b. That means we mark b with a "2".
0-b-1 -> 0-2-1
In 0-b-c-0, the computer doesn't actually simultaneously update b and c. It updates one of them, giving either 0-1-c-0 or 0-b-1-0, and the next update gives 0-1-2-0 or 0-2-1-0. Both b and c are "center"s of this graph since each of them meets the demand "the node v in the tree that minimize the length of the longest path from v to any other node." (That length is 2.)
This leads to another observation: The result of the computation is not to label a graph, it's to find the last vertex we label and/or the vertex that ends up with the largest label. (It's possible that we won't find a good way to order labels, so we'll end up needing to find the max at the end. Or maybe we will. Who knows.)
4) So now we have something like: Make two copies of the graph -- the labelled copy and the burn-down copy. The first one will store the labels and it's the one that will have the final answer in it. The burn-down copy will get smaller and smaller as we delete unlabeled vertices from it (to find new labelable vertices). (There are other ways to organize this so that only one copy of the graph is used. When you get to the end of understanding this answer, you should find a way to reduce this waste.) Outline:
label = 0
while the burndown graph is nonempty
collect all the leaves in the burndown-graph into the set X
for each leaf in the set X
if the leaf does not have a label
set the leaf's label (to the current value of label)
delete the leaf from the burn-down graph (these leafs are two copies of the same leaf in the input graph)
label = label+1
find the vertex with the largest label and return it
5) If you actually watch this run, you'll notice several opportunities for short-cutting. Including replacing the search on the last line of the outline with a much quicker method for recognizing the answer.
And now for general strategy tips for algorithm problems:
* Do a few small examples by hand. If you don't understand how to do the small cases, there's no way you can jump straight in and tell the computer how to do the large cases.
* If any of the above steps seemed unmotivated or totally opaque, you will need to study much, much harder to do well in Computer Science. It may be the Computer Science is not for you...

Is there an effient way of determining whether a leaf node is reachable from another arbitrary node in a Directed Acyclic Graph?

Wikipedia: Directed Acyclic Graph
Not sure if leaf node is still proper terminology since it's not really a tree (each node can have multiple children and also multiple parents) and also I'm actually trying to find all the root nodes (which is really just a matter of semantics, if you reverse the direction of all the edges it'd they'd be leaf nodes).
Right now we're just traversing the entire graph (that's reachable from the specified node), but that's turning out to be somewhat expensive, so I'm wondering if there's a better algorithm for doing this. One thing I'm thinking is that we keep track of nodes that have been visited already (while traversing a different path) and don't recheck those.
Are there any other algorithmic optimizations?
We also thought about keeping a list of root nodes that this node is a descendant of, but it seems like maintaining such a list would be fairly expensive as well if we need to check if it changes every time a node is added, moved, or removed.
Edit:
This is more than just finding a single node, but rather finding ALL nodes that are endpoints.
Also there is no master list of nodes. Each node has a list of it's children and it's parents. (Well, that's not completely true, but pulling millions of nodes from the DB ahead of time is prohibitively expensive and would likely cause an OutOfMemory exception)
Edit2:
May or may not change possible solutions, but the graph is bottom-heavy in that there's at most a few dozen root nodes (what I'm trying to find) and some millions (possibly tens or hundreds of millions) leaf nodes (where I'm starting from).
There are a few methods that each may be faster depending on your structure, but in general what youre going to want is a traversal.
A depth first search, goes through each possible route, keeping track of nodes that have already been visited. It's a recursive function, because at each node you have to branch and try each child node of it. There's no faster method if you dont know which way to look for the object you just have to try each way! You definitely need to keep track of where you have already been because it would be wasteful otherwise. It should require on the order of the number of nodes to do a full traversal.
A breadth first search is similar but visits each child of the node before "moving on" and as such builds up layers of distance from the chosen root. This can be faster if the destination is expected to be close to the root node. It would be slower if it is expected to be all the way down a path, because it forces you to traverse every possible edge.
Youre right about maybe keeping a list of known root nodes, the tradeoff there is that you basically have to do the search whenever you alter the graph. If you are altering the graph rarely this is acceptable, but if you alter the graph more frequently than you need to generate this information, then of course it is too costly.
EDIT: Info Update.
It sounds like we are actually looking for a path between two arbitrary nodes, the root/leaf semantic keeps getting switched. The DepthFirstSearch (DFS) starts at one node, and then for each unvisited child, recurse. Break if you find the target node. Due to the way recursion evaluates, this will traverse all the way down the 'left' path, then enumerate nodes at this distance before ever getting to the 'right' path. This is time costly and inefficient if the target node is potentially the first child on the right. BreadthFirst walks in steps, covering all children before moving forward. Because your graph is bottom heavy like a tree, both will be approximately the same execution time.
When the graph is bottom heavy you might be interested in a reverse traversal. Start at the target node and walk upwards, because there are relatively fewer nodes in this direction. So long as the nodes in general have more parents than children, this direction will be much faster. You can also combine the approaches, stepping one up and one down , then comparing lists of nodes, and meeting somewhere in the middle. (this combination might seem the fastest if you ignore that twice as much work is done at each step).
However, since you said that your graph is stored as a list of lists of children, you have no real way of traversing the graph backwards. A node does not know what its parents are. This is a problem. To fix it you have to get a node to know what its parents are by adding that data on graph update, or by creating a duplicate of the whole structure (which you have said is too large). It will need the whole structure to be rewritten, which sounds probably out of the question due to it being a large database at this point.
There's a lot of work to do.
http://en.wikipedia.org/wiki/Graph_(data_structure)
Just color (keep track of) visited nodes.
Sample in Python:
def reachable(nodes, edges, start, end):
color = {}
for n in nodes:
color[n] = False
q = [start]
while q:
n = q.pop()
if color[n]:
continue
color[n] = True
for adj in edges[n]:
q.append(adj)
return color[end]
For a vertex x you want to compute a bit array f(x), each bit corresponds to a root vertex Ri, and 1 (resp 0) means "x can (resp can't) be reached from root vertex Ri.
You could partition the graph into one "upper" set U containing all your target roots R and such that if x in U then all parents of x are in U. For example the set of all vertices at distance <=D from the closest Ri.
Keep U not too big, and precompute f for each vertex x of U.
Then, for a query vertex y: if y is in U, you already have the result. Otherwise recursively perform the query for all parents of y, caching the value f(x) for each visited vertex x (in a map for example), so you won't compute a value twice. The value of f(y) is the bitwise OR of the value of its parents.

Is there a proper algorithm to solve edge-removing problem?

There is a directed graph (not necessarily connected) of which one or more nodes are distinguished as sources. Any node accessible from any one of the sources is considered 'lit'.
Now suppose one of the edges is removed. The problem is to determine the nodes that were previously lit and are not lit anymore.
An analogy like city electricity system may be considered, I presume.
This is a "dynamic graph reachability" problem. The following paper should be useful:
A fully dynamic reachability algorithm for directed graphs with an almost linear update time. Liam Roditty, Uri Zwick. Theory of Computing, 2002.
This gives an algorithm with O(m * sqrt(n))-time updates (amortized) and O(sqrt(n))-time queries on a possibly-cyclic graph (where m is the number of edges and n the number of nodes). If the graph is acyclic, this can be improved to O(m)-time updates (amortized) and O(n/log n)-time queries.
It's always possible you could do better than this given the specific structure of your problem, or by trading space for time.
If instead of just "lit" or "unlit" you would keep a set of nodes from which a node is powered or lit, and consider a node with an empty set as "unlit" and a node with a non-empty set as "lit", then removing an edge would simply involve removing the source node from the target node's set.
EDIT: Forgot this:
And if you remove the last lit-from-node in the set, traverse the edges and remove the node you just "unlit" from their set (and possibly traverse from there too, and so on)
EDIT2 (rephrase for tafa):
Firstly: I misread the original question and thought that it stated that for each node it was already known to be lit or unlit, which as I re-read it now, was not mentioned.
However, if for each node in your network you store a set containing the nodes it was lit through, you can easily traverse the graph from the removed edge and fix up any lit/unlit references.
So for example if we have nodes A,B,C,D like this: (lame attempt at ascii art)
A -> B >- D
\-> C >-/
Then at node A you would store that it was a source (and thus lit by itself), and in both B and C you would store they were lit by A, and in D you would store that it was lit by both A and C.
Then say we remove the edge from B to D: In D we remove B from the lit-source-list, but it remains lit as it is still lit by A. Next say we remove the edge from A to C after that: A is removed from C's set, and thus C is no longer lit. We then go on to traverse the edges that originated at C, and remove C from D's set which is now also unlit. In this case we are done, but if the set was bigger, we'd just go on from D.
This algorithm will only ever visit the nodes that are directly affected by a removal or addition of an edge, and as such (apart from the extra storage needed at each node) should be close to optimal.
Is this your homework?
The simplest solution is to do a DFS (http://en.wikipedia.org/wiki/Depth-first_search) or a BFS (http://en.wikipedia.org/wiki/Breadth-first_search) on the original graph starting from the source nodes. This will get you all the original lit nodes.
Now remove the edge in question. Do again the DFS. You can the nodes which still remain lit.
Output the nodes that appear in the first set but not the second.
This is an asymptotically optimal algorithm, since you do two DFSs (or BFSs) which take O(n + m) times and space (where n = number of nodes, m = number of edges), which dominate the complexity. You need at least o(n + m) time and space to read the input, therefore the algorithm is optimal.
Now if you want to remove several edges, that would be interesting. In this case, we would be talking about dynamic data structures. Is this what you intended?
EDIT: Taking into account the comments:
not connected is not a problem, since nodes in unreachable connected components will not be reached during the search
there is a smart way to do the DFS or BFS from all nodes at once (I will describe BFS). You just have to put them all at the beginning on the stack/queue.
Pseudo code for a BFS which searches for all nodes reachable from any of the starting nodes:
Queue q = [all starting nodes]
while (q not empty)
{
x = q.pop()
forall (y neighbour of x) {
if (y was not visited) {
visited[y] = true
q.push(y)
}
}
}
Replace Queue with a Stack and you get a sort of DFS.
How big and how connected are the graphs? You could store all paths from the source nodes to all other nodes and look for nodes where all paths to that node contain one of the remove edges.
EDIT: Extend this description a bit
Do a DFS from each source node. Keep track of all paths generated to each node (as edges, not vertices, so then we only need to know the edges involved, not their order, and so we can use a bitmap). Keep a count for each node of the number of paths from source to node.
Now iterate over the paths. Remove any path that contains the removed edge(s) and decrement the counter for that node. If a node counter is decremented to zero, it was lit and now isn't.
I would keep the information of connected source nodes on the edges while building the graph.(such as if edge has connectivity to the sources S1 and S2, its source list contains S1 and S2 ) And create the Nodes with the information of input edges and output edges. When an edge is removed, update the output edges of the target node of that edge by considering the input edges of the node. And traverse thru all the target nodes of the updated edges by using DFS or BFS. (In case of a cycle graph, consider marking). While updating the graph, it is also possible to find nodes without any edge that has source connection (lit->unlit nodes). However, it might not be a good solution, if you'd like to remove multiple edges at the same time since that may cause to traverse over same edges again and again.

Resources