Identify Redundant Dependence in Graph - algorithm

I have a DIRECTED ACYCLIC GRAPH where each node stands for a task and each directed edge A -> B means task A should be done before task B starts
A simple illustration could be like this:
So this is actually a workflow. In this graph, edge A -> B is considered redundant because task B need task C done first, and task C needs task A done first. (not to mention another path A -> D -> E -> B which make A -> B unnecessary)
The problem is: I want to identify (say, just output) all the redundant dependence (edges) on the graph. My friend and I have got an idea like this: iterate through all edges on the graph, and for each edge say X -> Y, remove it and check the connectivity from X to Y (for example, run DFS/BFS), if there still exists a path (other than the removed one), then edge X -> Y is redundant and can be physically removed, otherwise just put it back. In this case, the complexity in the worst condition could be O(n^2) (DFS/BFS pass through approximately all edges every time), where n stands for the number of edges on the graph.
I wonder if there is any optimization about this?

Have you heard Transitive reduction? From Wikipedia
A transitive reduction of a directed graph is a graph with as few edges as possible that has the same reachability relation as the given graph. Equivalently, the given graph and its transitive reduction should have the same transitive closure as each other, and its transitive reduction should have as few edges as possible among all graphs with this property. Transitive reductions were introduced by Aho, Garey & Ullman (1972), who provided tight bounds on the computational complexity of constructing them.
You can get details from Transitive Reduction. If the number n of vertices and the number m of edges in a directed acyclic graph, then transitive reductions can be found in time O(nm).

Topological sorting using DFS with an stack can yield result in linear time, it can be done by starting with a vertex, marking it visited, then recursively performing topologic sorting to all its non visited adjacent edged, once all of them are explored push the vertex to stack.
then simply print from stack, it will generate result in linear time, for more you can refer to an algo explained in following link.
Topological sorting

Related

Path double cover, recursion set up

I'm working on path double cover problem. I have undirected connected graph G and and I change every edge to 2 directed edges and each of them is in opposite direction. Then the goal is to find set of paths(no loops) in this directed graph so that every vertex is used once as start of path and once as end of another path. Each of directed edges are used exactly once.
undirected graph G
directed graph G
For this example there is set of paths P={(1,2,4),(4,3,1),(2,1,3),(3,4,2)}.
There are currently known 2 graphs K3 and K5 (fully connected graphs with 3 and 5 vertices) which cannot be covered in this way.
I want to make script which will find me this covering or tell me if there isn't one. I tried to generate all possible paths and then search in them but for bigger graph this approach isn't usable (n! complexity). I don't know how to set up the recursion so I can keep track of what I've used. I don't care about time complexity but it would be awesome if you had any tip for doing it more quickly. :D
Thanks for any suggestions. :D
Your definition is a bit confusing- you say that you need to find a set of paths (no loops) in the directed graph, with 1 outgoing edge per vertex. There is no way for these edges not to form a loop (at most n - 1 edges can be tree edges).
I'm going to assume that you instead mean "only one cycle; no subcycles".
In that case, your task becomes that of determining whether your graph has a Hamiltonian Cycle or not.
We can use Ore's Theorem as a quick check:
If deg v + deg w ≥ n for every pair of distinct non-adjacent vertices v and w of G then G is Hamiltonian.
Note that this says "if" and not "iif" / "if and only if", so a graph be Hamiltonian, and not satisfy this check.
To take things one step further, we can use the Bondy–Chvátal theorem:
A graph is Hamiltonian if and only if its closure is Hamiltonian.
And we obtain its closure in a similar method to what we did for Ore's Theorem check- we repeatedly add a new edge connecting a nonadjacent pair of vertices u and v with deg(v) + deg(u) ≥ n until no more pairs with this property can be found.
Once this is done, we check whether the closure is Hamiltonian. If the closure is a complete graph, then it is Hamiltonian. I was unable to find any proof that the closure will be complete iif the graph g is Hamiltonian, however it does seem to happen with every example graph I can conjure up, so at least it may be a stronger correlation than Ore's Theorem.
In the end, you just need to determine if the graph has Hamiltonian Cycle. I've listed above two ways you can perform quadratic-time checks to positively identify some of such graph (maybe all, again- not sure of the completeness of the closure bit).

Cycle detection that handles a series of directed edges? [duplicate]

I came upon wait-for graphs and I wonder, are there any efficient algorithms for detecting if adding an edge to a directed graph results in a cycle?
The graphs in question are mutable (they can have nodes and edges added or removed). And we're not interested in actually knowing an offending cycle, just knowing there is one is enough (to prevent adding an offending edge).
Of course it'd be possible to use an algorithm for computing strongly connected components (such as Tarjan's) to check if the new graph is acyclic or not, but running it again every time an edge is added seems quite inefficient.
If I understood your question correctly, then a new edge (u,v) is only inserted if there was no path from v to u before (i.e., if (u,v) does not create a cycle). Thus, your graph is always a DAG (directed acyclic graph). Using Tarjan's Algorithm to detect strongly connected components (http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm) sounds like an overkill in this case. Before inserting (u,v), all you have to check is whether there is a directed path from v to u, which can be done with a simple BFS/DFS.
So the simplest way of doing it is the following (n = |V|, m = |E|):
Inserting (u,v): Check whether there is a path from v to u (BFS/DFS). Time complexity: O(m)
Deleting edges: Simply remove them from the graph. Time complexity: O(1)
Although inserting (u,v) takes O(m) time in the worst case, it is probably pretty fast in your situation. When doing the BFS/DFS starting from v to check whether u is reachable, you only visit vertices that are reachable from v. I would guess that in your setting the graph is pretty sparse and that the number of vertices reachable by another is not that high.
However, if you want to improve the theoretical running time, here are some hints (mostly showing that this will not be very easy). Assume we aim for testing in O(1) time whether there exists a directed path from v to u. The keyword in this context is the transitive closure of a DAG (i.e., a graph that contains an edge (u, v) if and only if there is a directed path from u to v in the DAG). Unfortunately, maintaining the transitive closure in a dynamic setting seems to be not that simple. There are several papers considering this problem and all papers I found were STOC or FOCS papers, which indicates that they are very involved. The newest (and fastest) result I found is in the paper Dynamic Transitive Closure via Dynamic Matrix Inverse by Sankowski (http://dl.acm.org/citation.cfm?id=1033207).
Even if you are willing to understand one of those dynamic transitive closure algorithms (or even want to implement it), they will not give you any speed up for the following reason. These algorithms are designed for the situation, where you have a lot of connectivity queries (which then can be performed in O(1) time) and only few changes in the graph. The goal then is to make these changes cheaper than recomputing the transitive closure. However, this update is still slower that a single check for connectivity. Thus, if you need to do an update on every connectivity query, it is better to use the simple approach mentioned above.
So why do I mention this approach of maintaining the transitive closure if it does not fit your needs? Well, it shows that searching an algorithm consuming only O(1) query time does probably not lead you to a solution faster than the simple one using BFS/DFS. What you could try is to get a query time that is faster than O(m) but worse than O(1), while updates are also faster than O(m). This is a very interesting problem, but it sounds to me like a very ambitious goal (so maybe do not spend too much time on trying to achieve it..).
As Mark suggested it is possible to use data structure that stores connected nodes. It is the best to use boolean matrix |V|x|V|. Values can be initialized with Floyd–Warshall algorithm. That is done in O(|V|^3).
Let T(i) be set of vertices that have path to vertex i, and F(j) set of vertices where exists path from vertex j. First are true's in i'th row and second true's in j'th column.
Adding an edge (i,j) is simple operation. If i and j wasn't connected before, than for each a from T(i) and each b from F(j) set matrix element (a,b) to true. But operation isn't cheap. In worst case it is O(|V|^2). That is in case of directed line, and adding edge from end to start vertex makes all vertices connected to all other vertices.
Removing an edge (i,j) is not so simple, but not more expensive operation in the worst case :-) If there is a path from i to j after removing edge, than nothing changes. That is checked with Dijkstra, less than O(|V|^2). Vertices that are not connected any more are (a,b):
a in T(i) - i - T(j),
b in F(j) + j
Only T(j) is changed with removing edge (i,j), so it has to be recalculated. That is done by any kind of graph traversing (BFS, DFS), by going in opposite edge direction from vertex j. That is done in less then O(|V|^2). Since setting of matrix element is in worst case is again O(|V|^2), this operation has same worst case complexity as adding edge.
This is a problem which I recently faced in a slightly different situation (optimal ordering of interdependent compiler instructions).
While I can't improve on O(n*n) theoretical bounds, after a fair amount of experimentation and assuming heuristics for my case (for example, assuming that the initial ordering wasn't created maliciously) the following was the best compromise algorithm in terms of performance.
(In my case I had an acceptable "right side failure": after the initial nodes and arcs were added (which was guaranteed to be possible), it was acceptable for the optimiser to occasionally reject the addition of further arcs where one could actually be added. This approximation isn't necessary for this algorithm when carried to completion, but it does admit such an approximation if you wish to do so, and so limiting its runtime further).
While a graph is topologically sorted, it is guaranteed to be cycle-free. In the first phase when I had a static bulk of nodes and arcs to add, I added the nodes and then topologically sorted them.
During the second phase, adding additional arcs, there are two situations when considering an arc from A to B. If A already lies to the left of B in the sort, an arc can simply be added and no cycle can be generated, as the list is still topologically sorted.
If B is to the left of A, we consider the sub-sequence between B and A and partition it into two disjoint sequences X, Y, where X is those nodes which can reach A (and Y the others). If A is not reachable from B, ie there are no direct arcs from B into X or to A, then the sequence can be reordered XABY before adding the A to B arc, showing it is still cycle-free and maintaining the topological sort. The efficiency over the naive algorithm here is that we only need consider the subsequence between B and A as our list is topologically sorted: A is not reachable from any node to the right of A. For my situation, where localised reorderings are the most frequent and important, this an important gain.
As we don't reorder within the sequences X,A,B,Y, clearly any arcs which start or end within the same sequence are still ordered correctly, and the same in each flank, and any "fly-over" arcs from the left to the right flanks. Any arcs between the flanks and X,A,B,Y are also still ordered correctly as our reordering is restricted to this local region. So we only need to consider arcs between our four sequences. Consider each possible "problematic" arc for our final ordering XABY in turn: YB YA YX BA BX AX. Our initial order was B[XY]A, so AX and YB cannot occur. X reaches A, but Y does not, therefore YX and YA do not occur or A could be reached from the source of the arc in Y (potentially via X) a contradiction. Our criterion for acceptability was that there are no links BX or BA. So there are no problematic arcs, and we are still topologically sorted.
Our only acceptability criterion (that A is not reachable from B) is clearly sufficient to create a cycle on adding the arc A->B: B -(X)-> A -> B, so the converse is also shown.
This can be implemented reasonably efficiently if we can add a flag to each node. Consider the nodes [BXY] going right-to-left from the node immediately to the left of A. If that node has a direct arc to A then set the flag. At an arbitrary such node, we need only consider direct outgoing arcs: the nodes to its right are either after A (and so irrelevant), or else have already been flagged if reachable from A, so the flag on such an arbitrary node is set when any flagged nodes are encountered by direct link. If B is not flagged at the end of the process, the reordering is acceptable and the flagged nodes comprise X.
Though this always yields a correct ordering if carried to completion (as far as I can tell), as I mentioned in the introduction it is particularly efficient if your initial build is approximately correct (in the sense of accommodating of likely additional arcs without reordering).
There also exists an effective approximation, if your context is such that "outrageous" arcs can be rejected (those which would massively reorder) by limiting the A to B distance you are prepared to scan. If you have an initial list of the additional arcs you wish to add, they can be ordered by increasing distance in the initial ordering until you run out of some scanning "credit", and call your optimisation a day at that point.
If the graph is directed, you would only have to check the parent nodes (navigate up until you reach the root) of the node where the new edge should start. If one of the parent nodes is equal to the end of the edge, adding the edge would create a cycle.
If all previous jobs are in Topologically sorted order. Then if you add an edge that appears to brake the sort, and can not be fixed, then you have a cycle.
https://stackoverflow.com/a/261621/831850
So if we have a sorted list of nodes:
1, 2, 3, ..., x, ..., z, ...
Such that each node is waiting for nodes to its left.
Say we want to add an edge from x->z. Well that appears to brake the sort. So we can move the node at x to position z+1 which will fix the sort iif none of the nodes (x, z] have an edge to the node at x.

Find all critical edges of an MST

I have this question from Robert Sedgewick's book on algorithms.
Critical edges. An MST edge whose deletion from the graph would cause the
MST weight to increase is called a critical edge. Show how to find all critical edges in a
graph in time proportional to E log E. Note: This question assumes that edge weights
are not necessarily distinct (otherwise all edges in the MST are critical).
Please suggest an algorithm that solves this problem.
One approach I can think of does the job in time E.V.
My approach is to run the kruskal's algorithm.
But whenever we encounter an edge whose insertion in the MST creates a cycle and if that
cycle already contains an edge with the same edge weight, then, the edge already inserted will not be a critical edge (otherwise all other MST edges are critical edges).
Is this algorithm correct? How can I extend this algorithm to do the job in time E log E.
The condition you suggest for when an edge is critical is correct I think. But it's not necessary to actually find a cycle and test each of its edges.
The Kruskal algorithm adds edges in increasing weight order, so the sequence of edge additions can be broken into blocks of equal-weight edge additions. Within each equal-weight block, if there is more than one edge that joins the same two components, then all of these edges are non-critical, because any one of the other edges could be chosen instead. (I say they are all non-critical because we are not actually given a specific MST as part of the input -- if we were then this would identify a particular edge to call non-critical. The edge that Kruskal actually chooses is just an artefact of initial edge ordering or how sorting was implemented.)
But this is not quite sufficient: it might be that after adding all edges of weight 4 or less to the MST, we find that there are 3 weight-5 edges, connecting component pairs (1, 2), (2, 3) and (1, 3). Although no component pair is joined by more than 1 of these 3 edges, we only need (any) 2 of them -- using all 3 would create a cycle.
For each equal-weight block, having weight say w, what we actually need to do is (conceptually) create a new graph in which each component of the MST so far (i.e. using edges having weight < w) is a vertex, and there is an edge between 2 vertices whenever there is a weight-w edge between these components. (This may result in multi-edges.) We then run DFS on each component of this graph to find any cycles, and mark every edge belonging to such a cycle as non-critical. DFS takes O(nEdges) time, so the sum of the DFS times for each block (whose sizes sum to E) will be O(E).
Note that Kruskal's algorithm takes time O(Elog E), not O(E) as you seem to imply -- although people like Bernard Chazelle have gotten close to linear-time MST construction, TTBOMK no one has got there yet! :)
Yes, your algorithm is correct. We can prove that by comparing the execution of Kruskal's algorithm to a similar execution where the cost of some MST edge e is changed to infinity. Until the first execution considers e, both executions are identical. After e, the first execution has one fewer connected component than the second. This condition persists until an edge e' is considered that, in the second execution, joins the components that e would have. Since edge e is the only difference between the forests constructed so far, it must belong to the cycle created by e'. After e', the executions make identical decisions, and the difference in the forests is that the first execution has e, and the second, e'.
One way to implement this algorithm is using a dynamic tree, a data structure that represents a labelled forest. One configuration of this ADT supports the following methods in logarithmic time.
MakeVertex() - constructs and returns a fresh vertex.
Link(u, c, v) - vertices u and v must not be connected. Creates an unmarked edge from vertex u to vertex v with cost c.
Mark(u, v) - vertices u and v must be endpoints of an edge e. Marks e.
Connected(u, v) - indicates whether vertices u and v are connected.
FindMax(u, v) - vertices u and v must be connected. Returns the endpoints of an unmarked edge on the unique path from u to v with maximum cost, together with that cost. The endpoints of this edge are given in the order that they appear on the path.
I make no claim that this is a good algorithm in practice. Dynamic trees, like Swiss Army knives, are versatile but complicated and often not the best tool for the job. I encourage you to think about how to take advantage of the fact that we can wait until all of the edges are processed to figure out what the critical edges are.

computing number of nodes which can be reached by a specific node in a directed graph for each node

In a directed graph (suppose it has lots of cycles) I need to compute number of nodes which can be reached by specific node for each node. How can I do that with minimal effort? Which algorithm do I need to use?
Note: I think a reasonable algorithm for this problem should recursively compute this numbers(like result for 'node a' depends on that of 'node b' if a is connected to b).
The algorithm you're looking for is called the Floyd-Warshall algorithm, a very nice and efficient dynamic programming algorithm. It can be used to calculate the set of nodes reachable from each individual node in a graph (the transitive closure), although it's more often used to calculate the shortest paths from each individual node in a graph to all other nodes.
(Edit: the Floyd-Warshall algorithm is more complicated than it needs to be for your uses, because it's been extended a bit by Floyd to calculate shortest paths. You may find this page helpful, which only describes the "Warshall" part of the algorithm - the part you need.)
I happen to be studying it right now for class and have the paper on my desk. The recurrence for the transitive closure version of F-W is:
T(i,j,k) = T(i,j,k-1) ∨ (T(i,k,k-1) ∧ T(k,j,k-1))
Where T(a,b,c) is true if and only if there is a path from a to b using only the first c vertices in the graph (you must give them an arbitrary numbering before running the algorithm).
Intuitively, the recurrence says that there's a path from i to j using the first k vertices if:
there's a direct path between i and j, using the first k-1 vertices, OR
there's a path between i and k, and a path between k and j, using the first k-1 vertices.
You can build up the entire 3-dimensional table of T(i,j,k) in the typical dynamic programming fashion, and then count all of the TRUE entries along the source node that you want (using the max k), to get the size of the transitive closure for that source node.
If you're still following my poor explanation, you can make the algorithm extremely efficient with a few tricks:
It turns out that you don't need the k dimension in your table; you can just overwrite your same row of values over and over. Now the program would look like:
T(i,j) = T(i,j) || (T(i,k) && T(k,j))
If T(i,k) is 0 then you can skip the whole thing since nothing will change on that step.
If T(i,k) is 1 then the new value will just be T(i,j) || T(k,j). This can be done in huge chunks because block OR is extremely fast on modern processors.
Hope that helps...

What algorithm can I apply to this DAG?

I have a DAG representing a list of properties. These properties are such that if a>b, then a has a directed edge to b. It is transitive as well, so that if a>b and b>c, then a has a directed edge to c.
However, the directed edge from a to c is superfluous because a has a directed edge to b and b has a directed edge to c. How can I prune all these superfluous edges? I was thinking of using a minimum spanning tree algorithm, but I'm not really sure what is the appropriate algorithm to apply in this situation
I suppose I could do a depth first search from each node and all its outgoing edges and compare if it can reach certain nodes without using certain edges, but this seems horribly inefficient and slow.
After the algorithm is complete, the output would be a linear list of all the nodes in an order that is consistent with the graph. So if a has three directed edges to b,c, and d. b and c also each of which has a directed edge to d, the output could be either abcd or acbd.
This is called the transitive reduction problem. Formally speaking, you are looking for a minimal (fewest edges) directed graph, the transitive closure of which is equal to the transitive closure of the input graph. (The diagram on the above Wikipedia link makes it clear.)
Apparently there exists an efficient algorithm for solving this problem that takes the same time as for producing a transitive closure (i.e. the more common inverse problem of adding transitive links instead of removing them), however the link to the 1972 paper by Aho, Garey, and Ullman costs $25 to download, and some quick googling didn't turn up any nice descriptions.
EDIT: Scott Cotton's graphlib contains a Java implementation! This Java library looks to be very well organised.
Actually, after looking around a little more, I think a Topologicalsort is what I'm really after here.
If these are already n nodes with directed edges:
Starting from any point M, loop all its child edge, select the biggest child (like N), remove other edges, the complexity should be o(n) . If no N exists (no child edge, goto step 3).
start from N, repeat step 1.
start from point M, select the smallest parent node ( like T), remove others' edges.
start from T, repeat step 3.....
Actually it's just a ordering algorithm, and the totally complexity should be o(0.5n^2).
One problem is that if we want loop one node's parent nodes, then we need more memory to log edge so we can trace back from child to parent. This can be improved in the step 3 where we choose one node from the left nodes bigger than M, this means we need to keep a list of nodes to know what nodes are left..

Resources