Does order of exploration in DFS affect edge classification

Does order of exploration in DFS affect edge classification - algorithm

I'm implementing DFS and edge classification for college (based on the code provided in this paper: https://courses.csail.mit.edu/6.006/fall11/rec/rec14.pdf).
I've used this graph as a test:
The letters in italic are simply the vertices' names, while the numbers inside the vertices are discovery and finalization time, respectively. Edges are classified as Back, Forward or Cross; all else are tree edges.
As you can see, this graph was visited in the following order: first s, then its neighbors (following DFS); when there were no more accessible neighbors, visitation began on t.
In order to test our algorithm, the teacher will provide a text file with each edge as a line. When performing the DFS, I simply follow the order of appearance in the file for each vertex; in this case, that would be first s and then v, not t. This in turn gives a different edge classification: t to v is considered a cross edge, t to u a back one and u to t a tree one.
Is this behavior normal? Does the order of exploration affect the final edge classification? If so, are both classifications, mine and the example's, correct? If not, is there any way I can know which vertex to explore first?

Is this behavior normal? Does the order of exploration affect the final edge classification?
Absolutely. The order in which the algorithm chooses edges for exploration decides edge classification in cases when there are any back or cross edges (i.e. when the graph is not a tree.)
If so, are both classifications, mine and the example's, correct?
Yes, both classifications are correct, as long as there is no specific order of exploration requested by the problem setter.
If not, is there any way I can know which vertex to explore first?
The problem may specify a particular order of exploration - for example, the graph on your picture appears to be traversed in reverse alphabetical order of vertex naming: z before w from s, y before w from z, and v before u from t.

Related

Cycle detection that handles a series of directed edges? [duplicate]

I came upon wait-for graphs and I wonder, are there any efficient algorithms for detecting if adding an edge to a directed graph results in a cycle?
The graphs in question are mutable (they can have nodes and edges added or removed). And we're not interested in actually knowing an offending cycle, just knowing there is one is enough (to prevent adding an offending edge).
Of course it'd be possible to use an algorithm for computing strongly connected components (such as Tarjan's) to check if the new graph is acyclic or not, but running it again every time an edge is added seems quite inefficient.

If I understood your question correctly, then a new edge (u,v) is only inserted if there was no path from v to u before (i.e., if (u,v) does not create a cycle). Thus, your graph is always a DAG (directed acyclic graph). Using Tarjan's Algorithm to detect strongly connected components (http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm) sounds like an overkill in this case. Before inserting (u,v), all you have to check is whether there is a directed path from v to u, which can be done with a simple BFS/DFS.
So the simplest way of doing it is the following (n = |V|, m = |E|):
Inserting (u,v): Check whether there is a path from v to u (BFS/DFS). Time complexity: O(m)
Deleting edges: Simply remove them from the graph. Time complexity: O(1)
Although inserting (u,v) takes O(m) time in the worst case, it is probably pretty fast in your situation. When doing the BFS/DFS starting from v to check whether u is reachable, you only visit vertices that are reachable from v. I would guess that in your setting the graph is pretty sparse and that the number of vertices reachable by another is not that high.
However, if you want to improve the theoretical running time, here are some hints (mostly showing that this will not be very easy). Assume we aim for testing in O(1) time whether there exists a directed path from v to u. The keyword in this context is the transitive closure of a DAG (i.e., a graph that contains an edge (u, v) if and only if there is a directed path from u to v in the DAG). Unfortunately, maintaining the transitive closure in a dynamic setting seems to be not that simple. There are several papers considering this problem and all papers I found were STOC or FOCS papers, which indicates that they are very involved. The newest (and fastest) result I found is in the paper Dynamic Transitive Closure via Dynamic Matrix Inverse by Sankowski (http://dl.acm.org/citation.cfm?id=1033207).
Even if you are willing to understand one of those dynamic transitive closure algorithms (or even want to implement it), they will not give you any speed up for the following reason. These algorithms are designed for the situation, where you have a lot of connectivity queries (which then can be performed in O(1) time) and only few changes in the graph. The goal then is to make these changes cheaper than recomputing the transitive closure. However, this update is still slower that a single check for connectivity. Thus, if you need to do an update on every connectivity query, it is better to use the simple approach mentioned above.
So why do I mention this approach of maintaining the transitive closure if it does not fit your needs? Well, it shows that searching an algorithm consuming only O(1) query time does probably not lead you to a solution faster than the simple one using BFS/DFS. What you could try is to get a query time that is faster than O(m) but worse than O(1), while updates are also faster than O(m). This is a very interesting problem, but it sounds to me like a very ambitious goal (so maybe do not spend too much time on trying to achieve it..).

As Mark suggested it is possible to use data structure that stores connected nodes. It is the best to use boolean matrix |V|x|V|. Values can be initialized with Floyd–Warshall algorithm. That is done in O(|V|^3).
Let T(i) be set of vertices that have path to vertex i, and F(j) set of vertices where exists path from vertex j. First are true's in i'th row and second true's in j'th column.
Adding an edge (i,j) is simple operation. If i and j wasn't connected before, than for each a from T(i) and each b from F(j) set matrix element (a,b) to true. But operation isn't cheap. In worst case it is O(|V|^2). That is in case of directed line, and adding edge from end to start vertex makes all vertices connected to all other vertices.
Removing an edge (i,j) is not so simple, but not more expensive operation in the worst case :-) If there is a path from i to j after removing edge, than nothing changes. That is checked with Dijkstra, less than O(|V|^2). Vertices that are not connected any more are (a,b):
a in T(i) - i - T(j),
b in F(j) + j
Only T(j) is changed with removing edge (i,j), so it has to be recalculated. That is done by any kind of graph traversing (BFS, DFS), by going in opposite edge direction from vertex j. That is done in less then O(|V|^2). Since setting of matrix element is in worst case is again O(|V|^2), this operation has same worst case complexity as adding edge.

This is a problem which I recently faced in a slightly different situation (optimal ordering of interdependent compiler instructions).
While I can't improve on O(n*n) theoretical bounds, after a fair amount of experimentation and assuming heuristics for my case (for example, assuming that the initial ordering wasn't created maliciously) the following was the best compromise algorithm in terms of performance.
(In my case I had an acceptable "right side failure": after the initial nodes and arcs were added (which was guaranteed to be possible), it was acceptable for the optimiser to occasionally reject the addition of further arcs where one could actually be added. This approximation isn't necessary for this algorithm when carried to completion, but it does admit such an approximation if you wish to do so, and so limiting its runtime further).
While a graph is topologically sorted, it is guaranteed to be cycle-free. In the first phase when I had a static bulk of nodes and arcs to add, I added the nodes and then topologically sorted them.
During the second phase, adding additional arcs, there are two situations when considering an arc from A to B. If A already lies to the left of B in the sort, an arc can simply be added and no cycle can be generated, as the list is still topologically sorted.
If B is to the left of A, we consider the sub-sequence between B and A and partition it into two disjoint sequences X, Y, where X is those nodes which can reach A (and Y the others). If A is not reachable from B, ie there are no direct arcs from B into X or to A, then the sequence can be reordered XABY before adding the A to B arc, showing it is still cycle-free and maintaining the topological sort. The efficiency over the naive algorithm here is that we only need consider the subsequence between B and A as our list is topologically sorted: A is not reachable from any node to the right of A. For my situation, where localised reorderings are the most frequent and important, this an important gain.
As we don't reorder within the sequences X,A,B,Y, clearly any arcs which start or end within the same sequence are still ordered correctly, and the same in each flank, and any "fly-over" arcs from the left to the right flanks. Any arcs between the flanks and X,A,B,Y are also still ordered correctly as our reordering is restricted to this local region. So we only need to consider arcs between our four sequences. Consider each possible "problematic" arc for our final ordering XABY in turn: YB YA YX BA BX AX. Our initial order was B[XY]A, so AX and YB cannot occur. X reaches A, but Y does not, therefore YX and YA do not occur or A could be reached from the source of the arc in Y (potentially via X) a contradiction. Our criterion for acceptability was that there are no links BX or BA. So there are no problematic arcs, and we are still topologically sorted.
Our only acceptability criterion (that A is not reachable from B) is clearly sufficient to create a cycle on adding the arc A->B: B -(X)-> A -> B, so the converse is also shown.
This can be implemented reasonably efficiently if we can add a flag to each node. Consider the nodes [BXY] going right-to-left from the node immediately to the left of A. If that node has a direct arc to A then set the flag. At an arbitrary such node, we need only consider direct outgoing arcs: the nodes to its right are either after A (and so irrelevant), or else have already been flagged if reachable from A, so the flag on such an arbitrary node is set when any flagged nodes are encountered by direct link. If B is not flagged at the end of the process, the reordering is acceptable and the flagged nodes comprise X.
Though this always yields a correct ordering if carried to completion (as far as I can tell), as I mentioned in the introduction it is particularly efficient if your initial build is approximately correct (in the sense of accommodating of likely additional arcs without reordering).
There also exists an effective approximation, if your context is such that "outrageous" arcs can be rejected (those which would massively reorder) by limiting the A to B distance you are prepared to scan. If you have an initial list of the additional arcs you wish to add, they can be ordered by increasing distance in the initial ordering until you run out of some scanning "credit", and call your optimisation a day at that point.

If the graph is directed, you would only have to check the parent nodes (navigate up until you reach the root) of the node where the new edge should start. If one of the parent nodes is equal to the end of the edge, adding the edge would create a cycle.

If all previous jobs are in Topologically sorted order. Then if you add an edge that appears to brake the sort, and can not be fixed, then you have a cycle.
https://stackoverflow.com/a/261621/831850
So if we have a sorted list of nodes:
1, 2, 3, ..., x, ..., z, ...
Such that each node is waiting for nodes to its left.
Say we want to add an edge from x->z. Well that appears to brake the sort. So we can move the node at x to position z+1 which will fix the sort iif none of the nodes (x, z] have an edge to the node at x.

Will a standard Kruskal-like approach for MST work if some edges are fixed?

The problem: you need to find the minimum spanning tree of a graph (i.e. a set S of edges in said graph such that the edges in S together with the respective vertices form a tree; additionally, from all such sets, the sum of the cost of all edges in S has to be minimal). But there's a catch. You are given an initial set of fixed edges K such that K must be included in S.
In other words, find some MST of a graph with a starting set of fixed edges included.
My approach: standard Kruskal's algorithm but before anything else join all vertices as pointed by the set of fixed edges. That is, if K = {1,2}, {4,5} I apply Kruskal's algorithm but instead of having each node in its own individual set initially, instead nodes 1 and 2 are in the same set and nodes 4 and 5 are in the same set.
The question: does this work? Is there a proof that this always yields the correct result? If not, could anyone provide a counter-example?
P.S. the problem only inquires finding ONE MST. Not interested in all of them.

Yes, it will work as long as your initial set of edges doesn't form a cycle.
Keep in mind that the resulting tree might not be minimal in weight since the edges you fixed might not be part of any MST in the graph. But you will get the lightest spanning tree which satisfies the constraint that those fixed edges are part of the tree.
How to implement it:
To implement this, you can simply change the edge-weights of the edges you need to fix. Just pick the lowest appearing edge-weight in your graph, say min_w, subtract 1 from it and assign this new weight,i.e. (min_w-1) to the edges you need to fix. Then run Kruskal on this graph.
Why it works:
Clearly Kruskal will pick all the edges you need (since these are the lightest now) before picking any other edge in the graph. When Kruskal finishes the resulting set of edges is an MST in G' (the graph where you changed some weights). Note that since you only changed the values of your fixed set of edges, the algorithm would never have made a different choice on the other edges (the ones which aren't part of your fixed set). If you think of the edges Kruskal considers, as a sorted list of edges, then changing the values of the edges you need to fix moves these edges to the front of the list, but it doesn't change the order of the other edges in the list with respect to each other.
Note: As you may notice, giving the lightest weight to your edges is basically the same thing as you suggest. But I think it is a bit easier to reason about why it works. Go with whatever you prefer.
I wouldn't recommend Prim, since this algorithm expands the spanning tree gradually from the current connected component (in the beginning one usually starts with a single node). The case where you join larger components (because your fixed edges might not all be in a single component), would be needed to handled separately - it might not be hard, but you would have to take care of it. OTOH with Kruskal you don't have to adapt anything, but simply manipulate your graph a bit before running the regular algorithm.

If I understood the question properly, Prim's algorithm would be more suitable for this, as it is possible to initialize the connected components to be exactly the edges which are required to occur in the resulting spanning tree (plus the remaining isolated nodes). The desired edges are not permitted to contain a cycle, otherwise there is no spanning tree including them.
That being said, apparently Kruskal's algorithm can also be used, as it is explicitly stated that is can be used to find an edge that connects two forests in a cost-minimal way.
Roughly speaking, as the forests of a given graph form a Matroid, the greedy approach yields the desired result (namely a weight-minimal tree) regardless of the independent set you start with.

Find paths that cover all edges between two nodes

we hope you are able to help us with the following problem:
A directed graph that may contain cycles is given. One has to find a set of paths that fulfill the following criterion:
all edges that can be passed on the way from node A to node B must be covered by the paths within the set (one edge can be part of more than one paths from the set)
the solution does not have to be necessarily the one with the lowest number of paths and the paths does not have to be necessarily the shortest ones. However, the solution should be efficiently implementable using a programming language just as java. We need the solution to generate a few test cases and it is important to cover all edges between a node A and a node B.
does everyone know a suitable algorithm? or does no efficient solution exist?
thanks a lot in advance for your advise! (we have already searched for a solution, but the one we found was focused on shortest paths and were extremely inefficient)
Here is a graphical representation of our problem:
http://i.stack.imgur.com/wIY34.jpg

Consider all edges R(A) reachable from A. They can be found by adding a node on each edge (i.e. turning each edge U->V to U->X->V) and then perform a Breadth First Search starting from A.
Edges outside of R(A) clearly cannot be be on a path from A to B, since then they'd be reachable from A. So all paths to B must go through edges of R(A).
So the set of edges, U, we want to "cover" are all edges of R(A) that B is reachable from.
Now we are looking for a set of paths S from A to B, which contains all edges of U.
A straightforward method is the following:
Color all edges of R(A) black and set S={ }
While there are black edges remaining:
Take a black edge UV.
If B is reachable from V:
Construct a path P = A -> ... -> U -> V -> ... -> B
Color all edges of P as gray
Add P to S
Else:
Color UV as gray.
Then return S
As #user189 pointed out, if we consider reachable edges from A that go through B, we are allowing paths that go twice through B. (I.e. a->b->c->g->f->e in the example image).
His suggested solution (removing the node B before computing R(A) ) fixes this.
Regarding complexity:
R(A) can be computed in O(|E|) time and the paths from A to an edge UV in R(A) can be directly read from the BFS tree. To check for reachability to B from V and to find the path, we can use a BFS tree starting from B and following edges backwards, computed in O(|E|) time.
If we reference the paths implicitly through the edge UV that connects the two BFS trees, and use a O(1) read/update structure to maintain the set of black edges and to look up edges in the BFS trees, I think we can do this in O(|E|) time.

Find all critical edges of an MST

I have this question from Robert Sedgewick's book on algorithms.
Critical edges. An MST edge whose deletion from the graph would cause the
MST weight to increase is called a critical edge. Show how to find all critical edges in a
graph in time proportional to E log E. Note: This question assumes that edge weights
are not necessarily distinct (otherwise all edges in the MST are critical).
Please suggest an algorithm that solves this problem.
One approach I can think of does the job in time E.V.
My approach is to run the kruskal's algorithm.
But whenever we encounter an edge whose insertion in the MST creates a cycle and if that
cycle already contains an edge with the same edge weight, then, the edge already inserted will not be a critical edge (otherwise all other MST edges are critical edges).
Is this algorithm correct? How can I extend this algorithm to do the job in time E log E.

The condition you suggest for when an edge is critical is correct I think. But it's not necessary to actually find a cycle and test each of its edges.
The Kruskal algorithm adds edges in increasing weight order, so the sequence of edge additions can be broken into blocks of equal-weight edge additions. Within each equal-weight block, if there is more than one edge that joins the same two components, then all of these edges are non-critical, because any one of the other edges could be chosen instead. (I say they are all non-critical because we are not actually given a specific MST as part of the input -- if we were then this would identify a particular edge to call non-critical. The edge that Kruskal actually chooses is just an artefact of initial edge ordering or how sorting was implemented.)
But this is not quite sufficient: it might be that after adding all edges of weight 4 or less to the MST, we find that there are 3 weight-5 edges, connecting component pairs (1, 2), (2, 3) and (1, 3). Although no component pair is joined by more than 1 of these 3 edges, we only need (any) 2 of them -- using all 3 would create a cycle.
For each equal-weight block, having weight say w, what we actually need to do is (conceptually) create a new graph in which each component of the MST so far (i.e. using edges having weight < w) is a vertex, and there is an edge between 2 vertices whenever there is a weight-w edge between these components. (This may result in multi-edges.) We then run DFS on each component of this graph to find any cycles, and mark every edge belonging to such a cycle as non-critical. DFS takes O(nEdges) time, so the sum of the DFS times for each block (whose sizes sum to E) will be O(E).
Note that Kruskal's algorithm takes time O(Elog E), not O(E) as you seem to imply -- although people like Bernard Chazelle have gotten close to linear-time MST construction, TTBOMK no one has got there yet! :)

Yes, your algorithm is correct. We can prove that by comparing the execution of Kruskal's algorithm to a similar execution where the cost of some MST edge e is changed to infinity. Until the first execution considers e, both executions are identical. After e, the first execution has one fewer connected component than the second. This condition persists until an edge e' is considered that, in the second execution, joins the components that e would have. Since edge e is the only difference between the forests constructed so far, it must belong to the cycle created by e'. After e', the executions make identical decisions, and the difference in the forests is that the first execution has e, and the second, e'.
One way to implement this algorithm is using a dynamic tree, a data structure that represents a labelled forest. One configuration of this ADT supports the following methods in logarithmic time.
MakeVertex() - constructs and returns a fresh vertex.
Link(u, c, v) - vertices u and v must not be connected. Creates an unmarked edge from vertex u to vertex v with cost c.
Mark(u, v) - vertices u and v must be endpoints of an edge e. Marks e.
Connected(u, v) - indicates whether vertices u and v are connected.
FindMax(u, v) - vertices u and v must be connected. Returns the endpoints of an unmarked edge on the unique path from u to v with maximum cost, together with that cost. The endpoints of this edge are given in the order that they appear on the path.
I make no claim that this is a good algorithm in practice. Dynamic trees, like Swiss Army knives, are versatile but complicated and often not the best tool for the job. I encourage you to think about how to take advantage of the fact that we can wait until all of the edges are processed to figure out what the critical edges are.

What algorithm can I apply to this DAG?

I have a DAG representing a list of properties. These properties are such that if a>b, then a has a directed edge to b. It is transitive as well, so that if a>b and b>c, then a has a directed edge to c.
However, the directed edge from a to c is superfluous because a has a directed edge to b and b has a directed edge to c. How can I prune all these superfluous edges? I was thinking of using a minimum spanning tree algorithm, but I'm not really sure what is the appropriate algorithm to apply in this situation
I suppose I could do a depth first search from each node and all its outgoing edges and compare if it can reach certain nodes without using certain edges, but this seems horribly inefficient and slow.
After the algorithm is complete, the output would be a linear list of all the nodes in an order that is consistent with the graph. So if a has three directed edges to b,c, and d. b and c also each of which has a directed edge to d, the output could be either abcd or acbd.

This is called the transitive reduction problem. Formally speaking, you are looking for a minimal (fewest edges) directed graph, the transitive closure of which is equal to the transitive closure of the input graph. (The diagram on the above Wikipedia link makes it clear.)
Apparently there exists an efficient algorithm for solving this problem that takes the same time as for producing a transitive closure (i.e. the more common inverse problem of adding transitive links instead of removing them), however the link to the 1972 paper by Aho, Garey, and Ullman costs $25 to download, and some quick googling didn't turn up any nice descriptions.
EDIT: Scott Cotton's graphlib contains a Java implementation! This Java library looks to be very well organised.

Actually, after looking around a little more, I think a Topologicalsort is what I'm really after here.

If these are already n nodes with directed edges:
Starting from any point M, loop all its child edge, select the biggest child (like N), remove other edges, the complexity should be o(n) . If no N exists (no child edge, goto step 3).
start from N, repeat step 1.
start from point M, select the smallest parent node ( like T), remove others' edges.
start from T, repeat step 3.....
Actually it's just a ordering algorithm, and the totally complexity should be o(0.5n^2).
One problem is that if we want loop one node's parent nodes, then we need more memory to log edge so we can trace back from child to parent. This can be improved in the step 3 where we choose one node from the left nodes bigger than M, this means we need to keep a list of nodes to know what nodes are left..

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio