Given an undirected graph, I want an algorithm (inO(|V|+|E|)) that will find me the heaviest edge in the graph that forms a cycle. For example, if my graph is as below, and I'll run DFS(A), then the heaviest edge in the graph will be BC.
(*) In this problem, I have at most 1 cycle.
I'm trying to write a modified DFS, that will return the desired heavy edge, but I'm having some trouble.
Because I have at most 1 cycle, I can save the edges in the cycle in an array, and find the maximum edge easily at the end of the run, but I think this answer seems a bit messy, and I'm sure there's a better recursive answer.
I think the easiest way to solve this is to use a union-find data structure (https://en.wikipedia.org/wiki/Disjoint-set_data_structure) in a manner similar to Kruskal's MST algorithm:
Put each vertex in its own set
Iterate through the edges in order of weight. For each edge, merge the sets of the adjacent vertices if they're not already in the same set.
Remember the last edge for which you found that its adjacent vertices were already in the same set. That's the one you're looking for.
This works because the last and heaviest edge that you visit in any cycle must already have its adjacent vertices connected by edges you visited earlier.
Use Tarjan's Strongly Connected Components algorithm.
Once you have split your graph into many strongly connected graphs assign a COMP_ID to each node which specifies the component ID to which this node belongs (This can be done with a small edit on the algorithm. Define a global integer value which starts at 1. Every time you pop nodes from the stack they all correspond to the same component, save the value of this variable to the COMP_ID of these nodes. When the pop loop ends increment the value of this integer by one).
Now, iterate over all the edges. You have 2 possibilities:
If this edge links two nodes from two different components, then this edge can't be the answer, since it can't possibly be a part of a cycle.
If this edge links two nodes from the same component, then this edge is a part of some cycle. All you have left to do now is to choose the maximum edge among all the edges of type 2.
The described approach runs in a total complexity of O(|V| + |E|) because every node and edge corresponds to at most one strongly connected component.
In the graph example you provided COMP_ID will be as follows:
COMP_ID[A] = 1
COMP_ID[B] = 2
COMP_ID[C] = 2
COMP_ID[D] = 2
Edge 10 connects COMP_ID 1 with COMP_ID 2, thus it can't be the answer. The answer is the maximum among edges {2, 5, 8} since they all connect COMP_ID 1 with it self, thus the answer is 8
Related
I have seen ways to detect a cycle in a graph, but I still have not managed to find a way to detect a "bridge-like" cycle. So let's say we have found a cycle in a connected (and undirected) graph. How can we determine whether removing this cycle will disconnect the graph or not? By removing the cycle, I mean removing the edges in the cycle (so the vertices are unaffected).
One way to do it is clearly to count the number of components before and after the removal. I'm just curious to know if there's a better way.
If there happens to be an established algorithm for that, could anyone please point me to a related work/paper/publication?
Here's the naive algorithm, complexity wise I don't think there's a more efficient way of doing the check.
Start with your list of edges (cycleEdges)
Get the set of vertices within cycleEdges (cycleVertices)
If a vertex in cycleVertices only contains edges that are part of cycleEdges return FALSE
For Each vetex In cycleVertices
Recursively follow vertex's edges that are not in cycleEdges (avoid already visited vertices)
If a vertex is reached that is not in cycleVertices add it to te set outsideVertices (stop recursively searching this path)
If only vertices that are in cycleVertices have been reached Return FALSE
If outsideVertices contains 1 element Return TRUE
Choose a vertex in outsideVertices and remove it from outsideVertices
Recursively follow that vertex's edges that are not in cycleEdges (avoid already visited vertices) (favor choosing edges that contain a vertex in outsideVertices to improve running time for large graphs)
If a vertex is reached that is in outsideVertices remove it from outsideVertices
If outsideVertices is empty Return TRUE
Return FALSE
You can do it for E+V.
You can get all bridges in your graph for E+V by dfs + dynamic programming.
http://www.geeksforgeeks.org/bridge-in-a-graph
Save them (just make boolean[E], and make true.
Then you can say for O(1) the edge is bridge or not.
You can just take all edges from your cycle and verify that it is bridge.
Vish's mentions articulation points which is definitely in the right direction. More can be said though. Articulation points can be found via a modified DFS algorithm that looks something like this:
Execute DFS, assigning each number with its DFS number (e.g. the number of nodes visited before it). When you encounter a vertex that has already been discovered compare its DFS number to the current vertex and you can store a LOW number associated with this vertex (e.g. the lowest DFS number that this node has "seen"). As you recurse back to the start vertex, you can compare the parent vertex with the child's LOW number. As you're recursing back, if a parent vertex ever sees a child's low number that is greater than or equal to its own DFS number, then that parent vertex is an articulation point.
I'm using "child" and "parent" here as descriptors a lot because in the DFS tree we have to consider a special case for the root. If it ever sees a child's low number that is greater than or equal to its own DFS number and it has two children in the tree, then the first vertex is an articulation.
Here's a useful art. point image
Another concept to be familiar with, especially for undirected graphs, is biconnected components, aka any subgraph whose vertices are in a cycle with all other vertices.
Here's a useful colored image with biconnected components
You can prove that any two biconnected components can only share one vertex max; two "shared" vertices would mean that the two are in a cycle, as well as with all the other vertices in the components so the two components are actually one large component. As you can see in the graph, any vertex shared by two components (has more than one color) is an articulation point. Removing the cycle that contains any articulation point will thus disconnect the graph.
Well, as in a cycle from any vertex x can be reached any other vertex y and vice-verse, then it's a strongly connected component, so we can contract a cycle into a single vertex that represents the cycle. The operation can be performed for O(n+m) using DFS. Now, we can apply DFS again in order to check whether the contracted cycles are articulation vertices, if they are, then removing them will disconnect a graph, else not. Total time is 2(n+m) = O(n+m)
Similar question has been asked before, as in https://cs.stackexchange.com/questions/28635/find-an-mst-in-a-graph-with-edge-weights-from-1-2
The question is:
Given a connected undirected graph G=(V,E) and a weight function w:Eā{1,2}, suggest an efficient algorithm that finds an MST of the graph in O(V+E) without using Kruskal.
I had a look at the suggested solutions on the thread above, but am still not sure how to make it work. The first suggestion doesn't consider components. The second suggestion doesn't provide more details on how to identify if the newly considered edge will form a cycle if it is added to the current MST. The tricky part is how to identify if two vertices are in the same component in liner time.
My current thought is
sort the vertices, which can be done in linear time
consider edges with weight 1 first. add the edge to the MST when the number of edges is less than or equal to |V1|-1. V1 are the vertices on the edges with a weight of 1. We need to make sure all the vertices with a weight of 1 is checked. Hash set can be used to store V1 and edges.
add V2 to the graph by using the same logic.
Could anyone suggest if my thought has flaws? If so, what is the best way to tackle this question? Thank you very much!
I would suggest you to do something like the second answer in the given question:
This is prim's algorithm :
Start by picking any vertex to be the root of the tree.
While the tree does not contain all vertices in the graph find
shortest edge leaving the tree and add it to the tree .
So now if we can perform the finding in the set of edges leaving the tree in O(1) time and we can keep the set updated so the search can always happen in constant time in total O(|E|) time, then we are good to go.
Now for this to happen, think of the set of edges leaving the tree as a linked list. now whenever a vertex is added to the set of vertexes that form the MST, iterate through its adjacency list and add the edges of weight 1 to the front of the list, and add the edges of weight 2 to the end of the list. Now whenever you want the minimum edge leaving the tree just take one from the front of the linked list!
The only problem with this method is that you should only add the edges to the list that are "leaving the tree"! because if we don't, we might end up having cycles! For checking this "leaving the tree" property for each edge, we can use the set of selected vertices, and we can check each edge before adding, that it doesn't have both ends in the set, so simply when you are adding the edges of a newly added vertex to set of edges leaving the tree, first check if the vertex on the other end of the edge is in the set of selected vertices of tree, and add the edge to the list only if the other edge wasn't in the set. You can check existence of an element in a set in O(1) time and this way you won't end up with cycles!
I have a connected, non-directed, graph with N nodes and 2N-3 edges. You can consider the graph as it is built onto an existing initial graph, which has 3 nodes and 3 edges. Every node added onto the graph and has 2 connections with the existing nodes in the graph. When all nodes are added to the graph (N-3 nodes added in total), the final graph is constructed.
Originally I'm asked, what is the maximum number of nodes in this graph that can be visited exactly once (except for the initial node), i.e., what is the maximum number of nodes contained in the largest Hamiltonian path of the given graph? (Okay, saying largest Hamiltonian path is not a valid phrase, but considering the question's nature, I need to find a max. number of nodes that are visited once and the trip ends at the initial node. I thought it can be considered as a sub-graph which is Hamiltonian, and consists max. number of nodes, thus largest possible Hamiltonian path).
Since i'm not asked to find a path, I should check if a hamiltonian path exists for given number of nodes first. I know that planar graphs and cycle graphs (Cn) are hamiltonian graphs (I also know Ore's theorem for Hamiltonian graphs, but the graph I will be working on will not be a dense graph with a great probability, thus making Ore's theorem pretty much useless in my case). Therefore I need to find an algorithm for checking if the graph is cycle graph, i.e. does there exist a cycle which contains all the nodes of the given graph.
Since DFS is used for detecting cycles, I thought some minor manipulation to the DFS can help me detect what I am looking for, as in keeping track of explored nodes, and finally checking if the last node visited has a connection to the initial node. Unfortunately
I could not succeed with that approach.
Another approach I tried was excluding a node, and then try to reach to its adjacent node starting from its other adjacent node. That algorithm may not give correct results according to the chosen adjacent nodes.
I'm pretty much stuck here. Can you help me think of another algorithm to tell me if the graph is a cycle graph?
Edit
I realized by the help of the comment (thank you for it n.m.):
A cycle graph consists of a single cycle and has N edges and N vertices. If there exist a cycle which contains all the nodes of the given graph, that's a Hamiltonian cycle. ā n.m.
that I am actually searching for a Hamiltonian path, which I did not intend to do so:)
On a second thought, I think checking the Hamiltonian property of the graph while building it will be more efficient, which is I'm also looking for: time efficiency.
After some thinking, I thought whatever the number of nodes will be, the graph seems to be Hamiltonian due to node addition criteria. The problem is I can't be sure and I can't prove it. Does adding nodes in that fashion, i.e. adding new nodes with 2 edges which connect the added node to the existing nodes, alter the Hamiltonian property of the graph? If it doesn't alter the Hamiltonian property, how so? If it does alter, again, how so? Thanks.
EDIT #2
I, again, realized that building the graph the way I described might alter the Hamiltonian property. Consider an input given as follows:
1 3
2 3
1 5
1 3
these input says that 4th node is connected to node 1 and node 3, 5th to node 2 and node 3 . . .
4th and 7th node are connected to the same nodes, thus lowering the maximum number of nodes that can be visited exactly once, by 1. If i detect these collisions (NOT including an input such as 3 3, which is an example that you suggested since the problem states that the newly added edges are connected to 2 other nodes) and lower the maximum number of nodes, starting from N, I believe I can get the right result.
See, I do not choose the connections, they are given to me and I have to find the max. number of nodes.
I think counting the same connections while building the graph and subtracting the number of same connections from N will give the right result? Can you confirm this or is there a flaw with this algorithm?
What we have in this problem is a connected, non-directed graph with N nodes and 2N-3 edges. Consider the graph given below,
A
/ \
B _ C
( )
D
The Graph does not have a Hamiltonian Cycle. But the Graph is constructed conforming to your rules of adding nodes. So searching for a Hamiltonian Cycle may not give you the solution. More over even if it is possible Hamiltonian Cycle detection is an NP-Complete problem with O(2N) complexity. So the approach may not be ideal.
What I suggest is to use a modified version of Floyd's Cycle Finding algorithm (Also called the Tortoise and Hare Algorithm).
The modified algorithm is,
1. Initialize a List CYC_LIST to ā
.
2. Add the root node to the list CYC_LIST and set it as unvisited.
3. Call the function Floyd() twice with the unvisited node in the list CYC_LIST for each of the two edges. Mark the node as visited.
4. Add all the previously unvisited vertices traversed by the Tortoise pointer to the list CYC_LIST.
5. Repeat steps 3 and 4 until no more unvisited nodes remains in the list.
6. If the list CYC_LIST contains N nodes, then the Graph contains a Cycle involving all the nodes.
The algorithm calls Floyd's Cycle Finding Algorithm a maximum of 2N times. Floyd's Cycle Finding algorithm takes a linear time ( O(N) ). So the complexity of the modied algorithm is O(N2) which is much better than the exponential time taken by the Hamiltonian Cycle based approach.
One possible problem with this approach is that it will detect closed paths along with cycles unless stricter checking criteria are implemented.
Reply to Edit #2
Consider the Graph given below,
A------------\
/ \ \
B _ C \
|\ /| \
| D | F
\ / /
\ / /
E------------/
According to your algorithm this graph does not have a cycle containing all the nodes.
But there is a cycle in this graph containing all the nodes.
A-B-D-C-E-F-A
So I think there is some flaw with your approach. But suppose if your algorithm is correct, it is far better than my approach. Since mine takes O(n2) time, where as yours takes just O(n).
To add some clarification to this thread: finding a Hamiltonian Cycle is NP-complete, which implies that finding a longest cycle is also NP-complete because if we can find a longest cycle in any graph, we can find the Hamiltonian cycle of the subgraph induced by the vertices that lie on that cycle. (See also for example this paper regarding the longest cycle problem)
We can't use Dirac's criterion here: Dirac only tells us minimum degree >= n/2 -> Hamiltonian Cycle, that is the implication in the opposite direction of what we would need. The other way around is definitely wrong: take a cycle over n vertices, every vertex in it has exactly degree 2, no matter the size of the circle, but it has (is) an HC. What you can tell from Dirac is that no Hamiltonian Cycle -> minimum degree < n/2, which is of no use here since we don't know whether our graph has an HC or not, so we can't use the implication (nevertheless every graph we construct according to what OP described will have a vertex of degree 2, namely the last vertex added to the graph, so for arbitrary n, we have minimum degree 2).
The problem is that you can construct both graphs of arbitrary size that have an HC and graphs of arbitrary size that do not have an HC. For the first part: if the original triangle is A,B,C and the vertices added are numbered 1 to k, then connect the 1st added vertex to A and C and the k+1-th vertex to A and the k-th vertex for all k >= 1. The cycle is A,B,C,1,2,...,k,A. For the second part, connect both vertices 1 and 2 to A and B; that graph does not have an HC.
What is also important to note is that the property of having an HC can change from one vertex to the other during construction. You can both create and destroy the HC property when you add a vertex, so you would have to check for it every time you add a vertex. A simple example: take the graph after the 1st vertex was added, and add a second vertex along with edges to the same two vertices of the triangle that the 1st vertex was connected to. This constructs from a graph with an HC a graph without an HC. The other way around: add now a 3rd vertex and connect it to 1 and 2; this builds from a graph without an HC a graph with an HC.
Storing the last known HC during construction doesn't really help you because it may change completely. You could have an HC after the 20th vertex was added, then not have one for k in [21,2000], and have one again for the 2001st vertex added. Most likely the HC you had on 23 vertices will not help you a lot.
If you want to figure out how to solve this problem efficiently, you'll have to find criteria that work for all your graphs that can be checked for efficiently. Otherwise, your problem doesn't appear to me to be simpler than the Hamiltonian Cycle problem is in the general case, so you might be able to adjust one of the algorithms used for that problem to your variant of it.
Below I have added three extra nodes (3,4,5) in the original graph and it does seem like I can keep adding new nodes indefinitely while keeping the property of Hamiltonian cycle. For the below graph the cycle would be 0-1-3-5-4-2-0
1---3---5
/ \ / \ /
0---2---4
As there were no extra restrictions about how you can add a new node with two edges, I think by construction you can have a graph that holds the property of hamiltonian cycle.
Here is the full question:
Assume we have a directed graph G = (V,E), we want to find a graph G' = (V,E') that has the following properties:
G' has same connected components as G
G' has same component graph as G
E' is minimized. That is, E' is as small as possible.
Here is what I got:
First, run the strongly connected components algorithm. Now we have the strongly connected components. Now go to each strong connected component and within that SCC make a simple cycle; that is, a cycle where the only nodes that are repeated are the start/finish nodes. This will minimize the edges within each SCC.
Now, we need to minimize the edges between the SCCs. Alas, I can't think of a way of doing this.
My 2 questions are: (1) Does the algorithm prior to the part about minimizing edges between SCCs sound right? (2) How does one go about minimizing the edges between SCCs.
For (2), I know that this is equivalent to minimizing the number of edges in a DAG. (Think of the SCCs as the vertices). But this doesn't seem to help me.
The algorithm seems right, as long as you allow for closed walks (i.e. repeating vertices.) Proper cycles might not exist (e.g. in an "8" shaped component) and finding them is NP-hard.
It seems that it is sufficient to group the inter-component edges by ordered pairs of components they connect and leave only one edge in each group.
Regarding the step 2,minimize the edges between the SCCs, you could randomly select a vertex, and run DFS, only keeping the longest path for each pair of (root, end), while removing other paths. Store all the vertices searched in a list L.
Choose another vertex, if it exists in L, skip to the next vertex; if not, repeat the procedure above.
1 Begin with a connected graph G containing edges of distinct weights, and an empty set of edges T
2 While the vertices of G connected by T are disjoint:
3 Begin with an empty set of edges E
4 For each component:
5 Begin with an empty set of edges S
6 For each vertex in the component:
7 Add the cheapest edge from the vertex in the component to another vertex in a disjoint component to S
8 Add the cheapest edge in S to E
9 Add the resulting set of edges E to T.
10 The resulting set of edges T is the minimum spanning tree of G.
From Wikipedia. I understand the outer loop is logV since you're joining sets. But then comes the internal loop.
If you use equivalence relations to keep track of the sets, that means you're only getting the element representing the set, so you can't determine the edge with the smallest weight between the two sets because you don't have all the elements. If you modify the structure to hold references to the children you still have to get all the children of each set. That means, worse case scenario, O(V/2) = O(V) for each set.
Afterwards, you still have to find the smallest edge connecting the two, which means going over all the edges connecting the two components. So you need to iterate over each node and see if its edge connects to an element in the other component, and if it does, if it's smaller than the minimum edge you currently have.
Meaning, an outer loop to iterate over the nodes and an inner loop to iterate over that nodes' edges - O(VE). Since it's inside an O(logV) loop, you get O(logVV*E).
Now, it seems as if you just have to iterate through all the edges, but how would you choose the minimum edge between the 2 components? I can tell if a given edge connects nodes in different components, but I can't tell which one connecting them has minimum weight. And if I get the one with the minimum weight, it might not connect them.
If hash tables are allowed, then I see how it can be an O(Elog N) algorithm.
Every component is stored as different hash set. Initially, each set contains a single node. The step of finding the minimum "bridges" for all components takes O(E) time, since we examine each edge at most twice, and we assume constant time lookup in the hash sets. Then we proceed by merging the sets, which takes O(N) time. Since the graph is connected, E>=N-1, so we have a total cost of O(E) per iteration.
--EDIT--
Following throwawayacct's comment,there is no need for hash structures at all. At each iteration we have a forest graph resulting from the previous iteration, so we can re-compute its connected components in O(E) time. This can be done for example by a simple DFS traversal from all nodes, that sets a unique "color" for each component. Then, when scanning the edges in order to find bridges, we only consider edges connecting nodes of different color.