Algorithm for Finding Graph Connectivity

Algorithm for Finding Graph Connectivity - algorithm

I'm tackling an interesting question in programming. It is this: we keep adding undirected edges to a graph, until the graph (or subgraph) is connected (i.e. we can use some path to get from each vertex to any other vertex in that subgraph). We stop as soon as the graph is connected.
For example if we have vertices 1,2,3 and 4 and we want the subgraph 1,2,3 to be connected.
Let's say we have edges (3,4), then (2,3), then (1,4), then (1,3). We only need to add in the first 3 edges for the subgraph to be connected, then we stop (edge 1,3 isn't needed).
Obviously I can run a BFS every time an edge is added to see if we can reach the required vertices, but if there are say m edges then we would potentially have to run BFS m times which seems too slow. Any better options? Thanks.

You should research the marvelous "Disjoint-set data structure" and the corresponding union - find algorithm. It can seem magical, but the worst case time and space complexity are tiny, O(α(n)) and O(n) respectively, where α is the inverse Ackerman function.

You can run just one time the BFS to find connected components. Then, each time you add an edge, if it is between vertices of two different components, you can merge them by a reference. So, the complexity of this algorithm is |V| + |E|.
Notice that the implementation of this method should be done by some reference techniques, especially to update the component number of the vertices.

I would normally do this using a disjoint set structure, as Doug suggests. It would be like Kruskal's algorithm for finding the minimum spanning tree, except you process edges in the given order.
If you don't need a spanning tree as output, though, then you can do this with an incremental BFS or DFS:
Pick any vertex, and find the vertices connected to it with BFS or DFS. Color these vertices red. If you start with no edges, of course, then there will be only one red vertex at this stage.
As you add edges, don't do anything else until you add an edge that connects a red vertex to a non-red vertex. Then run BFS or DFS, excluding the new edge, to find all the new vertices that will connect to the red set. Color them all red.
Stop when all vertices are red.
This is a little simpler in practice than using disjoint set, and takes O(|V|+|E|) time, since each vertex will be traversed by exactly one BFS/DFS search.
It does the work in chunks, though, so if you need each edge test to be fast individually, then disjoint set is better.

Related

Necessary to MST edges in a graph

Given a G(V,E) weighted(on edges) graph i need to find the number of edges that belong in every MST as well as the number of the edges that belong in at least one but not all and the ones that belong in none.
The graph is given as input in the following form(example):
3 3
1 2 1
1 3 1
2 3 2
First 3 is the number of nodes,second 3 is the number of edges.The following three lines are the edges where first number is where they start,second is where they end and third is the value.
I've thought about running kruskal once to find an MST and then for each edge belonging in G check in(linear?) time if it can be replace an edge in this MST without altering it's overall weight.If it can't it belongs in none.If it can it belongs in one but not all.I can also mark the edges in the first MST(maybe with a second value 1 or 0)and in the end check how many of them could not be replaced.These are the ones belonging in every possible MST. This algorithm is probably O(V^2) and i am not quite sure how to write it in C++.My questions are,could we somehow reduce it's complexity? If i add an edge to the MST how can i check(and implement in C++) if the cycle formed contains an edge that has less weight?

You can do this by adding some extra work to Kruskal's algorithm.
In Kruskal's algorithm, edges with the same weight can be in any order, and the order in which they are actually checked determines which MST you get out of all possible MSTs. For every MST, there is a sort-consistent order of the edges that will produce that tree.
No matter what sort-consistent order is used, the state of the union-find structure will be the same between weights.
If there is only one edge of a specific weight, then it is in every MST if Kruskal's algorithm selects it, because it would be selected with any sort-consistent edge order, and otherwise it is in no MSTs.
If there are multiple edges with the same weight, and Kruskal's algorithm would select at least one of them, then you can pause Kruskal's at that point and make a new (small) graph with just those edges that connect different sets, using the sets that they connect as vertices.
All of those edges are in at least one MST, because Kruskal's might pick them first. Any of those edges that are bridges in the new graph are in every MST, because Kruskal's would select them no matter what. See: https://en.wikipedia.org/wiki/Bridge_(graph_theory)
So, modify Kruskal's algorithm as follows:
When Kruskal's would select an edge, before doing the union, gather and process ALL the non-redundant edges with the same weight.
Make a small graph with those edges using the find() sets they connect as vertices
Use Tarjan's algorithm or equivalent (see the link) to find all the bridges in the graph. Those edges are in ALL MSTs.
The remaining edges in the small graph are in some, but not all, MSTs.
perform all the unions for edges in the small graph and move on to the next weight.
Since bridge-finding can be done in linear time, and every edge is in at most one small graph, the whole algorithm is still O(|V| + |E| log |E|).

DFS count isolated nodes for articulation points

I have a graph and a starting node. I want to find how many nodes become isolated when I remove each node in the graph, for all nodes, using DFS.
For example, if I start on a fixed node 1, and remove node 2, how many isolated nodes I will have? and if I remove node 3?
I know I can just do DFS for all nodes(removing a different node each time), but doing so I will have to navigate the graph one time for each node, I want to solve it with just one run.
I have been told it has O(|V|*||A|), being |V|=number of edges, and |A|= number of nodes.
I've been playing with prenum and postnums, but with no success.

Let N be the number of vertices and M be the number of edges. If you just want a O(NM) solution as you stated, you don't need to go any further than running a DFS for each vertex.
The complexity for each DFS is O(N+M) so the total complexity will be of O(N(N+M)) = O(N²+NM). Usually we have more edges than vertices, so NM grows much faster than N² and we can say that the complexity is of O(NM). Keep in mind, though, that if you physically delete the current vertex at each step your implementation will have a much worse complexity, because physically deleting a vertex means removing entries from a lot of adjacency lists, which is costly no matter how you represent the graph. There is an implementation trick to speed up the process: instead of physically deleting the current vertex before each DFS, just mark the vertex as deleted, and when you are going through the adjacency lists during the DFS just ignore the marked vertex.
However, i feel that you can solve this problem in O(N+M) using Tarjan's algorithm for finding articulation points. This algorithm will find every vertex that, when removed from the graph, splits the graph in more than one connected component (these vertices are called articulation points). It's easy to see that there won't be isolated vertices if you remove a vertex that is not an articulation point. However, if you remove an articulation point, you will split the graph in two parts G and G', where G is the connected component of the starting vertex, and G' is the rest of the graph. All vertices from G' are isolated because you can't reach them if you run a DFS from the starting vertex. I think that you can find the size of G' for each vertex deletion efficiently, maybe you can even do this while running Tarjan's. If i find a solution i can edit this answer later.
EDIT: i managed to solve the problem in O(N+M). I will give some hints so you can find the answer by yourself:
Every undirected graph can be decomposed in (not disjoint) sets of biconnected components: each biconnected component is a subset of the vertices of the graph where every vertex in this subset will remain connected even if you remove any vertex of the graph
Tarjan's O(N+M) algorithm to find bridges and articulation points can be altered in order to find the biconnected components, finding which vertices belong to each biconnected component, or which biconnected components contain each vertex
If you remove any vertex that is not an articulation point, answer for this vertex is obviously N-1
If you remove an articulation point, every vertex in the same biconnected component of the starting vertex will still be acessible, but you don't know about the other biconnected components. Don't worry, there is a way to find this efficiently
You can compress every graph G in a graph B of its biconnected components. The compression algorithm is simple: every biconnected component becomes a vertex in B, and you link biconnected components that share some articulation point. We can prove that the resulting graph B is a tree. You must use this tree somehow in order to solve the problem presented in step 4
Good luck!

Minimum vertex cover

I am trying to get a vertex cover for an "almost" tree with 50,000 vertices. The graph is generated as a tree with random edges added in making it "almost" a tree.
I used the approximation method where you marry two vertices, add them to the cover and remove them from the graph, then move on to another set of vertices. After that I tried to reduce the number of vertices by removing the vertices that have all of their neighbors inside the vertex cover.
My question is how would I make the vertex cover even smaller? I'm trying to go as low as I can.

Here's an idea, but I have no idea if it is an improvement in practice:
From https://en.wikipedia.org/wiki/Biconnected_component "Any connected graph decomposes into a tree of biconnected components called the block-cut tree of the graph." Furthermore, you can compute such a decomposition in linear time.
I suggest that when you marry and remove two vertices you do this only for two vertices within the same biconnected component. When you have run out of vertices to merge you will have a set of trees not connected with each other. The vertex cover problem on trees is tractable via dynamic programming: for each node compute the cost of the best answer if that node is added to the cover and if that node is not added to the cover. You can compute the answers for a node given the best answers for its children.
Another way - for all I know better - would be to compute the minimum spanning tree of the graph and to use dynamic programming to compute the best vertex cover for that tree, neglecting the links outside the tree, remove the covered links from the graph, and then continue by marrying vertices as before.
I think I prefer the minimum spanning tree one. In producing the minimum spanning tree you are deleting a small number of links. A tree with N nodes had N-1 links, so even if you don't get back the original tree you get back one with as many links as it. A vertex cover for the complete graph is also a vertex cover for the minimum spanning tree so if the correct answer for the full graph has V vertices there is an answer for the minimum spanning tree with at most V vertices. If there were k random edges added to the tree there are k edges (not necessarily the same) that need to be added to turn the minimum spanning tree into the full graph. You can certainly make sure these new edges are covered with at most k vertices. So if the optimum answer has V vertices you will obtain an answer with at most V+k vertices.

Here's an attempt at an exact answer which is tractable when only a small number of links are added, or when they don't change the inter-node distances very much.
Find a minimum spanning tree, and divide edges into "tree edges" and "added edges", where the tree edges form a minimum spanning tree, and the added edges were not chosen for this. They may not be the edges actually added during construction but that doesn't matter. All trees on N nodes have N-1 edges so we have the same number of added edges as were used during creation, even if not the same edges.
Now pretend you can peek at the answer in the back of the book just enough to see, for one vertex from each added edge, whether that vertex was part of the best vertex cover. If it was, you can remove that vertex and its links from the problem. If not, the other vertex must be so you can remove it and its links from the problem.
You now have to find a minimum vertex cover for a tree or a number of disconnected trees, and we know how to do this - see my other answer for a bit more handwaving.
If you can't peek at the back of the book for an answer, and there are k added edges, try all 2^k possible answers that might have been in the back of the book and find the best. If you are lucky then added link A is in a different subtree from added link B. In that case you can confine the two calculations needed for the two possibilities for added link A (or B) to the dynamic programming calculations for the relevant subtree so you have only doubled the work instead of quadrupled it. In general, if your k added edges are in k different subtrees that don't interfere with each other, the cost is multiplied by 2 instead of 2^k.

Minimum vertex cover is an NP complete algorithm, which means that you can not solve it in a reasonable time even for something like 100 vertices (not to mention 50k).
For a tree there is a polynomial time greedy algorithm which is based on DFS, but the fact that you have "random edges added" screws everything up and makes this algorithm useless.
Wikipedia has an article about approximation algorithm, claims that it reaches factor 2 and claims that no better algorithm is know, which makes it quit unlikely that you will find one.

Polygons from a list of edges

Given N points in a map of edges Map<Point, List<Edge>>, it's possible to get the polygons formed by these edges in O(N log N)?
What I know is that you have to walk all the vertices and get the edges containing that vertex as a starting point. These are edges of a voronoi diagram, and each vertex has, at most, 3 artists containing it. So, in the map, the key is a vertex, and the value is a list where the vertex is the start node.
For example:
Points: a,b,c,d,e,f,g
Edges: [a,b]; [a,c]; [a,d], [b,c], [d,e], [e,g], [g,f]
My idea is to iterate the map counterclockwise until I get the initial vertex. That is a polygon, then I put it in a list of polygons and keep looking for others. The problem is I do not want to overcome the complexity O(N log N)
Thanks!

You can loop through the edges and compute the distance from midpoint of the edge to all sites. Then sort the distances in ascending order and for inner voronoi polygons pick the first and the second. For outer polygons pick the first. Basically an edge separate/divide 2 polygons.
It's something O(m log n).

If I did find a polynomial solution to this problem I would not post it here because I am fairly certain this is at least NP-Hard. I think your best bet is to do a DFS. You might find this link useful Finding all cycles in undirected graphs.
You might be able to use the below solution if you can formulate your graph as a directed graph. There are 2^E directed graphs (because each edge can be represented in 2 directions). You could pick a random directed graph and use the below solution to find all of the cycles in this graph. You could do this multiple times for different random directed graphs keeping track of all the cycles and until you've reached a satisfactory error bounds.
You can efficiently create a directed graph with a little bit of state (Maybe store a + or - with an edge to note the direction?) And once you do this in O(n) the first time you can randomly flip x << E directions to get a new graph in what will essentially be constant time.
Since you can create subsequent directed graphs in constant time you need to choose the number of times to run the cycle finding algorithm to have it still be polynomial and efficient.
UPDATE - The below only works for directed graphs
Off the top of my head it seems like it's a better idea to think of this as a graph problem. Your map of vertices to edges is a graph representation. Your problem reduces to finding all of the loops in the graph because each cycle will be a polygon. I think "Tarjan's strongly connected components algorithm" will be of use here as it can do this in O(v+e).
You can find more information on the algorithm here https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm

Completely disconnecting a bipartite graph

I have a disconnected bipartite undirected graph. I want to completely disconnect the graph. Only operation that I can perform is to remove a node. Removing a node will automatically delete its edges. Task is to minimize the number of nodes to be removed. Each node in the graph has atmost 4 edges.
By completely disconnecting a graph, I mean that no two nodes should be connected through a link. Basically an empty edge set.

I think, you cannot prove your algorithm is optimal because, in fact, it is not optimal.
To completely disconnect your graph minimizing the number of nodes to be removed, you have to remove all the nodes belonging to the minimal vertex cover of your graph. Searching the minimal vertex cover is usually NP-complete, but for bipartite graphs there is a polynomial-time solution.
Find maximum matching in the graph (probably with Hopcroft–Karp algorithm). Then use König's theorem to get the minimal vertex cover:
Consider a bipartite graph where the vertices are partitioned into left (L) and right (R) sets. Suppose there is a maximum matching which partitions the edges into those used in the matching (E_m) and those not (E_0). Let T consist of all unmatched vertices from L, as well as all vertices reachable from those by going left-to-right along edges from E_0 and right-to-left along edges from E_m. This essentially means that for each unmatched vertex in L, we add into T all vertices that occur in a path alternating between edges from E_0 and E_m.
Then (L \ T) OR (R AND T) is a minimum vertex cover.

Here's a counter-example to your suggested algorithm.
The best solution is to remove both nodes A and B, even though they are different colors.

Since all the edges are from one set to another, find these two sets using say BFS and coloring using 2 colours. Then remove the nodes in smaller set.
Since there are no edges among themselves the rest of the nodes are disconnected as well.
[As a pre-processing step you can leave out nodes with 0 edges first.]

I have thought of an algorithm for it but am not able to prove if its optimal.
My algorithm: On each disconnected subgraph, I run a BFS and color it accordingly. Then I identify the number of nodes colored with each color and take the minimum of the two and store. I repeat the procedure for each subgraph and add up to get the required minimum. Help me prove the algorithm if it's correct.
EDIT: The above algorithm is not optimal. The accepted answer has been verified to be correct.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio