Find Minimum Vertex Connected Sub-graph - algorithm

First of all, I have to admit I'm not good at graph theory.
I have a weakly connected directed graph G=(V,E) where V is about 16 millions and E is about 180 millions.
For a given set S, which is a subset of V (size of S will be around 30), is it possible to find a weakly connected sub-graph G'=(V',E') where S is a subset of V' but try to keep the number of V' and E' as small as possible?
The graph G may change and I hope there's a way to find the sub-graph in real time. (When a process is writing into G, G will be locked, so don't worry about G get changed when your sub-graph calculation is still running.)
My current solution is find the shortest path for each pair of vertex in S and merge those paths to get the sub-graph. The result is OK but the running time is pretty expensive.
Is there a better way to solve this problem?

If you're happy with the results from your current approach, then it's certainly possible to do at least as well a lot faster:
Assign each vertex in S to a set in a disjoint set data structure: https://en.wikipedia.org/wiki/Disjoint-set_data_structure. Then:
Do a breadth-first-search of the graph, starting with S as the root set.
When you the search discovers a new vertex, remember its predecessor and assign it to the same set as its predecessor.
When you discover an edge that connects two sets, merge the sets and follow the predecessor links to add the connecting path to G'
Another way to think about doing exactly the same thing:
Sort all the edges in E according to their distance from S. You can use BFS discovery order for this
Use Kruskal's algorithm to generate a spanning tree for G, processing the edges in that order (https://en.wikipedia.org/wiki/Kruskal%27s_algorithm)
Pick a root in S, and remove any subtrees that don't contain a member of S. When you're done, every leaf will be in S.
This will not necessarily find the smallest possible subgraph, but it will minimize its maximum distance from S.

Related

How to traverse on only a cycle in a graph?

I'm attempting ch23 in CLRS on MSTs, here's a question:
Given a graph G and a minimum spanning tree T , suppose that we decrease the weight of one of the edges not in T . Give an algorithm for finding the minimum spanning tree in the modified graph.
A solution I found was to add this new changed edge in T, then exactly one simple cycle is created in T, traverse this cycle and delete the max-weight edge in this cycle, voila, the new updated MST is found!
My question is, how do I only traverse nodes on this simple-cycle? Since DFS/BFS traversals might go out of the cycle if I, say, start the traversal in T from one endpoint of this newly added edge in T.
One solution I could think of was to find the biconnected components in T after adding the new edge. Only one BCC will be found, which is this newly formed simple-cycle, then I can put in a special condition in my DFS code saying to only traverse edges/nodes in this BCC, and once a back-edge is found, stop the traversal.
Edit: graph G is connected and undirected btw
Your solution is basically good. To make it more formal you can use Tarjan's bridge-finding algorithm
This algorithm find the cut-edges (aka bridges) in the graph in linear time. Consider E' to be the cut-edges set. It is easy to prove that every edge in E' can not be on circle. So, E / E' are must be the cycle in the graph.
You can use hash-map or array build function to find the difference between your E and the cut-edges set
From here you can run simple for-loop to find the max weight edge which you want to remove.
Hope that help!

Find Two vertices with lowest path weight

I am trying to solve this question but got stuck.
Need some help,Thanks.
Given an undirected Connected graph G with non-negative values at edges.
Let A be a subgroup of V(G), where V(G) is the group of vertices in G.
-Find a pair of vertices (a,b) that belongs to A, such that the weight of the shortest path between them in G is minimal, in O((E+V)*log(v)))
I got the idea of using Dijkstra's algorithm in each node which will give me O(V*((E+V)logv))),which is too much.
So thought about connecting the vertices in A somehow,did'nt find any useful way.
Also tried changing the way Dijkstra's algorithm work,But it get's to hard to prove with no improvment in time complexity.
Note that if the optimal pair is (a, b), then from every node u in the optimal path, a and b are the closest two nodes in A.
I believe we should extend Dijkstra's algorithm in the following manners:
Start with all nodes in A, instead of a single source_node.
For each node, don't just remember the shortest_distance and the previous_node, but also the closest_source_node to remember which node in A gave the shortest distance.
Also, for each node, remember the second_shortest_distance, the second_closest_source_node, and previous_for_second_closest_source_node (shorter name suggestions are welcome). Make sure that second_closest_source_node is never the closest_source_node. Also, think carefully about how you update these variables, the optimal path for a node can become part of the second best path for it's neighbour.
Visit the entire graph, don't just stop at the first node whose closest_source and second_closest_source are found.
Once the entire graph is covered, search for the node whose shortest_distance + second_shortest_distance is smallest.

Minimize set of edges in a directed graph keeping connected components

Here is the full question:
Assume we have a directed graph G = (V,E), we want to find a graph G' = (V,E') that has the following properties:
G' has same connected components as G
G' has same component graph as G
E' is minimized. That is, E' is as small as possible.
Here is what I got:
First, run the strongly connected components algorithm. Now we have the strongly connected components. Now go to each strong connected component and within that SCC make a simple cycle; that is, a cycle where the only nodes that are repeated are the start/finish nodes. This will minimize the edges within each SCC.
Now, we need to minimize the edges between the SCCs. Alas, I can't think of a way of doing this.
My 2 questions are: (1) Does the algorithm prior to the part about minimizing edges between SCCs sound right? (2) How does one go about minimizing the edges between SCCs.
For (2), I know that this is equivalent to minimizing the number of edges in a DAG. (Think of the SCCs as the vertices). But this doesn't seem to help me.
The algorithm seems right, as long as you allow for closed walks (i.e. repeating vertices.) Proper cycles might not exist (e.g. in an "8" shaped component) and finding them is NP-hard.
It seems that it is sufficient to group the inter-component edges by ordered pairs of components they connect and leave only one edge in each group.
Regarding the step 2,minimize the edges between the SCCs, you could randomly select a vertex, and run DFS, only keeping the longest path for each pair of (root, end), while removing other paths. Store all the vertices searched in a list L.
Choose another vertex, if it exists in L, skip to the next vertex; if not, repeat the procedure above.

Prim and Kruskal's algorithms complexity

Given an undirected connected graph with weights. w:E->{1,2,3,4,5,6,7} - meaning there is only 7 weights possible.
I need to find a spanning tree using Prim's algorithm in O(n+m) and Kruskal's algorithm in O( m*a(m,n)).
I have no idea how to do this and really need some guidance about how the weights can help me in here.
You can sort edges weights faster.
In Kruskal algorithm you don't need O(M lg M) sort, you just can use count sort (or any other O(M) algorithm). So the final complexity is then O(M) for sorting and O(Ma(m)) for union-find phase. In total it is O(Ma(m)).
For the case of Prim algorithm. You don't need to use heap, you need 7 lists/queues/arrays/anything (with constant time insert and retrieval), one for each weight. And then when you are looking for cheapest outgoing edge you check is one of these lists is nonempty (from the cheapest one) and use that edge. Since 7 is a constant, whole algorithms runs in O(M) time.
As I understand, it is not popular to answer homework assignments, but this could hopefully be usefull for other people than just you ;)
Prim:
Prim is an algorithm for finding a minimum spanning tree (MST), just as Kruskal is.
An easy way to visualize the algorithm, is to draw the graph out on a piece of paper.
Then you create a moveable line (cut) over all the nodes you have selected. In the example below, the set A will be the nodes inside the cut. Then you chose the smallest edge running through the cut, i.e. from a node inside of the line to a node on the outside. Always chose the edge with the lowest weight. After adding the new node, you move the cut, so it contains the newly added node. Then you repeat untill all nodes are within the cut.
A short summary of the algorithm is:
Create a set, A, which will contain the chosen verticies. It will initially contain a random starting node, chosen by you.
Create another set, B. This will initially be empty and used to mark all chosen edges.
Choose an edge E (u, v), that is, an edge from node u to node v. The edge E must be the edge with the smallest weight, which has node u within the set A and v is not inside A. (If there are several edges with equal weight, any can be chosen at random)
Add the edge (u, v) to the set B and v to the set A.
Repeat step 3 and 4 until A = V, where V is the set of all verticies.
The set A and B now describe you spanning tree! The MST will contain the nodes within A and B will describe how they connect.
Kruskal:
Kruskal is similar to Prim, except you have no cut. So you always chose the smallest edge.
Create a set A, which initially is empty. It will be used to store chosen edges.
Chose the edge E with minimum weight from the set E, which is not already in A. (u,v) = (v,u), so you can only traverse the edge one direction.
Add E to A.
Repeat 2 and 3 untill A and E are equal, that is, untill you have chosen all edges.
I am unsure about the exact performance on these algorithms, but I assume Kruskal is O(E log E) and the performance of Prim is based on which data structure you use to store the edges. If you use a binary heap, searching for the smallest edge is faster than if you use an adjacency matrix for storing the minimum edge.
Hope this helps!

minimum connected subgraph containing a given set of nodes

I have an unweighted, connected graph. I want to find a connected subgraph that definitely includes a certain set of nodes, and as few extras as possible. How could this be accomplished?
Just in case, I'll restate the question using more precise language. Let G(V,E) be an unweighted, undirected, connected graph. Let N be some subset of V. What's the best way to find the smallest connected subgraph G'(V',E') of G(V,E) such that N is a subset of V'?
Approximations are fine.
This is exactly the well-known NP-hard Steiner Tree problem. Without more details on what your instances look like, it's hard to give advice on an appropriate algorithm.
I can't think of an efficient algorithm to find the optimal solution, but assuming that your input graph is dense, the following might work well enough:
Convert your input graph G(V, E) to a weighted graph G'(N, D), where N is the subset of vertices you want to cover and D is distances (path lengths) between corresponding vertices in the original graph. This will "collapse" all vertices you don't need into edges.
Compute the minimum spanning tree for G'.
"Expand" the minimum spanning tree by the following procedure: for every edge d in the minimum spanning tree, take the corresponding path in graph G and add all vertices (including endpoints) on the path to the result set V' and all edges in the path to the result set E'.
This algorithm is easy to trip up to give suboptimal solutions. Example case: equilateral triangle where there are vertices at the corners, in midpoints of sides and in the middle of the triangle, and edges along the sides and from the corners to the middle of the triangle. To cover the corners it's enough to pick the single middle point of the triangle, but this algorithm might choose the sides. Nonetheless, if the graph is dense, it should work OK.
The easiest solutions will be the following:
a) based on mst:
- initially, all nodes of V are in V'
- build a minimum spanning tree of the graph G(V,E) - call it T.
- loop: for every leaf v in T that is not in N, delete v from V'.
- repeat loop until all leaves in T are in N.
b) another solution is the following - based on shortest paths tree.
- pick any node in N, call it v, let v be a root of a tree T = {v}.
- remove v from N.
loop:
1) select the shortest path from any node in T and any node in N. the shortest path p: {v, ... , u} where v is in T and u is in N.
2) every node in p is added to V'.
3) every node in p and in N is deleted from N.
--- repeat loop until N is empty.
At the beginning of the algorithm: compute all shortest paths in G using any known efficient algorithm.
Personally, I used this algorithm in one of my papers, but it is more suitable for distributed enviroments.
Let N be the set of nodes that we need to interconnect. We want to build a minimum connected dominating set of the graph G, and we want to give priority for nodes in N.
We give each node u a unique identifier id(u). We let w(u) = 0 if u is in N, otherwise w(1).
We create pair (w(u), id(u)) for each node u.
each node u builds a multiset relay node. That is, a set M(u) of 1-hop neigbhors such that each 2-hop neighbor is a neighbor to at least one node in M(u). [the minimum M(u), the better is the solution].
u is in V' if and only if:
u has the smallest pair (w(u), id(u)) among all its neighbors.
or u is selected in the M(v), where v is a 1-hop neighbor of u with the smallest (w(u),id(u)).
-- the trick when you execute this algorithm in a centralized manner is to be efficient in computing 2-hop neighbors. The best I could get from O(n^3) is to O(n^2.37) by matrix multiplication.
-- I really wish to know what is the approximation ration of this last solution.
I like this reference for heuristics of steiner tree:
The Steiner tree problem, Hwang Frank ; Richards Dana 1955- Winter Pawel 1952
You could try to do the following:
Creating a minimal vertex-cover for the desired nodes N.
Collapse these, possibly unconnected, sub-graphs into "large" nodes. That is, for each sub-graph, remove it from the graph, and replace it with a new node. Call this set of nodes N'.
Do a minimal vertex-cover of the nodes in N'.
"Unpack" the nodes in N'.
Not sure whether or not it gives you an approximation within some specific bound or so. You could perhaps even trick the algorithm to make some really stupid decisions.
As already pointed out, this is the Steiner tree problem in graphs. However, an important detail is that all edges should have weight 1. Because |V'| = |E'| + 1 for any Steiner tree (V',E'), this achieves exactly what you want.
For solving it, I would suggest the following Steiner tree solver (to be transparent: I am one of the developers):
https://scipjack.zib.de/
For graphs with a few thousand edges, you will usually get an optimal solution in less than 0.1 seconds.

Resources