Minimum vertex cover - algorithm

I am trying to get a vertex cover for an "almost" tree with 50,000 vertices. The graph is generated as a tree with random edges added in making it "almost" a tree.
I used the approximation method where you marry two vertices, add them to the cover and remove them from the graph, then move on to another set of vertices. After that I tried to reduce the number of vertices by removing the vertices that have all of their neighbors inside the vertex cover.
My question is how would I make the vertex cover even smaller? I'm trying to go as low as I can.

Here's an idea, but I have no idea if it is an improvement in practice:
From https://en.wikipedia.org/wiki/Biconnected_component "Any connected graph decomposes into a tree of biconnected components called the block-cut tree of the graph." Furthermore, you can compute such a decomposition in linear time.
I suggest that when you marry and remove two vertices you do this only for two vertices within the same biconnected component. When you have run out of vertices to merge you will have a set of trees not connected with each other. The vertex cover problem on trees is tractable via dynamic programming: for each node compute the cost of the best answer if that node is added to the cover and if that node is not added to the cover. You can compute the answers for a node given the best answers for its children.
Another way - for all I know better - would be to compute the minimum spanning tree of the graph and to use dynamic programming to compute the best vertex cover for that tree, neglecting the links outside the tree, remove the covered links from the graph, and then continue by marrying vertices as before.
I think I prefer the minimum spanning tree one. In producing the minimum spanning tree you are deleting a small number of links. A tree with N nodes had N-1 links, so even if you don't get back the original tree you get back one with as many links as it. A vertex cover for the complete graph is also a vertex cover for the minimum spanning tree so if the correct answer for the full graph has V vertices there is an answer for the minimum spanning tree with at most V vertices. If there were k random edges added to the tree there are k edges (not necessarily the same) that need to be added to turn the minimum spanning tree into the full graph. You can certainly make sure these new edges are covered with at most k vertices. So if the optimum answer has V vertices you will obtain an answer with at most V+k vertices.

Here's an attempt at an exact answer which is tractable when only a small number of links are added, or when they don't change the inter-node distances very much.
Find a minimum spanning tree, and divide edges into "tree edges" and "added edges", where the tree edges form a minimum spanning tree, and the added edges were not chosen for this. They may not be the edges actually added during construction but that doesn't matter. All trees on N nodes have N-1 edges so we have the same number of added edges as were used during creation, even if not the same edges.
Now pretend you can peek at the answer in the back of the book just enough to see, for one vertex from each added edge, whether that vertex was part of the best vertex cover. If it was, you can remove that vertex and its links from the problem. If not, the other vertex must be so you can remove it and its links from the problem.
You now have to find a minimum vertex cover for a tree or a number of disconnected trees, and we know how to do this - see my other answer for a bit more handwaving.
If you can't peek at the back of the book for an answer, and there are k added edges, try all 2^k possible answers that might have been in the back of the book and find the best. If you are lucky then added link A is in a different subtree from added link B. In that case you can confine the two calculations needed for the two possibilities for added link A (or B) to the dynamic programming calculations for the relevant subtree so you have only doubled the work instead of quadrupled it. In general, if your k added edges are in k different subtrees that don't interfere with each other, the cost is multiplied by 2 instead of 2^k.

Minimum vertex cover is an NP complete algorithm, which means that you can not solve it in a reasonable time even for something like 100 vertices (not to mention 50k).
For a tree there is a polynomial time greedy algorithm which is based on DFS, but the fact that you have "random edges added" screws everything up and makes this algorithm useless.
Wikipedia has an article about approximation algorithm, claims that it reaches factor 2 and claims that no better algorithm is know, which makes it quit unlikely that you will find one.

Related

Algorithm for Finding Graph Connectivity

I'm tackling an interesting question in programming. It is this: we keep adding undirected edges to a graph, until the graph (or subgraph) is connected (i.e. we can use some path to get from each vertex to any other vertex in that subgraph). We stop as soon as the graph is connected.
For example if we have vertices 1,2,3 and 4 and we want the subgraph 1,2,3 to be connected.
Let's say we have edges (3,4), then (2,3), then (1,4), then (1,3). We only need to add in the first 3 edges for the subgraph to be connected, then we stop (edge 1,3 isn't needed).
Obviously I can run a BFS every time an edge is added to see if we can reach the required vertices, but if there are say m edges then we would potentially have to run BFS m times which seems too slow. Any better options? Thanks.
You should research the marvelous "Disjoint-set data structure" and the corresponding union - find algorithm. It can seem magical, but the worst case time and space complexity are tiny, O(α(n)) and O(n) respectively, where α is the inverse Ackerman function.
You can run just one time the BFS to find connected components. Then, each time you add an edge, if it is between vertices of two different components, you can merge them by a reference. So, the complexity of this algorithm is |V| + |E|.
Notice that the implementation of this method should be done by some reference techniques, especially to update the component number of the vertices.
I would normally do this using a disjoint set structure, as Doug suggests. It would be like Kruskal's algorithm for finding the minimum spanning tree, except you process edges in the given order.
If you don't need a spanning tree as output, though, then you can do this with an incremental BFS or DFS:
Pick any vertex, and find the vertices connected to it with BFS or DFS. Color these vertices red. If you start with no edges, of course, then there will be only one red vertex at this stage.
As you add edges, don't do anything else until you add an edge that connects a red vertex to a non-red vertex. Then run BFS or DFS, excluding the new edge, to find all the new vertices that will connect to the red set. Color them all red.
Stop when all vertices are red.
This is a little simpler in practice than using disjoint set, and takes O(|V|+|E|) time, since each vertex will be traversed by exactly one BFS/DFS search.
It does the work in chunks, though, so if you need each edge test to be fast individually, then disjoint set is better.

Will a standard Kruskal-like approach for MST work if some edges are fixed?

The problem: you need to find the minimum spanning tree of a graph (i.e. a set S of edges in said graph such that the edges in S together with the respective vertices form a tree; additionally, from all such sets, the sum of the cost of all edges in S has to be minimal). But there's a catch. You are given an initial set of fixed edges K such that K must be included in S.
In other words, find some MST of a graph with a starting set of fixed edges included.
My approach: standard Kruskal's algorithm but before anything else join all vertices as pointed by the set of fixed edges. That is, if K = {1,2}, {4,5} I apply Kruskal's algorithm but instead of having each node in its own individual set initially, instead nodes 1 and 2 are in the same set and nodes 4 and 5 are in the same set.
The question: does this work? Is there a proof that this always yields the correct result? If not, could anyone provide a counter-example?
P.S. the problem only inquires finding ONE MST. Not interested in all of them.
Yes, it will work as long as your initial set of edges doesn't form a cycle.
Keep in mind that the resulting tree might not be minimal in weight since the edges you fixed might not be part of any MST in the graph. But you will get the lightest spanning tree which satisfies the constraint that those fixed edges are part of the tree.
How to implement it:
To implement this, you can simply change the edge-weights of the edges you need to fix. Just pick the lowest appearing edge-weight in your graph, say min_w, subtract 1 from it and assign this new weight,i.e. (min_w-1) to the edges you need to fix. Then run Kruskal on this graph.
Why it works:
Clearly Kruskal will pick all the edges you need (since these are the lightest now) before picking any other edge in the graph. When Kruskal finishes the resulting set of edges is an MST in G' (the graph where you changed some weights). Note that since you only changed the values of your fixed set of edges, the algorithm would never have made a different choice on the other edges (the ones which aren't part of your fixed set). If you think of the edges Kruskal considers, as a sorted list of edges, then changing the values of the edges you need to fix moves these edges to the front of the list, but it doesn't change the order of the other edges in the list with respect to each other.
Note: As you may notice, giving the lightest weight to your edges is basically the same thing as you suggest. But I think it is a bit easier to reason about why it works. Go with whatever you prefer.
I wouldn't recommend Prim, since this algorithm expands the spanning tree gradually from the current connected component (in the beginning one usually starts with a single node). The case where you join larger components (because your fixed edges might not all be in a single component), would be needed to handled separately - it might not be hard, but you would have to take care of it. OTOH with Kruskal you don't have to adapt anything, but simply manipulate your graph a bit before running the regular algorithm.
If I understood the question properly, Prim's algorithm would be more suitable for this, as it is possible to initialize the connected components to be exactly the edges which are required to occur in the resulting spanning tree (plus the remaining isolated nodes). The desired edges are not permitted to contain a cycle, otherwise there is no spanning tree including them.
That being said, apparently Kruskal's algorithm can also be used, as it is explicitly stated that is can be used to find an edge that connects two forests in a cost-minimal way.
Roughly speaking, as the forests of a given graph form a Matroid, the greedy approach yields the desired result (namely a weight-minimal tree) regardless of the independent set you start with.

Tricky algorithm for finding alternative route in graph with few added edges

Okay, so I found this a bit tricky.
Basically, you have a directed graph (let's call it the base graph), that has some leaves and a node with 0 indegree that is called root. It may contain cycles.
From that base graph, a tree has been made, that contains the root and all leaves, and some connection between them. The nodes and edges that are not needed to connect the root to the leaves are left out.
Now imagine one or more edges in the tree "break", and can no longer be used. The problem now is to
a) If possible, find an alternative route(s) to the disconnected node(s), introducing as few previously unused edges from the base graph as possible.
b) If not possible, select which edges to "repair", repairing as few edges as possible to get all leaves connected again.
This is supposed to represent an electrical grid, and the breaks are power outages.
If just one edge is broken, it is easy enough. But say you have a graph with 100 leaves, 500 edges, and 50 edges break. Now to find which combination of adding previously unused edges from the base graph, and if necessary repairing some edges, to connect all leaves, seems like a very hard problem.
I imagined one could do some sort of brute force, where ALL combinations of unused edges, from using 1 to all of them, are tested. Or if repairs are needed, testing ALL combinations of repairs with all combinations of new edges. When the amount of edges get high, this seems to me very very inefficient.
My question is, does anyone have any smart ideas to how this could be done in a more efficient way? I hope I explained it well enough.
This is an NP-hard problem, and I'll explain why. Imagine that you have three layers of nodes: the root node, a layer of intermediate connecting nodes, and then a layer of leaf nodes. Edges go from root to intermediate nodes, and from an intermediate node to some subset of leaf nodes. Suppose you have some choice of intermediate nodes and edges to leaf nodes that gives you a connected tree graph, where each intermediate node has an edge to only one leaf node. Now imagine all edges in the reduced graph are removed. Then to find the minimum number of edges needed to add to repair the graph, this is equivalent to finding the minimum number of remaining intermediate nodes whose edges cover all of the leaf nodes. This is equivalent to the set cover problem for the leaf nodes http://en.wikipedia.org/wiki/Set_cover_problem and is NP-hard. Thus there is almost certainly no fast algorithm for your problem in the worst case (unless P = NP). Maybe if you bound the number of edges that are removed, you can come up with a polynomial time algorithm where the exponent in the polynomial depends (hopefully weakly) on how many edges were removed.
Seems like the start to a good efficient heuristic/solution would be to weight the edges. A couple simple approaches (not the most space efficient) as to how you could weight the edges based on the total number of edges are listed below.
If using any number of undamaged edges is better than using a single alternative edge and using any number of alternative edges is better than a single damaged edge.
Undamaged edge: 1
Alternative edge: E
Damaged edge: E^2
In the case of 100 vertices and 500 edges, alternative edges would be weighted as 500, while damaged edges would be weighted as 250000.
If using any number of undamaged edges is better than using a single alternative edge or a single damaged edge.
Undamaged edge: 1
Alternative/damaged edge: E
In the case of 100 vertices and 500 edges, alternative/damaged edges would be weighted as 500.
It seems like you then try a number of approaches to find either the exact solution or a heuristic result. The main suggestion I have for an algorithm is below.
Find the directed minimium spanning tree. If you use the weighting listed above, then I believe the result is optimum if I'm understanding things correctly.
Although, if you have intermediate nodes (nodes that are neither the root or a leaf), then this is likely to result in an overestimating heuristic. In which case, you might be able to get around it by running all pairs all shortest paths first and use the path costs for that as input for the directed minimium spanning tree algorithm, but that's probably a heuristic as well.

Sum of Vertices in Induced Graph - Dynamic Programming

This is a homework question so I'll be glad to get a hint.
I have a graph G, where each vertex v has a weight w(v).
S(G) is the sum of weights of the all the vertexes in the graph.
I need to find an algorithm that determines if there is a group of vertexes A, when G[A] (G's graph induced by A) is a tree, that conducts S(G[A])=S(G[V\A]).
I know that i should go over all vertexes, sum their weights, and then try to find a tree that reaches half of that sum, but i'm not sure how exactly. I'm pretty sure it involves dynamic programming.
Thank you very much,
Yaron.
This is not really a dynamic programming problem, it is a search problem, the key being that you are trying to find a tree.
If you think about it, you already know an algorithm or two that will will tell you the minimum spanning tree. By the same logic, you can make a maximum spanning tree. For example, if you find the maximum spanning tree and the sum of its weights is less than 50% (or whatever the target value is), then you know the problem is impossible.
So, following this logic, you can go along as though you were making a spanning tree and reject any path that goes over the target amount. This strategy is known as "branch and bound". Let's imagine how we could do this with Kruskal's algorithm:
(1) you will have a set of trees; start with each vertex as a separate "tree"
(2) maintain a queue of edges you have not used yet, sorted from least to greatest
(3) maintain a stack of edges that you have used
(4) look for an edge that (a) connects two different trees, and (b) the sum of the two trees is less than (or equal to the target value, ie a solution)
(4a) if no such edge exists, then pop a value from the stack (remove the edge and seperate the trees) and try the next value in the queue
(4b) if such an edge does exist, then add the edge (combine two of the trees), push it onto the stack and go back to step 4
Obviously there are different ways to do the same process. For example, you could use a variant of Prim's algorithm as well.

How to get the new MST from the old one if one edge in the graph changes its weight?

We know the original graph and the original MST. Now we change an edge's weight in the graph. Beside the Prim and the Kruskal, is there any way we can generate the new MST from the old one?
Here's how I would do it:
If the changed edge is in the original MST:
If its weight was reduced, then of course it must be in the new MST.
If its weight was increased, then delete it from the original MST and look for the lowest-weight edge that connects the two subtrees that remain (this could select the original edge again). This can be done efficiently in a Kruskal-like way by building up a disjoint-set data structure to hold the two subtrees, and sorting the remaining edges by weight: select the first one with endpoints in different sets. If you know a way of quickly deleting an edge from a disjoint-set data structure, and you built the original MST using Kruskal's algorithm, then you can avoid recomputing them here.
Otherwise:
If its weight was increased, then of course it will remain outside the MST.
If its weight was reduced, add it to the original MST. This will create a cycle. Scan the cycle, looking for the heaviest edge (this could select the original edge again). Delete this edge. If you will be performing many edge mutations, cycle-finding may be sped up by calculating all-pairs shortest paths using the Floyd-Warshall algorithm. You can then find all edges in the cycle by initially leaving the new edge out, and looking for the shortest path in the MST between its two endpoints (there will be exactly one such path).
Besides linear-time algorithm, proposed by j_random_hacker, you can find a sub-linear algorithm in this book: "Handbook of Data Structures and Applications" (Chapter 36) or in these papers: Dynamic graphs, Maintaining Minimum Spanning Trees in Dynamic Graphs.
You can change the problem a little while the result is same.
Get the structure of the original MST, run DFS from each vertice and
you can get the maximum weighted edge in the tree-path between each
vertice-pair. The complexity of this step is O(N ^ 2)
Instead of changing one edge's weight to w, we can assume we are
adding a new edge (u,v) into the original MST whose weight is w. The
adding edge would make a loop on the tree and we should cut one edge
on the loop to generate a new MST. It's obviously that we can only
compare the adding edge with the maximum edge in the path(a,b), The
complexity of this step is O(1)

Resources