Maximum independent set on non-tree representations - algorithm

When attempting to derive the maximum (largest size) independent set on a graph of nodes, a solution can be arrived when the graph is a forest/tree structure. The general pseudocode for this implementation is below.
S is an empty set
While forest F has at least 1 edge
Let v be a leaf node and let (u,v) be a lone edge incident to v
Add v to s
Delete both u and v from the forest, including all incident edges
Return s + nodes remaining in forest F
I was wondering, if this graph is not a forest/tree (does not follow the definition) but is instead some other representation, there are obvious reasons as to why this implementation would not work, however is it possible that this algorithm could still provide an independent set within that graph, just not the maximum (largest sized) one? If so, what would that graph look like?

Related

Find Minimum Vertex Connected Sub-graph

First of all, I have to admit I'm not good at graph theory.
I have a weakly connected directed graph G=(V,E) where V is about 16 millions and E is about 180 millions.
For a given set S, which is a subset of V (size of S will be around 30), is it possible to find a weakly connected sub-graph G'=(V',E') where S is a subset of V' but try to keep the number of V' and E' as small as possible?
The graph G may change and I hope there's a way to find the sub-graph in real time. (When a process is writing into G, G will be locked, so don't worry about G get changed when your sub-graph calculation is still running.)
My current solution is find the shortest path for each pair of vertex in S and merge those paths to get the sub-graph. The result is OK but the running time is pretty expensive.
Is there a better way to solve this problem?
If you're happy with the results from your current approach, then it's certainly possible to do at least as well a lot faster:
Assign each vertex in S to a set in a disjoint set data structure: https://en.wikipedia.org/wiki/Disjoint-set_data_structure. Then:
Do a breadth-first-search of the graph, starting with S as the root set.
When you the search discovers a new vertex, remember its predecessor and assign it to the same set as its predecessor.
When you discover an edge that connects two sets, merge the sets and follow the predecessor links to add the connecting path to G'
Another way to think about doing exactly the same thing:
Sort all the edges in E according to their distance from S. You can use BFS discovery order for this
Use Kruskal's algorithm to generate a spanning tree for G, processing the edges in that order (https://en.wikipedia.org/wiki/Kruskal%27s_algorithm)
Pick a root in S, and remove any subtrees that don't contain a member of S. When you're done, every leaf will be in S.
This will not necessarily find the smallest possible subgraph, but it will minimize its maximum distance from S.

Show that the heuristic solution to vertex cover is at most twice as large as the optimal solution

The heuristic solution that I've been given is:
Perform a depth-first-search on the graph
Delete all the leaves
The remaining graph forms a vertex cover
I've been given the question: "Show that this heuristic is at most twice as large as the optimal solution to the vertex cover". How can I show this?
I assume that the graph is connected (if it's not the case, we can solve this problem for each component separately).
I also assume that a dfs-tree is rooted and a leaf is a vertex that doesn't have children in the rooted dfs-tree (it's important. If we define it differently, the algorithm may not work).
We need to show to things:
The set of vertices returned by the algorithm is a vertex cover. Indeed, there can be only types of edges in the dfs-tree of any undirected graph: tree edges (such an edge is covered as at least on of its endpoints is not a leaf) and a back edge (again, one of its endpoint is not a leaf because back edge goes from a vertex to its ancestor. A leaf cannot be an ancestor of a leaf).
Let's consider the dfs-tree and ignore the rest of the edges. I'll show that it's not possible to cover tree edges using less than half non-leave vertices. Let S be a minimum vertex cover. Consider a vertex v, such that v is not a leaf and v is not in S (that is, v is returned by the heuristic in question but it's not in the optimal answer). v is not a leaf, thus there is an edge v -> u in the dfs-tree (where u is a successor of v). The edge v -> u is covered by S. Thus, u is in S. Let's define a mapping f from vertices returned by the heuristic that are not in S as f(v) = u (where v and u have the same meaning as in the previous sentence). Note that v is a parent of u in the dfs-tree. But there can be only one parent for any vertex in a tree! Thus, f is an injection. It means that the number of vertices in the set returned by the heuristic but not in the optimal answer is not greater than the size of the optimal answer. That's exactly what we needed to show.
Bad news: heuristics does not work.
Strictly said, 1 isolated vertex is counter-example for the question.
Nevertheless, heuristic does not provide vertex cover solution at all, even if you correct it for isolated vertex and for 2-point cliques.
Take a look at fully connected graphs with number of vertexes from 1 to 3:
1 - strictly said, isolated vertex is not a leaf (it has degree 0, while leaf is a vertex with degree 1), so heuristic will keep it, while vertex cover will not
2 - heuristic will drop both leaves, while vertex cover will keep at least 1 of them
3 - heuristic will leave 1 vertex, while vertex cover has to keep at least 2 vertexes of this clique

How can I find a way to minimum the number of edges?

I am thinking an algorithm to solve the problem below:
A given graph composed of vertices and edges.
There are N customers who want to travel from a vertex to another vertex.
And each customer requirement need a directed edge to connect two vertices.
The problem is how to find the minimum number of edges to satisfy all customers requirements ?
There is a simple example:
Customer 1 wants to travel from vertex a to vertex b.
Customer 2 wants to travel from vertex b to vertex c.
Customer 3 wants to travel from vertex a to vertex c.
The simplest way is to give an edge for each customers:
edge 1: vertex a -> vertex b
edge 2: vertex b -> vertex c
edge 3: vertex a -> vertex c
But actually there only needs 2 edges (i.e. edge 1 and edge 2) to satisfy three customer requirements.
If the number customers is large, how to find the minimum edges to satisfy all customer requirements ?
Is there a algorithm to solve this problem ?
You can model the problem as a mixed integer program. You can define binary variables for "arc a-> b is used" and "customer c uses arc a -> b" and write down the requirements as linear inequalities. If your graph is not too large, you can solve such models in reasonable time by a mixed integer program solver (CPLEX, GUROBI, but there also free alternatives on the web).
I know that this solution requires some work if you are not familiar with linear programming, but it guarantees to find best solutions in finite time and you can probably solve it for (say) 1000 customers and 1000 arcs.
If you have N vertices, you can always construct a solution with N (directed) edges. Just create a directed cycle V_1 -> V_2 -> V_3 ->... -> V_N -> V_1. You can never have directed path from every vertex V_a to every other vertex V_b with fewer edges (because you'd have a directed tree which necessarily contains a leaf). The leaf is either un-reachable (if the edge goes from leaf out) or the leaf is a sink (can't connect to anything else) if the edge is ->leaf.
No need to use any new algorithm. You can use BFS/DFS algorithm.
Find if there exists any path between source and destination.
if !true
add a direct edge between source and destination
count++;
return count;
Here the key part is instead of loop through the graph we have to loop through newly added edges.
You can use Disjoint set data structure.
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
while (num_edges--)
if root(vertex_a) != root(vertex_b)
count++
union(vertex_a,vertex_B)
If I think of the same problem for undirected edges, what we are looking for is the minimum spanning tree (MST) of the original graph (constructed of all edges). The brief explanation is that for each edge E (v1 -> v2) if there is a second path to v2 from v1, there exist a cycle, and for each existing cycle there is an edge we can omit.
For finding MST of a directed graph there is Chu–Liu/Edmonds' algorithm you can use.
Note that you are assigning a weight of 1 to all of your edges.

Consensus on multiple graphs

Let G = (V,E) be a Directed Acyclic Graph (DAG). V is the set of vertexes, while E is the set of edges.
Now, suppose that G is corrupted by some annotators in a crowd, according to the crowdsourcing paradigm:
Some of them may decide to remove some edge e belonging to E
Some of them may decide to add an edge e which was not existing
The result of the work of an annotator i is a graph whose set of vertexes V is the same as the original one and whose set of edges Ei may differ from the original one. If n is the number of annotators, we come up with n different graphs, having the same set of vertexes V, but a different set of edges E. Let G1 = (V,E1), ..., Gn = (V,En) be the set of graphs.
I would like to know whether there is a way of merging these graphs, so as to find a consensus on the presence/absence of each possible edge e between two vertexes v1,v2 in V. The purpose of this operation is the one of fusing the opinion of each annotator about the construction of the set of edges E in the graph G. The final graph has to be a DAG.
Let...
U be the distinct union of all Ei sets plus the original set E
T be some arbitrary threshold value
H(x) be some heuristic function
F be the final consensus set of edges
Pseudocode:
for each Edge e in U
if H(e) >= T then F.Add(e)
The question is then of course how to define your heuristic function. A naive approach would be set based voting. Count the number of E sets containing the edge, and if enough people agree that it's in the graph, include it. This is a simple and efficient function to implement. Some weaknesses of this heuristic are its inability to detect and compensate for bad annotators or small crowd sizes.
For each edge count the number of graphs that contains it. If it is greater than some threshold, assume it was an original edge.
You may face some problems if some of the actions are biased. That is, each user does not randomly choose a particular edge to act upon.

minimum connected subgraph containing a given set of nodes

I have an unweighted, connected graph. I want to find a connected subgraph that definitely includes a certain set of nodes, and as few extras as possible. How could this be accomplished?
Just in case, I'll restate the question using more precise language. Let G(V,E) be an unweighted, undirected, connected graph. Let N be some subset of V. What's the best way to find the smallest connected subgraph G'(V',E') of G(V,E) such that N is a subset of V'?
Approximations are fine.
This is exactly the well-known NP-hard Steiner Tree problem. Without more details on what your instances look like, it's hard to give advice on an appropriate algorithm.
I can't think of an efficient algorithm to find the optimal solution, but assuming that your input graph is dense, the following might work well enough:
Convert your input graph G(V, E) to a weighted graph G'(N, D), where N is the subset of vertices you want to cover and D is distances (path lengths) between corresponding vertices in the original graph. This will "collapse" all vertices you don't need into edges.
Compute the minimum spanning tree for G'.
"Expand" the minimum spanning tree by the following procedure: for every edge d in the minimum spanning tree, take the corresponding path in graph G and add all vertices (including endpoints) on the path to the result set V' and all edges in the path to the result set E'.
This algorithm is easy to trip up to give suboptimal solutions. Example case: equilateral triangle where there are vertices at the corners, in midpoints of sides and in the middle of the triangle, and edges along the sides and from the corners to the middle of the triangle. To cover the corners it's enough to pick the single middle point of the triangle, but this algorithm might choose the sides. Nonetheless, if the graph is dense, it should work OK.
The easiest solutions will be the following:
a) based on mst:
- initially, all nodes of V are in V'
- build a minimum spanning tree of the graph G(V,E) - call it T.
- loop: for every leaf v in T that is not in N, delete v from V'.
- repeat loop until all leaves in T are in N.
b) another solution is the following - based on shortest paths tree.
- pick any node in N, call it v, let v be a root of a tree T = {v}.
- remove v from N.
loop:
1) select the shortest path from any node in T and any node in N. the shortest path p: {v, ... , u} where v is in T and u is in N.
2) every node in p is added to V'.
3) every node in p and in N is deleted from N.
--- repeat loop until N is empty.
At the beginning of the algorithm: compute all shortest paths in G using any known efficient algorithm.
Personally, I used this algorithm in one of my papers, but it is more suitable for distributed enviroments.
Let N be the set of nodes that we need to interconnect. We want to build a minimum connected dominating set of the graph G, and we want to give priority for nodes in N.
We give each node u a unique identifier id(u). We let w(u) = 0 if u is in N, otherwise w(1).
We create pair (w(u), id(u)) for each node u.
each node u builds a multiset relay node. That is, a set M(u) of 1-hop neigbhors such that each 2-hop neighbor is a neighbor to at least one node in M(u). [the minimum M(u), the better is the solution].
u is in V' if and only if:
u has the smallest pair (w(u), id(u)) among all its neighbors.
or u is selected in the M(v), where v is a 1-hop neighbor of u with the smallest (w(u),id(u)).
-- the trick when you execute this algorithm in a centralized manner is to be efficient in computing 2-hop neighbors. The best I could get from O(n^3) is to O(n^2.37) by matrix multiplication.
-- I really wish to know what is the approximation ration of this last solution.
I like this reference for heuristics of steiner tree:
The Steiner tree problem, Hwang Frank ; Richards Dana 1955- Winter Pawel 1952
You could try to do the following:
Creating a minimal vertex-cover for the desired nodes N.
Collapse these, possibly unconnected, sub-graphs into "large" nodes. That is, for each sub-graph, remove it from the graph, and replace it with a new node. Call this set of nodes N'.
Do a minimal vertex-cover of the nodes in N'.
"Unpack" the nodes in N'.
Not sure whether or not it gives you an approximation within some specific bound or so. You could perhaps even trick the algorithm to make some really stupid decisions.
As already pointed out, this is the Steiner tree problem in graphs. However, an important detail is that all edges should have weight 1. Because |V'| = |E'| + 1 for any Steiner tree (V',E'), this achieves exactly what you want.
For solving it, I would suggest the following Steiner tree solver (to be transparent: I am one of the developers):
https://scipjack.zib.de/
For graphs with a few thousand edges, you will usually get an optimal solution in less than 0.1 seconds.

Resources