Is there a way to compute (accurate or hevristics) this problem on medium sized (up to 1000 nodes) weighted graph?
Place n (for example 5) sensors in nodes of the graph in such way that the sum of distances from every other node to the closest sensor will be minimal.
I'll show that this problem is NP-hard by reduction from Vertex Cover. This applies even if the graph is unweighted (you don't say whether it's weighted or not).
Given an unweighted graph G = (V, E) and an integer k, the question asked by Vertex Cover is "Does there exist a set of at most k vertices such that every edge has at least one endpoint in this set?" We will build a new graph G' = (V', E), which is the same as G except that all isolated vertices have been discarded, solve your problem on G', and then use it to answer the original question about Vertex Cover.
Suppose there does exist such a set S of k vertices. If we consider this set S to be the locations to put sensors in your problem, then every vertex in S has a distance of 0, and every other vertex is at a distance of exactly 1 away from a vertex that is in S (because if there was some vertex u for which this wasn't true, it would mean that none of u's neighbours are in S, so for each such neighbour u, the edge uv is not covered by the vertex cover, which would be a contradiction.)
This type of problem is called graph clustering. One of the popular methods to solve it is the Markov Cluster (MCL) Algorithm. A web search should provide some implementation examples. However it does not generally provide the optimal solution.
Related
I am looking back at some of my algorithms homework(exam soon haha), and I am having troubles understanding the solution to one of the questions.
The Question
Tutor Solution
I am having difficulties visualizing the solution, I understand that if you have an odd number of cycles than your graph cannot be bipartite. But as I stated, I don't understand the shortest path from s to u and v to s.
Let's say G is bipartite. Then the vertex set can be divided into V1 and V2 s.t. every edge of G includes a vertex from each set. Then the vertices of every path in G alternate between V1 and V2, so the parity of the length of every path between every pair of vertices is the same. I.e., if G is bipartite and v,w are two vertices of G, then the length of every path connecting v & w is either even or odd.
If there is an edge (u,v) connecting two vertices in the same layer then this is violated. The BFS path to v and u have the same length, so same parity, since v & u are in the same layer. The edge between v & u gives us a path longer by 1, so of a different parity. Contradiction.
The tutor's solution isn't as clear as it could be, since it talks about cycles, and the two paths don't necessarily form a cycle since they can share vertices. And the definition of bipartite graph is slightly wrong (or uses non-standard definitions of edges, cross-product etc. where the things aren't pairs but 2-element sets).
But instead of the definition as given, I'd just say that a graph is bipartite if the vertices can be coloured black or white, such that each edge goes between different-coloured vertices. (Equivalently you can simply say "the graph is 2-colourable").
From this definition, it follows that if there's an even length path between two vertices they must be the same colour, otherwise different colours. In the BFS on a bipartite graph, a vertex u is layer i has a path of length i from s to u, so all vertices in the same layer have the same colour. Thus there can be no edge between two vertices in the same layer.
I've received a task to find an algorithm which divides a graph G(V,E) into pairs of neighboring edges (colors the graph, such that every pair of neighboring edges has the same color).
I've tried to tackle this problem by drawing out some random graphs and came to a few conclusions:
If a vertex is connected to 2(4,6,8...) vertices of degree 1, these make a pair of edges.
If a vertex of degree 1 is directly connected to a cycle, it doesn't matter which edge of the cycle is paired with the lone edge.
However, I couldn't come up with any other conclusions, so I tried a different approach. I thought about using DFS, finding articulation points and dividing graph into subgraphs with an even number of edges, because those should be dividable by this rule as well, and so on until I end up with only subgraphs of |E(G')| = 2.
Another thing I've come up with is to create a graph G', where E(G) = V(G') and V(G) = E(G'). That way I could get a graph, where I could remove pairs of vertices (former edges) either via DFS or always starting with leaf vertices along with their adjacent vertices.
The last technique is most appealing to me, but it seems to be the slowest one. Any feedback or tips on which of these methods would be the best is much appreciated.
EDIT: In other words, imagine the graph as a layout of a town. Vertices being crossroads, edges being the roads. We want to decorate (sweep, color) each road exactly once, but we can only decorate two connected roads at the same time. I hope this helps for clarification.
For example, having graph G with E={ab,bd,cd,ac,ae,be,bf,fd}, one of possible pair combinations is P={{ab,bf},{ac,cd},{ae,eb},{bd,df}}.
One approach is to construct a new graph G where:
A vertex in G corresponds to an edge in the original graph
An edge in G connects vertices a and b in G where a and b represent edges in the original graph that meet at a vertex in the original graph
Then, if I have understood the original problem correctly, the objective for G is to find the maximum matching, which can be done, for example, with the Blossom algorithm.
Given a undirected graph. How to find the size of maximum subset of vertices of the graph in which each vertex has at degree atleast p, where the degree in subset is find among the vertices in subset only.
Vertices of degree less than p can never be part of the solution. Remove them entirely, including their edges. Look at the new graph and repeat, etc.
When this process stops, all vertices have degree at least p.
Then, look at the connected components of that graph and pick the largest one! (As Evgeny Kluev correctly points out, this is unnecessary of course. In my head, the remaining subgraph should have been connected, but of course the original problem makes no such demands.)
Let G (U u V, E) be a weighted directed bipartite graph (i.e. U and V are the two sets of nodes of the bipartite graph and E contains directed weighted edges from U to V or from V to U). Here is an example:
In this case:
U = {A,B,C}
V = {D,E,F}
E = {(A->E,7), (B->D,1), (C->E,3), (F->A,9)}
Definition: DirectionalMatching (I made up this term just to make things clearer): set of directed edges that may share the start or end vertices. That is, if U->V and U'->V' both belong to a DirectionalMatching then V /= U' and V' /= U but it may be that U = U' or V = V'.
My question: How to efficiently find a DirectionalMatching, as defined above, for a bipartite directional weighted graph which maximizes the sum of the weights of its edges?
By efficiently, I mean polynomial complexity or faster, I already know how to implement a naive brute force approach.
In the example above the maximum weighted DirectionalMatching is: {F->A,C->E,B->D}, with a value of 13.
Formally demonstrating the equivalence of this problem to any other well known problem in graph theory would also be valuable.
Thanks!
Note 1: This question is based on Maximum weighted bipartite matching _with_ directed edges but with the extra relaxation that it is allowed for edges in the matching to share the origin or destination. Since that relaxation makes a big difference, I created an independent question.
Note 2: This is a maximum weight matching. Cardinality (how many edges are present) and the number of vertices covered by the matching is irrelevant for a correct result. Only the maximum weight matters.
Note 2: During my research to solve the problem I found this paper, I think it would be helpful to others trying to find a solution: Alternating cycles and paths in edge-coloured
multigraphs: a survey
Note 3: In case it helps, you can also think of the graph as its equivalent 2-edge coloured undirected bipartite multigraph. The problem formulation would then turn into: Find the set of edges without colour-alternating paths or cycles which has maximum weight sum.
Note 4: I suspect that the problem might be NP-hard, but I am not that experienced with reductions so I haven't managed to prove it yet.
Yet another example:
Imagine you had
4 vertices: {u1, u2} {v1, v2}
4 edges: {u1->v1, u1->v2, u2->v1, v2->u2}
Then, regardless of their weights, u1->v2 and v2->u2 cannot be in the same DirectionalMatching, neither can v2->u2 and u2->v1. However u1->v1 and u1->v2 can, and so can u1->v1 and u2->v1.
Define a new undirected graph G' from G as follows.
G' has a node (A, B) with weight w for each directed edge (A, B) with weight w in G
G' has undirected edge ((A, B),(B, C)) if (A, B) and (B, C) are both directed edges in G
http://en.wikipedia.org/wiki/Line_graph#Line_digraphs
Now find a maximal (weighted) independent vertex set in G'.
http://en.wikipedia.org/wiki/Vertex_independent_set
Edit: stuff after this point only works if all of the edge weights are the same - when the edge weights have different values its a more difficult problem (google "maximum weight independent vertex set" for possible algorithms)
Typically this would be an NP-hard problem. However, G' is a bipartite graph -- it contains only even cycles. Finding the maximal (weighted) independent vertex set in a bipartite graph is not NP-hard.
The algorithm you will run on G' is as follows.
Find the connected components of G', say H_1, H_2, ..., H_k
For each H_i do a 2-coloring (say red and blue) of the nodes. The cookbook approach here is to do a depth-first search on H_i alternating colors. A simple approach would be to color each vertex in H_i based on whether the corresponding edge in G goes from U to V (red) or from V to U (blue).
The two options for which nodes to select from H_i are either all the red nodes or all the blue nodes. Choose the colored node set with higher weight. For example, the red node set has weight equal to H_i.nodes.where(node => node.color == red).sum(node => node.w). Call the higher-weight node set N_i.
Your maximal weighted independent vertex set is now union(N_1, N_2, ..., N_k).
Since each vertex in G' corresponds to one of the directed edges in G, you have your maximal DirectionalMatching.
This problem can be solved in polynomial time using the Hungarian Algorithm. The "proof" by Vor above is wrong.
The method of structuring the problem for the above example is as follows:
D E F
A # 7 9
B 1 # #
C # 3 #
where "#" means negative infinity. You then resolve the matrix using the Hungarian algorithm to determine the maximum matching. You can multiply the numbers by -1 if you want to find a minimum matching.
I have an unweighted, connected graph. I want to find a connected subgraph that definitely includes a certain set of nodes, and as few extras as possible. How could this be accomplished?
Just in case, I'll restate the question using more precise language. Let G(V,E) be an unweighted, undirected, connected graph. Let N be some subset of V. What's the best way to find the smallest connected subgraph G'(V',E') of G(V,E) such that N is a subset of V'?
Approximations are fine.
This is exactly the well-known NP-hard Steiner Tree problem. Without more details on what your instances look like, it's hard to give advice on an appropriate algorithm.
I can't think of an efficient algorithm to find the optimal solution, but assuming that your input graph is dense, the following might work well enough:
Convert your input graph G(V, E) to a weighted graph G'(N, D), where N is the subset of vertices you want to cover and D is distances (path lengths) between corresponding vertices in the original graph. This will "collapse" all vertices you don't need into edges.
Compute the minimum spanning tree for G'.
"Expand" the minimum spanning tree by the following procedure: for every edge d in the minimum spanning tree, take the corresponding path in graph G and add all vertices (including endpoints) on the path to the result set V' and all edges in the path to the result set E'.
This algorithm is easy to trip up to give suboptimal solutions. Example case: equilateral triangle where there are vertices at the corners, in midpoints of sides and in the middle of the triangle, and edges along the sides and from the corners to the middle of the triangle. To cover the corners it's enough to pick the single middle point of the triangle, but this algorithm might choose the sides. Nonetheless, if the graph is dense, it should work OK.
The easiest solutions will be the following:
a) based on mst:
- initially, all nodes of V are in V'
- build a minimum spanning tree of the graph G(V,E) - call it T.
- loop: for every leaf v in T that is not in N, delete v from V'.
- repeat loop until all leaves in T are in N.
b) another solution is the following - based on shortest paths tree.
- pick any node in N, call it v, let v be a root of a tree T = {v}.
- remove v from N.
loop:
1) select the shortest path from any node in T and any node in N. the shortest path p: {v, ... , u} where v is in T and u is in N.
2) every node in p is added to V'.
3) every node in p and in N is deleted from N.
--- repeat loop until N is empty.
At the beginning of the algorithm: compute all shortest paths in G using any known efficient algorithm.
Personally, I used this algorithm in one of my papers, but it is more suitable for distributed enviroments.
Let N be the set of nodes that we need to interconnect. We want to build a minimum connected dominating set of the graph G, and we want to give priority for nodes in N.
We give each node u a unique identifier id(u). We let w(u) = 0 if u is in N, otherwise w(1).
We create pair (w(u), id(u)) for each node u.
each node u builds a multiset relay node. That is, a set M(u) of 1-hop neigbhors such that each 2-hop neighbor is a neighbor to at least one node in M(u). [the minimum M(u), the better is the solution].
u is in V' if and only if:
u has the smallest pair (w(u), id(u)) among all its neighbors.
or u is selected in the M(v), where v is a 1-hop neighbor of u with the smallest (w(u),id(u)).
-- the trick when you execute this algorithm in a centralized manner is to be efficient in computing 2-hop neighbors. The best I could get from O(n^3) is to O(n^2.37) by matrix multiplication.
-- I really wish to know what is the approximation ration of this last solution.
I like this reference for heuristics of steiner tree:
The Steiner tree problem, Hwang Frank ; Richards Dana 1955- Winter Pawel 1952
You could try to do the following:
Creating a minimal vertex-cover for the desired nodes N.
Collapse these, possibly unconnected, sub-graphs into "large" nodes. That is, for each sub-graph, remove it from the graph, and replace it with a new node. Call this set of nodes N'.
Do a minimal vertex-cover of the nodes in N'.
"Unpack" the nodes in N'.
Not sure whether or not it gives you an approximation within some specific bound or so. You could perhaps even trick the algorithm to make some really stupid decisions.
As already pointed out, this is the Steiner tree problem in graphs. However, an important detail is that all edges should have weight 1. Because |V'| = |E'| + 1 for any Steiner tree (V',E'), this achieves exactly what you want.
For solving it, I would suggest the following Steiner tree solver (to be transparent: I am one of the developers):
https://scipjack.zib.de/
For graphs with a few thousand edges, you will usually get an optimal solution in less than 0.1 seconds.