Vertex with minimal distance to n other vertices - algorithm

Given a directed, weighted, cyclic graph, and minimal path distance between vertices given by m(x,y), find the vertex v that minimises m(a,v) + m(b,v) + m(c,v) + ... for n vertices a, b, c...
For example if the graph was undirected and we wanted the vertex v with minimal paths to vertices a and b, v would just be the vertex in the centre of the minimal path from a to b.
I can imagine an approach involving depth traversal etc, but wanted to ask what SO would suggest - Thanks hope this was clear.

Now that I am thinking a bit more about it, you should probably look at Bidirectional Search.
The basic idea would be to start a Dijkstra from each of your query nodes (a, b, c, ...) at the same time. The first vertex that is found by all of them is your result vertex.
You can implement this by putting all (unvisited vertex, distance, query vertex)triplets you encounter in one priority queue (sorted by distance) and process it similar to Dijkstra. You will need to label the the vertices you have seen with the query node from which you reached them. When a vertex is labeled with all query vertices then this is your result (let's call it median vertex).

Related

Find an O(n) algorithm that returns all interesting vertices of the graph

Problem : A directed graph G with n vertices and a special vertex u is provided. We call a vertex v ‘interesting’ if there is a path from v to a vertex w such that there is a cycle containing the vertices w and u. Write an O(n) time algorithm which takes G (the whole graph) and the node u as input and returns all the interesting vertices.
Ineffiecient Algorithm : My idea initially was to consider the node u and compute all the cycles that contain u. (This itself seems like traversing through the nodes using DFS and then forward-tracking as well when you encounter a visited node) Now from each vertex on these cycles we can compute the number of nodes on the graph that do not belong to the cycle(s) but is connected with each particular vertex w not equal to u on a cycle. Add all these values to get the desired answer. This isn't an O(n) algorithm.
There are two cases:
If there are no cycles containing u, then no vertex can be "interesting".
If there are any cycles containing u, then a vertex v is "interesting" if and only if there's a path from v to u. (We don't need to worry about the w in the problem description, because if a cycle contains two vertices u and w, then any path that ends at u can be extended to end at w and vice versa.)
So, the most efficient algorithm would be as follows:
Using DFS, determine if u is in a cycle. (We don't need to find all cycles; we just need to determine whether there are any.)
Using DFS in the "reverse" direction, find all vertices from which u is reachable.
DFS requires O(|V| + |E|) time, which is greater than O(n) = O(|V|) unless |E| is in O(n); but then, there's no way to even read in the entire graph definition in less than |E| time, so this is unavoidable. Whoever gave you this question must not have really thought this through.

Efficient algorithm to extract a subgraph within a maximum distance from multiple vertices

I have an algorithmic problem where there's a straightforward solution, but it seems wasteful. I'm wondering if there's a more efficient way to do the same thing.
Here's the problem:
Input: A large graph G with non-negative edge weights (interpreted as lengths), a list of vertices v, and a list of distances d the same length as v.
Output: The subgraph S of G consisting of all of the vertices that are at a distance of at most d[i] from v[i] for some i.
The obvious solution is to use Dijkstra's algorithm starting from each v[i], modified so that it bails out after hitting a distance of d[i], and then taking the union of the subgraphs that each search traverses. However, in my use case it's frequently going to be the case that the search trees from the v[i]s overlap substantially. That means the Dijkstra approach will wastefully traverse the vertices in the overlap multiple times before I take the union.
In the case that there is only one vertex in v, the Dijkstra approach runs in O(|S|log|S|), taking |S| to be the number of vertices (my graph is sparse, so I ignore the edges term). Is it possible to achieve the same asymptotic run time when v has more than one vertex?
My first idea was to combine the searches out of each v[i] into the same priority queue, but the "bail out" condition mentioned above complicates this approach. Sometimes a vertex will be reached in a shorter distance from one v[i], but you would still want to search through it from another v[j] if the second vertex has a larger d[j] allotted to it.
Thanks!
You can solve this with the complexity of a single Dijkstra run.
Let D be the maximum of the distances in d.
Define a new start vertex, and give it edges to each of the vertices in v.
The length of the edge between start and v[i] should be set to D-d[i].
Then in this new graph, S is given by all vertices within a length D of the start vertex, so apply Dijkstra to the start vertex.

Show that the heuristic solution to vertex cover is at most twice as large as the optimal solution

The heuristic solution that I've been given is:
Perform a depth-first-search on the graph
Delete all the leaves
The remaining graph forms a vertex cover
I've been given the question: "Show that this heuristic is at most twice as large as the optimal solution to the vertex cover". How can I show this?
I assume that the graph is connected (if it's not the case, we can solve this problem for each component separately).
I also assume that a dfs-tree is rooted and a leaf is a vertex that doesn't have children in the rooted dfs-tree (it's important. If we define it differently, the algorithm may not work).
We need to show to things:
The set of vertices returned by the algorithm is a vertex cover. Indeed, there can be only types of edges in the dfs-tree of any undirected graph: tree edges (such an edge is covered as at least on of its endpoints is not a leaf) and a back edge (again, one of its endpoint is not a leaf because back edge goes from a vertex to its ancestor. A leaf cannot be an ancestor of a leaf).
Let's consider the dfs-tree and ignore the rest of the edges. I'll show that it's not possible to cover tree edges using less than half non-leave vertices. Let S be a minimum vertex cover. Consider a vertex v, such that v is not a leaf and v is not in S (that is, v is returned by the heuristic in question but it's not in the optimal answer). v is not a leaf, thus there is an edge v -> u in the dfs-tree (where u is a successor of v). The edge v -> u is covered by S. Thus, u is in S. Let's define a mapping f from vertices returned by the heuristic that are not in S as f(v) = u (where v and u have the same meaning as in the previous sentence). Note that v is a parent of u in the dfs-tree. But there can be only one parent for any vertex in a tree! Thus, f is an injection. It means that the number of vertices in the set returned by the heuristic but not in the optimal answer is not greater than the size of the optimal answer. That's exactly what we needed to show.
Bad news: heuristics does not work.
Strictly said, 1 isolated vertex is counter-example for the question.
Nevertheless, heuristic does not provide vertex cover solution at all, even if you correct it for isolated vertex and for 2-point cliques.
Take a look at fully connected graphs with number of vertexes from 1 to 3:
1 - strictly said, isolated vertex is not a leaf (it has degree 0, while leaf is a vertex with degree 1), so heuristic will keep it, while vertex cover will not
2 - heuristic will drop both leaves, while vertex cover will keep at least 1 of them
3 - heuristic will leave 1 vertex, while vertex cover has to keep at least 2 vertexes of this clique

Minimum sum of distances from sensor nodes to all others

Is there a way to compute (accurate or hevristics) this problem on medium sized (up to 1000 nodes) weighted graph?
Place n (for example 5) sensors in nodes of the graph in such way that the sum of distances from every other node to the closest sensor will be minimal.
I'll show that this problem is NP-hard by reduction from Vertex Cover. This applies even if the graph is unweighted (you don't say whether it's weighted or not).
Given an unweighted graph G = (V, E) and an integer k, the question asked by Vertex Cover is "Does there exist a set of at most k vertices such that every edge has at least one endpoint in this set?" We will build a new graph G' = (V', E), which is the same as G except that all isolated vertices have been discarded, solve your problem on G', and then use it to answer the original question about Vertex Cover.
Suppose there does exist such a set S of k vertices. If we consider this set S to be the locations to put sensors in your problem, then every vertex in S has a distance of 0, and every other vertex is at a distance of exactly 1 away from a vertex that is in S (because if there was some vertex u for which this wasn't true, it would mean that none of u's neighbours are in S, so for each such neighbour u, the edge uv is not covered by the vertex cover, which would be a contradiction.)
This type of problem is called graph clustering. One of the popular methods to solve it is the Markov Cluster (MCL) Algorithm. A web search should provide some implementation examples. However it does not generally provide the optimal solution.

minimum connected subgraph containing a given set of nodes

I have an unweighted, connected graph. I want to find a connected subgraph that definitely includes a certain set of nodes, and as few extras as possible. How could this be accomplished?
Just in case, I'll restate the question using more precise language. Let G(V,E) be an unweighted, undirected, connected graph. Let N be some subset of V. What's the best way to find the smallest connected subgraph G'(V',E') of G(V,E) such that N is a subset of V'?
Approximations are fine.
This is exactly the well-known NP-hard Steiner Tree problem. Without more details on what your instances look like, it's hard to give advice on an appropriate algorithm.
I can't think of an efficient algorithm to find the optimal solution, but assuming that your input graph is dense, the following might work well enough:
Convert your input graph G(V, E) to a weighted graph G'(N, D), where N is the subset of vertices you want to cover and D is distances (path lengths) between corresponding vertices in the original graph. This will "collapse" all vertices you don't need into edges.
Compute the minimum spanning tree for G'.
"Expand" the minimum spanning tree by the following procedure: for every edge d in the minimum spanning tree, take the corresponding path in graph G and add all vertices (including endpoints) on the path to the result set V' and all edges in the path to the result set E'.
This algorithm is easy to trip up to give suboptimal solutions. Example case: equilateral triangle where there are vertices at the corners, in midpoints of sides and in the middle of the triangle, and edges along the sides and from the corners to the middle of the triangle. To cover the corners it's enough to pick the single middle point of the triangle, but this algorithm might choose the sides. Nonetheless, if the graph is dense, it should work OK.
The easiest solutions will be the following:
a) based on mst:
- initially, all nodes of V are in V'
- build a minimum spanning tree of the graph G(V,E) - call it T.
- loop: for every leaf v in T that is not in N, delete v from V'.
- repeat loop until all leaves in T are in N.
b) another solution is the following - based on shortest paths tree.
- pick any node in N, call it v, let v be a root of a tree T = {v}.
- remove v from N.
loop:
1) select the shortest path from any node in T and any node in N. the shortest path p: {v, ... , u} where v is in T and u is in N.
2) every node in p is added to V'.
3) every node in p and in N is deleted from N.
--- repeat loop until N is empty.
At the beginning of the algorithm: compute all shortest paths in G using any known efficient algorithm.
Personally, I used this algorithm in one of my papers, but it is more suitable for distributed enviroments.
Let N be the set of nodes that we need to interconnect. We want to build a minimum connected dominating set of the graph G, and we want to give priority for nodes in N.
We give each node u a unique identifier id(u). We let w(u) = 0 if u is in N, otherwise w(1).
We create pair (w(u), id(u)) for each node u.
each node u builds a multiset relay node. That is, a set M(u) of 1-hop neigbhors such that each 2-hop neighbor is a neighbor to at least one node in M(u). [the minimum M(u), the better is the solution].
u is in V' if and only if:
u has the smallest pair (w(u), id(u)) among all its neighbors.
or u is selected in the M(v), where v is a 1-hop neighbor of u with the smallest (w(u),id(u)).
-- the trick when you execute this algorithm in a centralized manner is to be efficient in computing 2-hop neighbors. The best I could get from O(n^3) is to O(n^2.37) by matrix multiplication.
-- I really wish to know what is the approximation ration of this last solution.
I like this reference for heuristics of steiner tree:
The Steiner tree problem, Hwang Frank ; Richards Dana 1955- Winter Pawel 1952
You could try to do the following:
Creating a minimal vertex-cover for the desired nodes N.
Collapse these, possibly unconnected, sub-graphs into "large" nodes. That is, for each sub-graph, remove it from the graph, and replace it with a new node. Call this set of nodes N'.
Do a minimal vertex-cover of the nodes in N'.
"Unpack" the nodes in N'.
Not sure whether or not it gives you an approximation within some specific bound or so. You could perhaps even trick the algorithm to make some really stupid decisions.
As already pointed out, this is the Steiner tree problem in graphs. However, an important detail is that all edges should have weight 1. Because |V'| = |E'| + 1 for any Steiner tree (V',E'), this achieves exactly what you want.
For solving it, I would suggest the following Steiner tree solver (to be transparent: I am one of the developers):
https://scipjack.zib.de/
For graphs with a few thousand edges, you will usually get an optimal solution in less than 0.1 seconds.

Resources