Efficient algorithm to extract a subgraph within a maximum distance from multiple vertices - algorithm

I have an algorithmic problem where there's a straightforward solution, but it seems wasteful. I'm wondering if there's a more efficient way to do the same thing.
Here's the problem:
Input: A large graph G with non-negative edge weights (interpreted as lengths), a list of vertices v, and a list of distances d the same length as v.
Output: The subgraph S of G consisting of all of the vertices that are at a distance of at most d[i] from v[i] for some i.
The obvious solution is to use Dijkstra's algorithm starting from each v[i], modified so that it bails out after hitting a distance of d[i], and then taking the union of the subgraphs that each search traverses. However, in my use case it's frequently going to be the case that the search trees from the v[i]s overlap substantially. That means the Dijkstra approach will wastefully traverse the vertices in the overlap multiple times before I take the union.
In the case that there is only one vertex in v, the Dijkstra approach runs in O(|S|log|S|), taking |S| to be the number of vertices (my graph is sparse, so I ignore the edges term). Is it possible to achieve the same asymptotic run time when v has more than one vertex?
My first idea was to combine the searches out of each v[i] into the same priority queue, but the "bail out" condition mentioned above complicates this approach. Sometimes a vertex will be reached in a shorter distance from one v[i], but you would still want to search through it from another v[j] if the second vertex has a larger d[j] allotted to it.
Thanks!

You can solve this with the complexity of a single Dijkstra run.
Let D be the maximum of the distances in d.
Define a new start vertex, and give it edges to each of the vertices in v.
The length of the edge between start and v[i] should be set to D-d[i].
Then in this new graph, S is given by all vertices within a length D of the start vertex, so apply Dijkstra to the start vertex.

Related

A BFS Algorithm for Weighted Graphs - To Find Shortest Distance

I've seen quite a few posts (viz. post1, post2, post3) on this topic but none of the posts provides an algorithm to back up respective queries. Consequently I'm not sure to accept the answers to those posts.
Here I present a BFS based shortest-path (single source) algorithm that works for non-negative weighted graph. Can anyone help me understand why BFS (in light of below BFS based algorithm) are not used for such problems (involving weighted graph)!
The Algorithm:
SingleSourceShortestPath (G, w, s):
//G is graph, w is weight function, s is source vertex
//assume each vertex has 'col' (color), 'd' (distance), and 'p' (predecessor)
properties
Initialize all vertext's color to WHITE, distance to INFINITY (or a large number
larger than any edge's weight, and predecessor to NIL
Q:= initialize an empty queue
s.d=0
s.col=GREY //invariant, only GREY vertex goes inside the Q
Q.enqueue(s) //enqueue 's' to Q
while Q is not empty
u = Q.dequeue() //dequeue in FIFO manner
for each vertex v in adj[u] //adj[u] provides adjacency list of u
if v is WHITE or GREY //candidate for distance update
if u.d + w(u,v) < v.d //w(u,v) gives weight of the
//edge from u to v
v.d=u.d + w(u,v)
v.p=u
if v is WHITE
v.col=GREY //invariant, only GREY in Q
Q.enqueue(v)
end-if
end-if
end-if
end-for
u.col=BLACK //invariant, don't update any field of BLACK vertex.
// i.e. 'd' field is sealed
end-while
Runtime: As far as I see it is O(|V| + |E|) including the initialization cost
If this algorithm resembles any existing one, pls let me know
Since the pseudocode is Dijksta's algorithm with FIFO queue instead of priority queue that is always sorted based on the distances. The crucial invariant that each visited (black) vertex has computed shortest distance possible so far will not be necessarily true. And that is the reason why priority queue is a must for computation of distance in (positively) weighted graphs.
You can use your algorithm for unweighted graphs or make it unweighed by replacing each edge with weight n with n-1 vertexes connected by edges with weight 1.
Counterexample:
State of the computation after first Q.enqueue(s):
State of the computation after first iteration:
Important for this graph to be a counterexample is that adj[u] = adj[S] = [F, M] and not [M, F], hence F is queued first by Q.enqueue(v)
State of the computation after second iteration:
Since vertex F is dequeued first by u = Q.dequeue() (unlike when distance priority queue is used), this iteration will not update any distance, F will become black and the invariant will be violated.
State of the computation after the final iteration:
I used to have same confusion. Check out SPFA algorithm. When the author published this algorithm back in 1994, he claimed it has a better performance than Dijkstra with O(E) complexity, which is wrong.
You can treat this algorithm as an variation/improve of Bellman-Ford. Worst case complexity is still O(VE), since one node may be add/remove from the queue multiple times. But for random sparse graph, it definitely outperforms original Bellman-Ford since it skip lots of unnecessary relaxing steps.
Although this name "SPFA" seems not well accepted in academia, upon published it became very popular among ACM students due to its simplicity and ease to implement. Performance wise Dijkstra is preferred.
It looks like you implemented Dijkstra's classical algorithm, without a heap. You are going through the matrix through each edge and then seeing if you can improve the distance.
Normally people say it's BFS when there is no edge weight.
BFS: graph with constant edge weight.
Dijkstra: graph with edge weights (can handle some negative edges if
it doens't have negative cycle)
Bellman-ford and SPFA: graph with negative cycle.
Your code is Dijkastra or SPFA variant and not a simple BFS (although it IS based on BFS based alrorithm)

Find the Sunflower subgraph induced in a graph in polynomial amount of time.

A subgraph Sn of a graph G is a sunflower graph, if consists of a Cycle Cn = {v1,v2,..,vn} of n vertices together with other n independent vertices {u1,u2,...,un} such that for each i, ui is adjacent to vi and vj, where j = i-1(mod n).
You could think of a sunflower - in the sense of the question - as a cycle of triangles. In time O(N^3) you can check each triple of points to see if it is a triangle and create a new graph whose vertices denote triangles in the original graph and where two vertices are linked if the two triangles share one or more vertices.
Then a depth first search looking for back edges should find cycles in this graph. Not all cycles are good. I think it may be enough to check that no two successive edges in the supposed cycle in the derived graph are produced by the same vertex in the original graph, and that you can check this as part of the depth first search. It may take some detailed analysis of cases to establish this, unless you can find a neat proof.

Path from s to e in a weighted DAG graph with limitations

Consider a directed graph with n nodes and m edges. Each edge is weighted. There is a start node s and an end node e. We want to find the path from s to e that has the maximum number of nodes such that:
the total distance is less than some constant d
starting from s, each node in the path is closer than the previous one to the node e. (as in, when you traverse the path you are getting closer to your destination e. in terms of the edge weight of the remaining path.)
We can assume there are no cycles in the graph. There are no negative weights. Does an efficient algorithm already exist for this problem? Is there a name for this problem?
Whatever you end up doing, do a BFS/DFS starting from s first to see if e can even be reached; this only takes you O(n+m) so it won't add to the complexity of the problem (since you need to look at all vertices and edges anyway). Also, delete all edges with weight 0 before you do anything else since those never fulfill your second criterion.
EDIT: I figured out an algorithm; it's polynomial, depending on the size of your graphs it may still not be sufficiently efficient though. See the edit further down.
Now for some complexity. The first thing to think about here is an upper bound on how many paths we can actually have, so depending on the choice of d and the weights of the edges, we also have an upper bound on the complexity of any potential algorithm.
How many edges can there be in a DAG? The answer is n(n-1)/2, which is a tight bound: take n vertices, order them from 1 to n; for two vertices i and j, add an edge i->j to the graph iff i<j. This sums to a total of n(n-1)/2, since this way, for every pair of vertices, there is exactly one directed edge between them, meaning we have as many edges in the graph as we would have in a complete undirected graph over n vertices.
How many paths can there be from one vertex to another in the graph described above? The answer is 2n-2. Proof by induction:
Take the graph over 2 vertices as described above; there is 1 = 20 = 22-2 path from vertex 1 to vertex 2: (1->2).
Induction step: assuming there are 2n-2 paths from the vertex with number 1 of an n vertex graph as described above to the vertex with number n, increment the number of each vertex and add a new vertex 1 along with the required n edges. It has its own edge to the vertex now labeled n+1. Additionally, it has 2i-2 paths to that vertex for every i in [2;n] (it has all the paths the other vertices have to the vertex n+1 collectively, each "prefixed" with the edge 1->i). This gives us 1 + Σnk=2 (2k-2) = 1 + Σn-2k=0 (2k-2) = 1 + (2n-1 - 1) = 2n-1 = 2(n+1)-2.
So we see that there are DAGs that have 2n-2 distinct paths between some pairs of their vertices; this is a bit of a bleak outlook, since depending on weights and your choice of d, you may have to consider them all. This in itself doesn't mean we can't choose some form of optimum (which is what you're looking for) efficiently though.
EDIT: Ok so here goes what I would do:
Delete all edges with weight 0 (and smaller, but you ruled that out), since they can never fulfill your second criterion.
Do a topological sort of the graph; in the following, let's only consider the part of the topological sorting of the graph from s to e, let's call that the integer interval [s;e]. Delete everything from the graph that isn't strictly in that interval, meaning all vertices outside of it along with the incident edges. During the topSort, you'll also be able to see whether there is a
path from s to e, so you'll know whether there are any paths s-...->e. Complexity of this part is O(n+m).
Now the actual algorithm:
traverse the vertices of [s;e] in the order imposed by the topological
sorting
for every vertex v, store a two-dimensional array of information; let's call it
prev[][] since it's gonna store information about the predecessors
of a node on the paths leading towards it
in prev[i][j], store how long the total path of length (counted in
vertices) i is as a sum of the edge weights, if j is the predecessor of the
current vertex on that path. For example, pres+1[1][s] would have
the weight of the edge s->s+1 in it, while all other entries in pres+1
would be 0/undefined.
when calculating the array for a new vertex v, all we have to do is check
its incoming edges and iterate over the arrays for the start vertices of those
edges. For example, let's say vertex v has an incoming edge from vertex w,
having weight c. Consider what the entry prev[i][w] should be.
We have an edge w->v, so we need to set prev[i][w] in v to
min(prew[i-1][k] for all k, but ignore entries with 0) + c (notice the subscript of the array!); we effectively take the cost of a
path of length i - 1 that leads to w, and add the cost of the edge w->v.
Why the minimum? The vertex w can have many predecessors for paths of length
i - 1; however, we want to stay below a cost limit, which greedy minimization
at each vertex will do for us. We will need to do this for all i in [1;s-v].
While calculating the array for a vertex, do not set entries that would give you
a path with cost above d; since all edges have positive weights, we can only get
more costly paths with each edge, so just ignore those.
Once you reached e and finished calculating pree, you're done with this
part of the algorithm.
Iterate over pree, starting with pree[e-s]; since we have no cycles, all
paths are simple paths and therefore the longest path from s to e can have e-s edges. Find the largest
i such that pree[i] has a non-zero (meaning it is defined) entry; if non exists, there is no path fitting your criteria. You can reconstruct
any existing path using the arrays of the other vertices.
Now that gives you a space complexity of O(n^3) and a time complexity of O(n²m) - the arrays have O(n²) entries, we have to iterate over O(m) arrays, one array for each edge - but I think it's very obvious where the wasteful use of data structures here can be optimized using hashing structures and other things than arrays. Or you could just use a one-dimensional array and only store the current minimum instead of recomputing it every time (you'll have to encapsulate the sum of edge weights of the path together with the predecessor vertex though since you need to know the predecessor to reconstruct the path), which would change the size of the arrays from n² to n since you now only need one entry per number-of-nodes-on-path-to-vertex, bringing down the space complexity of the algorithm to O(n²) and the time complexity to O(nm). You can also try and do some form of topological sort that gets rid of the vertices from which you can't reach e, because those can be safely ignored as well.

minimum connected subgraph containing a given set of nodes

I have an unweighted, connected graph. I want to find a connected subgraph that definitely includes a certain set of nodes, and as few extras as possible. How could this be accomplished?
Just in case, I'll restate the question using more precise language. Let G(V,E) be an unweighted, undirected, connected graph. Let N be some subset of V. What's the best way to find the smallest connected subgraph G'(V',E') of G(V,E) such that N is a subset of V'?
Approximations are fine.
This is exactly the well-known NP-hard Steiner Tree problem. Without more details on what your instances look like, it's hard to give advice on an appropriate algorithm.
I can't think of an efficient algorithm to find the optimal solution, but assuming that your input graph is dense, the following might work well enough:
Convert your input graph G(V, E) to a weighted graph G'(N, D), where N is the subset of vertices you want to cover and D is distances (path lengths) between corresponding vertices in the original graph. This will "collapse" all vertices you don't need into edges.
Compute the minimum spanning tree for G'.
"Expand" the minimum spanning tree by the following procedure: for every edge d in the minimum spanning tree, take the corresponding path in graph G and add all vertices (including endpoints) on the path to the result set V' and all edges in the path to the result set E'.
This algorithm is easy to trip up to give suboptimal solutions. Example case: equilateral triangle where there are vertices at the corners, in midpoints of sides and in the middle of the triangle, and edges along the sides and from the corners to the middle of the triangle. To cover the corners it's enough to pick the single middle point of the triangle, but this algorithm might choose the sides. Nonetheless, if the graph is dense, it should work OK.
The easiest solutions will be the following:
a) based on mst:
- initially, all nodes of V are in V'
- build a minimum spanning tree of the graph G(V,E) - call it T.
- loop: for every leaf v in T that is not in N, delete v from V'.
- repeat loop until all leaves in T are in N.
b) another solution is the following - based on shortest paths tree.
- pick any node in N, call it v, let v be a root of a tree T = {v}.
- remove v from N.
loop:
1) select the shortest path from any node in T and any node in N. the shortest path p: {v, ... , u} where v is in T and u is in N.
2) every node in p is added to V'.
3) every node in p and in N is deleted from N.
--- repeat loop until N is empty.
At the beginning of the algorithm: compute all shortest paths in G using any known efficient algorithm.
Personally, I used this algorithm in one of my papers, but it is more suitable for distributed enviroments.
Let N be the set of nodes that we need to interconnect. We want to build a minimum connected dominating set of the graph G, and we want to give priority for nodes in N.
We give each node u a unique identifier id(u). We let w(u) = 0 if u is in N, otherwise w(1).
We create pair (w(u), id(u)) for each node u.
each node u builds a multiset relay node. That is, a set M(u) of 1-hop neigbhors such that each 2-hop neighbor is a neighbor to at least one node in M(u). [the minimum M(u), the better is the solution].
u is in V' if and only if:
u has the smallest pair (w(u), id(u)) among all its neighbors.
or u is selected in the M(v), where v is a 1-hop neighbor of u with the smallest (w(u),id(u)).
-- the trick when you execute this algorithm in a centralized manner is to be efficient in computing 2-hop neighbors. The best I could get from O(n^3) is to O(n^2.37) by matrix multiplication.
-- I really wish to know what is the approximation ration of this last solution.
I like this reference for heuristics of steiner tree:
The Steiner tree problem, Hwang Frank ; Richards Dana 1955- Winter Pawel 1952
You could try to do the following:
Creating a minimal vertex-cover for the desired nodes N.
Collapse these, possibly unconnected, sub-graphs into "large" nodes. That is, for each sub-graph, remove it from the graph, and replace it with a new node. Call this set of nodes N'.
Do a minimal vertex-cover of the nodes in N'.
"Unpack" the nodes in N'.
Not sure whether or not it gives you an approximation within some specific bound or so. You could perhaps even trick the algorithm to make some really stupid decisions.
As already pointed out, this is the Steiner tree problem in graphs. However, an important detail is that all edges should have weight 1. Because |V'| = |E'| + 1 for any Steiner tree (V',E'), this achieves exactly what you want.
For solving it, I would suggest the following Steiner tree solver (to be transparent: I am one of the developers):
https://scipjack.zib.de/
For graphs with a few thousand edges, you will usually get an optimal solution in less than 0.1 seconds.

Is there an algorithm to find the best set of Pairs of vertices in a weighted graph without repetition?

Is there any efficient algorithm to find the set of edges with the following properties, in a complete weighted graph with an even number of vertices.
the set has the smallest, maximum edge weight for any set that meats the other criteria possible
every vertex is connected to exactly one edge in the set
All weights are positive
dI cannot think of one better than brute force, but I do not recognise it as NP hard.
One way to solve this problem in polynomial time is as follows:
Sort the edge weights in O(E log E) time
For each edge, assign it a pseudo-weight E' = 2^{position in the ordering} in ~O(E) time
Find the minimum weight perfect matching among pseudo-weights in something like O(V^3) time (depending on the algorithm you pick, it could be slower or faster)
This minimizes the largest edge that the perfect matching contains, which is exactly what you're looking for in something like O(V^3) time.
Sources for how to do part 3 are given below
[1] http://www2.isye.gatech.edu/~wcook/papers/match_ijoc.pdf
[2] http://www.cs.illinois.edu/class/sp10/cs598csc/Lectures/Lecture11.pdf
[3] http://www.cs.ucl.ac.uk/staff/V.Kolmogorov/papers/blossom5.ps
with sample C++ source available at http://ciju.wordpress.com/2008/08/10/min-cost-perfect-matching/
try this (I just thought this up so I've got neither the proof that it will give an optimum answer or whether it will produce a solution in every cases):
search for the heaviest vertices V(A, B)
remove vertice V from the graph
if A is only connected to a single other vertice T(A, C) then remove all other edges connected to C, repeat step 3 with those edges as well
if B is only connected to a single other vertice S(B, D) then remove all other edges connected to D, repeat step 4 with those edges as well
repeat from step #1

Resources