Minimum spanning tree for pairs of adjacent edges - algorithm

For some graph, there is a cost associated with each pair of adjacent edges. I hope to find a subgraph such that every point is connected and the cost is minimised (a minimum spanning tree).
For the above example, the solution will include the edges AB, BC and CD, but not DA, avoiding the expensive CDA and DAB triplets, and getting a score of 28 (weight of ABC + BCD).
To motivate this question, let's imagine that we're designing a road network between places, and whenever a car turns around a sharp bend it slows down. Creating the ideal network, one with a small number of sharp bends, may benefit from us taking node triplets into account.
The graphs I intend to apply this algorithm to will have 5,000 to 20,000 nodes, and 15,000 to 80,000 edges. Presumably, the function will be of this type or similar:
(
nodes: [T],
edges: [(int, int)],
distance: (a: T, b: T, c: T) => float
) => [(int, int)]
Where b is connected to both a and c, but a and c are not necessarily connected.
What algorithm solves this problem?
Thank you for any help you may give.

The quadratic objective feels like enough leeway to construct gadgets for an NP-hardness reduction, though I have no proof at this time.
Since your graph is sparse, I’m hoping that the max degree is small, especially given your comment about road networks. I’d suggest the following integer programming formulation:
Variables: for each edge {v, w}, let there be a 0-1 variable x(v, w) that is 1 if {v, w} belongs to the spanning tree and 0 otherwise. Also, for each vertex v and each nonempty subset S of edges incident to e, let there be a 0-1 variable y(v, S) that is 1 if the subset of edges incident to e in the tree is S and 0 otherwise.
Objective: minimize ∑v,S ∑{u,w}⊆S distance(u, v, w) y(v, S).
Initial constraints: we require that ∑v,S y(v, S) = 1, that is, each vertex has to choose exactly one neighborhood in the tree. We also require for each edge {v, w} that ∑v,S∋w y(v, S) = x(v, w), that is, the neighborhood that v chooses has to be consistent with whether the edge exists.
Connectivity constraints: right now nothing forces the solver to choose any edges at all. It’s possible to formulate connectivity constraints statically, but instead I’d recommend the following approach. Run the solver with the constraints so far and compute its connected components. If there’s exactly one component, great, it’s the optimal solution. Otherwise, for each component C, require that ∑{v,w}∈E(C,V∖C) x(v, w) ≥ 1 – that is, the tree contains at least one edge with one endpoint in C and one endpoint not in C – and try again.
I usually use OR-Tools because it’s the preferred library where I work, but you have many options.

Related

Need assistance in resolving this variation of shortest path algorithm

The below picture is the representation from the question which was asked to me during samsung interview. I had to write the program to find the minimum distance between I and M. There was an additional constraint that we can change one of the edges. For example, The edge FM can be moved to join edge L and M and the edge value will still be 4.
If you notice, the distance between I and M via I-> E -> F -> G -> M is 20. However, if we change one of the edges such that L to M edge value is 4 now. We have to move edge FM to join L and M now. By this method, the distance between I and M is 20.
An arbitrary edge u, v can be changed to u, t or t,v. It can not be changed to x,y. So one of the vertices in the edge has to be same.
Please find the picture below to illustrate the scenario -
So my problem is that I had to write the program for this. To find the minimum distance between two vertices, I thought of using Djikstra's algorithm. However , I as not sure how to take care of the additional constraint where I had the option of changing one of the vertices. If I could get some help to solve this, I would really appreciate it.
If we move an edge (A, B), the new end should be either the start S or the target T vertex (otherwise, the answer is not optimal).
Let's assume that we move the edge (A, B) and the new end is T (the case when it's S is handled similarly). We need to know the shortest path from S to A that doesn't use this edge (once we know it, we can update the answer with the S->A->T path).
Let's compute the shortest path from S to all other vertices using Dijkstra's algorithm.
Let's fix a vertex A and compute the two minimums of dist[B] + weight(A, B) for all B adjacent to A. Let's iterate over all edges adjacent to A. Let the current edge be (A, B). If dist[B] + weight(A, B) is equal to the first minimum, let d be the second minimum. Otherwise, let d be the first minimum. We need to update the answer with d + weight(A, B) (it means that (A, B) becomes (A, T) now).
This solution is linear in the size of the graph (not counting the Dijkstra's algorithm run time).
To avoid code duplication, we can handle the case when the edge is redirected to S by swapping S and T and running the same algorithm (the final answer is the minimum of the results of these two runs).
In the graph you've shown, the shortest path I see is I -> E -> F -> M with a length of 13.
Moving the edge F -> M so that it connects L -> M just makes things worse. The new shortest path is I -> E-> F -> L -> M with a length of 18.
The obvious answer is to move edge F -> M so that it connects I directly to M, giving a length of 4.
In other words, find the shortest edge that's connected to I or M and use it to connect I directly to M.
For future reference, it's highly unlikely that you'll be asked to implement Djikstra's algorithm from memory in an interview. So you need to look for something simpler.

Minimu Cut With the fewest Edges Algorithm

Let G = (V, E) be a flow network
with source s, sink t, and capacity function c(·). Assume that, for every
edge e ∈ E, c(e) is an integer. Define the size of an s-t cut (A, B) in G
to be the number of edges directed from A to B. Our goal is to identify,
from among all minimum cuts in G, a minimum cut whose size is as small
as possible.
Let us define a new capacity function c'(·) for G as follows. For each
edge e ∈ E, by c'(e) = m·c(e)+1. Suppose (A, B) is a minimum
cut in in G with respect to the capacity function c'(·).
(a) Show that (A, B) is a minimum cut with respect to the original capacity
function c(·).
(b) Show that, amongst all minimum cuts in G, (A, B) is a cut of smallest
size.
(c) Use the results of parts (a) and (b) to obtain a polynomial-time algorithm
to find a minimum cut of smallest size in a flow network.
How can I write a polynomial time algorithm for this? Any Idea?
I won't spoil the answer, but I will leave a hint to any student who finds this post in the future. Consider what happens if you take two min-cuts (A,B) and (C,D) in G, such that the number of edges in one is minimal and the number of edges in the other is not. Then map them to G' and consider the value of these two cuts.
Search up dijkstra's algorithm, it's usually used for shortest paths in a graph. I dont fully understand the algorithm you are trying to achieve but I feel it is very similar and the thinking behind dijstra's could be used

Single edge addition to minimize number of bridges in a graph

You are given a graph G with N vertices and M edges with N<=10^4 and M<=10^5. Now, you have to add exactly one edge (u,v) to the graph so that the total number of bridges is minimized. G may have multiple edges, but no self loops. On the other hand, the newly generated graph, after adding the edge, G', may have both self loops and multiple edges. If many such (u,v) with u<=v is possible then output the lexicographically smallest one (the vertices are numbered from 1..n).
A trivial idea would be to try all edges in order and then use the bridges finding algorithm to find the number of bridges. This takes time O(V^2 * E), so it is clearly useless. How to do better in terms of runtime ?
EDIT: Following advice by j_random_hacker, I add the following details about the source of the above problem. This is a problem named Computer Network (specifically problem 3) from India's IOI Training Camp '14 Practice Test (Test 3). It was an onsite offline test, so I cannot prove that it is not from a present contest, by giving a link. But I have a PDF of the problem statement.
This is not a complete answer but some ideas to steer you towards it:
To avoid having to run the bridge-finding algorithm after trying each possible edge, it pays to ask: By how much can adding a single edge (u, v) change the number of bridges in a graph G?
If u and v are not already connected by any path in G, then certainly (u, v) will itself become a bridge. What about the "bridgeness" (bridgity? bridgulence?) of all other pairs of vertices? Does it change? (Most importantly: Can any edge go from being a bridge to being a non-bridge? If you can prove that this can never happen, then you can immediately discard all such vertex pairs (u, v) from consideration as they can only ever make the situation worse.)
If u and v are already connected in G, there are 2 possibilities:
Every path P that connects them shares some edge (x, y) (note that x and y are not necessarily distinct from u and v). Then (x, y) is a bridge in G, and adding (u, v) will cause (x, y) to stop being a bridge, because it will then become possible to get from x to y "the long way", by going from x back to u, via the new edge (u, v) to v, and then back up to y. (This assumes that x is closer to u on P than y is, but clearly the argument still works if y is closer: just swap u and v.) There could be multiple such bridges (x, y): in that case, all of them will become non-bridges after (u, v) is added.
There are at least 2 edge-disjoint paths P and Q already connecting u and v. Obviously no edge (x, y) on P or Q can be a bridge, since if (x, y) on P were deleted, it's still possible to get from x to y "the long way" via Q. The question is, again: What about the bridgeness of all other vertex pairs? You should be able to prove that this property doesn't change, meaning that adding the edge (u, v) leaves the total number of bridges unchanged, and can therefore be disregarded as a useless move (unless there are no bridges at all to start with).
We see that 2.1 above is the only case in which adding an edge (u, v) can be useful. Furthermore, it seems that the more bridges we can find in a single path in G, the more of them we can neutralise by choosing to connect the endpoints of that path.
So it seems like "Find the path in G that contains the most bridges" might be the right criterion. But first we need to ask ourselves: Does the number of bridges in a path P accurately count the number of bridges eliminated by adding an edge from the start of P to the end? (We know that adding such an edge must eliminate at least those bridges, but perhaps some others are also eliminated as a "side effect" -- and if so, then we need to count them somehow to make sure that we add the edge that eliminates the most bridges overall.)
Happily the answer is that no other bridges are eliminated. This time I'll do the proof myself.
Suppose that there is a path P from u to v, and suppose to the contrary that adding the edge (u, v) would eliminate a bridge (x, y) that is not on P. Then it must be that the single edge (x, y) is the only path from x to y, and that adding (u, v) would create a second path Q from x, via the edge (u, v) in either direction, to y that avoids the edge (x, y). But for any such Q, we could replace the edge (u, v) in Q with the path P, which from our initial assumption avoids (x, y), and still get a path Q' from x to y that avoids the edge (x, y) -- this means that (x, y) must have already been connected by two edge-disjoint paths (namely the single edge (x, y) and Q'), so it could not have been a bridge in the first place. Since this is a contradiction, it follows that no such "removed as a side effect" bridge (x, y) can exist.
So "Find the path in G that contains the most bridges, and add an edge between its endpoints" definitely gives the right answer -- but there is still a problem: this sounds a lot like the Longest Path problem, which is NP-hard for general graphs, and therefore slow to solve.
However, there is a way out. (There must be: you already have an O(V^2*E) algorithm, so it can't be that your problem is NP-hard :-) ) Think of the biconnected components in your input graph G as being vertices in another graph G'. What do the edges between these vertices (in G') correspond to in G? Do they have any particular structure? Final (big) hint: What is a critical path?
This answer is a spoiler. You should probably think along with j_random_hacker's answer instead.
If I understand your problem correctly:
Think of the graph as a tree of biconnected components. Find the longest path in this tree and link up its ends with the new edge.
There is a linear-time algorithm for finding biconnected components using depth first search. Finding the longest path in a tree takes linear time and can be done using depth-first search---make it do "find the farthest vertex and return both it and its distance" and use that. So this takes linear time overall.
(You can roll it all into a single depth-first search that returns the number of bridge edges in the bridgiest path and the farthest vertex in said bridgiest path.)

graph traversal - finding 2 shortest paths for 2 entities such they are never in contact (both at the same node)

I'm trying to figure out something like this.
In a graph, there are 2 start nodes and 2 destination nodes. Find the optimal paths from the 1st start to the 1st destination, and the other start to the other destination, such that if two entities were to travel along these paths, they would never be at the same node.
My first thought (although I really have no idea) would be to use any shortest path algorithm, let's say Dijkstra's. Run the algorithm once for the first entity, and store the node chosen for every step in an array. Run the algorithm a second time for the second entity, and if the chosen node for a step is the same as it was for the first entity at that array index, then they would collide, so choose the next best node instead. There must be a better way to do this.
Could use some suggestions. Thanks!
I think you might consider a dynamic programming approach. At iteration 0, let s_0 = {(origin_0, origin_1)}. For iteration k+1, let s_k+1 = {(x,y) | x != y, exists an (prev_x, prev_y) in s_k s.t. e(prev_x, x) in E and e(prev_y, y) in E}. This should proceed forward with |s| < V^2 for every s. So if the best case distance is d, you should be able to do this in d*V^2 time. Good luck on doing better!
Update: The above solution actually runs in d * E^2, as per the comments below. Note that it will converge within d = V^2 iterations, so the total time is (VE)^2. But more importantly, this algorithm is actually the same as just running Bellman Ford on the product graph G' = (V', E') where V' = {(x,y) | x <- V, y <- V, x != y} and two nodes u = (x,y), v = (x',y') in V' share an edge if there is an edge e(x,x') and e(y,y') in the original edge set E. But now that we've defined our algorithm as Bellman Ford on the product graph, we might as well run Dijkstra's Algorithm! Note that the order of G' (number of vertices) is V^2 - V = O(V^2), and the number of edges in G' is O(E^2). Thus, running Dijkstra's using a Fibbonacci heap will take us at most O(E^2 + V^2 log V), which for anything other than the sparsest of graphs will essentially be O(E^2)! That's a major improvement. If you actually want to run this, you can use a graph library to build the product graph, and then just call shortest paths from (x0, y0) to (xT, yT).
The memory cost of this algorithm is O(E^2) because that's what it takes to explicitly form the product graph. You really don't need to do that though - Dijkstra's only needs to keep the vertices min cost in memory, and you could generate the edges on the fly to keep that down to O(V^2) memory. The code will be a lot uglier though, as you may have to roll Dijkstra's yourself. Also, if you're operating on a really big graph that you couldn't possible precompute the product of, you might consider running iterative deepening (http://en.wikipedia.org/wiki/Iterative_deepening_depth-first_search) on the product graph. If this problem is actually important for a real world use case of yours, feel free to ask more.
Here's what should be a quicker way. As above, run the two queries in parallel, but store two costs for each arc, one for each query. Initially both costs are the same for all arcs. When a query step reaches a node, set the costs for the other query, for all incoming arcs of that node, to infinity.

How hard is this in terms of computational complexity?

So I have a problem that is basically like this: I have a bunch of strings, and I want to construct a DAG such that every path corresponds to a string and vice versa. However, I have the freedom to permute my strings arbitrarily. The order of characters does not matter. The DAGs that I generate have a cost associated with them. Basically, the cost of a branch in the DAG is proportional to the length of its child paths.
For example, let's say I have the strings BAAA, CAAA, DAAA, and I construct a DAG representing them without permuting them. I get:
() -> (B, C, D) -> A -> A -> A
where the tuple represents branching.
A cheaper representation for my purposes would be:
() -> A -> A -> A -> (B, C, D)
The problem is: Given n strings, permute the strings such that the corresponding DAG has the cheapest cost, where the cost function is: If we traverse the graph from the source in depth first, left to right order, the total number of nodes we visit, with multiplicity.
So the cost of the first example is 12, because we must visit the A's multiple times on the traversal. The cost of the second example is 6, because we only visit the A's once before we deal with the branches.
I have a feeling this problem is NP Hard. It seems like a question about formal languages and I'm not familiar enough with those sorts of algorithms to figure out how I should go about the reduction. I don't need a complete answer per se, but if someone could point out a class of well known problems that seem related, I would much appreciate it.
To rephrase:
Given words w1, …, wn, compute permutations x1 of w1, …, xn of wn to minimize the size of the trie storing x1, …, xn.
Assuming an alphabet of unlimited size, this problem is NP-hard via a reduction from vertex cover. (I believe it might be fixed-parameter tractable in the size of the alphabet.) The reduction is easy: given a graph, let each vertex be its own letter and create a two-letter word for each edge.
There is exactly one node at depth zero, and as many nodes at depth two as there are edges. The possible sets of nodes at depth one are exactly the sets of nodes that are vertex covers.

Resources