Minimal spanning tree with K extra node - algorithm

Assume we're given a graph on a 2D-plane with n nodes and edge between each pair of nodes, having a weight equal to a euclidean distance. The initial problem is to find MST of this graph and it's quite clear how to solve that using Prim's or Kruskal's algorithm.
Now let's say we have k extra nodes, which we can place in any integer point on our 2D-plane. The problem is to find locations for these nodes so as new graph has the smallest possible MST, if it is not necessary to use all of these extra nodes.
It is obviously impossible to find the exact solution (in poly-time), but the goal is to find the best approximate one (which can be found within 1 sec). Maybe you can come up with some hints of the most efficient way of going throw possible solutions, or provide with some articles, where the similar problem is covered.

It is very interesting problem which you are working on. You have many options to attack this problem. The best known heuristics in such situation are - Genetic Algorithms, Particle Swarm Optimization, Differential Evolution and many others of this kind.
What is nice for such kind of heuristics is that you can limit their execution to a certain amount of time (let say 1 second). If it was my task to do I would try first Genetic Algorithms.

You could try with a greedy algorithm, try the longest edges in the MST, potentially these could give the largest savings.
Select the longest edge, now get the potential edge from each vertex that are closed in angle to the chosen one, from each side.
from these select the best Steiner point.
Fix the MST ...
repeat until 1 sec is gone.
The challenge is what to do if one of the vertexes is itself a Steiner point.

Related

Is there an efficient algorithm for finding or approximating the shortest walk of a graph which must visit some subset of vertices of the graph?

The title is a mouth-full, but put simply, I have a large, undirected, incomplete graph, and I need to visit some subset of vertices in the (approximately) shortest time possible. Note that this isn't TSP, as I don't need to visit all vertices.
The naive approach would be to simply brute-force the solution by trying every possible walk which includes the required vertices, using A*, for example, to calculate the walks between required vertices. However, this is O(n!) where n is the number of required vertices. This is unfeasible for me, as n > 40 in my average case, and n ≈ 80 in my worst case.
Is there a more efficient algorithm for this, perhaps one that approximates the solution?
This question is similar to the question here, but differs in the fact that my graph is larger than the one in the linked question. There are several other similar questions, but none that I've seen exactly solve my specific problem.
If you allow visiting the same nodes several times, find the shortest path between each pair of mandatory vertices. Then you solve the TSP between the mandatory vertices, using the above shortest path costs. If you disallow multiple visits, the problem is much worse.
I am afraid you cannot escape the TSP.

Maximum weight matching (MWM) for a predefined number of nodes

I am given a weighted graph and want to find a set of edges such that every node is only incident to one edge and such that the sum of the selected edge weights is maximized. As far as I know this problem is generally referred to as maximum weight matching and there exist fast approximations for it: https://web.eecs.umich.edu/~pettie/papers/ApproxMWM-JACM.pdf
However, for my application it would be better if only a certain ratio of nodes is paired. It's more important for my application that the nodes that get paired have a high weight between them. Leaving some nodes unpaired is no big problem.
Currently, I sort the weights between nodes in descending order and always select the edge with the highest weight until I have paired a certain number of nodes. Of course I ensure that pairs of nodes are mutually exclusive. This is only a 1/2 approximation to the original problem and it's probably even worse for the modified problem.
Could you please suggest an algorithm for this issue or tell me how this problem is called?
Some thoughts.
The greedy algorithm for this problem is a 2-approximation. Imagine that, during the execution of the greedy algorithm, we keep score versus an optimal solution in the following manner. Every time we add an edge to the greedy solution, we delete the incident edges in the optimal solution, which I claim must have weight no greater than the greedy edge. If no edge would be deleted from the optimal solution, we delete the two heaviest edges instead, which also I claim must have weight no greater than the greedy edge. Since the greedy solution must have at least half as many edges as the optimal solution, I claim that there are no optimal edges left at the end, and hence greedy is a 2-approximation because we never deleted more than 2x weight with each greedy edge.
A complete 20,000-vertex graph is right on that line where I don't know whether integer programming would be a good idea. I think it's still worth a try because it's easy enough.
There are polynomial-time algorithms for computing the maximum-weight independent set in the intersection of two matroids (for this problem, the matching matroid and the uniform matroid whose bases have size equal to the size of the desired matching). I don't know if they would be practical.

Shortest path to connect n points

I have n points and I need to connect all of them minimizing the final distance. The image above represents an algorithm that in each node it connects to the nearest one but the final output might be really of.
I've been searching a lot, I know some pathfinding algos but unaware of one that solves exactly this case. I found a question on Math Stackexchange but the answer is not providing any algorithm - https://math.stackexchange.com/a/581844/156584.
Is there any algorithm that solves exactly this problem? Otherwise I can bruteforce it.
Edit: Some clarification regarding the result I'm expecting: each node can be connected to 2 other nodes, creating a continuous path (like taking a pen and without ever lifting it, connect the nodes minimizing the final distance). I don't want to create a cycle (that being the travelling salesman problem).
PS: this question can also be translated to "complete graph with n vertices, and wanting to choose the set of edges such that the graph is connected, but the sum of the edge weights is minimized"
This problem is known as the shortest Hamiltonian path problem and it is NP-hard. So if the number of points is small, you can use backtracking or dynamic programming to find an optimal solution. If the number of points is large, you can use heuristics and/or approximations to obtain a relatively good answer(it is not always possible to find the best one in this case, though).

Topological sort of cyclic graph with minimum number of violated edges

I am looking for a way to perform a topological sorting on a given directed unweighted graph, that contains cycles. The result should not only contain the ordering of vertices, but also the set of edges, that are violated by the given ordering. This set of edges shall be minimal.
As my input graph is potentially large, I cannot use an exponential time algorithm. If it's impossible to compute an optimal solution in polynomial time, what heuristic would be reasonable for the given problem?
Eades, Lin, and Smyth proposed A fast and effective heuristic for the feedback arc set problem. The original article is behind a paywall, but a free copy is available from here.
There’s an algorithm for topological sorting that builds the vertex order by selecting a vertex with no incoming arcs, recursing on the graph minus the vertex, and prepending that vertex to the order. (I’m describing the algorithm recursively, but you don’t have to implement it that way.) The Eades–Lin–Smyth algorithm looks also for vertices with no outgoing arcs and appends them. Of course, it can happen that all vertices have incoming and outgoing arcs. In this case, select the vertex with the highest differential between incoming and outgoing. There is undoubtedly room for experimentation here.
The algorithms with provable worst-case behavior are based on linear programming and graph cuts. These are neat, but the guarantees are less than ideal (log^2 n or log n log log n times as many arcs as needed), and I suspect that efficient implementations would be quite a project.
Inspired by Arnauds answer and other interesting topological sorting algorithms have I created the cyclic-toposort project and published it on github. cyclic_toposort does exactly what you desire in that it quickly sorts a directed cyclic graph providing the minimal amount of violating edges. It optionally also provides the maximum groupings of nodes that are on the same topological level (and can therefore be activated at the same time) if desired.
If the problem is still relevant to you then I would be happy if you check out my project and let me know what you think!
This project was useful to my own neural network topology research, so I had to create something like this anyway. I am answering your question this late in case anyone else stumbles upon this thread in search for the same question.

Finding X the lowest cost trees in graph

I have Graph with N nodes and edges with cost. (graph may be Complete but also can contain zero edges).
I want to find K trees in the graph (K < N) to ensure every node is visited and cost is the lowest possible.
Any recommendations what the best approach could be?
I tried to modify the problem to finding just single minimal spanning tree, but didn't succeeded.
Thank you for any hint!
EDIT
little detail, which can be significant. To cost is not related to crossing the edge. The cost is the price to BUILD such edge. Once edge is built, you can traverse it forward and backwards with no cost. The problem is not to "ride along all nodes", the problem is about "creating a net among all nodes". I am sorry for previous explanation
The story
Here is the story i have heard and trying to solve.
There is a city, without connection to electricity. Electrical company is able to connect just K houses with electricity. The other houses can be connected by dropping cables from already connected houses. But dropping this cable cost something. The goal is to choose which K houses will be connected directly to power plant and which houses will be connected with separate cables to ensure minimal cable cost and all houses coverage :)
As others have mentioned, this is NP hard. However, if you're willing to accept a good solution, you could use simulated annealing. For example, the traveling salesman problem is NP hard, yet near-optimal solutions can be found using simulated annealing, e.g. http://www.codeproject.com/Articles/26758/Simulated-Annealing-Solving-the-Travelling-Salesma
You are describing something like a cardinality constrained path cover. It's in the Traveling Salesman/ Vehicle routing family of problems and is NP-Hard. To create an algorithm you should ask
Are you only going to run it on small graphs.
Are you only going to run it on special cases of graphs which do have exact algorithms.
Can you live with a heuristic that solves the problem approximately.
Assume you can find a minimum spanning tree in O(V^2) using prim's algorithm.
For each vertex, find the minimum spanning tree with that vertex as the root.
This will be O(V^3) as you run the algorithm V times.
Sort these by total mass (sum of weights of their vertices) of the graph. This is O(V^2 lg V) which is consumed by the O(V^3) so essentially free in terms of order complexity.
Take the X least massive graphs - the roots of these are your "anchors" that are connected directly to the grid, as they are mostly likely to have the shortest paths. To determine which route it takes, you simply follow the path to root in each node in each tree and wire up whatever is the shortest. (This may be further optimized by sorting all paths to root and using only the shortest ones first. This will allow for optimizations on the next iterations. Finding path to root is O(V). Finding it for all V X times is O(V^2 * X). Because you would be doing this for every V, you're looking at O(V^3 * X). This is more than your biggest complexity, but I think the average case on these will be small, even if their worst case is large).
I cannot prove that this is the optimal solution. In fact, I am certain it is not. But when you consider an electrical grid of 100,000 homes, you can not consider (with any practical application) an NP hard solution. This gives you a very good solution in O(V^3 * X), which I imagine is going to give you a solution very close to optimal.
Looking at your story, I think that what you call a path can be a tree, which means that we don't have to worry about Hamiltonian circuits.
Looking at the proof of correctness of Prim's algorithm at http://en.wikipedia.org/wiki/Prim%27s_algorithm, consider taking a minimum spanning tree and removing the most expensive X-1 links. I think the proof there shows that the result has the same cost as the best possible answer to your problem: the only difference is that when you compare edges, you may find that the new edge join two separated components, but in this case you can maintain the number of separated components by removing an edge with cost at most that of the new edge.
So I think an answer for your problem is to take a minimum spanning tree and remove the X-1 most expensive links. This is certainly the case for X=1!
Here is attempt at solving this...
For X=1 I can calculate minimal spanning tree (MST) with Prim's algorithm from each node (this node is the only one connected to the grid) and select the one with the lowest overall cost
For X=2 I create extra node (Power plant node) beside my graph. I connect it with random node (eg. N0) by edge with cost of 0. I am now sure I have one power plant plug right (the random node will definitely be in one of the tree, so whole tree will be connected). Now the iterative part. I take other node (eg. N1) and again connected with PP with 0 cost edge. Now I calculate MST. Then repeat this process with replacing N1 with N2, N3 ...
So I will test every pair [N0, NX]. The lowest cost MST wins.
For X>2 is it really the same as for X=2, but I have to test connect to PP every (x-1)-tuple and calculate MST
with x^2 for MST I have complexity about (N over X-1) * x^2... Pretty complex, but I think it will give me THE OPTIMAL solution
what do you think?
edit by random node I mean random but FIXED node
attempt to visualize for x=2 (each description belongs to image above it)
Let this be our city, nodes A - F are houses, edges are candidates to future cables (each has some cost to build)
Just for image, this could be the solution
Let the green one be the power plant, this is how can look connection to one tree
But this different connection is really the same (connection to power plant(pp) cost the same, cables remains untouched). That is why we can set one of the nodes as fixed point of contact to the pp. We can be sure, that the node will be in one of the trees, and it does not matter where in the tree is.
So let this be our fixed situation with G as PP. Edge (B,G) with zero cost is added.
Now I am trying to connect second connection with PP (A,G, cost 0)
Now I calculate MST from the PP. Because red edges are the cheapest (the can actually have even negative cost), is it sure, that both of them will be in MST.
So when running MST I get something like this. Imagine detaching PP and two MINIMAL COST trees left. This is the best solution for A and B are the connections to PP. I store the cost and move on.
Now I do the same for B and C connections
I could get something like this, so compare cost to previous one and choose the better one.
This way I have to try all the connection pairs (B,A) (B,C) (B,D) (B,E) (B,F) and the cheapest one is the winner.
For X=3 I would just test other tuples with one fixed again. (A,B,C) (A,B,D) ... (A,C,D) ... (A,E,F)
I just came up with the easy solution as follows:
N - node count
C - direct connections to the grid
E - available edges
1, Sort all edges by cost
2, Repeat (N-C) times:
Take the cheapest edge available
Check if adding this edge will not caused circles in already added edge
If not, add this edge
3, That is all... You will end up with C disjoint sets of edges, connect every set to the grid
Sounds like the famous Traveling Salesman problem. The problem known to be NP-hard. Take a look at the Wikipedia as your starting point: http://en.wikipedia.org/wiki/Travelling_salesman_problem

Resources