Minimize sum of distances in point pairs - algorithm

I have a bunch of points on a 2-dimensional Grid. I want to group the Points into pairs, while minimizing the sum of the euclidean distances between the points of the pairs.
Example:
Given the points:
p1: (1,1)
p2: (5,5)
p3: (1,3)
p4: (6,6)
Best solution:
pair1 = (p1,p3), distance = 2
pair2 = (p2,p4), distance = 1
Minimized total distance: 1+2 = 3
I suspect this problem might be solvable with a variant of the Hungarian Algorithm?!
What is the fastest way to solve the problem?
(Little Remark: I always should have less than 12 points.)

The problem you are trying to solve is similar to the shortest path through a fully connected (mesh) network, where you are not allowed to visit each vertex/node more than once, and you don't care about connecting the minimal pairs.
This problem is approachable when using techniques from graph theory, metric spaces, and other results from computational geometry.
This problem is similar the wiki article on the Closest pair of points problem, and the article offers some useful insights regarding Voroni diagrams and Delaunay triangulation, as well as using Recursive Divide and Conquer algorithms to solve the problem.
Note that solving the closest pair of points is not the solution, as you could have four points (A,B,C,D) in a line, where d(B,C) is least, but then you would also have d(A,D), and the sum would be larger than d(A,B) and d(C,D).
This stackoverflow question explains how to find the shortest distance between two points, and has a useful hint to skip computing the square root while comparing distances. Answers suggest using a divide and conquer approach (linear), but observe that splitting both X and Y coordinates might partition more appropriately.
This math stackexchange question addresses a similar problem, and suggests using Prim's algorithm, Kruskal's algorithm, or notes that this is a special case of the Travelling Salesman problem, which is NP-hard.
My approach would be to solve your problem (pairing the closest points) using a greedy algorithm to compute a minimal spanning tree, and then remove from the spanning tree 1/2 the edges (leaving disconnected pairs). Likely using a second (variant) of a greedy algorithm.

There are so few pairings possible for 12 or less points (about 10000 or less as pointed out in a comment), you can check all pairings by brute force and even with this solution you can solve about 10000 problems per second with 12 or less points on a modern personal computer. If you want a faster solution, you can enumerate nearest neighbors in order for each point and then just check pairings that are minimal with respect to which nearest neighbors are used for each point. In the worst-case I don't think this gives a speed-up, but for example if your 12 points come in 6 pairs of very close points (where unpaired points are far away) then you'd find the solution very quickly because the minimal pairing with respect to nearest neighbors would match together each point with its first nearest neighbor.

Related

Minimal spanning tree with K extra node

Assume we're given a graph on a 2D-plane with n nodes and edge between each pair of nodes, having a weight equal to a euclidean distance. The initial problem is to find MST of this graph and it's quite clear how to solve that using Prim's or Kruskal's algorithm.
Now let's say we have k extra nodes, which we can place in any integer point on our 2D-plane. The problem is to find locations for these nodes so as new graph has the smallest possible MST, if it is not necessary to use all of these extra nodes.
It is obviously impossible to find the exact solution (in poly-time), but the goal is to find the best approximate one (which can be found within 1 sec). Maybe you can come up with some hints of the most efficient way of going throw possible solutions, or provide with some articles, where the similar problem is covered.
It is very interesting problem which you are working on. You have many options to attack this problem. The best known heuristics in such situation are - Genetic Algorithms, Particle Swarm Optimization, Differential Evolution and many others of this kind.
What is nice for such kind of heuristics is that you can limit their execution to a certain amount of time (let say 1 second). If it was my task to do I would try first Genetic Algorithms.
You could try with a greedy algorithm, try the longest edges in the MST, potentially these could give the largest savings.
Select the longest edge, now get the potential edge from each vertex that are closed in angle to the chosen one, from each side.
from these select the best Steiner point.
Fix the MST ...
repeat until 1 sec is gone.
The challenge is what to do if one of the vertexes is itself a Steiner point.

Minimum manhattan distance with certain blocked points

The minimum Manhattan distance between any two points in the cartesian plane is the sum of the absolute differences of the respective X and Y axis. Like, if we have two points (X,Y) and (U,V) then the distance would be: ABS(X-U) + ABS(Y-V). Now, how should I determine the minimum distance between several pairs of points moving only parallel to the coordinate axis such that certain given points need not be visited in the selected path. I need a very efficient algorithm, because the number of avoided points can range up to 10000 with same range for the number of queries. The coordinates of the points would be less than ABS(50000). I would be given the set of points to be avoided in the beginning, so I might use some offline algorithm and/or precomputation.
As an example, the Manhattan distance between (0,0) and (1,1) is 2 from either path (0,0)->(1,0)->(1,1) or (0,0)->(0,1)->(1,1). But, if we are given the condition that (1,0) and (0,1) cannot be visited, the minimum distance increases to 6. One such path would then be: (0,0)->(0,-1)->(1,-1)->(2,-1)->(2,0)->(2,1)->(1,1).
This problem can be solved by breadth-first search or depth-first search, with breadth-first search being the standard approach. You can also use the A* algorithm which may give better results in practice, but in theory (worst case) is no better than BFS.
This is provable because your problem reduces to solving a maze. Obviously you can have so many obstacles that the grid essentially becomes a maze. It is well known that BFS or DFS are the only way to solve mazes. See Maze Solving Algorithms (wikipedia) for more information.
My final recommendation: use the A* algorithm and hope for the best.
You are not understanding the solutions here or we are not understanding the problem:
1) You have a cartesian plane. Therefore, every node has exactly 4 adjacent nodes, given by x+/-1, y+/-1 (ignoring the edges)
2) Do a BFS (or DFS,A*). All you can traverse is x/y +/- 1. Prestore your 10000 obstacles and just check if the node x/y +/-1 is visitable on demand. you don't need a real graph object
If it's too slow, you said you can do an offline calculation - 10^10 only requires 1.25GB to store an indexed obstacle lookup table. leave the algorithm running?
Where am I going wrong?

Best subsample in the Maxmin distance sense

I have a set of N points in a D-dimensional metric space. I want to select K of them in such a way that the smallest distance between any two points in the subset is the largest.
For instance, with N=4 and K=3 in 3D Euclidean space, the solution is the face of the tetrahedron having the longest short side.
Is there a classical way to achieve that ? Can it be solved exactly in polynomial time ?
I have googled as much as I could, but I have not figured out yet how to call this problem.
In my case N=50, K=10 and D=300 typically.
Clarification:
A brute force approach would be to try every combination of K points among the N and determine the closest pair in every subset. The solution is given by the subset that yields the longest pair.
Done the trivial way, an O(K^2) process, to be repeated N! / K!(N-K)! times.
Hum, 10^2 50! / 10! 40! = 1027227817000
I think you might find papers on unit disk graphs informative but discouraging. For instance, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.3113&rep=rep1&type=pdf states that the maximum independent set problem on unit disk graphs in NP-complete, even if the disk representation is known. A unit disk graph is the graph you get by placing points in the plane and forming links between every pair of points at most a unit distance apart.
So I think that if you could solve your problem in polynomial time you could run it on a unit disk graph for different values of K until you find a value at which the smallest distance between two chosen points was just over one, and I think this would be a maximum independent set on the unit disk graph, which would be solving an NP-complete problem in polynomial time.
(Just about to jump on a bicycle so this is a bit rushed, but searching for papers on unit disk graphs might at least turn up some useful search terms)
Here's an attempt to explain it piece by piece:
Here is another attempt to relate the two problems.
For maximum independent set see http://en.wikipedia.org/wiki/Maximum_independent_set#Finding_maximum_independent_sets. A decision problem version of this is "Are there K vertices in this graph such that no two are joined by an edge?" If you can solve this you can certainly find a maximum independent set by finding the largest K by asking this question for different K and then finding the K nodes by asking the question on versions of the graph with one or more nodes deleted.
I state without proof that finding the maximum independent set in a unit disk graph is NP-complete. Another reference for this is http://web.sau.edu/lilliskevinm/wirelessbib/ClarkColbournJohnson.pdf.
A decision version of your problem is "Do there exist K points with distance at least D between any two points?" Again, you can solve this in polynomial time iff you can solve your original problem in polynomial time - play around until you find the largest D that gives answer yes, and then delete points and see what happens.
A unit disk graph has an edge exactly when the distance between two points is 1 or less. So if you could solve the decision version of your original problem you could solve the decision version of the unit disk graph problem just by setting D = 1 and solving your problem.
So I think I have constructed a series of links showing that if you could solve your problem you could solve an NP-complete problem by turning it into your problem, which causes me to think that your problem is hard.

Sorting points such that the minimal Euclidean distance between consecutive points would be maximized

Given a set of points in a 3D Cartesian space, I am looking for an algorithm that will sort these points, such that the minimal Euclidean distance between two consecutive points would be maximized.
It would also be beneficial if the algorithm tends to maximize the average Euclidean distance between consecutive points.
Edit:
I've crossposted on https://cstheory.stackexchange.com/ and got a good answer. See https://cstheory.stackexchange.com/questions/8609/sorting-points-such-that-the-minimal-euclidean-distance-between-consecutive-poin.
Here is a lower bound for the cost of the solution, which might serve as a building block for branch and bound or a more unreliable incomplete search algorithm:
Sort the distances between the points and consider them in non-increasing order. Use http://en.wikipedia.org/wiki/Disjoint-set_data_structure to keep track of sets of points, merging two sets when connected by a link between two points. The length of the shortest distance you encounter up to the point when you merge all the points into one set is an upper bound to the minimum distance in a perfect solution, because a perfect solution also merges all the points into one. However your upper bound may be longer than the minimum distance for a perfect solution, because the links you are joining up will probably form a tree, not a path.
You can model your problem by graph, draw line between your points, now you have a complete graph, now your problem is finding longest path in this graph which is NP-Hard, see wiki for longest path.
In fact I answered a second part of problem, maximize average, which means maximize path which goes from every node of graph, if you weight them as 1/distance it will be a travelling salesman problem (minimize the path length) and is NP-Hard. and for this case may be is useful to see Metric TSP approximation.

Algorithm to find two points furthest away from each other

Im looking for an algorithm to be used in a racing game Im making. The map/level/track is randomly generated so I need to find two locations, start and goal, that makes use of the most of the map.
The algorithm is to work inside a two dimensional space
From each point, one can only traverse to the next point in four directions; up, down, left, right
Points can only be either blocked or nonblocked, only nonblocked points can be traversed
Regarding the calculation of distance, it should not be the "bird path" for a lack of a better word. The path between A and B should be longer if there is a wall (or other blocking area) between them.
Im unsure on where to start, comments are very welcome and proposed solutions are preferred in pseudo code.
Edit: Right. After looking through gs's code I gave it another shot. Instead of python, I this time wrote it in C++. But still, even after reading up on Dijkstras algorithm, the floodfill and Hosam Alys solution, I fail to spot any crucial difference. My code still works, but not as fast as you seem to be getting yours to run. Full source is on pastie. The only interesting lines (I guess) is the Dijkstra variant itself on lines 78-118.
But speed is not the main issue here. I would really appreciate the help if someone would be kind enough to point out the differences in the algorithms.
In Hosam Alys algorithm, is the only difference that he scans from the borders instead of every node?
In Dijkstras you keep track and overwrite the distance walked, but not in floodfill, but thats about it?
Assuming the map is rectangular, you can loop over all border points, and start a flood fill to find the most distant point from the starting point:
bestSolution = { start: (0,0), end: (0,0), distance: 0 };
for each point p on the border
flood-fill all points in the map to find the most distant point
if newDistance > bestSolution.distance
bestSolution = { p, distantP, newDistance }
end if
end loop
I guess this would be in O(n^2). If I am not mistaken, it's (L+W) * 2 * (L*W) * 4, where L is the length and W is the width of the map, (L+W) * 2 represents the number of border points over the perimeter, (L*W) is the number of points, and 4 is the assumption that flood-fill would access a point a maximum of 4 times (from all directions). Since n is equivalent to the number of points, this is equivalent to (L + W) * 8 * n, which should be better than O(n2). (If the map is square, the order would be O(16n1.5).)
Update: as per the comments, since the map is more of a maze (than one with simple obstacles as I was thinking initially), you could make the same logic above, but checking all points in the map (as opposed to points on the border only). This should be in order of O(4n2), which is still better than both F-W and Dijkstra's.
Note: Flood filling is more suitable for this problem, since all vertices are directly connected through only 4 borders. A breadth first traversal of the map can yield results relatively quickly (in just O(n)). I am assuming that each point may be checked in the flood fill from each of its 4 neighbors, thus the coefficient in the formulas above.
Update 2: I am thankful for all the positive feedback I have received regarding this algorithm. Special thanks to #Georg for his review.
P.S. Any comments or corrections are welcome.
Follow up to the question about Floyd-Warshall or the simple algorithm of Hosam Aly:
I created a test program which can use both methods. Those are the files:
maze creator
find longest distance
In all test cases Floyd-Warshall was by a great magnitude slower, probably this is because of the very limited amount of edges that help this algorithm to achieve this.
These were the times, each time the field was quadruplet and 3 out of 10 fields were an obstacle.
Size Hosam Aly Floyd-Warshall
(10x10) 0m0.002s 0m0.007s
(20x20) 0m0.009s 0m0.307s
(40x40) 0m0.166s 0m22.052s
(80x80) 0m2.753s -
(160x160) 0m48.028s -
The time of Hosam Aly seems to be quadratic, therefore I'd recommend using that algorithm.
Also the memory consumption by Floyd-Warshall is n2, clearly more than needed.
If you have any idea why Floyd-Warshall is so slow, please leave a comment or edit this post.
PS: I haven't written C or C++ in a long time, I hope I haven't made too many mistakes.
It sounds like what you want is the end points separated by the graph diameter. A fairly good and easy to compute approximation is to pick a random point, find the farthest point from that, and then find the farthest point from there. These last two points should be close to maximally separated.
For a rectangular maze, this means that two flood fills should get you a pretty good pair of starting and ending points.
I deleted my original post recommending the Floyd-Warshall algorithm. :(
gs did a realistic benchmark and guess what, F-W is substantially slower than Hosam Aly's "flood fill" algorithm for typical map sizes! So even though F-W is a cool algorithm and much faster than Dijkstra's for dense graphs, I can't recommend it anymore for the OP's problem, which involves very sparse graphs (each vertex has only 4 edges).
For the record:
An efficient implementation of Dijkstra's algorithm takes O(Elog V) time for a graph with E edges and V vertices.
Hosam Aly's "flood fill" is a breadth first search, which is O(V). This can be thought of as a special case of Dijkstra's algorithm in which no vertex can have its distance estimate revised.
The Floyd-Warshall algorithm takes O(V^3) time, is very easy to code, and is still the fastest for dense graphs (those graphs where vertices are typically connected to many other vertices). But it's not the right choice for the OP's task, which involves very sparse graphs.
Raimund Seidel gives a simple method using matrix multiplication to compute the all-pairs distance matrix on an unweighted, undirected graph (which is exactly what you want) in the first section of his paper On the All-Pairs-Shortest-Path Problem in Unweighted Undirected Graphs
[pdf].
The input is the adjacency matrix and the output is the all-pairs shortest-path distance matrix. The run-time is O(M(n)*log(n)) for n points where M(n) is the run-time of your matrix multiplication algorithm.
The paper also gives the method for computing the actual paths (in the same run-time) if you need this too.
Seidel's algorithm is cool because the run-time is independent of the number of edges, but we actually don't care here because our graph is sparse. However, this may still be a good choice (despite the slightly-worse-than n^2 run-time) if you want the all pairs distance matrix, and this might also be easier to implement and debug than floodfill on a maze.
Here is the pseudocode:
Let A be the nxn (0-1) adjacency matrix of an unweighted, undirected graph, G
All-Pairs-Distances(A)
Z = A * A
Let B be the nxn matrix s.t. b_ij = 1 iff i != j and (a_ij = 1 or z_ij > 0)
if b_ij = 1 for all i != j return 2B - A //base case
T = All-Pairs-Distances(B)
X = T * A
Let D be the nxn matrix s.t. d_ij = 2t_ij if x_ij >= t_ij * degree(j), otherwise d_ij = 2t_ij - 1
return D
To get the pair of points with the greatest distance we just return argmax_ij(d_ij)
Finished a python mockup of the dijkstra solution to the problem.
Code got a bit long so I posted it somewhere else: http://refactormycode.com/codes/717-dijkstra-to-find-two-points-furthest-away-from-each-other
In the size I set, it takes about 1.5 seconds to run the algorithm for one node. Running it for every node takes a few minutes.
Dont seem to work though, it always displays the topleft and bottomright corner as the longest path; 58 tiles. Which of course is true, when you dont have obstacles. But even adding a couple of randomly placed ones, the program still finds that one the longest. Maybe its still true, hard to test without more advanced shapes.
But maybe it can at least show my ambition.
Ok, "Hosam's algorithm" is a breadth first search with a preselection on the nodes.
Dijkstra's algorithm should NOT be applied here, because your edges don't have weights.
The difference is crucial, because if the weights of the edges vary, you need to keep a lot of options (alternate routes) open and check them with every step. This makes the algorithm more complex.
With the breadth first search, you simply explore all edges once in a way that garantuees that you find the shortest path to each node. i.e. by exploring the edges in the order you find them.
So basically the difference is Dijkstra's has to 'backtrack' and look at edges it has explored before to make sure it is following the shortest route, while the breadth first search always knows it is following the shortest route.
Also, in a maze the points on the outer border are not guaranteed to be part of the longest route.
For instance, if you have a maze in the shape of a giant spiral, but with the outer end going back to the middle, you could have two points one at the heart of the spiral and the other in the end of the spiral, both in the middle!
So, a good way to do this is to use a breadth first search from every point, but remove the starting point after a search (you already know all the routes to and from it).
Complexity of breadth first is O(n), where n = |V|+|E|. We do this once for every node in V, so it becomes O(n^2).
Your description sounds to me like a maze routing problem. Check out the Lee Algorithm. Books about place-and-route problems in VLSI design may help you - Sherwani's "Algorithms for VLSI Physical Design Automation" is good, and you may find VLSI Physical Design Automation by Sait and Youssef useful (and cheaper in its Google version...)
If your objects (points) do not move frequently you can perform such a calculation in a much shorter than O(n^3) time.
All you need is to break the space into large grids and pre-calculate the inter-grid distance. Then selecting point pairs that occupy most distant grids is a matter of simple table lookup. In the average case you will need to pair-wise check only a small set of objects.
This solution works if the distance metrics are continuous. Thus if, for example there are many barriers in the map (as in mazes), this method might fail.

Resources