I have a set of X "data points" with x and y coordinates and I want to assign them to a MxN grid, such that a "grid point" is occupied once. To simplify the question I here state that the number of "data points" and the number of "grid points" is identical.
For me the acceptable criterion is when the sum of difference between all "data point" and the selected "grid point" to the square is minimized.
For sure I can do this using a brute force method but there is a factorial of X number of possibilities, i.e. app. about 5*10^8 possibilities when you have 12 "data points".
Is there an elegant algorithm to do it with minimum computational effort less than O(n!)?
Just to visualize the problem I show an example with 6 "grid points" (A to F) in blue and "data points" (1 to 6) in red.
It is just interesting to see, that "3" is nearest point to "B", but then "1" is very far from the next point. Intuitively the human eye assigns trivially 2->A 4->C 6->E 5->F but it assigns non-trivial 1->B and 3->D. And this is what I want to reach programmatically.
There is already a question with the same name, but there was no discussion about algorithm,
here.
Your question is a variant of the balanced assignment problem, or equivalently, the problem of finding a minimal matching in a balanced, weighted bipartite graph. Specifically, your problem is actually a bit more restricted, since the edge weights correspond to spacial distance, and certain properties (e.g. the triangle inequality theorem) hold.
Have a look at the Hungarian Algorithm for a polynomial-time solution.
Related
In hill climbing for 1 dimension, I try two neighbors - a small delta to the left and one to the right of my current point, and then keep the one that gives a higher value of the objective function. How do I extend it to an n-dimensional space? How does one define a neighbor for an n-dimensional space? Do I have to try 2^n neighbors (a delta applied to each of the dimension)?
You don't need to compare each pair of neighbors, you need to compute a set of neighbors, e.g. on a circle (sphere/ hypersphere in a higher dimensions) with a radius of delta, and then take the one with the highest values to "climb up". In any case you will discretize the neighborhood of your current solution and compute the score function for each neighbor. When you can differentiate your function, than, Gradient ascent/descent based algorithms may solve your problem:
1) Compute the gradient (direction of steepest ascent)
2) Go a small step into the direction of the gradient
3) Stop if solution does not change
A common problem with those algorithms is, that you often only find local maxima / minima. You can find a great overview on gradient descent/ascent algorithms here: http://sebastianruder.com/optimizing-gradient-descent/
If you are using IEEE-754 floating point numbers then the obvious answer is something like (2^52*(log_2(delta)+1023))^(n-1)+1 if delta>=2^(-1022) (more or less depending on your search space...) as that is the only way you can be certain that there are no more neighboring solutions with a distance of delta.
Even assuming you instead take a random fixed size sample of all points within a given distance of delta, lets say delta=.1, you would still have the problem that if the distance from the local optimum was .0001 the probability of finding an improvement in just 1 dimension would be less than .0001/.1/2=0.05% so you would need to take more and more random samples as you get closer to the local optimum (of which you don't know the value...).
Obviously hill climbing is not intended for the real number space or theoretical graph spaces with infinite degree. You should instead be using a global search algorithm.
One example of a multidimensional search algorithm which needs only O(n) neighbours instead of O(2^n) neighbours is the Torczon simplex method described in Multidirectional search: A direct search algorithm for parallel machines (1989). I chose this over the more widely known Nelder-Mead method because the Torczon simplex method has a convergence proof (convergence to a local optimum given some reasonable conditions).
Let's say we want to Voronoi-partition a rectangular surface with N points.
The Voronoi tessellation results in N regions corresponding to the N points.
For each region, we calculate its area and divide it by the total area of the whole surface - call these numbers a1, ..., aN. Their sum equals unity.
Suppose now we have a preset list of N numbers, b1, ..., bN, their sum equaling unity.
How can one find a choice (any) of the coordinates of the N points for Voronoi partitioning, such that a1==b1, a2==b2, ..., aN==bN?
Edit:
After a bit of thinking about this, maybe Voronoi partitioning isn't the best solution, the whole point being to come up with a random irregular division of the surface, such that the N regions have appropriate sizes. Voronoi seemed to me like the logical choice, but I may be mistaken.
I'd go for some genetic algorithm.
Here is the basic process:
1) Create 100 sets of random points that belong in your rectangle.
2) For each set, compute the voronoï diagram and the areas
3) For each set, evaluate how well it compares with your preset weights (call it its score)
4) Sort sets of points by score
5) Dump the 50 worst sets
6) Create 50 new sets out of the 50 remaining sets by mixins points and adding some random ones.
7) Jump to step 2 until you meet a condition (score above a threshold, number of occurrence, time spent, etc...)
You will end up (hopefully) with a "somewhat appropriate" result.
If what you are looking for does not necessarily have to be a Voronoi tesselation, and could be a Power diagram, there is a nice algorithm described in the following article:
F. Aurenhammer, F. Hoffmann, and B. Aronov, "Minkowski-type theorems and least-squares clustering," Algorithmica, 20:61-76 (1998).
Their version of the problem is as follows: given N points (p_i) in a polygon P, and a set of non-negative real numbers (a_i) summing to the area of P, find weights (w_i), such that the area of the intersection of the Power cell Pow_w(p_i) with P is exactly a_i. In Section 5 of the paper, they prove that this problem can be written as a convex optimization problem. To implement this approach, you need:
software to compute Power diagrams efficiently, such as CGAL and
software for convex optimization. I found that using quasi-Newton solvers such as L-BFGS gives very good result in practice.
I have some code on my webpage that does exactly this, under the name "quadratic optimal transport". However this code is not very clean nor very well-documented, so it might be as fast to implement your own version of the algorithm. You can also look at my SGP2011 paper on this topic, which is available on the same page, for a short description of the implementation of Aurenhammer, Hoffman and Aronov's algorithm.
Assume coordinates where the rectangle is axis-aligned with left edge at x = 0 and right edge at x = 1 and horizontal bisector at y = 0. Let B(0) = 0 and B(i) = b1 + ... + bi. Put points at ((B(i-1) + B(i))/2, 0). That isn't right. We the x coordinates to be xi such that bi = (x(i+1) - x(i-1)) / 2, replacing x(0) by 0 and x(n+1) by 1. This is tridiagonal and should have an easy solution, but perhaps you don't want such a boring Voronoi diagram though; it will be a bunch of vertical divisions.
For a more random-looking diagram, maybe something physics inspired: drop points randomly, compute the Voronoi diagram, compute the area of each cell, make overweight cells attractive to the points of their neighbors and underweight cells repulsive and compute a small delta for each point, repeat until equilibrium is reached.
The voronoi tesselation can be compute when you compute the minimum spanning tree and remove the longest edges. Each center of the subtree of the mst is then a point of the voronoi diagram. Thus the voronoi diagram is a subset of the minimum spanning tree.
I have an algorithm problem here. It is different from the normal Fermat Point problem.
Given a set of n points in the plane, I need to find which one can minimize the sum of distances to the rest of n-1 points.
Is there any algorithm you know of run less than O(n^2)?
Thank you.
One solution is to assume median is close to the mean and for a subset of points close to the mean exhaustively calculate sum of distances. You can choose klog(n) points closest to the mean, where k is an arbitrarily chosen constant (complexity nlog(n)).
Another possible solution is Delaunay Triangulation. This triangulation is possible in O(nlogn) time. The triangulation results in a graph with one vertex for each point and edges to satisfy delauney triangulation.
Once you have the triangulation, you can start at any point and compare sum-of-distances of that point to its neighbors and keep moving iteratively. You can stop when the current point has the minimum sum-of-distance compared to its neighbors. Intuitively, this will halt at the global optimal point.
I think the underlying assumption here is that you have a dataset of points which you can easily bound, as many algorithms which would be "good enough" in practice may not be rigorous enough for theory and/or may not scale well for arbitrarily large solutions.
A very simple solution which is probably "good enough" is to sort the coordinates on the Y ordinate, then do a stable sort on the X ordinate.
Take the rectangle defined by the min(X,Y) and max(X,Y) values, complexity O(1) as the values will be at known locations in the sorted dataset.
Now, working from the center of your sorted dataset, find coordinate values as close as possible to {Xctr = Xmin + (Xmax - Xmin) / 2, Yctr = Ymin + (Ymax - Ymin) / 2} -- complexity O(N) bounded by your minimization criteria, distance being the familiar radius from {Xctr,Yctr}.
The worst case complexity would be comparing your centroid to every other point, but once you get away from the middle points you will not be improving the global optimal and should terminate the search.
I'm creating a simple game and come up with this problem while designing AI for my game:
Given a set of N points inside a rectangle in the Cartesian coordinate, i need to find the widest straight path through this rectangle. The path must be empty (i.e not containing any point).
I wonder if are there any efficient algorithm to solve this problem? Can you suggest any keyword/ paper/ anything related to this problem?
EDIT: The rectangle is always defined by 4 points in its corner. I added an image for illustration. the path in the above pictures are the determined by two red lines
This is the widest empty corridor problem. Houle and Maciel gave an O(n2)-time, O(n)-space algorithm in a 1988 tech report entitled "Finding the widest empty corridor through a set of points", which seems not to be available online. Fortunately, Janardan and Preparata describe this algorithm in Section 4 of their paper Widest-corridor problems, which is available.
Loop through all pairs of points. Construct a line l through the pair. (^1) On each side of l, either there are other points, or not. If not, then there is not a path on that side of l. If there are other points, loop through points calculating the perpendicular distance d from l to each such point. Record the minimum d. That is the widest path on that side of l. Continue looping through all pairs, comparing widest path for that pair with the previous widest path.
This algorithm can be considered naive and runs in O(n^3) time.
Edit: The above algorithm misses a case. At ^1 above, insert: "Construct two lines perpendicular to l through each point of the pair. If there is no third point between the lines, then record distance d between the points. This constitutes a path." Continue the algorithm at ^1. With additional case, algorithm is still O(n^3)
Myself, I would start by looking at the Delaunay triangulation of the point set:
http://en.wikipedia.org/wiki/Delaunay_triangulation
There appear to be plenty of resources there on efficient algorithms to build this - Fortune's algorithm, at O(n log n), for starters.
My intuition tells me that your widest path will be defined by one of the edges in this graph (Namely, it would run perpendicular to the edge, and its width would be equal to the length of the edge). How to sort the edges, check the candidates and identify the widest path remains. I like this question, and I'm going to keep thinking about it. :)
EDIT 1: My intuition fails me! A simple equilateral triangle is a counter-example: the widest path is shorter than any of the edges in the triangulation. Still thinking...
EDIT 2: So, we need a black-box algorithm which, given two points in the set, finds the widest path through the point set which is bounded by those two points. (Visualize two parallel lines running through the two points; rotate them in harmony with each other until there are no points between them). Let's call the runtime of this algorithm 'R'.
Given such an algorithm, we can do the following:
Build the Delaunay triangulation of the point set : O(n log n)
Sort the edges by width : O(n log n)
Beginning with the largest edge and moving down, use the black box algorithm to determine the widest path involving those two points; storing it as X : O(nR))
Stop when the edge being examined is shorter than the width of X.
Steps 1 and 2 are nice, but the O(nR) is kind of scary. If R turns out to be O(n), that's already O(n^2) for the whole algorithm. The nice thing is that, for a general set of random points, we would expect that we wouldn't have to go through all the edges.
Given a set of n points, can we find three points that describe a triangle with minimum area in O(n^2)? If yes, how, and if not, can we do better than O(n^3)?
I have found some papers that state that this problem is at least as hard as the problem that asks to find three collinear points (a triangle with area 0). These papers describe an O(n^2) solution to this problem by reducing it to an instance of the 3-sum problem. I couldn't find any solution for what I'm interested in however. See this (look for General Position) for such a paper and more information on 3-sum.
There are O(n2) algorithms for finding the minimum area triangle.
For instance you can find one here: http://www.cs.tufts.edu/comp/163/fall09/CG-lecture9-LA.pdf
If I understood that pdf correctly, the basic idea is as follows:
For each pair of points AB you find the point that is closest to it.
You construct a dual of the points so that lines <-> points.
Line y = mx + c is mapped to point (m,c)
In the dual, for a given point (which corresponds to a segment in original set of points) the nearest line vertically gives us the required point for 1.
Apparently 2 & 3 can be done in O(n2) time.
Also I doubt the papers showed 3SUM-hardness by reducing to 3SUM. It should be the other way round.
There's an algorithm that finds the required area with complexity O(n^2*log(n)).
For each point Pi in set do the following(without loss of generality we can assume that Pi is in the origin or translate the points to make it so).
Then for each points (x1,y1), (x2,y2) the triangle area will be 0.5*|x1*y2-x2*y1| so we need to minimize that value. Instead of iterating through all pairs of remaining points (which gives us O(N^3) complexity) we sort those points using predicate X1 * Y2 < X2 * Y1. It is claimed that to find triangle with minimal area we need to check only the pairs of adjacent points in the sorted array.
So the complexity of this procedure for each point is n*log(n) and the whole algorithm works in O(n^2*log(n))
P.S. Can't quickly find the proof that this algorithm is correct :(, hope will find it it later and post it then.
The problem
Given a set of n points, can we find three points that describe a triangle with minimum area in O(n^2)? If yes, how, and if not, can we do better than O(n^3)
is better resolved in this paper: James King, A Survey of 3sum-Hard Problems, 2004