minimum distance between two set of 2D points (x1,y1) , (x2,y2) - algorithm

I have a problem as described in the title and I am using MATLAB 2020.
I have 2 sets of 2D points, and I want to find the two points (each point from a different set)
that has the minimal distance from all the other points min(distance(pi,pj))
I done some research (google) and found this article:
"Optimal Algorithms for Computing the Minimum Distance Between Two Finite Planar Sets"
in this web page:
What is the fastest algorithm to calculate the minimum distance between two sets of points?
I tried to implement the algorithm, using MATLAB, and a code for Garbriel graph (which I found in google)
here:
http://matgeom.sourceforge.net/doc/api/matGeom/graphs/gabrielGraph.html
the problem is that when I run the code,which suppose to be the algorithm vs a "brute force algorithm" (two loops) , the brute force is always faster... no matter how many points I used , and it is way faster... which is in contrast to logic (mine) and the article mention above.
when I check the execution time of the code lines, I found that the line
dist = dist + (repmat(p1(:,i), [1 n2])-repmat(p2(:,i)', [n1 1])).^2;
in :
minDistancePoints(p1, varargin)
is the "problem"
and advises?
thank you
p.s
let
set1=random(100,2)
set2=random(100,2)
i want to find the point1 in set1 and the point2 in set2 that have minimum distance from all the other points.

Using implicit expansion, we can compute all the possible combination at once and then find point in p1 that minimize the sum of the distance:
p1 = [0 -1;
2 3;
8 8]
p2 = [1 1;
2 3;
3 5;
3 3]
[~,closest_p1] = min(sum(sum((permute(p1,[3,2,1])-p2).^2,2).^0.5))
I add a dimension to p1 with: permute(p1,[3,2,1]), so now we can compute all the combination in this new third dimension.
closest_p1 give the index of the point that minimize the sum of the euclidian distance between each points in p2. In this example closest_p1 = 2.
Noticed also that the algorithm that you use seems to also compute all the possible combination.

Related

How to convert the half-spaces that constitute a convex hull to a set of extreme points?

I have a convex set in a Euclidean space (3D, but would like answers for nD) that is characterized by a finite set of half-spaces (normal vector + point).
Is there a better algorithm to find the extreme points of the convex set other than compute brute force all points that are intersections of 3 (or, n) half-spaces and eliminate those that are not extreme points?
The key term is vertex enumeration of a polytope P. The idea of the algorithm described below is to consider the dual polytope P*. Then the vertices of P correspond to the facets of P*. The facets of P* are efficiently computed with Qhull, and then it remains to find the vertices by solving the corresponding sub-systems of linear equations.
The algorithm is implemented in BSD-licensed toolset Analyze N-dimensional Polyhedra in terms of Vertices or (In)Equalities for Matlab, authored by Matt J, specifically, its component lcon2vert. However, for the purpose of reading the algorithm and re-implementing it another language, it is easier to work with the older and simpler con2vert file by Michael Kleder, which Matt J's project builds on.
I'll explain what it does step by step. The individual Matlab commands (e.g., convhulln) are documented on MathWorks site, with references to underlying algorithms.
The input consists of a set of linear inequalities of the form Ax<=b, where A is a matrix and b is a column vector.
Step 1. Attempt to locate an interior point of the polytope
First try is c = A\b, which is the least-squares solution of the overdetermined linear system Ax=b. If A*c<b holds componentwise, this is an interior point. otherwise, multivariable minimization is attempted with the objective function being the maximum of 0 and all numbers A*c-b. If this fails to find a point where A*c-b<0 holds, the program exits with "unable to find an interior point".
Step 2. Translate the polytope so that the origin is its interior point
This is done by b = b - A*c in Matlab. Since 0 is now an interior point, all entries of b are positive.
Step 3. Normalize so that the right hand side is 1
This is just the division of ith row of A by b(i), done by D = A ./ repmat(b,[1 size(A,2)]); in Matlab. From now on, only the matrix D is used. Note that the rows of D are the vertices of the dual polytope P* mentioned at the beginning.
Step 4. Check that the polytope P is bounded
The polytope P is unbounded if the vertices of its dual P* lie on the same side of some hyperplane through the origin. This is detected by using the built-in function convhulln that computes the volume of the convex hull of given points. The author checks whether appending zero row to matrix D increases the volume of the convex hull; if it does, the program exits with "Non-bounding constraints detected".
Step 5. Computation of vertices
This is the loop
for ix = 1:size(k,1)
F = D(k(ix,:),:);
G(ix,:)=F\ones(size(F,1),1);
end
Here, the matrix k encodes the facets of the dual polytope P*, with each row listing the vertices of the facet. The matrix F is the submatrix of D consisting of the vertices of a facet of P*. Backslash invokes the linear solver, and finds a vertex of P.
Step 6: Clean-up
Since the polytope was translated at Step 2, this translation is undone with V = G + repmat(c',[size(G,1),1]);. The remaining two lines attempt to eliminate repeated vertices (not always successfully).
I am the author of polco, a tool which implements the "double description method". The double description method is known to work well for many degenerate problems. It has been used to compute tens of millions of generators mostly for computational systems biology problems.
The tool is written in Java, runs in parallel on multicore CPUs and supports various input and output formats including text and Matlab files. You will find more information and publications about the software and the double description method via given link to a university department of ETH Zurich.

Algorithm to validate free polygon vertices for a non-intersecting shape

I have been tasked with allowing a user to enter any number of arbitrary points on a canvas and link them together in the order that they were specified to draw a polygon.
However, each time the user tries to add a point I must validate whether the polygon can still be drawn without intersecting itself.
I have searched SO and found only this post which doesn't help me.
I need to form constraints everytime a new point is added to the canvas, and check that the next point doesn't validate those constraints.
I've added some poorly drawn illustrations of what I'm trying to achieve below. It might help to define the coordinate system I'm using: point 1 is the origin (0,0), x is positive to the right, and y is positive towards the top.
Adding Point 3
The first two points have only the constraint that 1 != 2, Adding point 3 I have to make sure that it doesn't sit anywhere on the line that passes through both 1 and 2.
Adding Point 4
Now, having added point 3, point 4 is blocked out as illustrated below:
The Yellow areas are constrained by line 1-2 and the Green areas are constrained by the line 2-3.
In pretty unreadable markup (there's no MathJax or anything) I figured the constraints for 4 are:
Y_4 < ( (Y_2 - Y_1) / (X_2 - X_1) )*X_4 + Y_1
Y_3 < ( (Y_2 - Y_1) / (X_2 - X_1) )*X_3 ? Y_4 > ( (Y_3 - Y_2) / (X_3 - X_2) )*X_4 + Y_2 : Y_4 < ( (Y_3 - Y_2) / (X_3 - X_2) )*X_4 + Y_2
Y_4 =/= ( (Y_3 - Y_1) / (X_3 - X_1) )*X_4 + Y_1
Adding Point 5
Now adding on point 5 the constrained areas are:
And it's starting to get complicated to do.
I was wondering If there are any established algorithms for these kind of things, or if there are general equations in terms of vertex n to generate the constraint equations. There could feasibly be tens, if not hundreds of points, so figuring out by brute force and hand coding doesn't seem to be an option.
You can do it like that:
Add a new point.
Add two new edges adjacent to this point.
Check if there is a pair of intersecting edges by iterating over all pairs of edges and checking if they intersect or not.
This algorithm has O(n^2) time complexity per addition of a new point. If it is too slow, you can make it linear using the following observation:
If the polygon was valid before a new point was added, no edges could intersect. That's why there is no need to iterate over all pairs of edges. So you can iterate only over the pairs <new, any>, where new is a newly created edge and any is any edge of the polygon. There are 2 * n = O(n) such pairs(because adding one points yields only 2 new edges).
This algorithm has O(n) time complexity per point so it should be fast enough for tens or hundreds of points.
Checking if two edges intersect is simple: an edge is just a segment and checking if two segments intersect is a well-known problem.
What you want to achieve is called a polygon simplicity test. You will find relevant information in this article: "Orientation, simplicity, and inclusion test for planar polygons", F. FEITO, J. C. TORRES and A. URENA.

Find the point minimizing the distance from a set of N lines

Given Multiple (N) lines in 3d space, find the point minimizing the distance to all lines.
Given that the Shortest distance between a line [aX + b] and a point [P] will be on the perpendicular line [aX+b]–[P] I can express the minimal squared distance as the sum of squared line distances, eg. ([aX+b]–[P])^2 +…+ ([aX+b]n–[P])^2 .
Since the lines are perpendicular I can use Dot Product to express [P] in the line terms
I have considered using Least Squares for estimating the point minimizing the distance, the problem is that the standard least squares will approximate the best fitting line/curve given a set of points, What I need is the opposite, given a set of lines estimate the best fitting point.
How should this be approached ?
From wikipedia, we read that the squared distance between line a'x + b = 0 and point p is (a'p+b)^2 / (a'a). We can therefore see that the point that minimizes the sum of squared distances is a weighted linear regression problem with one observation for each line. The regression model has the following properties:
Sample data a for each line ax+b=0
Sample outcome -b for each line ax+b=0
Sample weight 1/(a'a) for each line ax+b=0
You should be able to solve this problem with any standard statistical software.
An approach:
form the equations giving the distance from the point to each line
these equations give you N distances
optimize the set of distances by the criterion you want (least squares, minimax, etc.)
This reduces into a simple optimization question once you have the N equations. Of course, the difficulty of the last step depends heavily on the criterion you choose (least squares is simple, minimax not that simple.)
One thing that might help you forward is to find the simplest form of equation giving the distance from a point to line. Your thinking is correct in your #1, but you will need to think a bit more (or then check "distance from a point to line" with any search engine).
I have solved the same problem using hill climbing. Consider a single point and 26 neighbours step away from it(points on a cube centered at the current point). If the distance from the point is better than the distance from all neighbours, divide step by 2, otherwise make the neighbor with best distance new current point. Continue until step is small enough.
Following is solution using calculus :-
F(x,y) = sum((y-mix-ci)^2/(1+mi^2))
Using Partial differentiation :-
dF(x,y)/dx = sum(2*(y-mix-ci)*mi/(1+mi^2))
dF(x,y)/dy = sum(2*(y-mix-ci)/(1+mi^2))
To Minimize F(x,y) :-
dF(x,y)/dy = dF(x,y)/dx = 0
Use Gradient Descent using certain learning rate and random restarts to solve find minimum as much as possible
You can apply the following answer (which talks about finding the point that is closest to a set of planes) to this problem, since just as a plane can be defined by a point on the plane and a normal to the plane, a line can be defined by a point the line passes through and a "normal" vector orthogonal to the line:
https://math.stackexchange.com/a/3483313/365886
You can solve the resulting quadratic form by observing that the solution to 1/2 x^T A x - b x + c is x_min = A^{-1} b.

Find Voronoi tessellation with area constraints

Let's say we want to Voronoi-partition a rectangular surface with N points.
The Voronoi tessellation results in N regions corresponding to the N points.
For each region, we calculate its area and divide it by the total area of the whole surface - call these numbers a1, ..., aN. Their sum equals unity.
Suppose now we have a preset list of N numbers, b1, ..., bN, their sum equaling unity.
How can one find a choice (any) of the coordinates of the N points for Voronoi partitioning, such that a1==b1, a2==b2, ..., aN==bN?
Edit:
After a bit of thinking about this, maybe Voronoi partitioning isn't the best solution, the whole point being to come up with a random irregular division of the surface, such that the N regions have appropriate sizes. Voronoi seemed to me like the logical choice, but I may be mistaken.
I'd go for some genetic algorithm.
Here is the basic process:
1) Create 100 sets of random points that belong in your rectangle.
2) For each set, compute the voronoï diagram and the areas
3) For each set, evaluate how well it compares with your preset weights (call it its score)
4) Sort sets of points by score
5) Dump the 50 worst sets
6) Create 50 new sets out of the 50 remaining sets by mixins points and adding some random ones.
7) Jump to step 2 until you meet a condition (score above a threshold, number of occurrence, time spent, etc...)
You will end up (hopefully) with a "somewhat appropriate" result.
If what you are looking for does not necessarily have to be a Voronoi tesselation, and could be a Power diagram, there is a nice algorithm described in the following article:
F. Aurenhammer, F. Hoffmann, and B. Aronov, "Minkowski-type theorems and least-squares clustering," Algorithmica, 20:61-76 (1998).
Their version of the problem is as follows: given N points (p_i) in a polygon P, and a set of non-negative real numbers (a_i) summing to the area of P, find weights (w_i), such that the area of the intersection of the Power cell Pow_w(p_i) with P is exactly a_i. In Section 5 of the paper, they prove that this problem can be written as a convex optimization problem. To implement this approach, you need:
software to compute Power diagrams efficiently, such as CGAL and
software for convex optimization. I found that using quasi-Newton solvers such as L-BFGS gives very good result in practice.
I have some code on my webpage that does exactly this, under the name "quadratic optimal transport". However this code is not very clean nor very well-documented, so it might be as fast to implement your own version of the algorithm. You can also look at my SGP2011 paper on this topic, which is available on the same page, for a short description of the implementation of Aurenhammer, Hoffman and Aronov's algorithm.
Assume coordinates where the rectangle is axis-aligned with left edge at x = 0 and right edge at x = 1 and horizontal bisector at y = 0. Let B(0) = 0 and B(i) = b1 + ... + bi. Put points at ((B(i-1) + B(i))/2, 0). That isn't right. We the x coordinates to be xi such that bi = (x(i+1) - x(i-1)) / 2, replacing x(0) by 0 and x(n+1) by 1. This is tridiagonal and should have an easy solution, but perhaps you don't want such a boring Voronoi diagram though; it will be a bunch of vertical divisions.
For a more random-looking diagram, maybe something physics inspired: drop points randomly, compute the Voronoi diagram, compute the area of each cell, make overweight cells attractive to the points of their neighbors and underweight cells repulsive and compute a small delta for each point, repeat until equilibrium is reached.
The voronoi tesselation can be compute when you compute the minimum spanning tree and remove the longest edges. Each center of the subtree of the mst is then a point of the voronoi diagram. Thus the voronoi diagram is a subset of the minimum spanning tree.

Finding distance to the closest point in a point cloud on an uniform grid

I have a 3D grid of size AxBxC with equal distance, d, between the points in the grid. Given a number of points, what is the best way of finding the distance to the closest point for each grid point (Every grid point should contain the distance to the closest point in the point cloud) given the assumptions below?
Assume that A, B and C are quite big in relation to d, giving a grid of maybe 500x500x500 and that there will be around 1 million points.
Also assume that if the distance to the nearest point exceds a distance of D, we do not care about the nearest point distance, and it can safely be set to some large number (D is maybe 2 to 10 times d)
Since there will be a great number of grid points and points to search from, a simple exhaustive:
for each grid point:
for each point:
if distance between points < minDistance:
minDistance = distance between points
is not a good alternative.
I was thinking of doing something along the lines of:
create a container of size A*B*C where each element holds a container of points
for each point:
define indexX = round((point position x - grid min position x)/d)
// same for y and z
add the point to the correct index of the container
for each grid point:
search the container of that grid point and find the closest point
if no points in container and D > 0.5d:
search the 26 container indices nearest to the grid point for a closest point
.. continue with next layer until a point is found or the distance to that layer
is greater than D
Basically: put the points in buckets and do a radial search outwards until a points is found for each grid point. Is this a good way of solving the problem, or are there better/faster ways? A solution which is good for parallelisation is preferred.
Actually, I think I have a better way to go, as the number of grid points is much larger than the number of sample points. Let |Grid| = N, |Samples| = M, then the nearest neighbor search algorithms will be something like O(N lg M), as you need to look up all N grid points, and each lookup is (best case) O(lg M).
Instead, loop over the sample points. Store for each grid point the closest sample point found so far. For each sample point, just check all grid points within distance D of the sample to see if the current sample is closer than any previously processed samples.
The running time is then O(N + (D/d)^3 M) which should be better when D/d is small.
Even when D/d is larger, you might still be OK if you can work out a cutoff strategy. For example, if we're checking a grid point distance 5 from our sample, and that grid point is already marked as being distance 1 from a previous sample, then all grid points "beyond" that grid point don't need to be checked because the previous sample is guaranteed to be closer than the current sample we're processing. All you have to do is (and I don't think it is easy, but should be doable) define what "beyond" means and figure out how to iterate through the grid to avoid doing any work for areas "beyond" such grid points.
Take a look at octrees. They're a data structure often used to partition 3d spaces efficiently in such a manner as to improve efficiency of lookups for objects that are near each other spatially.
You can build a nearest neighbor search structure (Wikipedia) on your sample points, then ask it for each of your grid points. There are a bunch of algorithms mentioned on the Wikipedia page. Perhaps octtrees, kd-trees, or R-trees would be appropriate.
One approach, which may or may not suit your application, is to recast your thinking and define each grid 'point' to be the centre of a cube which divides your space into cells. You then have a 3D array of such cells and store the points in the cells -- choose the most appropriate data structure. To use your own words, put the points in buckets in the first place.
I guess that you may be running some sort of large scale simulation and the approach I suggest is not unusual in such applications. At each time step (if I've guessed correctly) you have to recalculate the distance from the cell to the nearest point, and move points from cells to cells. This will parallelise very easily.
EDIT: Googling around for particle-particle and particle-particle particle-mesh may throw up some ideas for you.
A note on Keith Randall's method,
expanding shells or cubes around the startpoints:
One can expand in various orders. Here's some python-style pseudocode:
S = set of 1m startpoints
near = grid 500x500x500 -> nearest s in S
initially s for s in S, else 0
for r in 1 .. D:
for s in S:
nnew = 0
for p in shell of radius r around s:
if near[p] == 0:
near[p] = s
nnew += 1
if nnew == 0:
remove s from S # bonk, stop expanding from s
"Stop expanding from s early" is fine in 1d (bonk left, bonk right);
but 2d / 3d shells are irregular.
It's easier / faster to do whole cubes in one pass:
near = grid 500x500x500 -> { dist, nearest s in S }
initially { 0, s } for s in self, else { infinity, 0 }
for s in S:
for p in approximatecube of radius D around s:
if |p - s| < near[p].dist: # is s nearer ?
near[p] = { |p - s|, s }
Here "approximatecube" may be a full DxDxD cube,
or you could lop off the corners like (here 2d)
0 1 2 3 4
1 1 2 3 4
2 2 3 4 4
3 3 4 4
4 4 4
Also fwiw,
with erik's numbers, there are on average 500^3/1M ~ 2^7 ~ 5^3 empties
per sample point.
So I at first thought that 5x5x5 cubes around 1M sample points
would cover most of the whole grid.
Not so, ~ 1/e of the gridpoints stay empty -- Poisson distibution.

Resources