Closest pair of points across a line - algorithm

I have two sets of 2D points, separated from each other by a line in the plane. I'd like to efficiently find the pair of points, consisting of one point from each set, with the minimum distance between them. There's a really convenient looking paper by Radu Litiu, Closest Pair for Two Separated Sets of Points, but it uses an L1 (Manhattan) distance metric instead of Euclidean distance.
Does anyone know of a similar algorithm which works with Euclidean distance?
I can just about see an extension of the standard divide & conquer closest pair algorithm -- divide the two sets by a median line perpendicular to the original splitting line, recurse on the two sides, then look for a closer pair consisting of one point from each side of the median. If the minimal distance from the recursive step is d, then the companion for a point on one side of the median must lie within a box of dimensions 2d*d. But unlike with the original algorithm, I can't see any way to bound the number of points within that box, so the algorithm as a whole just becomes O(m*n).
Any ideas?

Evgeny's answer works, but it's a lot of effort without library support: compute a full Voronoi diagram plus an additional sweep line algorithm. It's easier to enumerate for both sets of points the points whose Voronoi cells intersect the separating line, in order, and then test all pairs of points whose cells intersect via a linear-time merge step.
To compute the needed fragment of the Voronoi diagram, assume that the x-axis is the separating line. Sort the points in the set by x-coordinate, discarding points with larger y than some other point with equal x. Begin scanning the points in order of x-coordinate, pushing them onto a stack. Between pushes, if the stack has at least three points, say p, q, r, with r most recently pushed, test whether the line bisecting pq intersects the separating line after the line bisecting qr. If so, discard q, and repeat the test with the new top three. Crude ASCII art:
Case 1: retain q
------1-2-------------- separating line
/ |
p / |
\ |
q-------r
Case 2: discard q
--2---1---------------- separating line
\ /
p X r
\ /
q

For each point of one set find closest point in other set. While doing this, keep only one pair of points having minimal distance between them. This reduces given problem to other one: "Algorithm to find for all points in set A the nearest neighbor in set B", which could be solved using sweep line algorithm over (1) one set of points and (2) Voronoi diagram for other set.
Algorithm complexity is O((M+N) log M). And this algorithm does not use the fact that two sets of points are separated from each other by a line.

well what about this:
determine on which side any point is:
let P be your points (P0,...Pi,...Pn)
let A,B be the separator line start-end points
so: side(Pi) = signum of ((B-A).(Pi-A))
this is based on simple fact that signum of scalar vector multiplication (dot product) depends on the order of points (see triangle/polygon winding rule for more info)
find minimal distance of any (Pi,Pj) where side(Pi)!=side(pj)
so first compute all sides for all points O(N)
then cycle through all Pi and inside that for
cycle through all Pj and search for min distance.
if the Pi and Pj groups aprox. equal size tahn it is O((N/2)^2)
you can further optimize the search by 'sort' the points Pi,Pj by 'distance' from AB
you can use another dot product to do that, this time instead (B-A)
use perpendicular vector to it let say (C-A)
discard all points from Pi2 (and similar Pj2 also)
where ((B-A).(P(i1)-A)) is close to ((B-A).(P(i2)-A))
and |((C-A).(P(i1)-A))| << |((C-A).(P(i2)-A))|
beacuese that means that Pi2 is behind Pi1 (farer from AB)
and close to the normal of AB going near Pi1
complexity after this optimization strongly depend on the dataset.
should be O(N+(Ni*Nj)) where Ni/Nj is number of remaining points Pi/Pj
you need 2N dot products, Ni*Nj distance comparision (do not neet to be sqrt-ed)

A typical approach to this problem is a sweep-line algorithm. Suppose you have a coordinate system that contains all points and the line separating points from different sets. Now imagine a line perpendicular to the separating line hopping from point to point in ascending order. For convenience, you may rotate and translate the point set and the separating line such that the separating line equals the x-axis. The sweep-line is now parallel with the y-axis.
Hopping from point to point with the sweep-line, keep track of the shortest distance of two points from different sets. If the first few points are all from the same set, it's easy to find a formula that will tell you which one you'll have to remember until you hit the first point from the other set.
Suppose you have a total of N points. You will have to sort all points in O(N*log(N)). The sweep-line algorithm itself will run in O(N).

(I'm not sure if this bears any similarity to David's idea...I only saw it now after I logged in to post my thoughts.) For the sake of the argument, let's say we transposed everything so the dividing line is the x axis and sorted our points by the x coordinate. Assuming N is not too large, if we scan along the x-axis (that is, traverse our sorted list of a's and b's), we can keep a record of the overall minimum and two lists of passed points. The current point in the scan is tested against each passed point from the other list while the distance from the point in the list to (x-coordinate of our scan,0) is greater than or equal to the overall min. In the example below, when reaching b2, we can stop testing at a2.
scan ->
Y
| a2
|
| a1 a3
X--------------------------
| b1 b3
| b2

Related

Queries to figure out if point lies inside polygon

I have been given a strictly convex polygon of S sides and Q queries to process.
All points of polygon and query points are given in (x,y) pairs.The points of the polygon are given in anti-clockwise order.
The aforementioned variables are limited such that 1<=S<=10^6 and 1<=Q<=10^5 and 1<=|x|,|y|<=10^9.
For each query I should output Yes if the given point lies inside the polygon; otherwise, No.
I tried using an O(S) inclusion test (ray-casting) and it timed out for the bigger test cases but also didn't pass all the preliminary ones.
Obviously, the implementation didn't cover all the edge cases and I got to know about a specific algorithm for this question which could answer each query in O(log S) using binary search but I can't figure out how to implement it from the pseudocode (first time doing computational geometry).
Could anyone provide me with the algorithm which covers all edge cases within the required time complexity (Q log S) or guide me to a page or paper that implements it?
First, you can split your convex polygon into left and right parts both starting with the upper point and ending with the lower point. The points in both parts are already sorted by y-coordinate.
Assume that query point has coordinates (qx, qy). Now you can try to find (using a binary search) a segment from the left part and a segment from the right part that intersect with the line y = qy. If you could find both segments and qx is lying between x-coordinates of the segments' intersections with the line y = qy, it's inside the polygon.
The complexity of the query is O(log(S)).
You can do a scan line algorithm.
You need to sort the Q points by their x coordinate.
Then find the S point with the lowest x and consider a line moving along the x axis. You need to track the two sides of the polygon.
Then move along the polygon and the Q set in ascending x coordinate. For every point you now just have to check if it's between the two lines you are tracking.
Complexity is O(Q logQ + S) if Q is not sorted and O(Q+S) if Q is already sorted.
There is no need to sort, a convex polygon is already sorted !
For a convex polygon, point location is quick and easy: split the polygon in two using a straight line between vertex 0 and vertex S/2. The signed area test will tell you on which side the test point lies and which half to keep (the half is also a convex polygon).
Continue recursively until S=3 and compare against the supporting line of the third side.
O(Log(S)) tests in total per query.
(The numbers show the order of the splits.)

Tangents range for all pairs of points in a box

Suppose i have a box with a lot of points. I need to be able to calculate min and max angles for all lines which go through all possible pairs of the points. I can do it in O(n^2) times by just enumerating every point with all others. But is there faster algorithm?
Taking the idea of dual plane proposed by Evgeny Kluev, and my comment about finding left-most intersection point, I'll try to give an equivalent direct solution without any dual space.
The solution is simple: sort your points by (x, y) lexicographically. Now draw a line through each two adjacent points in the sorted order. It can be proved that the minimal angle is achieved by one of these lines. In order to get maximal angle, you need to sort by (x, -y) lexicographically, and also check only adjacent pairs of points.
Let's prove by the idea for min angle. Consider the two points A and B which yield the minimal possible angle. Among such points we can choose the pair with minimal difference of x coordinates.
Suppose that they have same y. If there is no other point between them, then they are adjacent. If there are any points between them, then clearly at least one of them is adjacent to A in our order, and all of them yield the same angle.
Suppose that there exists a point P with x-coordinate in between A and B, i.e. Ax < Px < Bx. If P lies on AB, then AP has same angle but less difference of x coordinates, hence a contradiction. When P is not on AB, then either AP or PB would give you less angle, which also gives contradiction.
Now we have points A and B lying on two adjacent vertical lines. There are no other points between these lines. If A and B are the only points on their vertical lines, then the AB pair is clearly adjacent in sorted order and QED. If there many points on these lines, obviously the minimal angle is achieved by taking the highest point on the left vertical line (which must be A) and the lowest point on the right vertical line (which must be B). Since we sort points of equal x by y, these two points are also adjacent.
Sort the points (or use hash map) to find out if there are any horizontal lines.
Then solve this problem on dual plane. Here you only need to find the leftmost and the rightmost intersection points. Use binary searches to find a pair of horizontal coordinates such that all intersection points are between them. (You could quickly find approximate results just by continuing binary searches from these coordinates).
Then sort lines according to their tangents on dual plane. And for pairs of adjacent lines in this sorted order find intersections closest to those horizontal coordinates. This does not guarantee good complexity in the worst case (when some lines on primal plane are almost horizontal). But in most cases time complexity would be determined by sorting: O(N log N) + O(binary_search_complexity).

How to find a nearest point to a given query line from a set of points in O(log n) time with out using duality concept?

Assume that we have been given a set S of n points and an arbitrary query line l. Do some preprocessing (other than duality) so that we can answer the nearest (closest) point (of S) to l in O(log n) time (no restriction on space).
You say "no restriction on space", which implies no restriction on preprocessing time.
Consider the sorted abscissas of the sites after rotation by an arbitrary angle: the site closest to a vertical line is found by dichotomy after Lg(N) comparisons.
Now consider the continuous set of rotations: you can partition it in angle ranges such that the order of the sorted abscissas does not change.
So you will find all limiting angles by taking the sites in pairs, and store the angle value as well as the corresponding ordering of the rotated abscissas.
For a query, find the enclosing angle interval by a first binary search (among O(N²) angles), then the closest site by a search on the rotated abscissas (binary search among O(N) abscissas).
Done the straightforward way, this will require O(N³) storage.
Given that the ordering permutations for two consecutive angles just differ by a single swap, it is not unthinkable that O(N²) storage can be achieved by a suitable data structure.
Distance from line for point (xi,yi) is d = |yi-mxi-c|/sqrt(1+m^2).
We need to minimize f(x,y) = (y-mx-c)^2
These are quadratic equations in (m,c) .
Satisfying conditions :-
F(xi,yi) <= F(x1,y1),F(x2,y2)..
Suppose you get a solver for this then you would get intervals of (m,c) where the conditions are satisfied.
You can use interval tree to search interval and point where a line (m,c) lies .
You need to find the line which goes through I and is perpendicular to the line. Then solve the pair of equation of the two lines to get the intersection. That is the closest point to I from the initial line, but it is not necessarily in S. Let's call the intersection I'. If your elements in S are ordered by x, then you can simply do a binary search to get the closest x in S to I'.x and return the point having that x.
You don't have to phrase it directly in terms of duality, but the key observation is that for two points, the boundary between lines closer to one point than the other point are the set of lines that satisfy certain inequalities on the slope and intercept of the line. So if you use these inequalities, then in a sense you are treating the line as a "point" (a pair of numbers that satisfy certain inequalities to find the nearest point) no matter what you do. It seems like the only other option is to do some preprocessing so you can quickly find the closest projection of your points onto an arbitrary given line (e.g. by computing a small number of projections and eliminating the rest from consideration), but that seems hard to make run in guaranteed O(log n) time per query line.
This ought to work:
Preprocess with a rotating sweep line through angles 0 to pi by projecting all the points onto that line and recording the sweep line angle theta and parameters for the projected points, doing this once for each time two points are coincident ie "cross over each other". By "parameters" I mean pick any fixed origin A and record (p - A) dot [cos theta, sin theta] for all p. There will be O(n^2) of these scan line records, so they can be searched by angle in O(log n) time. Given a query line Q, use binary search to find the two recorded sweep lines that bracket Q' s perpendicular in angle. The recorded projections have the same ordering of points as the ordering of points projected on Q's perpendicular. Now find the parameter for point R that is the projection of Q onto its own perpendicular through A. Use one more binary search in the bracketing sweep lines to find the point closest to B. This is the closest point to Q.

How to do query all points which lie on aline

Suppose I have a set of points,
then I define line L. How do I obtain b, d, and f?
Can this be solved using kd-tree (with slight modification)?
==EDIT==
How my program works:
Define a set of points
L is defined later, it has nothing to do with point set
My only idea right now:
Get middle point m of line L.
Based on point m, Get all points in the radius of lenght(L)/2 using KD-Tree
For every points, test if it lies on line L
Perhaps I'll add colinear threshold if some points are slightly lie on the query line.
The running time of my approach will depend on the L length, longer the line, bigger the query, more points need to be checked.
You can have logarithmic-time look-up. My algorithm achieves that at the cost of a giant memory usage (up to cubic in the number of points):
If you know the direction of the line in advance, you can achieve logarithmic-time lookup quite easily: let a*x + b*y = c be the equation of the line, then a / b describes the direction, and c describes the line position. For each a, b (except [0, 0]) and point, c is unique. Then sort the points according to their value of c into an index; when you get the line, do a search in this index.
If all your lines are orthogonal, it takes two indexes, one for x, one for y. If you use four indexes, you can look up by lines at 45° as well. You don't need to get the direction exact; if you know the bounding region for all the points, you can search every point in a strip parallel to the indexed direction that spans the query line within the bounding region:
The above paragraphs define "direction" as the ratio a / b. This yields infinite ratios, however. A better definition defines "direction" as a pair (a, b) where at least one of a, b is non-zero and two pairs (a1, b1), (a2, b2) define the same direction iff a1 * b2 == b1 * a2. Then { (a / b, 1) for b nonzero, (1, 0) for b zero} is one particular way of describing the space of directions. Then we can choose (1, 0) as the "direction at infinity", then order all other directions by their first component.
Be aware of floating point inaccuracies. Rational arithmetic is recommended. If you choose floating point arithmetic, be sure to use epsilon comparison when checking point-line incidence.
Algorithm 1: Just choose some value n, prepare n indexes, then choose one at query time. Unfortunately, the downside is obvious: the lookup is still a range sweep and thus linear, and the expected speedup drops as the direction gets further away from an indexed one. It also doesn't provide anything useful if the bounding region is much bigger than the region where most of the points are (you could search extremal points separately from the dense region, however).
The theoretical lookup speed is still linear.
In order to achieve logarithmic lookup this way, we need an index for every possible direction. Unfortunately, we can't have infinitely many indexes. Fortunately, similar directions still produce similar indexes - indexes that differ in only few swaps. If the directions are similar enough, they will produce identical indexes. Thus, we can use the same index for an entire range of directions. Namely, only directions such that two different points lie on the same line can cause a change of index.
Algorithm 2 achieves the logarithmic lookup time at the cost of a huge index:
When preparing:
For each pair of points (A, B), determine the direction from A to B. Collect the directions into an ordered set, calling the set the set of significant directions.
Turn this set into a list and add the "direction at infinity" to both ends.
For each pair of consecutive significant directions, choose an arbitrary direction within that range and prepare an index of all points for that direction. Collect the indexes into a list. Do not store any specific values of key in this index, only references to points.
Prepare an index over these indexes, where the direction is the key.
When looking up points by a line:
determine the line direction.
look up the right point index in the index of indexes. If the line direction falls at the boundary between two ranges, choose one arbitrarily. If not, you are guaranteed to find at most one point on the line.
Since there are only O(n^2) significant directions, there are O(n^2) ranges in this index. The lookup will take O(log n) time to find the right one.
look up the points in the index for this range, using the position with respect to the line direction as the key. This lookup will take O(log n) time.
Slight improvement can be obtained because the first and the last index are identical if the "direction at infinity" is not among the significant directions. Further improvements can be performed depending on what indexes are used. An array of indexes into an array of points is very compact, but if a binary search tree (such as a red-black tree or an AVL tree) is used for the index of points, you can do further improvements by merging subtrees identical by value to be identical by reference.
If the points are uniformly distributed, you could divide the plane in a Sqrt(n) x Sqrt(n) grid. Every gridcell contains 1 point on average, which is a constant.
Every line intersects at most 2 * Sqrt(n) grid cells [right? Proof needed :)]. Inspecting those cells takes O(Sqrt(n)) time, because each cell contains a constant number of points on average (of course this does not hold if the points have some bias).
Compute the bounding box of all of your points
Divide that bounding box in a uniform grid of x by y cells
Store each of your point in the cell it belongs to
Now for each line you want to test, all you have to do is find the cells it intersects, and test the points in those cells with "distance to line = 0".
Of course, it's only efficient if you gonna test many line for a given set of points.
Can try the next :
For each point find distance from a point to a line
More simple, for each point put the point coordinate in the line equation , is it match (meaning 0=0) than it's on the line
EDIT:
If you have many points - there is another way.
If you can sort the points, create 2 sort list:
1 sorted by x value
2 sorted by y values
Let say that your line start at (x1,y1) and ended at (x2,y2)
It's easy to filter all the points that their x value is not between [x1,x2] OR their y value is not between [y1,y2]
If you have no points - mean there are no points on this line.
Now split the line to 2, now you have 2 lines - run the same process again - you can see where this is going.
once you have small enough number of points (for you to choose) - let say 10, check if they are on the line in the usual way
This also enable you to get "as near" as you need to the line, and skip places where there are not relevant points
If you have enough memory, then it is possible to use Hough-algo like approach.
Fill r-theta array with lists of matching points (not counters). Then for every line find it's r-theta equation, and check points from the list with given r-theta coordinates.
Subtle thing - how to choose array resolution.

segment intersection in 3 dimension

We can solve the problem of segment intersection in 2D in O(nlgn) time. In this problem, we are given a set of line segments and we have to see if there is an intersection or not. Now her's a problem from CLRS.
Ques. Professor Charon has a set of n sticks, which are lying on top of each other in some
configuration. Each stick is specified by its endpoints, and each endpoint is an ordered triple giving its (x, y, z) coordinates. No stick is vertical. He wishes to pick up all the sticks, one at a time, subject to the condition that he may pick up a stick only if there is no other stick on top of it.
a. Give a procedure that takes two sticks a and b and reports whether a is above, below,
or unrelated to b.
b. Describe an efficient algorithm that determines whether it is possible to pick up all the sticks, and if so, provides a legal sequence of stick pickups to do so.
I find it is an extension of the segment intersection in 3D. In 2D, the sweep line moves in "y" and the array is sorted according to the "x" coordinate. I think in 3D, the sweep line should move in "z" dimension but m not sure how to sort now, since I have to take care of both "x" & "y".
If we somehow figure it out, I guess, if there is an intersection, then for part (b), its not possible to pick all the sticks.
Am I going in the right direction??
It is possible to use segment intersection in 2D to solve this problem.
Checking is one segment above the other is same as checking for there intersection in XY projection and if they intersect than compare Z coordinate of intersection point on each segment. E.g. for segments a=((0,0,0), (2,2,2)) and b=((0,2,3), (2,0,5)), projection on XY is ((0,0), (2,2)) and ((0,2), (2,0)). 2D intersection is (1,1), and Z value in (1,1) for a is 1, and for b is 4. That means b is above a.
So, use segment intersection in 2D to find which segments are in relation. To find in which order to remove segments use topological sorting.

Resources