Suppose that n points are given. Call the set of lines determined by them S. Can you find a line that is not parallel to any of the lines in S in better than quadratic time?
Sort all n points by increasing abscissa. It is clear that the largest slope occurs between two consecutive points. So pick the largest slope among the n-1 possibilities and use a yet larger slope.
The case of points vertically aligned is harmless: ignore these infinite slopes and stay with the largest among the finite ones.
Related
I'm new into coding and today I completed the trivial solution for the Closest-Pair problem in a 2-D space. (2 for loops)
However I gave up finding any solution which could do it in O(n log n). Even after researching it, I still don't understand how this can be faster than the trivial method.
What I understand:
-> At first we split the array in 2 halfs and sort everything only considering the X coordinates. This can be done in n log n.
Next there are recursive calls which "find the two points with the lowest distance" in each half. But how is this done exactly below O(n^2)?
In my understanding it is impossible to find the lowest distance between N/2 points without checking every single one of them.
There is a solution in 1-D which absolutely makes sense to me. After sorting we know, that the distance between two non-adjacent points can't be lower than the distance of at least 2 adjacent ones. However this is not true for 2-D space, since we have an additional Y coordinate which could lead to the lowest distance between two points which are not adjacent on the X axis.
First of all, heed the advice of user #Evg - this answer cannot substitute the comprehensive description and mathematically rigorous analysis of the algorithm.
However, here are some ideas to get the intuition started:
(Recursion structure)
The question states:
Next there are recursive calls which "find the two points with the lowest distance" in each half. But how is this done exactly below O(n^2)? In my understanding it is impossible to find the lowest distance between N/2 points without checking every single one of them.
The recursion, however, does not stop at level 1 - assume for the sake of the argument that some O(n log n) algorithm works. Finding closest pairs among N/2 points applying that very algorithm takes O(N/2 log N/2) - not O((N/2)^2).
(Consequences of finding a closest pair in one half)
If you have found a closest pair (p, q) in the 'left' half of the point set, this pair's distance sets an upper bound to the width of a corridor around the halving line from which a closer pair (r, s) with r from the left, s from the right half can be drawn. If the closest distance found so far is 'small', it significantly reduces the size of the candidate set. As the points have been ordered by their x coordinate, the algorithm can exploit the information efficiently.
Said corridor may still cover up to the whole set of N points, but if it does, it provides information of the geometry of the point set: the points of each half will basically be aligned along a vertical line. This information can be exploited algorithmically - the most naive way would be to execute the algorithm once again but sorting along y coordinates and halving the point set by a horizontal line. Note that executing any algorithm a constant number of times does not change asymptotic run time expressed by the O(.) notation.
(Finding a close pair with one point from each half)
Consider checking a pair of points (r, s), one point from each half. It is known that the difference in their x and y coordinates, resp., mustn't exceed the minimal distance d found so far. It is known from the recursion that there can be no points r', s' (r' from the left, s' from the right half) closer to r, s, resp., than d. So given some r there cannot be 'many' candidates from the other half.
Imagine a circle of radius d drawn around r. Any point s from the other half being closer than d must be located within that circle. Let there be a few of them - however, the minimum distance among each pair still be at least d. The maximum number of points that can be distributed within a circle of radius d such that the distance between each pair of them is at least d is 7 - think of a regular hexagon with side length d and its center coinciding with the circle's center.
So after the recursion, at most every r from the left half needs to be checked against at max a constant number of points from the other half which makes the part of the algorithm after the recursion run in O(N).
Note that finding the pairing candidates for a given r is an efficient operation - the points from both halves have been sorted by the same criterion.
Given a group of n points which are spread on 3 horizontal lines (y=0 , y=1, y=2)
think of algorithm to find if there is crossing line with O(n^2)
enter image description here
I believe this algorithm in O(n^2). First we assume:
The x-coordinates of the points on the line y=2 can be represented by a finite number of bits.
That these x-coordinates are initially sorted in x. If not, then sorting them first is O(nlog(n))
The definition of the crossing line is made with respect to the precision of representing the points with the finite number of bits.
Now, construct a hash function as follows:
Find the minimum distance between adjoining points on the line y=2. Since the x-coordinates are sorted, this is O(n). Call this min distance d.
The hash function is floor(2*x/d). It is clear that this hash will map each point on y=2 to a unique bin. This operation is also O(n).
Then, do the following:
For each pair of points on the lines y=0 and y=1, compute its intersect on y=2. This is O(n^2).
For each intersect point on y=2, lookup the hash table to see if there is a point from the line y=2 there. If there is, see if it is the same point (within precision of machine), and if so, then there is a crossing line. Otherwise, there is not. This is O(n^2) since the hash lookup is O(1).
Therefore, the algorithm is O(n^2). No code, just ideas.
My problem is: we have N points in a 2D space, each point has a positive weight. Given a query consisting of two real numbers a,b and one integer k, find the position of a rectangle of size a x b, with edges are parallel to axes, so that the sum of weights of top-k points, i.e. k points with highest weights, covered by the rectangle is maximized?
Any suggestion is appreciated.
P.S.:
There are two related problems, which are already well-studied:
Maximum region sum: find the rectangle with the highest total weight sum. Complexity: NlogN.
top-K query for orthogonal ranges: find top-k points in a given rectangle. Complexity: O(log(N)^2+k).
You can reduce this problem into finding two points in the rectangle: rightmost and topmost. So effectively you can select every pair of points and calculate the top-k weight (which according to you is O(log(N)^2+k)). Complexity: O(N^2*(log(N)^2+k)).
Now, given two points, they might not form a valid pair: they might be too far or one point may be right and top of the other point. So, in reality, this will be much faster.
My guess is the optimal solution will be a variation of maximum region sum problem. Could you point to a link describing that algorithm?
An non-optimal answer is the following:
Generate all the possible k-plets of points (they are N × N-1 × … × N-k+1, so this is O(Nk) and can be done via recursion).
Filter this list down by eliminating all k-plets which are not enclosed in a a×b rectangle: this is a O(k Nk) at worst.
Find the k-plet which has the maximum weight: this is a O(k Nk-1) at worst.
Thus, this algorithm is O(k Nk).
Improving the algorithm
Step 2 can be integrated in step 1 by stopping the branch recursion when a set of points is already too large. This does not change the need to scan the element at least once, but it can reduce the number significantly: think of cases where there are no solutions because all points are separated more than the size of the rectangle, that can be found in O(N2).
Also, the permutation generator in step 1 can be made to return the points in order by x or y coordinate, by pre-sorting the point array correspondingly. This is useful because it lets us discard a bunch of more possibilities up front. Suppose the array is sorted by y coordinate, so the k-plets returned will be ordered by y coordinate. Now, supposing we are discarding a branch because it contains a point whose y coordinate is outside the max rectangle, we can also discard all the next sibling branches because their y coordinate will be more than of equal to the current one which is already out of bounds.
This adds O(n log n) for the sort, but the improvement can be quite significant in many cases -- again, when there are many outliers. The coordinate should be chosen corresponding to the minimum rectangle side, divided by the corresponding side of the 2D field -- by which I mean the maximum coordinate minus the minimum coordinate of all points.
Finally, if all the points lie within an a×b rectangle, then the algorithm performs as O(k Nk) anyways. If this is a concrete possibility, it should be checked, an easy O(N) loop, and if so then it's enough to return the points with the top N weights, which is also O(N).
I wanted to know an efficient algorithm to match (partition into n/2 distinct pairs) n=2k points in general position in the plane in such way that segments joining the matched points do not cross. Any idea would help out immmensely.
Mr. SRKV there is a simpler way of doing it.
Sort all the points based on the x-coordinate.
Now pair the left most point with the next left most one.
Remove the two points that we just paired.
Continue from Step 2 till there are no points left.
In case two points have the same x-coordinate. The following is the tie breaking rule.
Join the point with the lower y-coordinate to the point with the 2nd lowest y-coordinate.
If there are an odd number of points with the same x-coordinate, then we join the lone remaining point (topmost y) with the next x-coordinate(if multiple then the lowest one).
Total complexity O(nlogn) to sort and O(n) to traverse so asymptotically it is O(nlogn).
Find the convex hull.
Working your way around the hull (let's say clockwise), take adjacent pairs of vertices and add them to your set of pairs. Delete each pair from the graph as you do so. If the hull contains an even number of points, then all of them will be deleted, otherwise 1 will be left over.
If the graph still contains points, goto 1.
If each hull contains an even number of points, then it's clear that every pair of line segments found by this algorithm either came from the same hull, or from different hulls -- and either way, they will not intersect. I'm convinced it will work even when some hulls have an odd number of points.
I have two sets of 2D points, separated from each other by a line in the plane. I'd like to efficiently find the pair of points, consisting of one point from each set, with the minimum distance between them. There's a really convenient looking paper by Radu Litiu, Closest Pair for Two Separated Sets of Points, but it uses an L1 (Manhattan) distance metric instead of Euclidean distance.
Does anyone know of a similar algorithm which works with Euclidean distance?
I can just about see an extension of the standard divide & conquer closest pair algorithm -- divide the two sets by a median line perpendicular to the original splitting line, recurse on the two sides, then look for a closer pair consisting of one point from each side of the median. If the minimal distance from the recursive step is d, then the companion for a point on one side of the median must lie within a box of dimensions 2d*d. But unlike with the original algorithm, I can't see any way to bound the number of points within that box, so the algorithm as a whole just becomes O(m*n).
Any ideas?
Evgeny's answer works, but it's a lot of effort without library support: compute a full Voronoi diagram plus an additional sweep line algorithm. It's easier to enumerate for both sets of points the points whose Voronoi cells intersect the separating line, in order, and then test all pairs of points whose cells intersect via a linear-time merge step.
To compute the needed fragment of the Voronoi diagram, assume that the x-axis is the separating line. Sort the points in the set by x-coordinate, discarding points with larger y than some other point with equal x. Begin scanning the points in order of x-coordinate, pushing them onto a stack. Between pushes, if the stack has at least three points, say p, q, r, with r most recently pushed, test whether the line bisecting pq intersects the separating line after the line bisecting qr. If so, discard q, and repeat the test with the new top three. Crude ASCII art:
Case 1: retain q
------1-2-------------- separating line
/ |
p / |
\ |
q-------r
Case 2: discard q
--2---1---------------- separating line
\ /
p X r
\ /
q
For each point of one set find closest point in other set. While doing this, keep only one pair of points having minimal distance between them. This reduces given problem to other one: "Algorithm to find for all points in set A the nearest neighbor in set B", which could be solved using sweep line algorithm over (1) one set of points and (2) Voronoi diagram for other set.
Algorithm complexity is O((M+N) log M). And this algorithm does not use the fact that two sets of points are separated from each other by a line.
well what about this:
determine on which side any point is:
let P be your points (P0,...Pi,...Pn)
let A,B be the separator line start-end points
so: side(Pi) = signum of ((B-A).(Pi-A))
this is based on simple fact that signum of scalar vector multiplication (dot product) depends on the order of points (see triangle/polygon winding rule for more info)
find minimal distance of any (Pi,Pj) where side(Pi)!=side(pj)
so first compute all sides for all points O(N)
then cycle through all Pi and inside that for
cycle through all Pj and search for min distance.
if the Pi and Pj groups aprox. equal size tahn it is O((N/2)^2)
you can further optimize the search by 'sort' the points Pi,Pj by 'distance' from AB
you can use another dot product to do that, this time instead (B-A)
use perpendicular vector to it let say (C-A)
discard all points from Pi2 (and similar Pj2 also)
where ((B-A).(P(i1)-A)) is close to ((B-A).(P(i2)-A))
and |((C-A).(P(i1)-A))| << |((C-A).(P(i2)-A))|
beacuese that means that Pi2 is behind Pi1 (farer from AB)
and close to the normal of AB going near Pi1
complexity after this optimization strongly depend on the dataset.
should be O(N+(Ni*Nj)) where Ni/Nj is number of remaining points Pi/Pj
you need 2N dot products, Ni*Nj distance comparision (do not neet to be sqrt-ed)
A typical approach to this problem is a sweep-line algorithm. Suppose you have a coordinate system that contains all points and the line separating points from different sets. Now imagine a line perpendicular to the separating line hopping from point to point in ascending order. For convenience, you may rotate and translate the point set and the separating line such that the separating line equals the x-axis. The sweep-line is now parallel with the y-axis.
Hopping from point to point with the sweep-line, keep track of the shortest distance of two points from different sets. If the first few points are all from the same set, it's easy to find a formula that will tell you which one you'll have to remember until you hit the first point from the other set.
Suppose you have a total of N points. You will have to sort all points in O(N*log(N)). The sweep-line algorithm itself will run in O(N).
(I'm not sure if this bears any similarity to David's idea...I only saw it now after I logged in to post my thoughts.) For the sake of the argument, let's say we transposed everything so the dividing line is the x axis and sorted our points by the x coordinate. Assuming N is not too large, if we scan along the x-axis (that is, traverse our sorted list of a's and b's), we can keep a record of the overall minimum and two lists of passed points. The current point in the scan is tested against each passed point from the other list while the distance from the point in the list to (x-coordinate of our scan,0) is greater than or equal to the overall min. In the example below, when reaching b2, we can stop testing at a2.
scan ->
Y
| a2
|
| a1 a3
X--------------------------
| b1 b3
| b2