Given a set of n points, can we find three points that describe a triangle with minimum area in O(n^2)? If yes, how, and if not, can we do better than O(n^3)?
I have found some papers that state that this problem is at least as hard as the problem that asks to find three collinear points (a triangle with area 0). These papers describe an O(n^2) solution to this problem by reducing it to an instance of the 3-sum problem. I couldn't find any solution for what I'm interested in however. See this (look for General Position) for such a paper and more information on 3-sum.
There are O(n2) algorithms for finding the minimum area triangle.
For instance you can find one here: http://www.cs.tufts.edu/comp/163/fall09/CG-lecture9-LA.pdf
If I understood that pdf correctly, the basic idea is as follows:
For each pair of points AB you find the point that is closest to it.
You construct a dual of the points so that lines <-> points.
Line y = mx + c is mapped to point (m,c)
In the dual, for a given point (which corresponds to a segment in original set of points) the nearest line vertically gives us the required point for 1.
Apparently 2 & 3 can be done in O(n2) time.
Also I doubt the papers showed 3SUM-hardness by reducing to 3SUM. It should be the other way round.
There's an algorithm that finds the required area with complexity O(n^2*log(n)).
For each point Pi in set do the following(without loss of generality we can assume that Pi is in the origin or translate the points to make it so).
Then for each points (x1,y1), (x2,y2) the triangle area will be 0.5*|x1*y2-x2*y1| so we need to minimize that value. Instead of iterating through all pairs of remaining points (which gives us O(N^3) complexity) we sort those points using predicate X1 * Y2 < X2 * Y1. It is claimed that to find triangle with minimal area we need to check only the pairs of adjacent points in the sorted array.
So the complexity of this procedure for each point is n*log(n) and the whole algorithm works in O(n^2*log(n))
P.S. Can't quickly find the proof that this algorithm is correct :(, hope will find it it later and post it then.
The problem
Given a set of n points, can we find three points that describe a triangle with minimum area in O(n^2)? If yes, how, and if not, can we do better than O(n^3)
is better resolved in this paper: James King, A Survey of 3sum-Hard Problems, 2004
Related
I was asked this question in Yahoo for machine learning profile. Given a set of points (x,y) coordinates I was asked to find points with lowest distance in O(n) or O(log n )time.
Obviously I was able to come up with O(n^2) time but was no way near getting the better algorithm. Even though the problem statement was screaming for Divide and Conquer I just could not come up with the reasoning for the merge step. I also googled for this question on the internet and found that It is actually very popular but I still could not get hold of the reasoning of the merge step.
Can anyone help me out with this?
Input: (x1,y1),(x2,y2),(x3,y3),(x4,y4),(x5,y5)
The problem can be solved in O(n log n) time using the recursive divide and conquer approach, e.g., as follows:
1.Sort points according to their x-coordinates.
2.Split the set of points into two equal-sized subsets by a vertical line x=xmid.
3.Solve the problem recursively in the left and right subsets. This yields the left-side and right-side minimum distances dLmin and dRmin, respectively.
4.Find the minimal distance dLRmin among the pair of points in which one point lies on the left of the dividing vertical and the second point lies to the right.
5.The final answer is the minimum among dLmin, dRmin, and dLRmin.
http://en.wikipedia.org/wiki/Closest_pair_of_points
In this problem r is a fixed positive integer. You are given N rectangles, all the same size, in the plane. The sides are either vertical or horizontal. We assume the area of the intersection of all N rectangles has non-zero area. The problem is how to find N-r of these rectangles, so as to maximize the area of the intersection. This problem arises in practical microscopy when one repeatedly images a given biological specimen, and alignment changes slightly during this process, due to physical reasons (e.g. differential expansion of parts of the microscope and camera). I have expressed the problem for dimension d=2. There is a similar problem for each d>0. For d=1, an O(N log(N)) solution is obtained by sorting the lefthand endpoints of the intervals. But let's stick with d=2. If r=1, one can again solve the problem in time O(N log(N)) by sorting coordinates of the corners.
So, is the original problem solved by solving first the case (N,1) obtaining N-1 rectangles, then solving the case (N-1,1), getting N-2 rectangles, and so on, until we reduce to N-r rectangles? I would be interested to see an explicit counter-example to this optimistic attempted procedure. It would be even more interesting if the procedure works (proof please!), but that seems over-optimistic.
If r is fixed at some value r>1, and N is large, is this problem in one of the NP classes?
Thanks for any thoughts about this.
David
Since the intersection of axis-aligned rectangles is an axis-aligned rectangle, there are O(N4) possible intersections (O(N) lefts, O(N) rights, O(N) tops, O(N) bottoms). The obvious O(N5) algorithm is to try all of these, checking for each whether it's contained in at least N - r rectangles.
An improvement to O(N3) is to try all O(N2) intervals in the X dimension and run the 1D algorithm in the Y dimension on those rectangles that contain the given X-interval. (The rectangles need to be sorted only once.)
How large is N? I expect that fancy data structures might lead to an O(N2 log N) algorithm, but it wouldn't be worth your time if a cubic algorithm suffices.
I think I have a counter-example. Let's say you have r := N-2. I.e. you want to find two rectangles with maximum overlapping. Let's say you have to rectangles covering the same area (=maximum overlapping). Those two will be the optimal result in the end.
Now we need to construct some more rectangles, such that at least one of those two get removed in a reduction step.
Let's say we have three rectangles which overlap a lot..but they are not optimal. They have a very small overlapping area with the other two rectangles.
Now if you want to optimize the area for four rectangles, you will remove one of the two optimal rectangles, right? Or maybe you don't HAVE to, but you're not sure which decision is optimal.
So, I think your reduction algorithm is not quite correct. Atm I'm not sure if there is a good algorithm for this or in which complexity class this belongs to, though. If I have time I think about it :)
Postscript. This is pretty defective, but may spark some ideas. It's especially defective where there are outliers in a quadrant that are near the X and Y axes - they will tend to reinforce each other, as if they were both at 45 degrees, pushing the solution away from that quadrant in a way that may not make sense.
-
If r is a lot smaller than N, and N is fairly large, consider this:
Find the average center.
Sort the rectangles into 2 sequences by (X - center.x) + (Y - center.y) and (X - center.x) - (Y - center.y), where X and Y are the center of each rectangle.
For any solution, all of the reject rectangles will be members of up to 4 subsequences, each of which is a head or tail of each of the 2 sequences. Assuming N is a lot bigger than r, most the time will be in sorting the sequences - O(n log n).
To find the solution, first find the intersection given by removing the r rectangles at the head and tail of each sequence. Use this base intersection to eliminate consideration of the "core" set of rectangles that you know will be in the solution. This will reduce the intersection computations to just working with up to 4*r + 1 rectangles.
Each of the 4 sequence heads and tails should be associated with an array of r rectangles, each entry representing the intersection given by intersecting the "core" with the i innermost rectangles from the head or tail. This precomputation reduces the complexity of finding the solution from O(r^4) to O(r^3).
This is not perfect, but it should be close.
Defects with a small r will come from should-be-rejects that are at off angles, with alternatives that are slightly better but on one of the 2 axes. The maximum error is probably computable. If this is a concern, use a real area-of-non-intersection computation instead of the simple "X+Y" difference formula I used.
Here is an explicit counter-example (with N=4 and r=2) to the greedy algorithm proposed by the asker.
The maximum intersection between three of these rectangles is between the black, blue, and green rectangles. But, it's clear that the maximum intersection between any two of these three is smaller than intersection between the black and the red rectangles.
I now have an algorithm, pretty similar to Ed Staub's above, with the same time estimates. It's a bit different from Ed's, since it is valid for all r
The counter-example by mhum to the greedy algorithm is neat. Take a look.
I'm still trying to get used to this site. Somehow an earlier answer by me was truncated to two sentences. Thanks to everyone for their contributions, particularly to mhum whose counter-example to the greedy algorithm is satisfying. I now have an answer to my own question. I believe it is as good as possible, but lower bounds on complexity are too difficult for me. My solution is similar to Ed Staub's above and gives the same complexity estimates, but works for any value of r>0.
One of my rectangles is determined by its lower left corner. Let S be the set of lower left corners. In time O(N log(N)) we sort S into Sx according to the sizes of the x-coordinates. We don't care about the order within Sx between two lower left corners with the same x-coord. Similarly the sorted sequence Sy is defined by using the sizes of the y-coords. Now let u1, u2, u3 and u4 be non-negative integers with u1+u2+u3+u4=r. We compute what happens to the area when we remove various rectangles that we now name explicitly. We first remove the u1-sized head of Sx and the u2-sized tail of Sx. Let Syx be the result of removing these u1+u2 entries from Sy. We remove the u3-sized head of Syx and the u4-sized tail of Syx. One can now prove that one of these possible choices of (u1,u2,u3,u4) gives the desired maximal area of intersection. (Email me if you want a pdf of the proof details.) The number of such choices is equal to the number of integer points in the regular tetrahedron in 4-d euclidean space with vertices at the 4 points whose coordinate sum is r and for which 3 of the 4 coordinates are equal to 0. This is bounded by the volume of the tetrahedron, giving a complexity estimate of O(r^3).
So my algorithm has time complexity O(N log(N)) + O(r^3).
I believe this produces a perfect solution.
David's solution is easier to implement, and should be faster in most cases.
This relies on the assumption that for any solution, at least one of the rejects must be a member of the complex hull. Applying this recursively leads to:
Compute a convex hull.
Gather the set of all candidate solutions produced by:
{Remove a hull member, repair the hull} r times
(The hull doesn't really need to be repaired the last time.)
If h is the number of initial hull members, then the complexity is less than
h^r, plus the cost of computing the initial hull. I am assuming that a hull algorithm is chosen such that the sorted data can be kept and reused in the hull repairs.
This is just a thought, but if N is very large, I would probably try a Monte-Carlo algorithm.
The idea would be to generate random points (say, uniformly in the convex hull of all rectangles), and score how each random point performs. If the random point is in N-r or more rectangles, then update the number of hits of each subset of N-r rectangles.
In the end, the N-r rectangle subset with the most random points in it is your answer.
This algorithm has many downsides, the most obvious one being that the result is random and thus not guaranteed to be correct. But as most Monte-Carlo algorithms it scales well, and you should be able to use it with higher dimensions as well.
I am looking at the wikipedia entry for how to solve this. It lists five steps
1.Sort points along the x-coordinate
2.Split the set of points into two equal-sized subsets by a vertical line x = xmid
3.Solve the problem recursively in the left and right subsets. This will give the left-side and right-side minimal distances dLmin and dRmin respectively.
4.Find the minimal distance dLRmin among the pair of points in which one point lies on the left of the dividing vertical and the second point lies to the right.
5.The final answer is the minimum among dLmin, dRmin, and dLRmin.
The fourth step I am having trouble understanding. How do I choose what point to the left of the line to compare to a point right of the line. I know I am not supposed to compare all points, but I am unclear about how to choose points to compare. Please do not send me a link, I have searched, gone to numerous links, and have not found an explanation that helps me understand step 4.
Thanks
Aaron
The answer to your question was in the next paragraph of the wikipedia article:
It turns out that step 4 may be
accomplished in linear time. Again, a
naive approach would require the
calculation of distances for all
left-right pairs, i.e., in quadratic
time. The key observation is based on
the following sparsity property of the
point set. We already know that the
closest pair of points is no further
apart than dist = min(dLmin,dRmin).
Therefore for each point p of the left
of the dividing line we have to
compare the distances to the points
that lie in the rectangle of
dimensions (dist, 2 * dist) to the
right of the dividing line, as shown
in the figure. And what is more, this
rectangle can contain at most 6 points
with pairwise distances at least
dRmin. Therefore it is sufficient to
compute at most 6n left-right
distances in step 4. The recurrence
relation for the number of steps can
be written as T(n) = 2T(n / 2) + O(n),
which we can solve using the master
theorem to get O(n log n).
I don't think I can put it much clearer than they already have, but do you have any specific questions about this step of the algorithm?
I'm creating a simple game and come up with this problem while designing AI for my game:
Given a set of N points inside a rectangle in the Cartesian coordinate, i need to find the widest straight path through this rectangle. The path must be empty (i.e not containing any point).
I wonder if are there any efficient algorithm to solve this problem? Can you suggest any keyword/ paper/ anything related to this problem?
EDIT: The rectangle is always defined by 4 points in its corner. I added an image for illustration. the path in the above pictures are the determined by two red lines
This is the widest empty corridor problem. Houle and Maciel gave an O(n2)-time, O(n)-space algorithm in a 1988 tech report entitled "Finding the widest empty corridor through a set of points", which seems not to be available online. Fortunately, Janardan and Preparata describe this algorithm in Section 4 of their paper Widest-corridor problems, which is available.
Loop through all pairs of points. Construct a line l through the pair. (^1) On each side of l, either there are other points, or not. If not, then there is not a path on that side of l. If there are other points, loop through points calculating the perpendicular distance d from l to each such point. Record the minimum d. That is the widest path on that side of l. Continue looping through all pairs, comparing widest path for that pair with the previous widest path.
This algorithm can be considered naive and runs in O(n^3) time.
Edit: The above algorithm misses a case. At ^1 above, insert: "Construct two lines perpendicular to l through each point of the pair. If there is no third point between the lines, then record distance d between the points. This constitutes a path." Continue the algorithm at ^1. With additional case, algorithm is still O(n^3)
Myself, I would start by looking at the Delaunay triangulation of the point set:
http://en.wikipedia.org/wiki/Delaunay_triangulation
There appear to be plenty of resources there on efficient algorithms to build this - Fortune's algorithm, at O(n log n), for starters.
My intuition tells me that your widest path will be defined by one of the edges in this graph (Namely, it would run perpendicular to the edge, and its width would be equal to the length of the edge). How to sort the edges, check the candidates and identify the widest path remains. I like this question, and I'm going to keep thinking about it. :)
EDIT 1: My intuition fails me! A simple equilateral triangle is a counter-example: the widest path is shorter than any of the edges in the triangulation. Still thinking...
EDIT 2: So, we need a black-box algorithm which, given two points in the set, finds the widest path through the point set which is bounded by those two points. (Visualize two parallel lines running through the two points; rotate them in harmony with each other until there are no points between them). Let's call the runtime of this algorithm 'R'.
Given such an algorithm, we can do the following:
Build the Delaunay triangulation of the point set : O(n log n)
Sort the edges by width : O(n log n)
Beginning with the largest edge and moving down, use the black box algorithm to determine the widest path involving those two points; storing it as X : O(nR))
Stop when the edge being examined is shorter than the width of X.
Steps 1 and 2 are nice, but the O(nR) is kind of scary. If R turns out to be O(n), that's already O(n^2) for the whole algorithm. The nice thing is that, for a general set of random points, we would expect that we wouldn't have to go through all the edges.
What is the best algorithm to find if any three points are collinear in a set of points say n. Please also explain the complexity if it is not trivial.
Thanks
Bala
If you can come up with a better than O(N^2) algorithm, you can publish it!
This problem is 3-SUM Hard, and whether there is a sub-quadratic algorithm (i.e. better than O(N^2)) for it is an open problem. Many common computational geometry problems (including yours) have been shown to be 3SUM hard and this class of problems is growing. Like NP-Hardness, the concept of 3SUM-Hardness has proven useful in proving 'toughness' of some problems.
For a proof that your problem is 3SUM hard, refer to the excellent surver paper here: http://www.cs.mcgill.ca/~jking/papers/3sumhard.pdf
Your problem appears on page 3 (conveniently called 3-POINTS-ON-LINE) in the above mentioned paper.
So, the currently best known algorithm is O(N^2) and you already have it :-)
A simple O(d*N^2) time and space algorithm, where d is the dimensionality and N is the number of points (probably not optimal):
Create a bounding box around the set of points (make it big enough so there are no points on the boundary)
For each pair of points, compute the line passing through them.
For each line, compute its two collision points with the bounding box.
The two collision points define the original line, so if there any matching lines they will also produce the same two collision points.
Use a hash set to determine if there are any duplicate collision point pairs.
There are 3 collinear points if and only if there were duplicates.
Another simple (maybe even trivial) solution which doesn't use a hash table, runs in O(n2log n) time, and uses O(n) space:
Let S be a set of points, we will describe an algorithm which finds out whether or not S contains some three collinear points.
For each point o in S do:
Pass a line L parallel to the x-axis through o.
Replace every point in S below L, with its reflection. (For example if L is the x axis, (a,-x) for x>0 will become (a,x) after the reflection). Let the new set of points be S'
The angle of each point p in S', is the right angle of the segment po with the line L. Let us sort the points S' by their angles.
Walk through the sorted points in S'. If there are two consecutive points which are collinear with o - return true.
If no collinear points were found in the loop - return false.
The loop runs n times, and each iteration performs nlog n steps. It is not hard to prove that if there're three points on a line they'll be found, and we'll find nothing otherwise.