Fast algorithm for calculating union of 'local convex hulls' - algorithm

I have a set of 2D points from which I want to generate a polygon (or collection of polygons) outlining the 'shape' of those points, using the following concept:
For each point in the set, calculate the convex hull of all points within radius R of that point. After doing this for each point, take the union of these convex hulls to produce the final shape.
A brute force approach of actually constructing all these convex hulls is something like O(N^2 + R^2 log R). Is there a known, more efficient algorithm to produce the same result? Or perhaps a different way of expressing the problem?
Note: I am aware of alpha shapes, they are different; I am looking for an algorithm to perform what is described above.
The following solution does not work - disproved experimentally in MATLAB.
Update: I have a proposed solution.
Proposition: take the Delaunay Triangulation of the set of points, remove all triangles having circumradius greater than R. Then take the union of the remaining triangles.

A sweep line algorithm can improve searching for the R-neighbors. Alternatively, you can consider only pairs of points that are in neighboring squares of square grid of width R. Both of these ideas can get rid of the N^2 - of course only if the points are relatively sparse.
I believe that a clever combination of sweeping and convex hull finding cat get rid of the N^2 even if the points are not sparse (as in Olexiy's example), but cannot come up with a concrete algorithm.

Yes, using rotating calipers. My prof wrote some stuff on this, it starts on page 19.

Please let me know if I misunderstood the problem.
I don't see how do you get N^2 time for brute-forcing all convex hulls in the worst case (1). What if almost any 2 points are closer than R - in this case you need at least N^2*logN to just construct the convex hulls, leave alone computing their union.
Also, where does R^2*logR in your estimation comes from?
1 The worst case (as I see it) for a huge N - take a circle of radius R / 2 and randomly place points on its border and just outside it.

Related

Efficient sorting of integer vertices of a convex polygon

I am given as input n pairs of integers which describe points in the 2D plane that are known ahead of time to be vertices of some convex polygon.
I'd like to efficiently sort these points in a clockwise or counter-clockwise fashion.
At first I thought of doing something like the initial step of Graham Scan, but I can't see a simple way to break ties for vertices that make the same angle with the anchor point.
Notice that, as you walk along the sides of the polygon, sometimes these vertices may be getting closer to the anchor point, and sometimes they may be getting farther.
Something that does seem to work is producing a point in the interior of the polygon (for instance, the average of the n points) and using it is an anchor point for radial sorting of the input.
Indeed, because the anchor point lies in the interior, any ray emanating from it contains at most one input point, so there will be no ties.
The overall complexity is not affected: computing the midpoint is an O(n) task, and the bottleneck is the actual sorting.
This does involve a few more operations than the hopeful version of Graham Scan (where we assume there are no ties to be broken), but the bigger loss is leaving integer arithmethic behind by introducing division into the mix.
This in turn can be remedied with scaling everything by a factor of n, but at this point it seems like grasping at straws.
Am I missing something?
Is there a simpler, efficient way to solve this sorting problem, preferrably one that can avoid floating point calculations?

Choose rectangles with maximal intersection area

In this problem r is a fixed positive integer. You are given N rectangles, all the same size, in the plane. The sides are either vertical or horizontal. We assume the area of the intersection of all N rectangles has non-zero area. The problem is how to find N-r of these rectangles, so as to maximize the area of the intersection. This problem arises in practical microscopy when one repeatedly images a given biological specimen, and alignment changes slightly during this process, due to physical reasons (e.g. differential expansion of parts of the microscope and camera). I have expressed the problem for dimension d=2. There is a similar problem for each d>0. For d=1, an O(N log(N)) solution is obtained by sorting the lefthand endpoints of the intervals. But let's stick with d=2. If r=1, one can again solve the problem in time O(N log(N)) by sorting coordinates of the corners.
So, is the original problem solved by solving first the case (N,1) obtaining N-1 rectangles, then solving the case (N-1,1), getting N-2 rectangles, and so on, until we reduce to N-r rectangles? I would be interested to see an explicit counter-example to this optimistic attempted procedure. It would be even more interesting if the procedure works (proof please!), but that seems over-optimistic.
If r is fixed at some value r>1, and N is large, is this problem in one of the NP classes?
Thanks for any thoughts about this.
David
Since the intersection of axis-aligned rectangles is an axis-aligned rectangle, there are O(N4) possible intersections (O(N) lefts, O(N) rights, O(N) tops, O(N) bottoms). The obvious O(N5) algorithm is to try all of these, checking for each whether it's contained in at least N - r rectangles.
An improvement to O(N3) is to try all O(N2) intervals in the X dimension and run the 1D algorithm in the Y dimension on those rectangles that contain the given X-interval. (The rectangles need to be sorted only once.)
How large is N? I expect that fancy data structures might lead to an O(N2 log N) algorithm, but it wouldn't be worth your time if a cubic algorithm suffices.
I think I have a counter-example. Let's say you have r := N-2. I.e. you want to find two rectangles with maximum overlapping. Let's say you have to rectangles covering the same area (=maximum overlapping). Those two will be the optimal result in the end.
Now we need to construct some more rectangles, such that at least one of those two get removed in a reduction step.
Let's say we have three rectangles which overlap a lot..but they are not optimal. They have a very small overlapping area with the other two rectangles.
Now if you want to optimize the area for four rectangles, you will remove one of the two optimal rectangles, right? Or maybe you don't HAVE to, but you're not sure which decision is optimal.
So, I think your reduction algorithm is not quite correct. Atm I'm not sure if there is a good algorithm for this or in which complexity class this belongs to, though. If I have time I think about it :)
Postscript. This is pretty defective, but may spark some ideas. It's especially defective where there are outliers in a quadrant that are near the X and Y axes - they will tend to reinforce each other, as if they were both at 45 degrees, pushing the solution away from that quadrant in a way that may not make sense.
-
If r is a lot smaller than N, and N is fairly large, consider this:
Find the average center.
Sort the rectangles into 2 sequences by (X - center.x) + (Y - center.y) and (X - center.x) - (Y - center.y), where X and Y are the center of each rectangle.
For any solution, all of the reject rectangles will be members of up to 4 subsequences, each of which is a head or tail of each of the 2 sequences. Assuming N is a lot bigger than r, most the time will be in sorting the sequences - O(n log n).
To find the solution, first find the intersection given by removing the r rectangles at the head and tail of each sequence. Use this base intersection to eliminate consideration of the "core" set of rectangles that you know will be in the solution. This will reduce the intersection computations to just working with up to 4*r + 1 rectangles.
Each of the 4 sequence heads and tails should be associated with an array of r rectangles, each entry representing the intersection given by intersecting the "core" with the i innermost rectangles from the head or tail. This precomputation reduces the complexity of finding the solution from O(r^4) to O(r^3).
This is not perfect, but it should be close.
Defects with a small r will come from should-be-rejects that are at off angles, with alternatives that are slightly better but on one of the 2 axes. The maximum error is probably computable. If this is a concern, use a real area-of-non-intersection computation instead of the simple "X+Y" difference formula I used.
Here is an explicit counter-example (with N=4 and r=2) to the greedy algorithm proposed by the asker.
The maximum intersection between three of these rectangles is between the black, blue, and green rectangles. But, it's clear that the maximum intersection between any two of these three is smaller than intersection between the black and the red rectangles.
I now have an algorithm, pretty similar to Ed Staub's above, with the same time estimates. It's a bit different from Ed's, since it is valid for all r
The counter-example by mhum to the greedy algorithm is neat. Take a look.
I'm still trying to get used to this site. Somehow an earlier answer by me was truncated to two sentences. Thanks to everyone for their contributions, particularly to mhum whose counter-example to the greedy algorithm is satisfying. I now have an answer to my own question. I believe it is as good as possible, but lower bounds on complexity are too difficult for me. My solution is similar to Ed Staub's above and gives the same complexity estimates, but works for any value of r>0.
One of my rectangles is determined by its lower left corner. Let S be the set of lower left corners. In time O(N log(N)) we sort S into Sx according to the sizes of the x-coordinates. We don't care about the order within Sx between two lower left corners with the same x-coord. Similarly the sorted sequence Sy is defined by using the sizes of the y-coords. Now let u1, u2, u3 and u4 be non-negative integers with u1+u2+u3+u4=r. We compute what happens to the area when we remove various rectangles that we now name explicitly. We first remove the u1-sized head of Sx and the u2-sized tail of Sx. Let Syx be the result of removing these u1+u2 entries from Sy. We remove the u3-sized head of Syx and the u4-sized tail of Syx. One can now prove that one of these possible choices of (u1,u2,u3,u4) gives the desired maximal area of intersection. (Email me if you want a pdf of the proof details.) The number of such choices is equal to the number of integer points in the regular tetrahedron in 4-d euclidean space with vertices at the 4 points whose coordinate sum is r and for which 3 of the 4 coordinates are equal to 0. This is bounded by the volume of the tetrahedron, giving a complexity estimate of O(r^3).
So my algorithm has time complexity O(N log(N)) + O(r^3).
I believe this produces a perfect solution.
David's solution is easier to implement, and should be faster in most cases.
This relies on the assumption that for any solution, at least one of the rejects must be a member of the complex hull. Applying this recursively leads to:
Compute a convex hull.
Gather the set of all candidate solutions produced by:
{Remove a hull member, repair the hull} r times
(The hull doesn't really need to be repaired the last time.)
If h is the number of initial hull members, then the complexity is less than
h^r, plus the cost of computing the initial hull. I am assuming that a hull algorithm is chosen such that the sorted data can be kept and reused in the hull repairs.
This is just a thought, but if N is very large, I would probably try a Monte-Carlo algorithm.
The idea would be to generate random points (say, uniformly in the convex hull of all rectangles), and score how each random point performs. If the random point is in N-r or more rectangles, then update the number of hits of each subset of N-r rectangles.
In the end, the N-r rectangle subset with the most random points in it is your answer.
This algorithm has many downsides, the most obvious one being that the result is random and thus not guaranteed to be correct. But as most Monte-Carlo algorithms it scales well, and you should be able to use it with higher dimensions as well.

Testing whether a polygon is simple or complex

For a polygon defined as a sequence of (x,y) points, how can I detect whether it is complex or not? A complex polygon has intersections with itself, as shown:
Is there a better solution than checking every pair which would have a time complexity of O(N2)?
There are sweep methods which can determine this much faster than a brute force approach. In addition, they can be used to break a non-simple polygon into multiple simple polygons.
For details, see this article, in particular, this code to test for a simple polygon.
See Bentley Ottmann Algorithm for a sweep based O((N + I)log N) method for this.
Where N is the number of line segments and I is number of intersection points.
In fact, this can be done in linear time use Chazelle's triangulation algorithm. It either triangulates the polygon or find out the polygon is not simple.

Is there a linear-time algorithm for finding the convex hull of a complex polygon?

I know there's a worst-case O(n log n) algorithm for finding the convex hull of a complex polygon and a worst-case O(n) algorithm for finding the convex hull of a simple polygon. Is there a worst-case O(n) algorithm for finding the convex hull of a complex polygon?
A complex polygon is a polygon where the line segments may intersect. Finding the convex hull of a complex polygon is equivalent to finding the convex hull of an unordered list of points.
If your point sets are such that some non-comparison based sorting mechanism (like radix sort) will be faster than comparison based methods, then it seems you can use the Graham scan algorithm (http://www.math.ucsd.edu/~ronspubs/72_10_convex_hull.pdf) to compute it. The time complexity of the Graham scan is dominated by the sorting step. The rest is linear.
I'm pretty sure not. Convex hull on arbitrary point sets can be shown to be equivalent to sorting. We can order an arbitrary point set and connect the points in sequence making it into a complex polygon, thereby reducing the problem on arbitrary point sets to yours.
Here is a link to a proof that convex hull is equivalent to sorting. I'm too damn lazy and too bad a typist to write it out myself.
In general, no there is not a O(n) solution. There is a pixelated version that is better than O(n log n). It is, however, so hobbled in other ways that you'd be crazy to use it in practice.
You render the first polygon (using verts 0, 1, 2) into screen space, then re-render the verts themselves using a distinct ID so they can be identified later. For example, you might clear the frame buffer to RGBA ffffffff and use fffffffe for space that is covered by the convex hull. Each vertex would be rendered using its ID as its RGBA; 00000000, 00000001, etc.
A 16-bit example:
fffffffffffffff
fffffff0fffffff
ffffffeeeffffff
fffffeeeeefffff
ffffeeeeeeeffff
fffeeeeeeeeefff
ff2eeeeeeeee1ff
fffffffffffffff
Checking a new point is a simple lookup in the current frame buffer. If the pixel it occupies is 'shaded' with polygon or with a vertex ID, the new vertex is rejected.
If the new vertex is outside the existing polygon, you find the first pixel between the new vertex and some point inside the convex hull (something in the middle of the first poly works fine) and march along the circumference of the hull - in both directions - until you find yourself on the far side of the hull from the new vertex. (I'll leave this as an exercise to the user. There are plenty of solutions that all suck, from an efficiency perspective.) Fill in the poly defined by these two points and the new vertex with the ID for polygon space - being careful not to erase any vertex IDs - and go on to the next pixel.
When you're done, any pixel which contains a vertex ID that is not completely surrounded by hull IDs is a convex hull vertex.
While the algorithm's complexity is O(n) with the number of vertices, it's deficiencies are obvious. Nobody in their right mind would use it unless they had a ridiculous, insane, staggering number of points to process so that nearly every vertex would be immediately rejected, and unless they could accept the limitation of an aliased result.
Friends don't let friends implement this algorithm.
If your points come from a finite universe (which is always the case in practice) you can do radix sort and then run Andrew's monotone chain algorithm.

Largest triangle from a set of points [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How to find largest triangle in convex hull aside from brute force search
I have a set of random points from which i want to find the largest triangle by area who's verticies are each on one of those points.
So far I have figured out that the largest triangle's verticies will only lie on the outside points of the cloud of points (or the convex hull) so i have programmed a function to do just that (using Graham scan in nlogn time).
However that's where I'm stuck. The only way I can figure out how to find the largest triangle from these points is to use brute force at n^3 time which is still acceptable in an average case as the convex hull algorithm usually kicks out the vast majority of points. However in a worst case scenario where points are on a circle, this method would fail miserably.
Dose anyone know an algorithm to do this more efficiently?
Note: I know that CGAL has this algorithm there but they do not go into any details on how its done. I don't want to use libraries, i want to learn this and program it myself (and also allow me to tweak it to exactly the way i want it to operate, just like the graham scan in which other implementations pick up collinear points that i don't want).
Don't know if this help, but if you choose two points from the convex hull and rotate all points of the hull so that the connecting line of the two points is parallel to the x-Axis, either the point with the maximum or the one with the minimum y-coordinate forms the triangle with the largest area together with the two points chosen first.
Of course once you have tested one point for all possible base lines, you can remove it from the list.
Here's a thought on how to get it down to O(n2 log n). I don't really know anything about computational geometry, so I'll mark it community wiki; please feel free to improve on this.
Preprocess the convex hull by finding for each point the range of slopes of lines through that point such that the set lies completely on one side of the line. Then invert this relationship: construct an interval tree for slopes with points in leaf nodes, such that when querying with a slope you find the points such that there is a tangent through those points.
If there are no sets of three or more collinear points on the convex hull, there are at most four points for each slope (two on each side), but in case of collinear points we can just ignore the intermediate points.
Now, iterate through all pairs of points (P,Q) on the convex hull. We want to find the point R such that triangle PQR has maximum area. Taking PQ as the base of the triangle, we want to maximize the height by finding R as far away from the line PQ as possible. The line through R parallel to PQ must be such that all points lie on one side of the line, so we can find a bounded number of candidates in time O(log n) using the preconstructed interval tree.
To improve this further in practice, do branch-and-bound in the set of pairs of points: find an upper bound for the height of any triangle (e.g. the maximum distance between two points), and discard any pair of points whose distance multiplied by this upper bound is less than the largest triangle found so far.
I think the rotating calipers method may apply here.
Off the top of my head, perhaps you could do something involving gridding/splitting the collection of points up into groups? Maybe... separating the points into three groups (not sure what the best way to do that in this case would be, though), doing something to discard those points in each group that are closer to the other two groups than other points in the same group, and then using the remaining points to find the largest triangle that can be made having one vertex in each group? This would actually make the case of all points being on a circle a lot simpler, because you'd just focus on the points that are near the center of the arcs contained within each group, as those would be the ones in each group furthest from the other two groups.
I'm not sure if this would give you the proper result for certain triangles/distributions of points, though. There may be situations where the resultant triangle isn't of optimal area, either because the grouping and/or the vertex choosing aren't/isn't optimal. Something like that.
Anyway, those are my thoughts on the problem. I hope I've at least been able to give you ideas for how to work on it.
How about dropping a point at a time from the convex hull? Starting with the convex hull, calculate the area of the triangle formed by each triple of adjacent points (p1p2p3, p2p3p4, etc.). Find the triangle with minimum area, then drop the middle of the three points that formed that triangle. (In other words, if the smallest area triangle is p3p4p5, drop P4.) Now you have a convex polygon with N-1 points. Repeat the same procedure until you are left with three points. This should take O(N^2) time.
I would not be at all surprised if there is some pathological case where this doesn't work, but I expect that it would work for the majority of cases. (In other words, I haven't proven this, and I have no source to cite.)

Resources