Given a set of 2D points X, I would like to find a convex hull consisting of maximum n points. Of course, this is not always possible. Therefore, I am looking for an approximate convex hull consisting of max. n points that maximally covers the set of points X.
Stated more formally, if F(H,X) returns the amount of points of X the convex hull H covers, where |H| is the amount of points out of which the hull is constructed, then I seek the following quantity:
H_hat = argmax_H F(H,X), s.t |H|<=n
Another way to regard the problem is the task of finding the polygon consisting of max. n corners of a set X such that it maximally covers said set X.
The way I've come up with is the following:
X = getSet() \\ Get the set of 2D points
H = convexHull(X) \\ Compute the convex hull
while |H| > n do
n_max = 0
for h in H:
H_ = remove(h,H) \\ Remove one point of the convex hull
n_c = computeCoverage(X,H_) \\ Compute the coverage of the modified convex hull.
\\ Save which point can be removed such that the coverage is reduced minimally.
if n_c > n_max:
n_max = n_c
h_toremove = h
\\ Remove the point and recompute the convex hull.
X = remove(h,X)
H = convexHull(X)
However, this is a very slow way of doing this. I have a large set of points and n (the constraint) is small. This means that the original convex hull contains a lot of points, hence the while-loop iterates for a very long time. Is there a more efficient way of doing this? Or are there any other suggestions for approximate solutions?
A couple of ideas.
The points that we exclude when deleting a vertex lie in the triangle formed by that vertex and the two vertices adjacent to it on the convex hull. Deleting any other vertex does not affect the set of potentially excluded points. Therefore, we only have to recompute coverage twice for each deleted vertex.
Speaking of recomputing coverage, we don't necessarily have to look at every point. This idea doesn't improve the worst case, but I think it should be a big improvement in practice. Maintain an index of points as follows. Pick a random vertex to be the "root" and group the points by which triangle formed by the root and two other vertices contains them (O(m log m) time with a good algorithm). Whenever we delete a non-root vertex, we unite and filter the two point sets for the triangles involving the deleted vertex. Whenever we recompute coverage, we can scan only points in two relevant triangles. If we ever delete this root, choose a new one and redo the index. The total cost of this maintenance will be O(m log^2 m) in expectation where m is the number of points. It's harder to estimate the cost of computing coverage, though.
If the points are reasonably uniformly distributed within the hull, maybe use area as a proxy for coverage. Store the vertices in a priority queue ordered by the area of their triangle formed by their neighbors (ear). Whenever we delete a point, update the ear area of its two neighbors. This is an O(m log m) algorithm.
May be the following approach could work for you: Initially, compute the convex hull H. Then select a subset of n points at random from H, which constructs a new polygon that might not cover all the points, so let's call it quasi-convex hull Q. Count how many points are contained in Q (inliers). Repeat this for a certain amount of times and keep the Q proposal with the most inliers.
This seems a bit related to RANSAC, but for this task, we don't really have a notion of what an outlier is, so we can't really estimate the outlier ratio. Hence, I don't know how good the approximation will be or how many iterations you need to get a reasonable result. May be you can add some heuristics instead of choosing the n points purely at random or you can have a threshold of how many points should at least be contained in Q and then stop when you reach that threshold.
Edit
Actually after thinking about it, you could use a RANSAC approach:
max_inliers = 0
best_Q = None
while(true):
points = sample_n_points(X)
Q = construct_convex_hull(points)
n_inliers = count_inliers(Q, X)
if n_inliers > max_inliers:
max_inliers = n_inliers
best_Q = Q
if some_condition:
break
The advantage of this is that the creation of the convex hull is faster than in your approach, as it only uses a maximum of n points. Also, checking the amount of inliers should be fast as it can be just a bunch of sign comparisons with each leg of the convex hull.
The following does not solve the question in the way I have formulated it, but it solved the problem which spawned the question. Therefore I wanted to add it in case anybody else encounters something similar.
I investigated two approaches:
1) Regards the convex hull as a polygon and apply a polygon simplification algorithm on it. The specific algorithm I investigated was the Ramer-Douglas-Peucker Algorithm.
2) Apply the algorithm described in in the question, without recomputing the convex hull.
Neither approaches will give you (as far as I can tell), the desired solution to the optimization problem stated, but for my tasks they worked sufficiently well.
Related
I have a small set of N points in the plane, N < 50.
I want to enumerate all triples of points from the set that form a triangle containing no other point.
Even though the obvious brute force solution could be viable for my tiny N, it has complexity O(N^4).
Do you know a way to decrease the time complexity, say to O(N³) or O(N²) that would keep the code simple ? No library allowed.
Much to my surprise, the number of such triangles is pretty large. Take any point as a center and sort the other ones by increasing angle around it. This forms a star-shaped polygon, that gives N-1 empty triangles, hence a total of Ω(N²). It has been shown that this bound is tight [Planar Point Sets with a Small Number of Empty convex Polygons, I. Bárány and P. Valtr].
In the case of points forming a convex polygon, all triangles are empty, hence O(N³). Chances of a fast algorithm are getting low :(
The paper "Searching for empty Convex polygons" by Dobkin, David P. / Edelsbrunner, Herbert / Overmars, Mark H. contains an algorithm linear in the number of possible output triangles for solving this problem.
A key problem in computational geometry is the identification of subsets of a point set having particular properties. We study this problem for the properties of convexity and emptiness. We show that finding empty triangles is related to the problem of determininng pairs of vertices that see each other in starshaped polygon. A linear time algorithm for this problem which is of independent interest yields an optimal algorithm for finding all empty triangles. This result is then extended to an algorithm for finding
empty convex r-gons (r > 3) and for determining a largest empty convex subset. Finally, extensions to higher dimensions are mentioned.
The sketch of the algorithm by Dobkin, Edelsbrunner and Overmars goes as follows for triangles:
for every point in turn, build the star-shaped polygon formed by sorting around it the points on its left. This takes N sorting operations (which can be lowered to total complexity O(N²) via an arrangement, anyway).
compute the visibility graph inside this star-shaped polygon and report all the triangles that are formed with the given point. This takes N visibility graph constructions, for a total of M operations, where M is the number of empty triangles.
Shortly, from every point you construct every empty triangle on the left of it, by triangulating the corresponding star-shaped polygon in all possible ways.
The construction of the visibility graph is a special version for the star-shaped polygon, which works in a traversal around the polygon, where every vertex has a visibility queue which gets updated.
The figure shows a star-shaped polygon in blue and the edges of its visibility graph in orange. The outline generates 6 triangles, and inner visibility edges 8 of them.
for each pair of points (A, B):
for each of the two half-planes defined by (A, B):
initialize a priority queue Q to empty.
for each point M in the half plane,
with increasing angle(AB, AM):
if angle(BA, BM) is smaller than all angles in Q:
print A,B,M
put M in Q with priority angle(BA, BM)
Inserting and querying the minimum in a priority queue can both be done in O(log N) time, so the complexity is O(N^3 log N) this way.
If I understand your questions, what you're looking for is https://en.wikipedia.org/wiki/Delaunay_triangulation
To quote from said Wikipedia article: "The most straightforward way of efficiently computing the Delaunay triangulation is to repeatedly add one vertex at a time, retriangulating the affected parts of the graph. When a vertex v is added, we split in three the triangle that contains v, then we apply the flip algorithm. Done naively, this will take O(n) time: we search through all the triangles to find the one that contains v, then we potentially flip away every triangle. Then the overall runtime is O(n2)."
Let X be a collection of n points in some moderate-dimensional space - for now, say R^5. Let S be the convex hull of X, let p be a point in S, and let v be any direction. Finally, let L = {p + lambda v : lambda a real number} be the line passing through p in direction v.
I am interested in finding a reasonably efficient algorithm for computing the intersection of S with L. I'd also be interested in hearing if it is known that no such algorithm exists! Note that this intersection can be represented by the (extreme) two points of intersection of L with the boundary of S. I'm particularly interested in finding an algorithm that behaves well when n is large.
I should say that it is easy to do this very efficiently in two dimensions. In that case, one can order the points of X in 'clockwise order' as seen from p, and then do binary search. So, the initial ordering takes O(n log(n)) steps and then further lookups take O(log(n)) steps. I don't see what the analogous algorithm should be in higher dimensions. Part of the problem is that a convex body in two dimensions has n vertices and n faces, while a convex body in 3 or higher dimensions can have n vertices but many, many more than n faces.
You can write a simple linear program for this. You want to minimise/maximise lambda subject to the constraint that x + lambda v lies in the convex hull of your input points. "Lies in the convex hull of" is coordinatewise equality between two points, one of which is a nonnegative weighted average of your input points such that the weights sum to 1.
As a practical matter, it may be useful to start with a handful of randomly chosen points, get a convex combination or a certificate of infeasibility, then interpret the cerificate of infeasibility as a linear inequality and find the input point that most violates it. If you're using a practical solver, this means you want to formulate the dual, switch a bunch of presolve things off, and run the above essentially as a cutting plane method using certificates of unboundedness instead. It is likely that, unless you have pathological data, you will only need to tell the LP solver about a small handful of your input points.
I'm sorry if topic is confusing. It's because I don't know what i'm really searching for. I have a set P of points (|P| < 10^5). Each point have integer coordinates (between -10^4, 10^4) How to find straight, which goes across all of points specified on input. Condition is that line must be the thickest, and you have to output how thick that straight is with accuracy up to 2 places after decimal point. Any hint, clue, idea or algorithm name would be appreciate.
PS. It's neither homework nor SPOJ. I'm just preparing to programming contest by doing problems from last edition. And that one I can't solve, even I don't know where to start for searching solution...
You could start by determining the convex hull of this point cloud (see e.g. http://softsurfer.com/Archive/algorithm_0109/algorithm_0109.htm), and try to find the two parallel lines that bound this polygon with the shortest distance.
I think this should be an easier problem because it allows you to base the direction of the parallel lines on the segments of the convex hull (of which there are a limited number).
One implementation could be to process each segment of the convex hull in turn. Per segment, draw a line through it (this is one of the two parallel lines), and then determine the closest other parallel line that encloses the convex hull. Do this for each segment of the convex hull while recording the minimum distance you have found between the parallel lines so far. At the end you should have your optimum result.
Obviously, this still requires an efficient way to determine the closest other parallel line. A (naive, but maybe good enough) way of doing this, is to take all vertices of the convex hull that are not on the current segment, and determine the perpendicular distance to the line through it (e.g. http://en.wikipedia.org/wiki/Distance_from_a_point_to_a_line). The maximum distance for all these vertices is also the minimum distance to the parallel line.
In pseudo-code:
Function FindThinnestLine(PointCloud P)
CH = ConvexHull(P)
optS = nothing
optDist = infinite
For each segment S in CH
L = the line through S
/* Find the minimum distance that the line parallel to L must have in order to enclose CH */
maxDist = 0
For each vertex P in CH, except the two that limit S
dist = The distance between L and P
maxDist = max(dist, maxDist)
/* If the current S has a smaller maxDist, it is our new optimum */
if(maxDist < optDist)
optS = S
optDist = maxDist
Return the line through optS and the line parallel to optS at a distance of optDist as the result
End Function
This is an O(n^2) algorithm, with n being the number of segments in your convex hull.
Edit
Come to think of it, you don't need to iterate over O(n) vertices of the convex hull for every S (in order to find the maxDist), only for the first S. Let's say we call this first vertex oppV (opp for opposite to S), and let's say we process the segments of the convex hull in clockwise order. For every subsequent S that we process, the new oppV can be either the same vertex, or one of its right neigbours (but never a left neighbour, otherwise the segments wouldn't form a convex polygon).
Hence, processing the segments of the convex hull can then be done in O(n) (but creating the convex hull is still O(n log n)).
The thickest line containing all points of a given subset of P should be the perpendicular bisector of the segment xy, where x and y are those two points of the subset with the highest distance d between them. The thickness of this line would be d as well.
Given n points on the plane. No 3 are collinear.
Given the number k.
Find the subset of k points, such that the convex hull of the k points has minimum perimeter out of any convex hull of a subset of k points.
I can think of a naive method runs in O(n^k k log k). (Find the convex hull of every subset of size k and output the minimum).
I think this is a NP problem, but I can't find anything suitable for reduction to.
Anyone have ideas on this problem?
An example,
the set of n=4 points {(0,0), (0,1), (1,0), (2,2)} and k=3
Result:
{(0,0),(0,1),(1,0)}
Since this set contains 3 points the convex hull of and the perimeter of the result is smaller than that of any other sets of 3 points.
This can be done in O(kn^3) time and O(kn^2) space (or maybe O(kn^3) if you want the actual points).
This paper: http://www.win.tue.nl/~gwoegi/papers/area-k-gons.pdf
by Eppstein et al, has algorithms to solve this problem for minimum perimeter and other weight functions like area, sum of internal angles etc which follow certain constraints, even though the title says minimum area (See Corollary 5.3 for perimeter).
The basic idea is a dynamic programming approach as follows (read the first few paragraphs of Section 4):
Suppose S is the given set of points and Q is the convex hull of k points with minimum perimeter.
Let p1 be the bottom-most point of Q, p2 and p3 are the next points on the hull in counter-clockwise order.
We can decompose Q into a triangle p1p2p3 and a convex hull of k-1 points Q' (which shares the side p1p3 with triangle p1p2p3).
The main observation is that Q' is the optimal for k-1, in which the bottommost point is p1 and the next point is p3 and all the points of Q' lie on the same side of the line p2->p3.
Thus maintaining a 4d array of optimum polygons for the each quadruple (pi, pj, pk, m) such that
the polygon is a convex hull of exactly m points of S.
pi is the bottom most point of the polygon.
pj is the next vertex in counter-clockwise order,
all points of the polygon lie to the left of the line pi -> pj.
all points lie on the same side of pj->pk as pi does.
can help us find the optimum polygons for m=k, given the optimum polygons for m <= k-1.
The paper describes exactly how to go about doing that in order to achieve the stated space and time bounds.
Hope that helps.
It's not exactly pretty solution. In fact, it's quite a pain to implement, but it surely gives polynomial complexity. Although complexity is also big (n^5*k is my rough estimate), someone may find a way to improve it or find here an idea for better solution. Or it may be enough for you: even this complexity is much better than bruteforce.
Note: optimal solution (set S) with hull H includes all points from orignal set inside H. Otherwise, we could throw away one of the border points of H and include that missed point, reducing perimeter.
(update just like 'optimization' mbeckish posted)
Assumption: no two points from the set form a vertical line. It can be achieved easily by rotating whole set of points by some irrational angle around origin of coordinates.
Due to assumption above, any complex hull has one leftmost and one rightmost point. Also, these two points divide the hull into top and bottom parts.
Now, let's take one segment from the top part of this hull and one from the bottom part. Let's call those two segments middle segments and perimeter of the right part of this hull - right perimeter.
Note: those two segments is all we need to know about right part of our convex hull to continue building it to the left. But having just two points instead of 4 is not enough: we could not uphold condition of 'convexness' this way.
It leads to a solution. For each set of points {p0, p1, p2, p3} and number i (i <= k) we store minimal right perimeter that can be achieved if [p0, p1], [p2, p3] are two middle segments and i is the number of points in the right part of this solution (including the ones inside of it, not only on the border).
We go through all points from right to left. For each new point p we check all combinations of points {p0, p1, p2, p3} such that point p can continue this hull to the left (either on the top or on the bottom part). For each such set and size i, we already store optimal perimeter size (see paragraph above).
Note: if you add point p to a right-hull formed by points {p0, p1, p2, p3}, you'll increment set size i at least by 1. But sometimes this number will be > 1: you'll have to include all points in the triangle {p, p0, p2}. They aren't on the hull, but inside it.
Algorithm is over :) In addition, despite scary complexity, you may note that not all segments [p0, p1], [p2, p3] can be middle segments: it should reduce actual computation time substantially.
update This provides only optimal perimeter size, not the set itself. But finding the set is simple: for each 'state' above you store not only perimeter size, but also last point added. Then, you can 'trace' your solution back. It's quite standard trick, I suppose it's not a problem for you, you seem to be good at algorithms :)
update2 This is essentially DP (dynamic programming), only a bit bloated
One possible optimization: You can ignore any subsets whose convex hull contains points that are not in the subset.
Proof:
If your convex hull contains points that are not in your subset, then remove a point from your subset that is on the hull, and replace it with a point in the interior of the hull. This will yield a hull of equal or smaller perimeter.
In the planar case, you can use an algorithm known as the Jarvis march, which has worst case complexity O(n^2). In this algorithm, you start building a hull at an arbitrary point and then check which point needs to be added next. Pseudocode taken from wikipedia:
jarvis(S)
pointOnHull = leftmost point in S
i = 0
repeat
P[i] = pointOnHull
endpoint = S[0] // initial endpoint for a candidate edge on the hull
for j from 1 to |S|-1
if (S[j] is on left of line from P[i] to endpoint)
endpoint = S[j] // found greater left turn, update endpoint
i = i+1
pointOnHull = endpoint
until endpoint == P[0] // wrapped around to first hull point
As far as I understand it, convex hulls are unique to each set of points, so there is no need to find a minimum. You just find one, and it will be the smallest one by definition.
Edit
The posted solution solves for the convex hull with the fewest number of points. Any hull with more points will have a longer perimeter, and I misunderstood the question to seeking a minimum perimeter, instead of a minimum perimeter for a set with K points.
This new problem is probably NP as suspected, and most similar to the longest path problem. Unfortunately, I lack the ingenuity to provide a worthwhile reduction.
This is a question that I was asked on a job interview some time ago. And I still can't figure out sensible answer.
Question is:
you are given set of points (x,y). Find 2 most distant points. Distant from each other.
For example, for points: (0,0), (1,1), (-8, 5) - the most distant are: (1,1) and (-8,5) because the distance between them is larger from both (0,0)-(1,1) and (0,0)-(-8,5).
The obvious approach is to calculate all distances between all points, and find maximum. The problem is that it is O(n^2), which makes it prohibitively expensive for large datasets.
There is approach with first tracking points that are on the boundary, and then calculating distances for them, on the premise that there will be less points on boundary than "inside", but it's still expensive, and will fail in worst case scenario.
Tried to search the web, but didn't find any sensible answer - although this might be simply my lack of search skills.
For this specific problem, with just a list of Euclidean points, one way is to find the convex hull of the set of points. The two distant points can then be found by traversing the hull once with the rotating calipers method.
Here is an O(N log N) implementation:
http://mukeshiiitm.wordpress.com/2008/05/27/find-the-farthest-pair-of-points/
If the list of points is already sorted, you can remove the sort to get the optimal O(N) complexity.
For a more general problem of finding most distant points in a graph:
Algorithm to find two points furthest away from each other
The accepted answer works in O(N^2).
Boundary point algorithms abound (look for convex hull algorithms). From there, it should take O(N) time to find the most-distant opposite points.
From the author's comment: first find any pair of opposite points on the hull, and then walk around it in semi-lock-step fashion. Depending on the angles between edges, you will have to advance either one walker or the other, but it will always take O(N) to circumnavigate the hull.
You are looking for an algorithm to compute the diameter of a set of points, Diam(S). It can be shown that this is the same as the diameter of the convex hull of S, Diam(S) = Diam(CH(S)). So first compute the convex hull of the set.
Now you have to find all the antipodal points on the convex hull and pick the pair with maximum distance. There are O(n) antipodal points on a convex polygon. So this gives a O(n lg n) algorithm for finding the farthest points.
This technique is known as Rotating Calipers. This is what Marcelo Cantos describes in his answer.
If you write the algorithm carefully, you can do without computing angles. For details, check this URL.
A stochastic algorithm to find the most distant pair would be
Choose a random point
Get the point most distant to it
Repeat a few times
Remove all visited points
Choose another random point and repeat a few times.
You are in O(n) as long as you predetermine "a few times", but are not guaranteed to actually find the most distant pair. But depending on your set of points the result should be pretty good. =)
This question is introduced at Introduction to Algorithm. It mentioned 1) Calculate Convex Hull O(NlgN). 2) If there is M vectex on Convex Hull. Then we need O(M) to find the farthest pair.
I find this helpful links. It includes analysis of algorithm details and program.
http://www.seas.gwu.edu/~simhaweb/alg/lectures/module1/module1.html
Wish this will be helpful.
Find the mean of all the points, measure the difference between all points and the mean, take the point the largest distance from the mean and find the point farthest from it. Those points will be the absolute corners of the convex hull and the two most distant points.
I recently did this for a project that needed convex hulls confined to randomly directed infinite planes. It worked great.
See the comments: this solution isn't guaranteed to produce the correct answer.
Just a few thoughts:
You might look at only the points that define the convex hull of your set of points to reduce the number,... but it still looks a bit "not optimal".
Otherwise there might be a recursive quad/oct-tree approach to rapidly bound some distances between sets of points and eliminate large parts of your data.
This seems easy if the points are given in Cartesian coordinates. So easy that I'm pretty sure that I'm overlooking something. Feel free to point out what I'm missing!
Find the points with the max and min values of their x, y, and z coordinates (6 points total). These should be the most "remote" of all the boundary points.
Compute all the distances (30 unique distances)
Find the max distance
The two points that correspond to this max distance are the ones you're looking for.
Here's a good solution, which works in O(n log n). It's called Rotating Caliper’s Method.
https://www.geeksforgeeks.org/maximum-distance-between-two-points-in-coordinate-plane-using-rotating-calipers-method/
Firstly you find the convex hull, which you can make in O(n log n) with the Graham's scan. Only the point from the convex hull can provide you the maximal distance. This algorithm arranges points of the convex hull in the clockwise traversal. This property will be used later.
Secondly, for all the points on the convex hull, you'll need to find the most distant point on this hull (it's called the antipodal point here). You don't have to find all the antipodal points separately (which would give quadratic time). Let's say the points of the convex hall are called p_1, ..., p_n, and their order corresponds to the clockwise traversal. There is a property of convex polygons that when you iterate through points p_j on the hull in the clockwise order and calculate the distances d(p_i, p_j), these distances firstly don't decrease (and maybe increase) and then don't increase (and maybe decrease). So you can find the maximum distance easily in this case. But when you've found the correct antipodal point p_j* for the p_i, you can start this search for p_{i+1} with the candidates points starting from that p_j*. You don't need to check all previously seen points. in total p_i iterates through points p_1, ..., p_n once, and p_j iterates through these points at most twice, because p_j can never catch up p_i as it would give zero distance, and we stop when the distance starts decreasing.
A solution that has runtime complexity O(N) is a combination of the above
answers. In detail:
(1) One can compute the convex hull with runtime complexity O(N) if you
use counting sort as an internal polar angle sort and are willing to
use angles rounded to the nearest integer [0, 359], inclusive.
(2) Note that the number of points on the convex hull is then N_H which is usually less than N.
We can speculate about the size of the hull from information in Cormen et al. Introduction to Algorithms, Exercise 33-5.
For sparse-hulled distributions of a unit-radius disk, a convex polygon with k sides, and a 2-D normal distribution respectively as n^(1/3), log_2(n), sqrt(log_2(n)).
The furthest pair problem is then between comparison of points on the hull.
This is N_H^2, but each leading point's search for distance point can be
truncated when the distances start to decrease if the points are traversed
in the order of the convex hull (those points are ordered CCW from first point).
The runtime complexity for this part is then O(N_H^2).
Because N_H^2 is usually less than N, the total runtime complexity
for furthest pair is O(N) with a caveat of using integer degree angles to reduce the sort in the convex hull to linear.
Given a set of points {(x1,y1), (x2,y2) ... (xn,yn)} find 2 most distant points.
My approach:
1). You need a reference point (xa,ya), and it will be:
xa = ( x1 + x2 +...+ xn )/n
ya = ( y1 + y2 +...+ yn )/n
2). Calculate all distance from point (xa,ya) to (x1,y1), (x2,y2),...(xn,yn)
The first "most distant point" (xb,yb) is the one with the maximum distance.
3). Calculate all distance from point (xb,yb) to (x1,y1), (x2,y2),...(xn,yn)
The other "most distant point" (xc,yc) is the one with the maximum distance.
So you got your most distant points (xb,yb) (xc,yc) in O(n)
For example, for points: (0,0), (1,1), (-8, 5)
1). Reference point (xa,ya) = (-2.333, 2)
2). Calculate distances:
from (-2.333, 2) to (0,0) : 3.073
from (-2.333, 2) to (1,1) : 3.480
from (-2.333, 2) to (-8, 5) : 6.411
So the first most distant point is (-8, 5)
3). Calculate distances:
from (-8, 5) to (0,0) : 9.434
from (-8, 5) to (1,1) : 9.849
from (-8, 5) to (-8, 5) : 0
So the other most distant point is (1, 1)