I'm looking for an algorithm for calculating the maximum area in a self-intersecting polygon. As it's self-intersecting it is not trivial to calculate the area with methods such as the shoelace formula.
The polygon in the example prioritises vertices with the "smallest" letter alphabetically, sometimes going back to the same vertex in a non-alphabetical way as it is self-intersecting. Although that shouldn't make a difference in the expected area.
In this case the algorithm should output 40: the area of the square (36) plus the area of the 4 outer triangles (4).
The intersection points are known in advance, no need to calculate them (as the example shows)
The last point is guaranteed to connect back to the figure, i.e. no dangling lines
The polygon is always traceable, i.e. it can be drawn, without lifting the pen
Time complexity ideally not worse than O(n log(n)) where n is the number of points
I think I've got it.
Find the vertex with the lowest x. In case of a tie, pick the one with the lowest y. This process is O(n)
Of the 2+ segments connected to the vertex found in point 1, pick the one that forms the smallest angle with the x-axis, so as to start traversing the polygon in a clockwise direction. This process is O(s) where s is the number of segments that are connected to the starting vertex.
Keep choosing the next connected segments ignoring the order of the segments in the original polygon. In case of an intersection, pick the segment that forms the smallest angle with the current segment with the direction negated, measured clockwise. This is in order to choose the segment that lays on the outside of the polygon. This process is O(n i/2) where i is the average number of segments connected to each vertex.
Once back to the starting point, calculate the area using the shoelace formula. This could actually be calculated in parallel whilst traversing the polygon in points 2 and 3.
This algorithm's worst case time complexity is O(n i/2) where n is the number of vertices and i is the average number of segments connected to each vertex. The best case complexity is O(n) for a polygon that never intersects. In my case the polygons rarely intersect so this process is close to O(n).

Build the set of segments from the points given
For each point, test a ray to see if it intersects an even or odd number of those segments.
The even intersect counts are internal points, remove them.
Shoelace algo tells you the area of the remaining shape.
Finding all empty triangles

I have a small set of N points in the plane, N < 50.
I want to enumerate all triples of points from the set that form a triangle containing no other point.
Even though the obvious brute force solution could be viable for my tiny N, it has complexity O(N^4).
Do you know a way to decrease the time complexity, say to O(N³) or O(N²) that would keep the code simple ? No library allowed.
Much to my surprise, the number of such triangles is pretty large. Take any point as a center and sort the other ones by increasing angle around it. This forms a star-shaped polygon, that gives N-1 empty triangles, hence a total of Ω(N²). It has been shown that this bound is tight [Planar Point Sets with a Small Number of Empty convex Polygons, I. Bárány and P. Valtr].
In the case of points forming a convex polygon, all triangles are empty, hence O(N³). Chances of a fast algorithm are getting low :(
The paper "Searching for empty Convex polygons" by Dobkin, David P. / Edelsbrunner, Herbert / Overmars, Mark H. contains an algorithm linear in the number of possible output triangles for solving this problem.
A key problem in computational geometry is the identification of subsets of a point set having particular properties. We study this problem for the properties of convexity and emptiness. We show that finding empty triangles is related to the problem of determininng pairs of vertices that see each other in starshaped polygon. A linear time algorithm for this problem which is of independent interest yields an optimal algorithm for finding all empty triangles. This result is then extended to an algorithm for finding
empty convex r-gons (r > 3) and for determining a largest empty convex subset. Finally, extensions to higher dimensions are mentioned.
The sketch of the algorithm by Dobkin, Edelsbrunner and Overmars goes as follows for triangles:
for every point in turn, build the star-shaped polygon formed by sorting around it the points on its left. This takes N sorting operations (which can be lowered to total complexity O(N²) via an arrangement, anyway).
compute the visibility graph inside this star-shaped polygon and report all the triangles that are formed with the given point. This takes N visibility graph constructions, for a total of M operations, where M is the number of empty triangles.
Shortly, from every point you construct every empty triangle on the left of it, by triangulating the corresponding star-shaped polygon in all possible ways.
The construction of the visibility graph is a special version for the star-shaped polygon, which works in a traversal around the polygon, where every vertex has a visibility queue which gets updated.
The figure shows a star-shaped polygon in blue and the edges of its visibility graph in orange. The outline generates 6 triangles, and inner visibility edges 8 of them.
for each pair of points (A, B):
for each of the two half-planes defined by (A, B):
initialize a priority queue Q to empty.
for each point M in the half plane,
with increasing angle(AB, AM):
if angle(BA, BM) is smaller than all angles in Q:
print A,B,M
put M in Q with priority angle(BA, BM)
Inserting and querying the minimum in a priority queue can both be done in O(log N) time, so the complexity is O(N^3 log N) this way.
If I understand your questions, what you're looking for is
To quote from said Wikipedia article: "The most straightforward way of efficiently computing the Delaunay triangulation is to repeatedly add one vertex at a time, retriangulating the affected parts of the graph. When a vertex v is added, we split in three the triangle that contains v, then we apply the flip algorithm. Done naively, this will take O(n) time: we search through all the triangles to find the one that contains v, then we potentially flip away every triangle. Then the overall runtime is O(n2)."

Segment Intersection

Here is a question from CLRS.
A disk consists of a circle plus its interior and is represented by its center point and radius. Two disks intersect if they have any point in common. Give an O(n lg n)-time algorithm to determine whether any two disks in a set of n intersect.
Its not my home work. I think we can take the horizontal diameter of every circle to be the representing line segment. If two orders come consecutive, then we check the length of the distances between the two centers. If its less than or equal to the sum of the radii of the circles, then they intersect.
Please let me know if m correct.
Build a Voronoi diagram for disk centers. This is an O(n log n) job.
Now for each edge of the diagram take the corresponding pair of centers and check whether their disk intersect.
Build a k-d tree with the centres of the circles.
For every circle (p, r), find using the k-d tree the set S of circles whose centres are nearer than 2r from p.
Check if any of the circles in S touches the current circle.
I think the average cost for this algorithm is O(NlogN).
The logic is that we loop over the set O(N), and for every element get a subset of elements near O(NlogN), so, a priori, the complexity is O(N^2 logN). But we also have to consider that the probability of two random circles being less than 2r apart and not touching is lesser than 3/4 (if they touch we can short-circuit the algorithm).
That means that the average size of S is probabilistically limited to be a small value.
Another approach to solve the problem:
Divide the plane using a grid whose diameter is that of the biggest circle.
Use a hashing algorithm to classify the grid cells in N groups.
For every circle calculate the grid cells it overlaps and the corresponding groups.
Get all the circles in a group and...
Check if the biggest circle touches any other circle in the group.
Recurse applying this algorithm to the remaining circles in the group.
Triangle partitioning

This was a problem in the 2010 Pacific ACM-ICPC contest. The gist of it is trying to find a way to partition a set of points inside a triangle into three subtriangles such that each partition contains exactly a third of the points.
Coordinates of a bounding triangle: (v1x,v1y),(v2x,v2y),(v3x,v3y)
A number 3n < 30000 representing the number of points lying inside the triangle
Coordinates of the 3n points: (x_i,y_i) for i=1...3n
A point (sx,sy) that splits the triangle into 3 subtriangles such that each subtriangle contains exactly n points.
The way the splitting point splits the bounding triangle into subtriangles is as follows: Draw a line from the splitting point to each of the three vertices. This will divide the triangle into 3 subtriangles.
We are guaranteed that such a point exists. Any such point will suffice (the answer is not necessarily unique).
Here is an example of the problem for n=2 (6 points). We are given the coordinates of each of the colored points and the coordinates of each vertex of the large triangle. The splitting point is circled in gray.
Can someone suggest an algorithm faster than O(n^2)?
Here's an O(n log n) algorithm. Let's assume no degeneracy.
The high-level idea is, given a triangle PQR,
C \
/ S\
we initially place the center point C at P. Slide C toward R until there are n points inside the triangle CPQ and one (S) on the segment CQ. Slide C toward Q until either triangle CRP is no longer deficient (perturb C and we're done) or CP hits a point. In the latter case, slide C away from P until either triangle CRP is no longer deficient (we're done) or CQ hits a point, in which case we begin sliding C toward Q again.
Clearly the implementation cannot “slide” points, so for each triangle involving C, for each vertex S of that triangle other than C, store the points inside the triangle in a binary search tree sorted by angle with S. These structures suffice to implement this kinetic algorithm.
I assert without proof that this algorithm is correct.
As for the running time, each event is a point-line intersection and can be handled in time O(log n). The angles PC and QC and RC are all monotonic, so each of O(1) lines hits each point at most once.
Main idea is: if we have got the line, we can try to find a point on it using linear search. If the line is not good enough, we can move it using binary search.
Sort the points based on the direction from vertex A. Sort them for B and C too.
Set current range for vertex A to be all the points.
Select 2 middle points from the range for vertex A. These 2 points define subrange for 'A'. Get some line AD lying between these points.
Iterate for all the points lying between B and AD (starting from BA). Stop when n points found. Select subrange of directions from B to points n and next after n (if there is no point after n, use BC). If less than n points can be found, set current range for vertex A to be the left half of the current range and go to step 3.
Same as step 4, but for vertex C.
If subranges A, B, C intersect, choose any point from there and finish. Otherwise, if A&B is closer to A, set current range for vertex A to be the right half of the current range and go to step 3. Otherwise set current range for vertex A to be the left half of the current range and go to step 3.
Complexity: sorting O(n * log n), search O(n * log n). (Combination of binary and linear search).
Here is an approach that takes O(log n) passes of cost n each.
Each pass starts with an initial point, which divides the triangle into there subtriangles. If each has n points, we are finished. If not, consider the subtriangle which is furthest away from the desired n. Suppose it has too many, just for now. The imbalances sum to zero, so at least one of the other two subtriangles has too few points. The third subtriangle either also has too few, or has exactly n points - or the original subtriangle would not have the highest discrepancy.
Take the most imbalanced subtriangle and consider moving the centre point along the line leading away from it. As you do so, the imbalance of the most imbalanced point will reduce. For each point in the triangle, you can work out when that point crosses into or out of the most imbalanced subtriangle as you move the centre point. Therefore you can work out in time n where to move the centre point to give the most imbalanced triangle any desired count.
As you move the centre point you can choose whether points move in our out of the most imbalanced subtriangle, but you can't chose which of the other two subtriangles they go to, or from - but you can predict which easily from which side of the line along which you are sliding the centre point they live, so you can move the centre point along this line to get the lowest maximum discrepancy after the move. In the worst case, all of the points moved go into, or out of, the subtriangle that was exactly balanced. However, if the imbalanced subtriangle has n + k points, by moving k/2 of them, you can move, at worst, to the case where it and the previously balanced subtriangle are out by k/2. The third subtriangle may still be unbalanced by up to k, in the other direction, but in this case a second pass will reduce the maximum imbalance to something below k/2.
Therefore in the case of a large unbalance, we can reduce it by at worst a constant factor in two passes of the above algorithm, so in O(log n) passes the imbalance will be small enough that we are into special cases where we worry about an excess of at most one point. Here I am going to guess that the number of such special cases is practically enumerable in a program, and the cost amounts to a small constant addition.
I think there is a linear time algorithm. See the last paragraph of the paper "Illumination by floodlights- by Steiger and Streinu". Their algorithm works for any k1, k2, k3 that sum up to n. Therefore, k1=k2=k3=n/3 is a special case.
Find the largest convex black area in an image

I have an image of which this is a small cut-out:
As you can see it are white pixels on a black background. We can draw imaginary lines between these pixels (or better, points). With these lines we can enclose areas.
How can I find the largest convex black area in this image that doesn't contain a white pixel in it?
Here is a small hand-drawn example of what I mean by the largest convex black area:
P.S.: The image is not noise, it represents the primes below 10000000 ordered horizontally.
Trying to find maximum convex area is a difficult task to do. Wouldn't you just be fine with finding rectangles with maximum area? This problem is much easier and can be solved in O(n) - linear time in number of pixels. The algorithm follows.
Say you want to find largest rectangle of free (white) pixels (Sorry, I have images with different colors - white is equivalent to your black, grey is equivalent to your white).
You can do this very efficiently by two pass linear O(n) time algorithm (n being number of pixels):
1) in a first pass, go by columns, from bottom to top, and for each pixel, denote the number of consecutive pixels available up to this one:
repeat, until:
2) in a second pass, go by rows, read current_number. For each number k keep track of the sums of consecutive numbers that were >= k (i.e. potential rectangles of height k). Close the sums (potential rectangles) for k > current_number and look if the sum (~ rectangle area) is greater than the current maximum - if yes, update the maximum. At the end of each line, close all opened potential rectangles (for all k).
This way you will obtain all maximum rectangles. It is not the same as maximum convex area of course, but probably would give you some hints (some heuristics) on where to look for maximum convex areas.
I'll sketch a correct, poly-time algorithm. Undoubtedly there are data-structural improvements to be made, but I believe that a better understanding of this problem in particular will be required to search very large datasets (or, perhaps, an ad-hoc upper bound on the dimensions of the box containing the polygon).
The main loop consists of guessing the lowest point p in the largest convex polygon (breaking ties in favor of the leftmost point) and then computing the largest convex polygon that can be with p and points q such that (q.y > p.y) || (q.y == p.y && q.x > p.x).
The dynamic program relies on the same geometric facts as Graham's scan. Assume without loss of generality that p = (0, 0) and sort the points q in order of the counterclockwise angle they make with the x-axis (compare two points by considering the sign of their dot product). Let the points in sorted order be q1, …, qn. Let q0 = p. For each 0 ≤ i < j ≤ n, we're going to compute the largest convex polygon on points q0, a subset of q1, …, qi - 1, qi, and qj.
The base cases where i = 0 are easy, since the only “polygon” is the zero-area segment q0qj. Inductively, to compute the (i, j) entry, we're going to try, for all 0 ≤ k ≤ i, extending the (k, i) polygon with (i, j). When can we do this? In the first place, the triangle q0qiqj must not contain other points. The other condition is that the angle qkqiqj had better not be a right turn (once again, check the sign of the appropriate dot product).
At the end, return the largest polygon found. Why does this work? It's not hard to prove that convex polygons have the optimal substructure required by the dynamic program and that the program considers exactly those polygons satisfying Graham's characterization of convexity.
You could try treating the pixels as vertices and performing Delaunay triangulation of the pointset. Then you would need to find the largest set of connected triangles that does not create a concave shape and does not have any internal vertices.
If I understand your problem correctly, it's an instance of Connected Component Labeling. You can start for example at:
I thought of an approach to solve this problem:
Out of the set of all points generate all possible 3-point-subsets. This is a set of all the triangles in your space. From this set remove all triangles that contain another point and you obtain the set of all empty triangles.
For each of the empty triangles you would then grow it to its maximum size. That is, for every point outside the rectangle you would insert it between the two closest points of the polygon and check if there are points within this new triangle. If not, you will remember that point and the area it adds. For every new point you want to add that one that maximizes the added area. When no more point can be added the maximum convex polygon has been constructed. Record the area for each polygon and remember the one with the largest area.
Crucial to the performance of this algorithm is your ability to determine a) whether a point lies within a triangle and b) whether the polygon remains convex after adding a certain point.
I think you can reduce b) to be a problem of a) and then you only need to find the most efficient method to determine whether a point is within a triangle. The reduction of the search space can be achieved as follows: Take a triangle and increase all edges to infinite length in both directions. This separates the area outside the triangle into 6 subregions. Good for us is that only 3 of those subregions can contain points that would adhere to the convexity constraint. Thus for each point that you test you need to determine if its in a convex-expanding subregion, which again is the question of whether it's in a certain triangle.
The whole polygon as it evolves and approaches the shape of a circle will have smaller and smaller regions that still allow convex expansion. A point once in a concave region will not become part of the convex-expanding region again so you can quickly reduce the number of points you'll have to consider for expansion. Additionally while testing points for expansion you can further cut down the list of possible points. If a point is tested false, then it is in the concave subregion of another point and thus all other points in the concave subregion of the tested points need not be considered as they're also in the concave subregion of the inner point. You should be able to cut down to a list of possible points very quickly.
Still you need to do this for every empty triangle of course.
Intersection of N rectangles

I'm looking for an algorithm to solve this problem:
Given N rectangles on the Cartesian coordinate, find out if the intersection of those rectangles is empty or not. Each rectangle can lie in any direction (not necessary to have its edges parallel to Ox and Oy)
Do you have any suggestion to solve this problem? :) I can think of testing the intersection of each rectangle pair. However, it's O(N*N) and quite slow :(
Either use a sorting algorithm according to smallest X value of the rectangle, or store your rectangles in an R-tree and search it.
Straight-forward approach (with sorting)
Let us denote low_x() - the smallest (leftmost) X value of a rectangle, and high_x() - the highest (rightmost) X value of a rectangle.
Sort the rectangles according to low_x(). # O(n log n)
For each rectangle in sorted array: # O(n)
Finds its highest X point. # O(1)
Compare it with all rectangles whose low_x() is smaller # O(log n)
than this.high(x)
Complexity analysis
This should work on O(n log n) on uniformly distributed rectangles.
The worst case would be O(n^2), for example when the rectangles don't overlap but are one above another. In this case, generalize the algorithm to have low_y() and high_y() too.
Data-structure approach: R-Trees
R-trees (a spatial generalization of B-trees) are one of the best ways to store geospatial data, and can be useful in this problem. Simply store your rectangles in an R-tree, and you can spot intersections with a straightforward O(n log n) complexity. (n searches, log n time for each).
Observation 1: given a polygon A and a rectangle B, the intersection A ∩ B can be computed by 4 intersection with half-planes corresponding to each edge of B.
Observation 2: cutting a half plane from a convex polygon gives you a convex polygon. The first rectangle is a convex polygon. This operation increases the number of vertices at most per 1.
Observation 3: the signed distance of the vertices of a convex polygon to a straight line is a unimodal function.
Here is a sketch of the algorithm:
Maintain the current partial intersection D in a balanced binary tree in a CCW order.
When cutting a half-plane defined by a line L, find the two edges in D that intersect L. This can be done in logarithmic time through some clever binary or ternary search exploiting the unimodality of the signed distance to L. (This is the part I don't exactly remember.) Remove all the vertices on the one side of L from D, and insert the intersection points to D.
Repeat for all edges L of all rectangles.
This seems like a good application of Klee's measure. Basically, if you read there are lower bounds on the runtime of the best algorithms that can be found for rectilinear intersections at O(n log n).
I think you should use something like the sweep line algorithm: finding intersections is one of its applications. Also, have a look at answers to this questions
Since the rectangles must not be parallel to the axis, it is easier to transform the problem to an already solved one: compute the intersections of the borders of the rectangles.
build a set S which contains all borders, together with the rectangle they're belonging to; you get a set of tuples of the form ((x_start,y_start), (x_end,y_end), r_n), where r_n is of course the ID of the corresponding rectangle
now use a sweep line algorithm to find the intersections of those lines
The sweep line stops at every x-coordinate in S, i.e. all start values and all end values. For every new start coordinate, put the corresponding line in a temporary set I. For each new end-coordinate, remove the corresponding line from I.
Additionally to adding new lines to I, you can check for each new line whether it intersects with one of the lines currently in I. If they do, the corresponding rectangles do, too.
You can find a detailed explanation of this algorithm here.
The runtime is O(n*log(n) + c*log(n)), where c is the number of intersection points of the lines in I.
Pick the smallest rectangle from the set (or any rectangle), and go over each point within it. If one of it's point also exists in all other rectangles, the intersection is not empty. If all points are free from ALL other rectangles, the intersection is empty.
