Find Voronoi tessellation with area constraints - algorithm

Let's say we want to Voronoi-partition a rectangular surface with N points.
The Voronoi tessellation results in N regions corresponding to the N points.
For each region, we calculate its area and divide it by the total area of the whole surface - call these numbers a1, ..., aN. Their sum equals unity.
Suppose now we have a preset list of N numbers, b1, ..., bN, their sum equaling unity.
How can one find a choice (any) of the coordinates of the N points for Voronoi partitioning, such that a1==b1, a2==b2, ..., aN==bN?
Edit:
After a bit of thinking about this, maybe Voronoi partitioning isn't the best solution, the whole point being to come up with a random irregular division of the surface, such that the N regions have appropriate sizes. Voronoi seemed to me like the logical choice, but I may be mistaken.

I'd go for some genetic algorithm.
Here is the basic process:
1) Create 100 sets of random points that belong in your rectangle.
2) For each set, compute the voronoï diagram and the areas
3) For each set, evaluate how well it compares with your preset weights (call it its score)
4) Sort sets of points by score
5) Dump the 50 worst sets
6) Create 50 new sets out of the 50 remaining sets by mixins points and adding some random ones.
7) Jump to step 2 until you meet a condition (score above a threshold, number of occurrence, time spent, etc...)
You will end up (hopefully) with a "somewhat appropriate" result.

If what you are looking for does not necessarily have to be a Voronoi tesselation, and could be a Power diagram, there is a nice algorithm described in the following article:
F. Aurenhammer, F. Hoffmann, and B. Aronov, "Minkowski-type theorems and least-squares clustering," Algorithmica, 20:61-76 (1998).
Their version of the problem is as follows: given N points (p_i) in a polygon P, and a set of non-negative real numbers (a_i) summing to the area of P, find weights (w_i), such that the area of the intersection of the Power cell Pow_w(p_i) with P is exactly a_i. In Section 5 of the paper, they prove that this problem can be written as a convex optimization problem. To implement this approach, you need:
software to compute Power diagrams efficiently, such as CGAL and
software for convex optimization. I found that using quasi-Newton solvers such as L-BFGS gives very good result in practice.
I have some code on my webpage that does exactly this, under the name "quadratic optimal transport". However this code is not very clean nor very well-documented, so it might be as fast to implement your own version of the algorithm. You can also look at my SGP2011 paper on this topic, which is available on the same page, for a short description of the implementation of Aurenhammer, Hoffman and Aronov's algorithm.

Assume coordinates where the rectangle is axis-aligned with left edge at x = 0 and right edge at x = 1 and horizontal bisector at y = 0. Let B(0) = 0 and B(i) = b1 + ... + bi. Put points at ((B(i-1) + B(i))/2, 0). That isn't right. We the x coordinates to be xi such that bi = (x(i+1) - x(i-1)) / 2, replacing x(0) by 0 and x(n+1) by 1. This is tridiagonal and should have an easy solution, but perhaps you don't want such a boring Voronoi diagram though; it will be a bunch of vertical divisions.
For a more random-looking diagram, maybe something physics inspired: drop points randomly, compute the Voronoi diagram, compute the area of each cell, make overweight cells attractive to the points of their neighbors and underweight cells repulsive and compute a small delta for each point, repeat until equilibrium is reached.

The voronoi tesselation can be compute when you compute the minimum spanning tree and remove the longest edges. Each center of the subtree of the mst is then a point of the voronoi diagram. Thus the voronoi diagram is a subset of the minimum spanning tree.

Related

Mesh with minimal area between two polylines

I have two polylines v and u with n and m vertices respectively in 3D. I want to connect v[0] to u[0], v[n-1] to u[m-1] and also the inner vertices somehow to obtain a triangle mesh strip with minimal surface area.
My naïve solution is to get the near-optimal initial mesh by subsequent addition of the smallest diagonal and then switch diagonal in every quadrilateral if it produces smaller area until this is no longer possible.
But I am afraid I can end in local minimum and not global. What are the better options to achieve a minimal mesh?
This can be solved with a Dynamic Program.
Let's visualize the problem as a table, where the columns represent the vertices of the first polyline and the rows represent the vertices of the second polyline:
0 1 2 3 ... n-1 -> v
0
1
2
...
m-1
Every cell represents an edge between the polylines. You start at (0, 0) and want to find a path to (n-1, m-1) by taking either (+1, 0) or (0, +1) steps. Every step that you make has a cost (the area of the resulting triangle) and you want to find the path that results in the minimum cost.
So you can iteratively (just in the style of dynamic programming) calculate the cost that is necessary to reach any cell (by comparing the resulting cost of the two possible incoming directions). Remember the direction that you chose and you will have a complete path of minimum cost in the end. The overall runtime will be O(n * m).
If you know that your vertices are more or less nicely distributed, you can restrict the calculation of the table to a few entries near the diagonal. This could get the runtime down to O(k * max(n, m)), where k is the variable radius around the diagonal. But you may miss the optimal solution if the assumption of a nice vertex distribution does not hold.
You could also employ an A*-like strategy where you calculate a cell only when you think it could belong to the minimum path (with the help of some heuristic).

Two closest points in Manhattan distance

I'm wondering about Manhattan distance. It is very specific, and (I don't know if it's a good word) simple. For example when we are given a set of n points in this metric, then it is very easy to find the distance between two farthest points, in linear time. But is it also easy to find two closest points?
I heard, that there exists universal algorithm for finding two closest points in any metric, but it's complicated. I'm wondering if in this situation (Manhattan metric) it is possible to use special properties of this distance and come up with an easier algorithm, that will be more friendly in implementation?
EDIT: n points on a plane, and lets say -10^9 <= x,y <= 10^9 for all points.
Assuming you're talking about n points on a plane, find among the coordinates the minimal and maximal values of x and y coordinates. Create a matrix sized maxX-minX x maxY-minY, such that all points are representable by a cell in the matrix. Fill the matrix with the n given points (not all cells will be filled, set NaN there, for example). Scan the matrix - shortest distance is between adjacent filled cells in the matrix (there are might be several such pairs).

find a point closest to other points

Given N points(in 2D) with x and y coordinates. You have to find a point P (in N given points) such that the sum of distances from other(N-1) points to P is minimum.
for ex. N points given p1(x1,y1),p2(x2,y2) ...... pN(xN,yN).
we have find a point P among p1 , p2 .... PN whose sum of distances from all other points is minimum.
I used brute force approach , but I need a better approach. I also tried by finding median, mean etc. but it is not working for all cases.
then I came up with an idea that I would treat X as a vertices of a polygon and find centroid of this polygon, and then I will choose a point from Y nearest to the centroid. But I'm not sure whether centroid minimizes sum of its distances to the vertices of polygon, so I'm not sure whether this is a good way? Is there any algorithm for solving this problem?
If your points are nicely distributed and if there are so many of them that brute force (calculating the total distance from each point to every other point) is unappealing the following might give you a good enough answer. By 'nicely distributed' I mean (approximately) uniformly or (approximately) randomly and without marked clustering in multiple locations.
Create a uniform k*k grid, where k is an odd integer, across your space. If your points are nicely distributed the one which you are looking for is (probably) in the central cell of this grid. For all the other cells in the grid count the number of points in each cell and approximate the average position of the points in each cell (either use the cell centre or calculate the average (x,y) for points in the cell).
For each point in the central cell, compute the distance to every other point in the central cell, and the weighted average distance to the points in the other cells. This will, of course, be the distance from the point to the 'average' position of points in the other cells, weighted by the number of points in the other cells.
You'll have to juggle the increased accuracy of higher values for k against the increased computational load and figure out what works best for your points. If the distribution of points across cells is far from uniform then this approach may not be suitable.
This sort of approach is quite widely used in large-scale simulations where points have properties, such as gravity and charge, which operate over distances. Whether it suits your needs, I don't know.
The point in consideration is known as the Geometric Median
The centroid or center of mass, defined similarly to the geometric median as minimizing the sum of the squares of the distances to each sample, can be found by a simple formula — its coordinates are the averages of the coordinates of the samples but no such formula is known for the geometric median, and it has been shown that no explicit formula, nor an exact algorithm involving only arithmetic operations and kth roots can exist in general.
I'm not sure if I understand your question but when you calculate the minimum spanning tree the sum from any point to any other point from the tree is minimum.

Find the largest convex black area in an image

I have an image of which this is a small cut-out:
As you can see it are white pixels on a black background. We can draw imaginary lines between these pixels (or better, points). With these lines we can enclose areas.
How can I find the largest convex black area in this image that doesn't contain a white pixel in it?
Here is a small hand-drawn example of what I mean by the largest convex black area:
P.S.: The image is not noise, it represents the primes below 10000000 ordered horizontally.
Trying to find maximum convex area is a difficult task to do. Wouldn't you just be fine with finding rectangles with maximum area? This problem is much easier and can be solved in O(n) - linear time in number of pixels. The algorithm follows.
Say you want to find largest rectangle of free (white) pixels (Sorry, I have images with different colors - white is equivalent to your black, grey is equivalent to your white).
You can do this very efficiently by two pass linear O(n) time algorithm (n being number of pixels):
1) in a first pass, go by columns, from bottom to top, and for each pixel, denote the number of consecutive pixels available up to this one:
repeat, until:
2) in a second pass, go by rows, read current_number. For each number k keep track of the sums of consecutive numbers that were >= k (i.e. potential rectangles of height k). Close the sums (potential rectangles) for k > current_number and look if the sum (~ rectangle area) is greater than the current maximum - if yes, update the maximum. At the end of each line, close all opened potential rectangles (for all k).
This way you will obtain all maximum rectangles. It is not the same as maximum convex area of course, but probably would give you some hints (some heuristics) on where to look for maximum convex areas.
I'll sketch a correct, poly-time algorithm. Undoubtedly there are data-structural improvements to be made, but I believe that a better understanding of this problem in particular will be required to search very large datasets (or, perhaps, an ad-hoc upper bound on the dimensions of the box containing the polygon).
The main loop consists of guessing the lowest point p in the largest convex polygon (breaking ties in favor of the leftmost point) and then computing the largest convex polygon that can be with p and points q such that (q.y > p.y) || (q.y == p.y && q.x > p.x).
The dynamic program relies on the same geometric facts as Graham's scan. Assume without loss of generality that p = (0, 0) and sort the points q in order of the counterclockwise angle they make with the x-axis (compare two points by considering the sign of their dot product). Let the points in sorted order be q1, …, qn. Let q0 = p. For each 0 ≤ i < j ≤ n, we're going to compute the largest convex polygon on points q0, a subset of q1, …, qi - 1, qi, and qj.
The base cases where i = 0 are easy, since the only “polygon” is the zero-area segment q0qj. Inductively, to compute the (i, j) entry, we're going to try, for all 0 ≤ k ≤ i, extending the (k, i) polygon with (i, j). When can we do this? In the first place, the triangle q0qiqj must not contain other points. The other condition is that the angle qkqiqj had better not be a right turn (once again, check the sign of the appropriate dot product).
At the end, return the largest polygon found. Why does this work? It's not hard to prove that convex polygons have the optimal substructure required by the dynamic program and that the program considers exactly those polygons satisfying Graham's characterization of convexity.
You could try treating the pixels as vertices and performing Delaunay triangulation of the pointset. Then you would need to find the largest set of connected triangles that does not create a concave shape and does not have any internal vertices.
If I understand your problem correctly, it's an instance of Connected Component Labeling. You can start for example at: http://en.wikipedia.org/wiki/Connected-component_labeling
I thought of an approach to solve this problem:
Out of the set of all points generate all possible 3-point-subsets. This is a set of all the triangles in your space. From this set remove all triangles that contain another point and you obtain the set of all empty triangles.
For each of the empty triangles you would then grow it to its maximum size. That is, for every point outside the rectangle you would insert it between the two closest points of the polygon and check if there are points within this new triangle. If not, you will remember that point and the area it adds. For every new point you want to add that one that maximizes the added area. When no more point can be added the maximum convex polygon has been constructed. Record the area for each polygon and remember the one with the largest area.
Crucial to the performance of this algorithm is your ability to determine a) whether a point lies within a triangle and b) whether the polygon remains convex after adding a certain point.
I think you can reduce b) to be a problem of a) and then you only need to find the most efficient method to determine whether a point is within a triangle. The reduction of the search space can be achieved as follows: Take a triangle and increase all edges to infinite length in both directions. This separates the area outside the triangle into 6 subregions. Good for us is that only 3 of those subregions can contain points that would adhere to the convexity constraint. Thus for each point that you test you need to determine if its in a convex-expanding subregion, which again is the question of whether it's in a certain triangle.
The whole polygon as it evolves and approaches the shape of a circle will have smaller and smaller regions that still allow convex expansion. A point once in a concave region will not become part of the convex-expanding region again so you can quickly reduce the number of points you'll have to consider for expansion. Additionally while testing points for expansion you can further cut down the list of possible points. If a point is tested false, then it is in the concave subregion of another point and thus all other points in the concave subregion of the tested points need not be considered as they're also in the concave subregion of the inner point. You should be able to cut down to a list of possible points very quickly.
Still you need to do this for every empty triangle of course.
Unfortunately I can't guarantee that by adding always the maximum new region your polygon becomes the maximum polygon possible.

Connected points with-in a grid

Given a collection of random points within a grid, how do you check efficiently that they are all lie within a fixed range of other points. ie: Pick any one random point you can then navigate to any other point in the grid.
To clarify further: If you have a 1000 x 1000 grid and randomly placed 100 points in it how can you prove that any one point is within 100 units of a neighbour and all points are accessible by walking from one point to another?
I've been writing some code and came up with an interesting problem: Very occasionally (just once so far) it creates an island of points which exceeds the maximum range from the rest of the points. I need to fix this problem but brute force doesn't appear to be the answer.
It's being written in Java, but I am good with either pseudo-code or C++.
I like #joel.neely 's construction approach but if you want to ensure a more uniform density this is more likely to work (though it would probably produce more of a cluster rather than an overall uniform density):
Randomly place an initial point P_0 by picking x,y from a uniform distribution within the valid grid
For i = 1:N-1
Choose random j = uniformly distributed from 0 to i-1, identify point P_j which has been previously placed
Choose random point P_i where distance(P_i,P_j) < 100, by repeating the following until a valid P_i is chosen in substep 4 below:
Choose (dx,dy) each uniformly distributed from -100 to +100
If dx^2+dy^2 > 100^2, the distance is too large (fails 21.5% of the time), go back to previous step.
Calculate candidate coords(P_i) = coords(P_j) + (dx,dy).
P_i is valid if it is inside the overall valid grid.
Just a quick thought: If you divide the grid into 50x50 patches and when you place the initial points, you also record which patch they belong to. Now, when you want to check if a new point is within 100 pixels of the others, you could simply check the patch plus the 8 surrounding it and see if the point counts match up.
E.g., you know you have 100 random points, and each patch contains the number of points they contain, you can simply sum up and see if it is indeed 100 — which means all points are reachable.
I'm sure there are other ways, tough.
EDIT: The distance from the upper left point to the lower right of a 50x50 patch is sqrt(50^2 + 50^2) = 70 points, so you'd probably have to choose smaller patch size. Maybe 35 or 36 will do (50^2 = sqrt(x^2 + x^2) => x=35.355...).
Find the convex hull of the point set, and then use the rotating calipers method. The two most distant points on the convex hull are the two most distant points in the set. Since all other points are contained in the convex hull, they are guaranteed to be closer than the two extremal points.
As far as evaluating existing sets of points, this looks like a type of Euclidean minimum spanning tree problem. The wikipedia page states that this is a subgraph of the Delaunay triangulation; so I would think it would be sufficient to compute the Delaunay triangulation (see prev. reference or google "computational geometry") and then the minimum spanning tree and verify that all edges have length less than 100.
From reading the references it appears that this is O(N log N), maybe there is a quicker way but this is sufficient.
A simpler (but probably less efficient) algorithm would be something like the following:
Given: the points are in an array from index 0 to N-1.
Sort the points in x-coordinate order, which is O(N log N) for an efficient sort.
Initialize i = 0.
Increment i. If i == N, stop with success. (All points can be reached from another with radius R)
Initialize j = i.
Decrement j.
If j<0 or P[i].x - P[j].x > R, Stop with failure. (there is a gap and all points cannot be reached from each other with radius R)
Otherwise, we get here if P[i].x and P[j].x are within R of each other. Check if point P[j] is sufficiently close to P[i]: if (P[i].x-P[j].x)^2 + (P[i].y-P[j].y)^2 < R^2`, then point P[i] is reachable by one of the previous points within radius R, and go back to step 4.
Keep trying: go back to step 6.
Edit: this could be modified to something that should be O(N log N) but I'm not sure:
Given: the points are in an array from index 0 to N-1.
Sort the points in x-coordinate order, which is O(N log N) for an efficient sort.
Maintain a sorted set YLIST of points in y-coordinate order, initializing YLIST to the set {P[0]}. We'll be sweeping the x-coordinate from left to right, adding points one by one to YLIST and removing points that have an x-coordinate that is too far away from the newly-added point.
Initialize i = 0, j = 0.
Loop invariant always true at this point: All points P[k] where k <= i form a network where they can be reached from each other with radius R. All points within YLIST have x-coordinates that are between P[i].x-R and P[i].x
Increment i. If i == N, stop with success.
If P[i].x-P[j].x <= R, go to step 10. (this is automatically true if i == j)
Point P[j] is not reachable from point P[i] with radius R. Remove P[j] from YLIST (this is O(log N)).
Increment j, go to step 6.
At this point, all points P[j] with j<i and x-coordinates between P[i].x-R and P[i].x are in the set YLIST.
Add P[i] to YLIST (this is O(log N)), and remember the index k within YLIST where YLIST[k]==P[i].
Points YLIST[k-1] and YLIST[k+1] (if they exist; P[i] may be the only element within YLIST or it may be at an extreme end) are the closest points in YLIST to P[i].
If point YLIST[k-1] exists and is within radius R of P[i], then P[i] is reachable with radius R from at least one of the previous points. Go to step 5.
If point YLIST[k+1] exists and is within radius R of P[i], then P[i] is reachable with radius R from at least one of the previous points. Go to step 5.
P[i] is not reachable from any of the previous points. Stop with failure.
New and Improved ;-)
Thanks to Guillaume and Jason S for comments that made me think a bit more. That has produced a second proposal whose statistics show a significant improvement.
Guillaume remarked that the earlier strategy I posted would lose uniform density. Of course, he is right, because it's essentially a "drunkard's walk" which tends to orbit the original point. However, uniform random placement of the points yields a significant probability of failing the "path" requirement (all points being connectible by a path with no step greater than 100). Testing for that condition is expensive; generating purely random solutions until one passes is even more so.
Jason S offered a variation, but statistical testing over a large number of simulations leads me to conclude that his variation produces patterns that are just as clustered as those from my first proposal (based on examining mean and std. dev. of coordinate values).
The revised algorithm below produces point sets whose stats are very similar to those of purely (uniform) random placement, but which are guaranteed by construction to satisfy the path requirement. Unfortunately, it's a bit easier to visualize than to explain verbally. In effect, it requires the points to stagger randomly in a vaguely consistant direction (NE, SE, SW, NW), only changing directions when "bouncing off a wall".
Here's the high-level overview:
Pick an initial point at random, set horizontal travel to RIGHT and vertical travel to DOWN.
Repeat for the remaining number of points (e.g. 99 in the original spec):
2.1. Randomly choose dx and dy whose distance is between 50 and 100. (I assumed Euclidean distance -- square root of sums of squares -- in my trial implementation, but "taxicab" distance -- sum of absolute values -- would be even easier to code.)
2.2. Apply dx and dy to the previous point, based on horizontal and vertical travel (RIGHT/DOWN -> add, LEFT/UP -> subtract).
2.3. If either coordinate goes out of bounds (less than 0 or at least 1000), reflect that coordinate around the boundary violated, and replace its travel with the opposite direction. This means four cases (2 coordinates x 2 boundaries):
2.3.1. if x < 0, then x = -x and reverse LEFT/RIGHT horizontal travel.
2.3.2. if 1000 <= x, then x = 1999 - x and reverse LEFT/RIGHT horizontal travel.
2.3.3. if y < 0, then y = -y and reverse UP/DOWN vertical travel.
2.3.4. if 1000 <= y, then y = 1999 - y and reverse UP/DOWN vertical travel.
Note that the reflections under step 2.3 are guaranteed to leave the new point within 100 units of the previous point, so the path requirement is preserved. However, the horizontal and vertical travel constraints force the generation of points to "sweep" randomly across the entire space, producing more total dispersion than the original pure "drunkard's walk" algorithm.
If I understand your problem correctly, given a set of sites, you want to test whether the nearest neighbor (for the L1 distance, i.e. the grid distance) of each site is at distance less than a value K.
This is easily obtained for the Euclidean distance by computing the Delaunay triangulation of the set of points: the nearest neighbor of a site is one of its neighbor in the Delaunay triangulation. Interestingly, the L1 distance is greater than the Euclidean distance (within a factor sqrt(2)).
It follows that a way of testing your condition is the following:
compute the Delaunay triangulation of the sites
for each site s, start a breadth-first search from s in the triangulation, so that you discover all the vertices at Euclidean distance less than K from s (the Delaunay triangulation has the property that the set of vertices at distance less than K from a given site is connected in the triangulation)
for each site s, among these vertices at distance less than K from s, check if any of them is at L1 distance less than K from s. If not, the property is not satisfied.
This algorithm can be improved in several ways:
the breadth-first search at step 2 should of course be stopped as soon as a site at L1 distance less than K is found.
during the search for a valid neighbor of s, if a site s' is found to be at L1 distance less than K from s, there is no need to look for a valid neighbor for s': s is obviously one of them.
a complete breadth-first search is not needed: after visiting all triangles incident to s, if none of the neighbors of s in the triangulation is a valid neighbor (i.e. a site at L1 distance less than K), denote by (v1,...,vn) the neighbors. There are at most four edges (vi, vi+1) which intersect the horizontal and vertical axis. The search should only be continued through these four (or less) edges. [This follows from the shape of the L1 sphere]
Force the desired condition by construction. Instead of placing all points solely by drawing random numbers, constrain the coordinates as follows:
Randomly place an initial point.
Repeat for the remaining number of points (e.g. 99):
2.1. Randomly select an x-coordinate within some range (e.g. 90) of the previous point.
2.2. Compute the legal range for the y-coordinate that will make it within 100 units of the previous point.
2.3. Randomly select a y-coordinate within that range.
If you want to completely obscure the origin, sort the points by their coordinate pair.
This will not require much overhead vs. pure randomness, but will guarantee that each point is within 100 units of at least one other point (actually, except for the first and last, each point will be within 100 units of two other points).
As a variation on the above, in step 2, randomly choose any already-generated point and use it as the reference instead of the previous point.

Resources