The title is most of the problem. I have a set of circles, each given by a center C and radius r. The distance between two circles is the Euclidean distance between their centers minus both their radii. For circles a and b,
d_ab = |C_a - C_b| - r_a - r_b.
Note this can be negative if the circles overlap.
Then what is the quickest data structure for finding the nearest (minimum distance) neighbor of a given circle in the set?
Adding and deleting circles with "find nearest" queries interleaved in arbitrary order must be supported. Nothing is known about the set's geometric distribution in advance.
This will be the heart of a system where a typical number of circles is 50,000 and 10's of thousands of queries, inserts, and deletes will be needed, ideally at user-interaction speed (a second or less) on a high end tablet device.
The point nearest neighbor has been studied to death, but this version with circles seems somewhat harder.
I have looked at kd-trees, quad trees, r-trees, and some variations of these. Both advice on which of these might be the best to try and also new suggestions would be a terrific help.
Cover trees are another possibility for a proximity structure. They don't support deletes (?), but you can soft delete and rebuild in the background to keep the garbage from piling up, which may be a useful technique for the other structures.
There's a reduction from the 2D circle problem to the 3D point problem with a funky metric that goes like this. (The proximity structures that you named should be adaptable.) Map a circle centered at (x, y) with radius r to the point (x, y, r). Define the length of a vector (dx, dy, dz) to be sqrt(dx**2 + dy**2) + abs(dz). This induces a metric. To find the circle nearest to a center (x, y) (the radius of the query circle is not relevant), do a proximity search at (x, y, R), where R is greater than or equal to the maximum radius of a circle (it may be possible to modify your proximity structure so that it's not necessary to track R).
From my experience implementing kd-trees and Voronoi diagrams on points, it will be significantly easier to implement kd-trees from scratch. Even if you reuse someone else's robust geometric primitives (and please do, to save your sanity if you go this route), the degenerate edge cases of Voronoi/point location take time to get right.
I propose the following heuristic using a KD-Tree or something else that allows for O(log N) nearest neighbor. Instead of using a single point and a radius to represented a circle. Use k equidistant points on a circle itself, plus the center of the circle, otherwise you might have issues with a small circle inside of a large circle. It's the same the same idea as using a regular polygon of k vertices to represent a circle. It is then possible to take one vertex and find it's nearest neighbor (ignoring the vertices on the same circle) to find an approximation as to what is the closest circle based on the closest regular polygon.
The performances are as followed:
Create the KD-Tree: O(kN log kN)
Remove/add Circle to KD-Tree: O(k log kN)
-Add or remove all k points of a circle within the KD-Tree
Nearest Circle query (Circle): O(k log kN)
-This is done by first removing all k points of the circle (O(k log kN)) as it's not terribly useful to find out that the nearest neighbor of a circle is unsurprisingly, the circle itself. Then for each k points in the circle, find the nearest neighbor (O(k log kN)). Once the nearest neighbors are found, the actual nearest (to within some error) is the the one with the smallest distance (after calculating the true distance based on point and radius) (O(1)).
I'd suggest either using k = log(N) if you prefer it to be fast or k = sqrt(N) if you prefer it to be accurate.
Also, it's possible I haven't considered some special case that causes issues, so watch out for them.
If there is a guarantee that circles don't have large radius, at least that maximum radius (R) is significantly smaller than area where circles are positioned, than I think it can be covered with standard space partitioning and nearest neighbour search.
When searching for circle in a set that has minimal distance to given circle, than radius of given circle doesn't matter (distance definition.) Because of that it is same if only center (point) is compared to set of circles.
With that it is enough to store set of circles in space partition structure only by there centers (set of points.) Circle adding and deleting is done in standard way for points. Finding of nearest circle to given point can be done in two step:
find nearest center to given point P. Say circle C with center c and radius r.
center of circle that is closer to P, by your distance, can be only in ring around P with inner radius r and outer radius R-d(P,c). It is enough to search partitions that intersect that ring for candidates.
It is possible to optimize search by combining these two steps. In first step, some of partitions of interest are already visited. By storing which partitions are visited, and circle with minimal distance found in these partitions, search in second step can be reduced.
Thanks to #David Eisenstadt for the idea of a 3d search structure. This is part of the best answer, though his strange metric is not needed.
The key is to look in detail at how nearest neighbor search works. I'll show this for quadrees. Kd-trees with k=3 are similar. Here is pseudocode:
# Let nearest_info be a record containing the current nearest neighbor (or nil
# if none yet) and the distance from point to that nearest neighbor.
def find_nearest_neighbor(node, target, nearest_info)
if node is leaf
update nearest_info using target and the points found in this leaf
else
for each subdivision S of node
if S contains any point P where dist(P,T) < nearest_info.distance,
find_neareast(S, target, nearest_info)
end
end
end
end
When this is done, nearest_info contains the nearest neighbor and its distance.
The key is if S contains any point P where dist(P,T) < nearest_info.distance. In a 3d space, of (x,y,r) triples that describe circles, we have
def dist(P,T)
return sqrt( (P.x - T.x)^2 + (P.y - T.y)^2 ) - P.r - T.r
end
Here P is an arbitrary point in an octant of an octree cuboid. How to consider all points in the cuboid? Note all components of T are effectively fixed for a given search, so it's clearer if we write the target as a constant point (a, b, c):
def dist(P)
return sqrt( (P.x - a)^2 + (P.y - b)^2 ) - P.r
end
Where we have left out c = T.r completely because it can be subtracted out of the minimum distance after the algorithm is complete. In other words, the radius of the target does not affect the result.
With this it is pretty easy to see that the P we need to obtain minimum dist to the cuboid is Euclidean closest to the target with respect to x and y and with the max represented radius. This is very easy and quick to compute: a 2d point-rectangle distance and a 1d max operation.
In hindsight all this is obvious, but it took a while to see it from the right point of view. Thanks for the ideas.
Related
I recently came across a problem which is something like this
There are N disjoint (such that they do not touch or intersect) circles given by their center and radius, i.e. center = (x_i, y_i), radius = r_i. Then we have Q queries where a point (x, y) is given. For each query we need to find out the index i of the circle which contains that given point (-1 if no circle).
The constraints are roughly 1 <= N <= 10^5 and 1 <= Q <= 10^5. So a O(Q * log(N)) might be needed.
Apart from the straight-forward O(Q * N) solution the only better thing I can think of is keeping the leftmost and rightmost points of the circles as intervals in an array and then doing binary search to find out the intervals which contains the x-coordinate of the point, but more than one intervals may be overlapping and more than one circle may contain the point. So I'm not sure if that's going to work.
Any help would be highly appreciated. Thank you.
This can be solved as a nearest-neighbour query in N+1 dimensions.
Imagine a set of balls of a fixed radius in 3d, such that their intersection with the plane z=0 is exactly your set of circles. (The balls may intersect, it doesn't matter). Now a point that falls into a circle is necessarily closer to the centre of its corresponding ball than to centres of all other balls.
The nearest-neighbour problem is well studied. Space partitioning techniques work well with real life data, though worst-case performance is not so good.
Edit: since the query point in in the fixed plane z=0, the problem can be seen as a 2d nearest-neighbour problem with non-Euclidean distance function. The effective distance from a query point to the centre of a circle is
D = &sqrt;(d2+R2 - r2)
where d is the real distance, R is the ball radius (conmon for all balls) and r is the circle radius.
Another way to solve this is to build a power diagram of the set of circles. A power diagram is a plane subdivision. There are ways to efficiently answer queries of the form "which cell of a plane subdivision given point belongs to", for example, using Kirkpatrik's point location data structure.
The two approaches are similar, if not equivalent, because in the power diagram, the power of the point with respect to a circle is the square of D in the formula for distance (with R=0).
I know this is a quite old question but for the records....
You can solve it with matplotlib Circle
from matplotlib.patches import Circle as Cr
.................
self.my_mpl_circle = Cr(origin, radius)
......
def match_pos(self, coords):
return self.my_mpl_circle.contains_point(coords)
I have a map of about tens of millions of dots stand for people's location, now given a dot, how to find the dots(stand for people's location) at a distance in 1 kilo meters from the given dot quickly? What is the best algorithm?
You can use kd tree to get all dots within a particular distance from given point. In dense graph such as yours the problem can be solved in O(logn + k) where k are total points that can be found in the region and n is total points .
Grids sound like a very practical solution, but there are a number of tree-based data structures for this problem as well. The basic idea is that you arrange your data in a tree and add annotations to the tree such as a bounding box at each node which holds all the points held below that node. Then when you search the tree you can work out that you don't need to look in the descendants of most nodes.
http://en.wikipedia.org/wiki/K-d_tree http://en.wikipedia.org/wiki/Cover_tree
I think the fastest a fast way is to use a grid to filter all dots that are definitly too far away. to the other dots you can compute the distance exact.
Im not sure how your dots look like. But let a dot be a pair of coordinates (x,y).
You can save them sorted (in a database), so it is easy to find all dots (a,b) with x - max_dist < a < x + max_dist and y - max_dist < b < y + max_dist. so you get a little square with only some dots. Now you can compute the exact distance between (x,y) and the dots (a,b) in the square.
This should also work with gps-coordinats. sure, coordinates on a sphere do not form a square, but if the sphere is much larger than the maximum distance it do not matter.
http://www.glassdoor.com/Interview/Google-Interview-RVW2382108.htm
I have tried to come with a solution to this problem. But I have not been successful.. Can any one please give me a hint as to how to proceed with this problem.
I will take 2 pair of two points each. That is, I will make 2 chords. Find out their perpendicular bisector. Using those bisectors, I will find out the center of the circle...
Moreover, I will come up with the equation of the circle. And find the point of intersection of the point M with the circle... That should be closest point. However, that point may or may not exist in the set of N points
Thanks.
Assuming that the points on the circumference of the circle are "in-order" (i.e. sorted by angle about the circle's center) you could use an angle-based binary search, which should achieve the O(log(n)) bounds.
Calculate the angle A from the point M to the center of the circle - O(1).
Use binary search to find the point I on the circumference with largest angle less than A - O(log(n)).
Since circles are convex the closest point to M is either I or I+1. Calculate distance to both and take the minimum - O(1).
To find a point closest to M, we need to do binary elimination of points based on planar cuts. A little pre-processing of the input points is needed after which we can find a point closest to any given point M in O(lgn) time.
Calculate (if not given) polar representation of points in (r,θ) format where r is the distance from center and θ is the angle from x-axis in the range (-180,180].
Sort all N points in increasing order of their angle from x-axis.
Note that simple binary search of a point closest to M will not work here, e.g.,
if the given points are sorted such that θ = (-130,-100,-90,-23,-15,0,2,14,170), then for a point M with θ = -170, binary search will give -130 (40 degrees away) as the closest point whereas 170 (20 degrees away) is the closest point to M.
if we ignore the sign while sorting (thinking that it will produce correct output), our new sorted array will look like θ = (0,2,14,15,23,90,100,130,170), binary search for a point M with θ = -6 will yield the result should be either 2 or 14 whereas 0 is the closest point to M in this case.
To perform the search operation using planar cuts,
Find planar cut line passing through the center of circle and perpendicular to the line connecting the center of the circle with point M.
Eliminate half of the circular plane [90+θ,-90+θ) or [-90+θ,90+θ) depending upon on in which half of the plane M lies.
Make planar cuts parallel to the first cut and passing through the point in the middle of the previous plane and eliminate all points in the half of the plane farther from M until there are no points left in the nearer half of the plane, in which case eliminate the nearer half of the plane.
Keep on cutting planes till we are left with one point. That point is the closest point to M. The total operation takes O(lgn) steps.
In case the data is skewed and not uniformly spread in the circle, we can optimize our planar cuts such that each cut passes through the median (based on angle) of those points which are left in the search operation.
I have a height map of NxN values.
I would like to find, given a point A (the red dot), whose x and y coordinates are given (and z is known from the data, so A is a vertex of the surface) a set of points that lie on the circumference of the circle with center in A and radius R that are a good approximation of a circular "cloth" (in grey) draped on the imaginary surface described by the data points.
The sampling, the reciprocal distances between the set of points that I am trying to find, doesn't need to be uniform, but still I would like to find at least all the points that are an intersection of the edges of the mesh with the circle at distance R from A.
How to find this set of points?
Is this a known problem?
(source: keplero.com)
-- edit
The assumption that Jan is using is right: the samples form a regular rectangular or square grid (in the X-Y plane) aligned with [0,0]. But I would like to take the displacement in the Z direction into account to compute the distance. you can see the height map as a terrain, and the algorithm I am looking for as the instructions to give to an explorer that, traveling just on paths of given latitude or longitude, mark the points that are at distance R from A. Walking distance, that is taking into account all the Z displacements done so far. The explorer climbs and go down in the valleys too.
The trivial algorithm for this would be something like this. We know that given R, the maximum displacement on the x and y axis corresponds to a completely flat surface. If there is no slope, the x,y points will all be in the bounding square Ax-R < x < Ax+r and Ay-R
At this point, it would start traveling to the close cells, since if the perimeter enters the edge of one cell of the grid, it also have to exit that cell.
I reckon this is going to be quite difficult to solve in an exact fashion, so I would suggest trying the straightforward approach of simulating the paths that your explorers would take on the surface.
Given your starting point A and a travel distance d, calculate a circle of points P on the XY plane that are d from A.
For each of the points p in P, intersect the line segment A-p with your grid so that you end up with a sequence of points where the explorer crosses from one grid square to the next, in the order that this would happen if the explorer were travelling from A. These points should then be given a z-coordinate by interpolation from your grid data.
You can thus advance through this point sequence and keep track of the distance travelled so far. Eventually the target distance will be reached - adjust p to be at this point.
P now contains the perimeter that you're looking for. Adjust the sample fidelity (size of P) according to your needs.
Just to clarify - You have a triangulated surface in 3d and, for a given starting vertex Vi in the mesh you would like to find the set of vertices U that are reachable via paths along the surface (i.e. geodesics) with length Li <= R.
One approach would be to transform this to a graph-based problem:
Form the weighted, undirected graph G(V,E), where V is the set of vertices in the triangulated surface mesh and E is the set of edges in this mesh. The edge weight should be the Euclidean (3d) length of each edge. This graph is a discrete distance map - the distance "along the surface" between each adjacent vertex in the mesh.
Run a variant of Dijkstra's algorithm from the starting vertex Vi, only expanding paths with length Li that satisfy the constraint Li <= R. The set of vertices visited U, will be those that can be reached by the shortest (geodesic) path with Li <= R.
The accuracy of this approach should be related to the resolution of the surface mesh - as long as the surface curvature within each element is not too high the Euclidean edge length should be a good approximation to the actual geodesic distance, if not, the surface mesh should be refined in that area.
Hope this helps.
I have an image of which this is a small cut-out:
As you can see it are white pixels on a black background. We can draw imaginary lines between these pixels (or better, points). With these lines we can enclose areas.
How can I find the largest convex black area in this image that doesn't contain a white pixel in it?
Here is a small hand-drawn example of what I mean by the largest convex black area:
P.S.: The image is not noise, it represents the primes below 10000000 ordered horizontally.
Trying to find maximum convex area is a difficult task to do. Wouldn't you just be fine with finding rectangles with maximum area? This problem is much easier and can be solved in O(n) - linear time in number of pixels. The algorithm follows.
Say you want to find largest rectangle of free (white) pixels (Sorry, I have images with different colors - white is equivalent to your black, grey is equivalent to your white).
You can do this very efficiently by two pass linear O(n) time algorithm (n being number of pixels):
1) in a first pass, go by columns, from bottom to top, and for each pixel, denote the number of consecutive pixels available up to this one:
repeat, until:
2) in a second pass, go by rows, read current_number. For each number k keep track of the sums of consecutive numbers that were >= k (i.e. potential rectangles of height k). Close the sums (potential rectangles) for k > current_number and look if the sum (~ rectangle area) is greater than the current maximum - if yes, update the maximum. At the end of each line, close all opened potential rectangles (for all k).
This way you will obtain all maximum rectangles. It is not the same as maximum convex area of course, but probably would give you some hints (some heuristics) on where to look for maximum convex areas.
I'll sketch a correct, poly-time algorithm. Undoubtedly there are data-structural improvements to be made, but I believe that a better understanding of this problem in particular will be required to search very large datasets (or, perhaps, an ad-hoc upper bound on the dimensions of the box containing the polygon).
The main loop consists of guessing the lowest point p in the largest convex polygon (breaking ties in favor of the leftmost point) and then computing the largest convex polygon that can be with p and points q such that (q.y > p.y) || (q.y == p.y && q.x > p.x).
The dynamic program relies on the same geometric facts as Graham's scan. Assume without loss of generality that p = (0, 0) and sort the points q in order of the counterclockwise angle they make with the x-axis (compare two points by considering the sign of their dot product). Let the points in sorted order be q1, …, qn. Let q0 = p. For each 0 ≤ i < j ≤ n, we're going to compute the largest convex polygon on points q0, a subset of q1, …, qi - 1, qi, and qj.
The base cases where i = 0 are easy, since the only “polygon” is the zero-area segment q0qj. Inductively, to compute the (i, j) entry, we're going to try, for all 0 ≤ k ≤ i, extending the (k, i) polygon with (i, j). When can we do this? In the first place, the triangle q0qiqj must not contain other points. The other condition is that the angle qkqiqj had better not be a right turn (once again, check the sign of the appropriate dot product).
At the end, return the largest polygon found. Why does this work? It's not hard to prove that convex polygons have the optimal substructure required by the dynamic program and that the program considers exactly those polygons satisfying Graham's characterization of convexity.
You could try treating the pixels as vertices and performing Delaunay triangulation of the pointset. Then you would need to find the largest set of connected triangles that does not create a concave shape and does not have any internal vertices.
If I understand your problem correctly, it's an instance of Connected Component Labeling. You can start for example at: http://en.wikipedia.org/wiki/Connected-component_labeling
I thought of an approach to solve this problem:
Out of the set of all points generate all possible 3-point-subsets. This is a set of all the triangles in your space. From this set remove all triangles that contain another point and you obtain the set of all empty triangles.
For each of the empty triangles you would then grow it to its maximum size. That is, for every point outside the rectangle you would insert it between the two closest points of the polygon and check if there are points within this new triangle. If not, you will remember that point and the area it adds. For every new point you want to add that one that maximizes the added area. When no more point can be added the maximum convex polygon has been constructed. Record the area for each polygon and remember the one with the largest area.
Crucial to the performance of this algorithm is your ability to determine a) whether a point lies within a triangle and b) whether the polygon remains convex after adding a certain point.
I think you can reduce b) to be a problem of a) and then you only need to find the most efficient method to determine whether a point is within a triangle. The reduction of the search space can be achieved as follows: Take a triangle and increase all edges to infinite length in both directions. This separates the area outside the triangle into 6 subregions. Good for us is that only 3 of those subregions can contain points that would adhere to the convexity constraint. Thus for each point that you test you need to determine if its in a convex-expanding subregion, which again is the question of whether it's in a certain triangle.
The whole polygon as it evolves and approaches the shape of a circle will have smaller and smaller regions that still allow convex expansion. A point once in a concave region will not become part of the convex-expanding region again so you can quickly reduce the number of points you'll have to consider for expansion. Additionally while testing points for expansion you can further cut down the list of possible points. If a point is tested false, then it is in the concave subregion of another point and thus all other points in the concave subregion of the tested points need not be considered as they're also in the concave subregion of the inner point. You should be able to cut down to a list of possible points very quickly.
Still you need to do this for every empty triangle of course.
Unfortunately I can't guarantee that by adding always the maximum new region your polygon becomes the maximum polygon possible.