Given a few points and circles, how can I tell which point lies in which circles? - algorithm

Given a small number of points and circles (say under 100), how do I tell which point lies in which circles? The circles can intersect, so one point can lie in multiple circles.
If it's of any relevance, both points and circle centers are aligned on a hexagonal grid, and the radii of the circles are also aligned to the grid.
With a bit of thought, it seems the worse case scenario would always be quadratic (when each point lies in all circles) ... but there might be some way to make this faster for the average case when there aren't that many intersections?
I'm doing this for an AI simulation and the circle/point locations change all the time, so I can't really pre-compute anything ahead of time.

If the number of points and circles is that small, you probably will get away with brute-forcing it. Circle-point intersections are pretty cheap, and 100 * 100 checks a frame shouldn't harm performance at all.
If you are completely sure that this routine is the bottleneck and needs to be optimized, read on.
You can try using a variation of Bounding Volume Hierarchies.
A bounding volume hierarchy is a tree in which each node covers the entire volume of both (or more if you decide to use a tree with higher degree) of its children. The volumes/objects that have to be tested for intersections are always the leaf nodes of the tree.
Insertion, removal and intersection queries have an amortized average run-time of O(log n). You will however have to update the tree, as your objects are dynamic, which is done by removing and reinserting invalid nodes (nodes which do not contain their leaf nodes fully any more). Updating the full tree takes a worst case time of O(n log n).
Care should be taken that while insertion, a node should be inserted into that sub-tree that increases the sub-tree's volume by the least amount.
Here is a good blog post by Randy Gaul which explains dynamic bounding hierarchies well.
You'll have to use circles as the bounding volumes, unless you can find a way to use AABBs in all nodes except leaf nodes, and circles as leaf nodes. AABBs are more accurate and should give you a slightly better constructed tree.

You can build a kd-tree of the points. And then for each circle center you retrieve all the points of the kd-tree with distance bounded by the circle radius. Given M points and N circles the complexity should be M log M + N log M = max(M,N) log M (if points and circles are "well distributed").
Whether you can gain anything compared to a brute-force pair-wise check depends on the geometric structure of your points and circles. If, for instance, the radii of the circles are big in relation to the distances of the points or the distances of the cirlce centers then there is not much to expect, I think.

Rather than going to a full 2D-tree, there is an intermediate possibility based on sorting.
Sort the P points on the abscissas. With a good sorting algorithm (say Heapsort), the cost can be modeled as S.P.Lg(P) (S is the cost of comparisons/moves).
Then, for every circle (C of them), locate its leftmost point (Xc-R) in the sorted list by dichotomy, with a cost D.Lg(P) (D is the cost of a dichotomy step). Then step to the rightmost point (Xc+R) and perform the point/circle test every time.
Doing this, you will spare the comparisons with the points to the left and to the right of the circle. Let F denote the average fraction of the points which fall in the range [Xc-R, Xc+R] for all circles.
Denoting K the cost of a point/circle comparison, the total can be estimated as
S.P.Lg(P) + D.Lg(P).C + F.K.P.C
to be compared to K.P.C.
The ratio is
S/K.Lg(P)/C + D/K.Lg(P)/P + F.
With the unfavorable hypothesis that S=D=K, for P=C=100 we get 6.6% + 6.6% + F. These three terms respectively correspond to the preprocessing time, an acceleration overhead and the reduced workload.
Assuming resonably small circles, let F = 10%, and you can hope a speedup x4.
If you are using a bounding box test before the exact point/circle comparison (which is not necessarily an improvement), you can simplify the bounding box test to two Y comparisons, as the X overlap is implicit.

Related

Optimize bruteforce solution of searching nearest point

I have non empty Set of points scattered on plane, they are given by their coordinates.
Problem is to quickly reply such queries:
Give me the point from your set which is nearest to the point A(x, y)
My current solution pseudocode
query( given_point )
{
nearest_point = any point from Set
for each point in Set
if dist(point, query_point) < dist(nearest_point, given_point)
nearest_point = point
return nearest_point
}
But this algorithm is very slow with complexity is O(N).
The question is, is there any data structure or tricky algorithms with precalculations which will dramatically reduce time complexity? I need at least O(log N)
Update
By distance I mean Euclidean distance
You can get O(log N) time using a kd-tree. This is like a binary search tree, except that it splits points first on the x-dimension, then the y-dimension, then the x-dimension again, and so on.
If your points are homogeneously distributed, you can achieve O(1) look-up by binning the points into evenly-sized boxes and then searching the box in which the query point falls and its eight neighbouring boxes.
It would be difficult to make an efficient solution from Voronoi diagrams since this requires that you solve the problem of figuring out which Voronoi cell the query point falls in. Much of the time this involves building an R*-tree to query the bounding boxes of the Voronoi cells (in O(log N) time) and then performing point-in-polygon checks (O(p) in the number of points in the polygon's perimeter).
You can divide your grid in subsections:
Depending on the number of points and grid size, you choose a useful division. Let's assume a screen of 1000x1000 pixels, filled with random points, evenly distributed over the surface.
You may divide the screen into 10x10 sections and make a map (roughX, roughY)->(List ((x, y), ...). For a certain point, you may lookup all points in the same cell and - since the point may be closer to points of the neighbor cell than to an extreme point in the same cell, the surrounding cells, maybe even 2 cells away. This would reduce the searching scope to 16 cells.
If you don't find a point in the same cell/layer, expand the search to next layer.
If you happen to find the next neighbor in one of the next layers, you have to expand the searching scope to an additional layer for each layer. If there are too many points, choose a finer grid. If there are to few points, choose a bigger grid. Note, that the two green circles, connected to the red with a line, have the same distance to the red one, but one is in layer 0 (same cell) but the other layer 2 (next of next cell).
Without preprocessing you definitely need to spend O(N), as you must look at every point before return the closest.
You can look here Nearest neighbor search for how to approach this problem.

Most efficient way to select point with the most surrounding points

N.B: there's a major edit at the bottom of the question - check it out
Question
Say I have a set of points:
I want to find the point with the most points surrounding it, within radius (ie a circle) or within (ie a square) of the point for 2 dimensions. I'll refer to it as the densest point function.
For the diagrams in this question, I'll represent the surrounding region as circles. In the image above, the middle point's surrounding region is shown in green. This middle point has the most surrounding points of all the points within radius and would be returned by the densest point function.
What I've tried
A viable way to solve this problem would be to use a range searching solution; this answer explains further and that it has " worst-case time". Using this, I could get the number of points surrounding each point and choose the point with largest surrounding point count.
However, if the points were extremely densely packed (in the order of a million), as such:
then each of these million points () would need to have a range search performed. The worst-case time , where is the number of points returned in the range, is true for the following point tree types:
kd-trees of two dimensions (which are actually slightly worse, at ),
2d-range trees,
Quadtrees, which have a worst-case time of
So, for a group of points within radius of all points within the group, it gives complexity of for each point. This yields over a trillion operations!
Any ideas on a more efficient, precise way of achieving this, so that I could find the point with the most surrounding points for a group of points, and in a reasonable time (preferably or less)?
EDIT
Turns out that the method above is correct! I just need help implementing it.
(Semi-)Solution
If I use a 2d-range tree:
A range reporting query costs , for returned points,
For a range tree with fractional cascading (also known as layered range trees) the complexity is ,
For 2 dimensions, that is ,
Furthermore, if I perform a range counting query (i.e., I do not report each point), then it costs .
I'd perform this on every point - yielding the complexity I desired!
Problem
However, I cannot figure out how to write the code for a counting query for a 2d layered range tree.
I've found a great resource (from page 113 onwards) about range trees, including 2d-range tree psuedocode. But I can't figure out how to introduce fractional cascading, nor how to correctly implement the counting query so that it is of O(log n) complexity.
I've also found two range tree implementations here and here in Java, and one in C++ here, although I'm not sure this uses fractional cascading as it states above the countInRange method that
It returns the number of such points in worst case
* O(log(n)^d) time. It can also return the points that are in the rectangle in worst case
* O(log(n)^d + k) time where k is the number of points that lie in the rectangle.
which suggests to me it does not apply fractional cascading.
Refined question
To answer the question above therefore, all I need to know is if there are any libraries with 2d-range trees with fractional cascading that have a range counting query of complexity so I don't go reinventing any wheels, or can you help me to write/modify the resources above to perform a query of that complexity?
Also not complaining if you can provide me with any other methods to achieve a range counting query of 2d points in in any other way!
I suggest using plane sweep algorithm. This allows one-dimensional range queries instead of 2-d queries. (Which is more efficient, simpler, and in case of square neighborhood does not require fractional cascading):
Sort points by Y-coordinate to array S.
Advance 3 pointers to array S: one (C) for currently inspected (center) point; other one, A (a little bit ahead) for nearest point at distance > R below C; and the last one, B (a little bit behind) for farthest point at distance < R above it.
Insert points pointed by A to Order statistic tree (ordered by coordinate X) and remove points pointed by B from this tree. Use this tree to find points at distance R to the left/right from C and use difference of these points' positions in the tree to get number of points in square area around C.
Use results of previous step to select "most surrounded" point.
This algorithm could be optimized if you rotate points (or just exchange X-Y coordinates) so that width of the occupied area is not larger than its height. Also you could cut points into vertical slices (with R-sized overlap) and process slices separately - if there are too many elements in the tree so that it does not fit in CPU cache (which is unlikely for only 1 million points). This algorithm (optimized or not) has time complexity O(n log n).
For circular neighborhood (if R is not too large and points are evenly distributed) you could approximate circle with several rectangles:
In this case step 2 of the algorithm should use more pointers to allow insertion/removal to/from several trees. And on step 3 you should do a linear search near points at proper distance (<=R) to distinguish points inside the circle from the points outside it.
Other way to deal with circular neighborhood is to approximate circle with rectangles of equal height (but here circle should be split into more pieces). This results in much simpler algorithm (where sorted arrays are used instead of order statistic trees):
Cut area occupied by points into horizontal slices, sort slices by Y, then sort points inside slices by X.
For each point in each slice, assume it to be a "center" point and do step 3.
For each nearby slice use binary search to find points with Euclidean distance close to R, then use linear search to tell "inside" points from "outside" ones. Stop linear search where the slice is completely inside the circle, and count remaining points by difference of positions in the array.
Use results of previous step to select "most surrounded" point.
This algorithm allows optimizations mentioned earlier as well as fractional cascading.
I would start by creating something like a https://en.wikipedia.org/wiki/K-d_tree, where you have a tree with points at the leaves and each node information about its descendants. At each node I would keep a count of the number of descendants, and a bounding box enclosing those descendants.
Now for each point I would recursively search the tree. At each node I visit, either all of the bounding box is within R of the current point, all of the bounding box is more than R away from the current point, or some of it is inside R and some outside R. In the first case I can use the count of the number of descendants of the current node to increase the count of points within R of the current point and return up one level of the recursion. In the second case I can simply return up one level of the recursion without incrementing anything. It is only in the intermediate case that I need to continue recursing down the tree.
So I can work out for each point the number of neighbours within R without checking every other point, and pick the point with the highest count.
If the points are spread out evenly then I think you will end up constructing a k-d tree where the lower levels are close to a regular grid, and I think if the grid is of size A x A then in the worst case R is large enough so that its boundary is a circle that intersects O(A) low level cells, so I think that if you have O(n) points you could expect this to cost about O(n * sqrt(n)).
You can speed up whatever algorithm you use by preprocessing your data in O(n) time to estimate the number of neighbouring points.
For a circle of radius R, create a grid whose cells have dimension R in both the x- and y-directions. For each point, determine to which cell it belongs. For a given cell c this test is easy:
c.x<=p.x && p.x<=c.x+R && c.y<=p.y && p.y<=c.y+R
(You may want to think deeply about whether a closed or half-open interval is correct.)
If you have relatively dense/homogeneous coverage, then you can use an array to store the values. If coverage is sparse/heterogeneous, you may wish to use a hashmap.
Now, consider a point on the grid. The extremal locations of a point within a cell are as indicated:
Points at the corners of the cell can only be neighbours with points in four cells. Points along an edge can be neighbours with points in six cells. Points not on an edge are neighbours with points in 7-9 cells. Since it's rare for a point to fall exactly on a corner or edge, we assume that any point in the focal cell is neighbours with the points in all 8 surrounding cells.
So, if a point p is in a cell (x,y), N[p] identifies the number of neighbours of p within radius R, and Np[y][x] denotes the number of points in cell (x,y), then N[p] is given by:
N[p] = Np[y][x]+
Np[y][x-1]+
Np[y-1][x-1]+
Np[y-1][x]+
Np[y-1][x+1]+
Np[y][x+1]+
Np[y+1][x+1]+
Np[y+1][x]+
Np[y+1][x-1]
Once we have the number of neighbours estimated for each point, we can heapify that data structure into a maxheap in O(n) time (with, e.g. make_heap). The structure is now a priority-queue and we can pull points off in O(log n) time per query ordered by their estimated number of neighbours.
Do this for the first point and use a O(log n + k) circle search (or some more clever algorithm) to determine the actual number of neighbours the point has. Make a note of this point in a variable best_found and update its N[p] value.
Peek at the top of the heap. If the estimated number of neighbours is less than N[best_found] then we are done. Otherwise, repeat the above operation.
To improve estimates you could use a finer grid, like so:
along with some clever sliding window techniques to reduce the amount of processing required (see, for instance, this answer for rectangular cases - for circular windows you should probably use a collection of FIFO queues). To increase security you can randomize the origin of the grid.
Considering again the example you posed:
It's clear that this heuristic has the potential to save considerable time: with the above grid, only a single expensive check would need to be performed in order to prove that the middle point has the most neighbours. Again, a higher-resolution grid will improve the estimates and decrease the number of expensive checks which need to be made.
You could, and should, use a similar bounding technique in conjunction with mcdowella's answers; however, his answer does not provide a good place to start looking, so it is possible to spend a lot of time exploring low-value points.

Find overlapping circles

I have a rectangular area where there are circles with equal radius. I want to find which circles overlap with other circles (the output is a list of 2-element sets of overlapping circles).
I know how to check if two of the circles overlap (the distance between their centers is less than the diameter). I can perform this check for every pair of circles, but I was wondering if there is a better algorithm (faster than O(n^2)).
EDIT
The number of circles is usually about 100 and overlappings won't happen very often.
Here is some context:
The rectangle is a battlefield in a game. The movement of the units is done on small steps and I'm trying to detect collisions between units.
Given the new explanation of the problem statement, I would recommend a different approach.
Overlay a square grid over the battlefield, with a grid step equal to one circle diameter. Every circle can overlap at most four cells. In each cell, keep a list of the overlapping circles (and update it on every move).
Detecting potential collisions will now take about four cell/circle tests per circle, i.e. close to linear time.
For a simple solution, insert the centers in a 2d-tree and perform circular range queries around every center with a query radius 2R. In good conditions, this can be O(N Log(N)).
Alternatively, just sort the centers on X and try all circles in turn: by dichotomic search, locate the abscissa Xc and scan to Xc-2R and to Xc+2R, then check the 2D distance.
The cost of the dichotomic searches will be O(N Log(N)). If the circles are uniformly spread out in a square of side S, a stripe of width 4R contains 4RN/S circles, hence a total comparison cost of 4RN²/S. This is a good performance if S is large (think that for N tightly packed circles in a square, S~2R√N, hence 2N√N comparisons).
Direct answer: You cannot get better than O(n^2) in general since the circles could potentially all overlap, so you have to generate n^2 answers.
If you get more specific, you might get better answers. For example, if what you are really trying to do is find bounding spheres in a 2D simulation, you can profit from the fact that entities only move so far between frames, if the circles are sparse it's different from when they are tightly packed, etc. So let us know more about what it's all about.
EDIT: You edited your question - you indeed are looking for collision detection in a 2D simulation. If you check out https://en.wikipedia.org/wiki/Collision_detection , they point to several algorithms for exactly your case.
I like the one detailed right on that page where you keep one list of bounding intervals per axis (2 in "2D") and only need to "work hard" when those bounding intervals (which are themself by definition one-dimensional) change (i.e., there "overlap state"). This removes the O(n²) for well-behaved cases. They don't give an estimate for the complexity of that, but as it basically comes down to sorting, it looks more or less O(n logn) to me, and less when there are only minimal changes between frames.

Finding point with largest weight within a region

My problem is:
We have a set of N points in a 2D space, each of which has a weight. Given any rectangle region R, how to efficiently return the point with the largest weight inside R?
Note that all query regions R have the same shape, i.e. same lengths and widths. And point and rectangle coordinates are float numbers.
My initial idea is use a R-tree to store points. For a region R, extract all points in R, and then find the point with max. weights. The time complexity is O(logN + V), where V is number of points in R. Can we do better?
I tried to search the solution, but still not successfully. Any suggestion?
Thanks,
This sounds like a range max query problem in 2D. You have some very good algorithms here.
Simpler would be to just use a 2D segment tree.
Basically, each node of your segment tree will store 2d regions instead of 1d intervals. So each node will have 4 children, reducing to a quad tree, which you can then operate on as you would a classic segment tree. This is described in detail here.
This will be O(log n) per query, where n is the total number of points. And it also allows you to do a lot more operations, such as update a point's weight, update a region's weights etc.
How about adding an additional attribute to each tree node which contains the maximum weight of all the points contained by any of its children.
This will be easy to update while adding points. It'll be a little more work to maintain when you remove a point that changes the maximum. You'll have to traverse the tree backwards and update the max value for all parent nodes.
With this attribute, if you want to retrieve the maximum-weight point then when you query the tree with your query region you only inspect the child node with the maximum weight as you traverse the tree. Note, you may have more than one point with the same maximum weight so you may have more than one child node to inspect.
Only inspecting the child nodes with a maximum weight attribute will improve your query throughput at the expense of more memory and slower time building/modifying the tree.
Look up so called range trees, which in your case you would want to implement in 2-dimensions. This would be a 2-layer "tree of trees", where you first split the set of points based on x-coordinate and then for each set of x-points at one of the nodes in the resulting tree, you build a tree based on y-coordinate for those points at that node in the original tree. You can look up how to adapt a 2-d range tree to return the number of points in the query rectangle in O((log n)^2) time, independent of the number of points. Similarly, instead of storing the count of points for subrectangles in the range tree, you can store the maximum objective value of points within that rectangle. This will give you O(n log n) time guaranteed storage and construction time, and O((log n)^2) query time, regardless of the number of points in the query rectangle.
An adaptation of so-called "fractional cascading" for range-tree "find all points in query rectangle" might even be able to get your query time down to O(log n), but I'm not sure since you are taking max of value of points within the query rectangle.
Hint:
Every point has a "zone of influence", which is the locus of the positions of the (top-left corner of the) rectangle such that this point dominates. The set of the zones of influence defines a partition of the plane. Every edge of the partition occurs at the abcissa [ordinate] of a given point or its abscissa [ordinate] minus the width [height] of the query region.
If you map the coordinate values to their rank (by sorting on both axis), you can represent the partition as a digital image of size 4N². To precompute this image, initialize it with minus infinity, and for every point you fill its zone of influence with its weight, taking the maximum. If the query window size is R² pixels on average, the cost of constructing the image is NR².
A query is made by finding the row and column of the relevant pixel and returning the pixel value. This takes two dichotomic searches, in time Lg(N).
This approach is only realistic for moderate values of N (say up to 1000). Better insight can be gained on this problem by studying the geometry of the partition map.
You can try a weighted voronoi-diagram when the positive weight is substracted from the euklidian distance. Sites with big weight tends to have big cells with near-by sites with small weights. Then sort the cells by the number of sites and compute a minimum bounding box for each cell. Match it with the rectangular search box.

Segment Intersection

Here is a question from CLRS.
A disk consists of a circle plus its interior and is represented by its center point and radius. Two disks intersect if they have any point in common. Give an O(n lg n)-time algorithm to determine whether any two disks in a set of n intersect.
Its not my home work. I think we can take the horizontal diameter of every circle to be the representing line segment. If two orders come consecutive, then we check the length of the distances between the two centers. If its less than or equal to the sum of the radii of the circles, then they intersect.
Please let me know if m correct.
Build a Voronoi diagram for disk centers. This is an O(n log n) job.
Now for each edge of the diagram take the corresponding pair of centers and check whether their disk intersect.
Build a k-d tree with the centres of the circles.
For every circle (p, r), find using the k-d tree the set S of circles whose centres are nearer than 2r from p.
Check if any of the circles in S touches the current circle.
I think the average cost for this algorithm is O(NlogN).
The logic is that we loop over the set O(N), and for every element get a subset of elements near O(NlogN), so, a priori, the complexity is O(N^2 logN). But we also have to consider that the probability of two random circles being less than 2r apart and not touching is lesser than 3/4 (if they touch we can short-circuit the algorithm).
That means that the average size of S is probabilistically limited to be a small value.
Another approach to solve the problem:
Divide the plane using a grid whose diameter is that of the biggest circle.
Use a hashing algorithm to classify the grid cells in N groups.
For every circle calculate the grid cells it overlaps and the corresponding groups.
Get all the circles in a group and...
Check if the biggest circle touches any other circle in the group.
Recurse applying this algorithm to the remaining circles in the group.
This same algorithm implemented in scala: https://github.com/salva/simplering/blob/master/touching/src/main/scala/org/vesbot/simplering/touching/Circle.scala

Resources