FInd furthest point in O(1) time - algorithm

Consider a set S of n points in the plane such that the farthest pair is having distance at most 1. I would like to find the farthest point of a given query point q (not in S) in O(1) time. How do I pre-process the points in S to achieve the desired query time bound?
Can this be possible?

It is not possible stricto sensu. This is a point location problem in a planar straight line graph, which is known to require O(log(N)) query time.
Anyway, it can be addressed approximately by gridding.
Overlay a square grid over the furthest point Voronoi diagram, and for every cell note the regions it covers. Make sure that the number of covered regions is bounded. This can be approximately achieved by taking a grid pitch smaller than the distance of the two closest vertices in the diagram.
For a query pixel, finding the containing cell is done in constant time. Then finding the region among a bounded number takes constant time as well.

Assuming there is no relation between the points, there is no single operation that will give you the furthest point. So, the only way to do it is to compute it in advance, so you need a simple mapping between each point and the point furthest from it.

Related

Optimize bruteforce solution of searching nearest point

I have non empty Set of points scattered on plane, they are given by their coordinates.
Problem is to quickly reply such queries:
Give me the point from your set which is nearest to the point A(x, y)
My current solution pseudocode
query( given_point )
{
nearest_point = any point from Set
for each point in Set
if dist(point, query_point) < dist(nearest_point, given_point)
nearest_point = point
return nearest_point
}
But this algorithm is very slow with complexity is O(N).
The question is, is there any data structure or tricky algorithms with precalculations which will dramatically reduce time complexity? I need at least O(log N)
Update
By distance I mean Euclidean distance
You can get O(log N) time using a kd-tree. This is like a binary search tree, except that it splits points first on the x-dimension, then the y-dimension, then the x-dimension again, and so on.
If your points are homogeneously distributed, you can achieve O(1) look-up by binning the points into evenly-sized boxes and then searching the box in which the query point falls and its eight neighbouring boxes.
It would be difficult to make an efficient solution from Voronoi diagrams since this requires that you solve the problem of figuring out which Voronoi cell the query point falls in. Much of the time this involves building an R*-tree to query the bounding boxes of the Voronoi cells (in O(log N) time) and then performing point-in-polygon checks (O(p) in the number of points in the polygon's perimeter).
You can divide your grid in subsections:
Depending on the number of points and grid size, you choose a useful division. Let's assume a screen of 1000x1000 pixels, filled with random points, evenly distributed over the surface.
You may divide the screen into 10x10 sections and make a map (roughX, roughY)->(List ((x, y), ...). For a certain point, you may lookup all points in the same cell and - since the point may be closer to points of the neighbor cell than to an extreme point in the same cell, the surrounding cells, maybe even 2 cells away. This would reduce the searching scope to 16 cells.
If you don't find a point in the same cell/layer, expand the search to next layer.
If you happen to find the next neighbor in one of the next layers, you have to expand the searching scope to an additional layer for each layer. If there are too many points, choose a finer grid. If there are to few points, choose a bigger grid. Note, that the two green circles, connected to the red with a line, have the same distance to the red one, but one is in layer 0 (same cell) but the other layer 2 (next of next cell).
Without preprocessing you definitely need to spend O(N), as you must look at every point before return the closest.
You can look here Nearest neighbor search for how to approach this problem.

Most efficient way to select point with the most surrounding points

N.B: there's a major edit at the bottom of the question - check it out
Question
Say I have a set of points:
I want to find the point with the most points surrounding it, within radius (ie a circle) or within (ie a square) of the point for 2 dimensions. I'll refer to it as the densest point function.
For the diagrams in this question, I'll represent the surrounding region as circles. In the image above, the middle point's surrounding region is shown in green. This middle point has the most surrounding points of all the points within radius and would be returned by the densest point function.
What I've tried
A viable way to solve this problem would be to use a range searching solution; this answer explains further and that it has " worst-case time". Using this, I could get the number of points surrounding each point and choose the point with largest surrounding point count.
However, if the points were extremely densely packed (in the order of a million), as such:
then each of these million points () would need to have a range search performed. The worst-case time , where is the number of points returned in the range, is true for the following point tree types:
kd-trees of two dimensions (which are actually slightly worse, at ),
2d-range trees,
Quadtrees, which have a worst-case time of
So, for a group of points within radius of all points within the group, it gives complexity of for each point. This yields over a trillion operations!
Any ideas on a more efficient, precise way of achieving this, so that I could find the point with the most surrounding points for a group of points, and in a reasonable time (preferably or less)?
EDIT
Turns out that the method above is correct! I just need help implementing it.
(Semi-)Solution
If I use a 2d-range tree:
A range reporting query costs , for returned points,
For a range tree with fractional cascading (also known as layered range trees) the complexity is ,
For 2 dimensions, that is ,
Furthermore, if I perform a range counting query (i.e., I do not report each point), then it costs .
I'd perform this on every point - yielding the complexity I desired!
Problem
However, I cannot figure out how to write the code for a counting query for a 2d layered range tree.
I've found a great resource (from page 113 onwards) about range trees, including 2d-range tree psuedocode. But I can't figure out how to introduce fractional cascading, nor how to correctly implement the counting query so that it is of O(log n) complexity.
I've also found two range tree implementations here and here in Java, and one in C++ here, although I'm not sure this uses fractional cascading as it states above the countInRange method that
It returns the number of such points in worst case
* O(log(n)^d) time. It can also return the points that are in the rectangle in worst case
* O(log(n)^d + k) time where k is the number of points that lie in the rectangle.
which suggests to me it does not apply fractional cascading.
Refined question
To answer the question above therefore, all I need to know is if there are any libraries with 2d-range trees with fractional cascading that have a range counting query of complexity so I don't go reinventing any wheels, or can you help me to write/modify the resources above to perform a query of that complexity?
Also not complaining if you can provide me with any other methods to achieve a range counting query of 2d points in in any other way!
I suggest using plane sweep algorithm. This allows one-dimensional range queries instead of 2-d queries. (Which is more efficient, simpler, and in case of square neighborhood does not require fractional cascading):
Sort points by Y-coordinate to array S.
Advance 3 pointers to array S: one (C) for currently inspected (center) point; other one, A (a little bit ahead) for nearest point at distance > R below C; and the last one, B (a little bit behind) for farthest point at distance < R above it.
Insert points pointed by A to Order statistic tree (ordered by coordinate X) and remove points pointed by B from this tree. Use this tree to find points at distance R to the left/right from C and use difference of these points' positions in the tree to get number of points in square area around C.
Use results of previous step to select "most surrounded" point.
This algorithm could be optimized if you rotate points (or just exchange X-Y coordinates) so that width of the occupied area is not larger than its height. Also you could cut points into vertical slices (with R-sized overlap) and process slices separately - if there are too many elements in the tree so that it does not fit in CPU cache (which is unlikely for only 1 million points). This algorithm (optimized or not) has time complexity O(n log n).
For circular neighborhood (if R is not too large and points are evenly distributed) you could approximate circle with several rectangles:
In this case step 2 of the algorithm should use more pointers to allow insertion/removal to/from several trees. And on step 3 you should do a linear search near points at proper distance (<=R) to distinguish points inside the circle from the points outside it.
Other way to deal with circular neighborhood is to approximate circle with rectangles of equal height (but here circle should be split into more pieces). This results in much simpler algorithm (where sorted arrays are used instead of order statistic trees):
Cut area occupied by points into horizontal slices, sort slices by Y, then sort points inside slices by X.
For each point in each slice, assume it to be a "center" point and do step 3.
For each nearby slice use binary search to find points with Euclidean distance close to R, then use linear search to tell "inside" points from "outside" ones. Stop linear search where the slice is completely inside the circle, and count remaining points by difference of positions in the array.
Use results of previous step to select "most surrounded" point.
This algorithm allows optimizations mentioned earlier as well as fractional cascading.
I would start by creating something like a https://en.wikipedia.org/wiki/K-d_tree, where you have a tree with points at the leaves and each node information about its descendants. At each node I would keep a count of the number of descendants, and a bounding box enclosing those descendants.
Now for each point I would recursively search the tree. At each node I visit, either all of the bounding box is within R of the current point, all of the bounding box is more than R away from the current point, or some of it is inside R and some outside R. In the first case I can use the count of the number of descendants of the current node to increase the count of points within R of the current point and return up one level of the recursion. In the second case I can simply return up one level of the recursion without incrementing anything. It is only in the intermediate case that I need to continue recursing down the tree.
So I can work out for each point the number of neighbours within R without checking every other point, and pick the point with the highest count.
If the points are spread out evenly then I think you will end up constructing a k-d tree where the lower levels are close to a regular grid, and I think if the grid is of size A x A then in the worst case R is large enough so that its boundary is a circle that intersects O(A) low level cells, so I think that if you have O(n) points you could expect this to cost about O(n * sqrt(n)).
You can speed up whatever algorithm you use by preprocessing your data in O(n) time to estimate the number of neighbouring points.
For a circle of radius R, create a grid whose cells have dimension R in both the x- and y-directions. For each point, determine to which cell it belongs. For a given cell c this test is easy:
c.x<=p.x && p.x<=c.x+R && c.y<=p.y && p.y<=c.y+R
(You may want to think deeply about whether a closed or half-open interval is correct.)
If you have relatively dense/homogeneous coverage, then you can use an array to store the values. If coverage is sparse/heterogeneous, you may wish to use a hashmap.
Now, consider a point on the grid. The extremal locations of a point within a cell are as indicated:
Points at the corners of the cell can only be neighbours with points in four cells. Points along an edge can be neighbours with points in six cells. Points not on an edge are neighbours with points in 7-9 cells. Since it's rare for a point to fall exactly on a corner or edge, we assume that any point in the focal cell is neighbours with the points in all 8 surrounding cells.
So, if a point p is in a cell (x,y), N[p] identifies the number of neighbours of p within radius R, and Np[y][x] denotes the number of points in cell (x,y), then N[p] is given by:
N[p] = Np[y][x]+
Np[y][x-1]+
Np[y-1][x-1]+
Np[y-1][x]+
Np[y-1][x+1]+
Np[y][x+1]+
Np[y+1][x+1]+
Np[y+1][x]+
Np[y+1][x-1]
Once we have the number of neighbours estimated for each point, we can heapify that data structure into a maxheap in O(n) time (with, e.g. make_heap). The structure is now a priority-queue and we can pull points off in O(log n) time per query ordered by their estimated number of neighbours.
Do this for the first point and use a O(log n + k) circle search (or some more clever algorithm) to determine the actual number of neighbours the point has. Make a note of this point in a variable best_found and update its N[p] value.
Peek at the top of the heap. If the estimated number of neighbours is less than N[best_found] then we are done. Otherwise, repeat the above operation.
To improve estimates you could use a finer grid, like so:
along with some clever sliding window techniques to reduce the amount of processing required (see, for instance, this answer for rectangular cases - for circular windows you should probably use a collection of FIFO queues). To increase security you can randomize the origin of the grid.
Considering again the example you posed:
It's clear that this heuristic has the potential to save considerable time: with the above grid, only a single expensive check would need to be performed in order to prove that the middle point has the most neighbours. Again, a higher-resolution grid will improve the estimates and decrease the number of expensive checks which need to be made.
You could, and should, use a similar bounding technique in conjunction with mcdowella's answers; however, his answer does not provide a good place to start looking, so it is possible to spend a lot of time exploring low-value points.

Two salesmen - one always visits the nearest neighbour, the other the farthest

Consider this question relative to graph theory:
Let G a complete (every vertex is connected to all the other vertices) non-directed graph of size N x N. Two "salesmen" travel this way: the first always visits the nearest non visited vertex, the second the farthest, until they have both visited all the vertices. We must generate a matrix of distances and the starting points for the two salesmen (they can be different) such that:
All the distances are unique Edit: positive integers
The distance from a vertex to itself is always 0.
The difference between the total distance covered by the two salesmen must be a specific number, D.
The distance from A to B is equal to the distance from B to A
What efficient algorithms cn be useful to help me? I can only think of backtracking, but I don't see any way to reduce the work to be done by the program.
Geometry is helpful.
Using the distances of points on a circle seems like it would work. Seems like you could determine adjust D by making the circle radius larger or smaller.
Alternatively really any 2D shape, where the distances are all different could probably used as well. In this case you should scale up or down the shape to obtain the correct D.
Edit: Now that I think about it, the simplest solution may be to simply pick N random 2D points, say 32 bit integer coordinates to lower the chances of any distances being too close to equal. If two distances are too close, just pick a different point for one of them until it's valid.
Ideally, you'd then just need to work out a formula to determine the relationship between D and the scaling factor, which I'm not sure of offhand. If nothing else, you could also just use binary search or interpolation search or something to search for scaling factor to obtain the required D, but that's a slower method.

Shortest distance to rectangle caching

I have a list of rectangles that don't have to be parallel to the axes. I also have a master rectangle that is parallel to the axes.
I need an algorithm that can tell which rectangle is a point closest to(the point must be in the master rectangle). the list of rectangles and master rectangle won't change during the algorithm and will be called with many points so some data structure should be created to make the lookup faster.
To be clear: distance from a rectangle to a point is the distance between the closest point in the rectangle to the point.
What algorithm/data structure can be used for this? memory is on higher priority on this, n log n is ok but n^2 is not.
You should be able to do this with a Voronoi diagram with O(n log n) preprocessing time with O(log n) time queries. Because the objects are rectangles, not points, the cells may be curved. Nevertheless, a Voronoi diagram should work fine for your purposes. (See http://en.wikipedia.org/wiki/Voronoi_diagram)
For a quick and dirty solution that you could actually get working within a day, you could do something inspired by locality sensitive hashing. For example, if the rectangles are somewhat well-spaced, you could hash them into square buckets with a few different offsets, and then for each query examine each rectangle that falls in one of the handful of buckets that contain the query point.
You should be able to do this in O(n) time and O(n) memory.
Calculate the closest point on each edge of each rectangle to the point in question. To do this, see my detailed answer in the this question. Even though the question has to do with a point inside of the polygon (rather than outside of it), the algorithm still can be applied here.
Calculate the distance between each of these closest points on the edges, and find the closest point on the entire rectangle (for each rectangle) to the point in question. See the link above for more details.
Find the minimum distance between all of the rectangles. The rectangle corresponding with your minimum distance is the winner.
If memory is more valuable than speed, use brute force: for a given point S, compute the distance from S to each edge. Choose the rectangle with the shortest distance.
This solution requires no additional memory, while its execution time is in O(n).
Depending on your exact problem specification, you may have to adjust this solution if the rectangles are allowed to overlap with the master rectangle.
As you described, a distance between one point to a rectangle is the minimum length of all lines through that point which is perpendicular with all four edges of a rectangle and all lines connect that point with one of four vertices of the rectangle.
(My English is not good at describing a math solution, so I think you should think more deeply for understanding my explanation).
For each rectangle, you should save four vertices and four edges function for fast calculation distance between them with the specific point.

Faster way to compare two sets of points in N-dimensional space?

List1 contains a high number (~7^10) of N-dimensional points (N <=10), List2 contains the same or fewer number of N-dimensional points (N <=10).
My task is this: I want to check which point in List2 is closest (euclidean distance) to a point in List1 for every point in List1 and subsequently perform some operation on it. I have been doing it the simple- the nested loop way when I didn't have more than 50 points in List1, but with 7^10 points, this obviously takes up a lot of time.
What is the fastest way to do this? Any concepts from Computational Geometry might help?
EDIT: I have the following in place, I have built a kd-tree out of List2 and then now I am doing a nearest-neighborhood search for each point in List1. Now as I originally pointed out, List1 has 7^10 points, and hence though I am saving on the brute force, Euclidean distance method for every pair, the sheer large number of points in List1 is causing a lot of time consumption. Is there any way I can improve this?
Well a good way would be to use something like a kd-tree and perform nearest neighbour searching. Fortunately you do not have to implement this data structure yourself, it has been done before. I recommend this one, but there are others:
http://www.cs.umd.edu/~mount/ANN/
It's not possible to tell you which is the most efficient algorithm without knowing anything about the distribution of points in the two solutions. However, for a first guess...
First algorithm doesn't work — for two reasons: (1) a wrong assumption - I assume the bounding hulls are disjoint, and (2) a misreading of the question - it doesn't find the shortest edge for every pair of points.
...compute the convex hull of the two sets: the closest points must be on the hyperface on the two hulls through which the line between the two centres of gravity passes.
You can compute the convex hull by computing the centre points, the centre of gravity assuming all points have equal mass, and ordering the lists from furthest from the centre to least far. Then take the furthest away point in the list, add this to the convex hull, and then remove all points that are within the so-far computed convex hull (you will need to compute lots of 10d hypertriangles to do this). Repeat unil there is nothing left in the list that is not on the convex hull.
Second algorithm: partial
Compute the convex hull for List2. For each point of List1, if the point is outside the convex hull, then find the hyperface as for first algorithm: the nearest point must be on this face. If it is on the face, likewise. If it is inside, you can still find the hyperface by extending the line past the point from List1: the nearest point must be inside the ball that includes the hyperface to List2's centre of gravity: here, though, you need a new algorithm to get the nearest point, perhaps the kd-tree approach.
Perfomance
When List2 is something like evenly distributed, or normally distributed, through some fairly oblique shape, this will do a good job of reducing the number of points under consideration, and it should be compatible with the kd-tree suggestion.
There are some horrible worts cases, though: if List2 contains only points on the surface of a torus whose geometric centre is the centre of gravity of the list, then the convex hull will be very expensive to calculate, and will not help much in reducing the number of points under consideration.
My evaluation
These kinds of geometric techniques may be a useful complement to the kd-trees approach of other posters, but you need to know a little about the distribution of points before you can determine whether they are worth applying.
kd-tree is pretty fast. I've used the algorithm in this paper and it works well Bentley - K-d trees for semidynamic point sets
I'm sure there are libraries around, but it's nice to know what's going on sometimes - Bentley explains it well.
Basically, there are a number of ways to search a tree: Nearest N neighbors, All neighbors within a given radius, nearest N neighbors within a radius. Sometimes you want to search for bounded objects.
The idea is that the kdTree partitions the space recursively. Each node is split in 2 down the axis in one of the dimensions of the space you are in. Ideally it splits perpendicular to the node's longest dimension. You should keep splitting the space until you have about 4 points in each bucket.
Then for every query point, as you recursively visit nodes, you check the distance from to the partition wall for the particular node you are in. You descend both nodes (the one you are in and its sibling) if the distance to the partition wall is closer than the search radius. If the wall is beyond the radius, just search children of the node you are in.
When you get to a bucket (leaf node), you test the points in there to see if they are within the radius.
If you want the closest point, you can start with a massive radius, and pass a pointer or reference to it as you recurse - and in that way you can shrink the search radius as you find close points - and home in on the closest point pretty fast.
(A year later) kd trees that quit early, after looking at say 1M of all 200M points,
can be much faster in high dimensions.
The results are only statistically close to the absolute nearest, depending on the data and metric;
there's no free lunch.
(Note that sampling 1M points, and kd tree only those 1M, is quite different, worse.)
FLANN does this for image data with dim=128,
and is I believe in opencv. A local mod of the fast and solid
SciPy cKDTree also has cutoff= .

Resources