Efficient algorithm to find closest line segments for each point - computational-geometry

Given a polygonal subdivison S and a set of points P, find the closest line segment in S for each point (in 2-d space).
In my setting, I have hundreds of thousands of segments and a couple thousand points.
Checking each line for each point would take too long. Is there an efficient algorithm for this?
I was considering multiple options, but can't figure out which is best.
Build a trapezoidal map and query the face each point is in. Then go over the edges of the face (in the subdivision) to find the nearest line.
Build a range tree or segment tree. Query a box around the point and find the closest line segment in it. There has to be a segment in the box for this to find anything.
Build a line segment voronoi diagram. Each face describes the nearest segment, but I wouldn't know how to do a point query, since the edges can be parabolic arcs.
What is a good high-level approach for this problem?

Nearest Neighbor in Postgis
The approach of Postgis is to use R-trees with a custom search algorithm. While going down the tree like in a regular query, they keep track of the minimum and maximum distance to objects in the bounding regions they encounter in the tree. Each encountered branch of the tree is added to an Active Branch List (ABL), which are pruned using the distance metrics.
They pick a branch's bounding region in the ABL and apply the algorithm recursively. At a leaf (an object like a line segment), it updates the variable nearest. When returning from the recursion, they apply the nearest variable for more pruning of the ABL, repeating until the ABL is empty.
The theoretical worst case is linear per query, but it has much better results in practice.

Related

Polyline path in 2D - find all nearest passing of landmark points

Given a poly-line path in 2D (like a GPS trace) I am interested in finding all points where the path gets close to existing landmarks. See the diagram below. This could be considered a problem Strava is solving when it reports running time between landmarks.
The landmarks have a small radius and I am only interested in them when the path crosses through that radius - finding the red dot where the paths is closest to the landmark.
There are many more landmarks than points on the paths.
Given a line segment and a landmark it is not difficult to compute the minimum distance using the vector dot product. The problem is to efficiently find the line segments that pass through landmarks.
I am not looking for code but the general algorithms and data structures to make this efficient - I lack the background in geometry where this problem is located.
The following properties could be exploited:
Using the bounding box of the path, the landmarks to be considered could be cut down. Landmarks could be stored in a quad-tree or 2d-tree for this.
The points of the paths form a sequence. One could walk along the paths only considering the next landmark that could come within reach.
Landmarks are static, paths change.
The bird's eye view of multiple proximity data structures including quad- and 2d-trees is that you have a tree where
Each leaf has a site (here, a landmark point);
Each interior node has some data that gives a lower bound on the nearest site in its sub-tree.
The lower bound doesn't have to be the minimum; we could just use 0 everywhere (and thereby recover the brute force algorithm). This is the same idea as admissible heuristics in the A* algorithm.
Then to find all sites within some distance r of a query point, we traverse the tree but skip sub-trees where the lower bound is greater than r.
Thing is, this works for query line segments too (and many other geometric primitives besides). All we need is code to compute
The distance from a query primitive to a site,
A lower bound on the distance from a query primitive to sites in a sub-tree.
You already have the first. With quad- or 2d-trees, for the second, you could use a line-segment-to-box distance, or you could implement something simpler (e.g., maximum separation in either the horizontal or vertical dimension).

Nearest point to set of line segments

I have a point p, and n line segments in the 2d space. Is there a way I can preprocess the line segments so that I can efficiently (i.e. sublineraly) find the line segment closest (i.e with lowest perpendicular distance) to P?
This is a real-world problem we're trying to solve. The best (approximate) answer we have is to preprocess the ends of the line segments of the points into a quad tree/2d kd tree, and find the nearest point. This should lead to a nearly optimal answer (or maybe even correct answer) in most cases.
Alternately one can use Mongodb's geonear, which works with points as well.
Can we do better than this, particularly in terms of accuracy?
If your segments are uniformly spread and not too long, you can think of a gridding approach: choose a cell size and determine for every cell which segment crosses it (this is done by "drawing" the segments on the grid). Then for a query point, find the nearest non-empty cell, by visiting neighborhoods of increasing size, and compute the exact nearest distance to the segment(s) so found. You need to continue the search as long as the distance between the query point and the next cells does not exceed the shortest distance found so far.
If the distribution is not uniform, a quad-tree decomposition can be better.
More generally, a suitable strategy is to use any acceleration device that quickly will report a small number of candidate segments, with a guaranty: the nearest segment must be among the candidates.

Find nearest edge in graph

I want to find the nearest edge in a graph. Consider the following example:
Figure 1: yellow: vertices, black: edges, blue: query-point
General Information:
The graph contains about 10million vertices and about 15million edges. Every vertex has coordinates. Edges are defined by the two adjacent vertices.
Simplest solution:
I could simply calculate the distance from the query-point to every other edge in the graph, but that would be horribly slow.
Idea and difficulties:
My idea was to use some spatial index to accelerate the query. I already implemented a kd-tree to find the nearest vertex. But as Figure 1 shows the edges incident to the nearest vertex are not necessarily the nearest to the query-point. I would get the edge 3-4 instead of the nearer edge 7-8.
Question:
Is there an algorithm to find the nearest edge in a graph?
A very simple solution (but maybe not the one with lowest complexity) would be to use a quad tree for all your edges based on their bounding box. Then you simply extract the set of edges closest to your query point and iterate over them to find the closest edge.
The extracted set of edges returned by the quad tree should be many factors smaller than your original 15 million edges and therefore a lot less expensive to iterate through.
A quad tree is a simpler data structure than the R-tree. It is fairly common and should be readily available in many environments. For example, in Java the JTS Topology Suite has a structure QuadTree that can easily be wrapped to perform this task.
There are spatial query structures which are appropriate for other types of data than points. The most general is the "R-tree" structure (and its many, many variants), which will allow you to store the bounding rectangles of your line segments. You can then search outward from your query points, examining the segments in the bounding rectangles and stopping when the nearest remaining rectangle is further than the closest line encountered so far. This could have poor performance when there are many long line segments overlapping, but for a PSLG such as you seem to have here, that shouldn't happen.
Another option is to use the segments to define a BSP tree, and scan outwards from your point to find all the "visible" lines. This in turn will be problematic if your point can see many edges.
Without proof:
You start with a constrained Delaunay Triangulation, that is a triangulation that takes the existing edges into account. E.g. CGAL or Triangle can do this. For each query point you determine which triangle it belongs to. Then you you only have to check the edges touching a corner of that triangle.
I think this should work in most cases, but there are certainly corner cases where it fails, e.g. when there are many vertices without any edge at all, so at least you have to remove those empty vertices.
You can compute the voronoi diagram and run a query on each voronoi cell. You can subdivide the voronoi diagram to get a better result. You can combine metric and voronoi diagram:http://www.cc.gatech.edu/~phlosoft/voronoi/
you could insert extra vertices in long edges to get some approximation based on closest vertices ..

Nearest neighbour search in a constantly changing set of line segments

I have a set of line segments. I want to perform the following operations on them:
Insert a new line segment.
Find all line segments within radius R of a given point.
Find all points within radium R1 of a given point.
Given a line segment, find if it intersects any of the existing line segments. Exact intersection point is not necessary(though that probably doesnt simplify anything.)
The problem is algorithms like kd/bd tree or BSP trees is that they assume a static set of points, and in my case the points will constantly get augmented with new points, necessitating rebuilding the tree.
What data-structure/algorithms will be most suited for this situation ?
Edit: Accepted the answer that is the simplest and solves my problem. Thanks everyone!
Maintaining a dynamic tree might not be as bad as you think.
As you insert new points/lines etc into the collection it's clear that you'd need to refine the current tree, but I can't see why you'd have to re-build the whole tree from scratch every time a new entity was added, as you're suggesting.
With a dynamic tree approach you'd have a valid tree at all times, so you can then just use the fast range searches supported by your tree type to find the geometric entities you've mentioned.
For your particular problem:
You could setup a dynamic geometric tree, where the leaf elements
maintain a list of geometric entities (points and lines) associated
with that leaf.
When a line is inserted into the collection it should be pushed onto
the lists of all leaf elements that it intersects with. You can do
this efficiently by traversing the subset of the tree from the root
that intersects with the line.
To find all points/lines within a specified radial halo you first need
to find all leaves in this region. Again, this can be done by traversing
the subset of the tree from the root that is enclosed by, or that
intersects with the halo. Since there maybe some overlap, you need to
check that the entities associated with this set of leaf elements
actually lie within the halo.
Once you've inserted a line into a set of leaf elements, you can find
whether it intersects with another line by scanning all of the lines
associated with the subset of leaf boxes you've just found. You can then
do line-line intersection tests on this subset.
A potential dynamic tree refinement algorithm, based on an upper limit to the number of entities associated with each leaf in the tree, might work along these lines:
function insert(x, y)
find the tree leaf element enclosing the new entitiy at (x,y) based on whatever
fast search algorithm your tree supports
if (number of entities per leaf > max allowable) do
refine current leaf element (would typically either be a bisection
or quadrisection based on a 2D tree type)
push all entities from the old leaf element list onto the new child element
lists, based on the enclosing child element
else
push new entity onto list for leaf element
endif
This type of refinement strategy only makes local changes to the tree and is thus generally pretty fast in practice. If you're also deleting entities from the collection you can also support dynamic aggregation by imposing a minimum number of entities per leaf, and collapsing leaf elements to their parents when necessary.
I've used this type of approach a number of times with quadtrees/octrees, and I can't at this stage see why a similar approach wouldn't work with kd-trees etc.
Hope this helps.
One possibility is dividing your space into a grid of boxes - perhaps 10 in the y-axis and 10 in the x-axis for a total of 100.
Store these boxes in an array, so it's very easy/fast to determine neighboring boxes. Each box will hold a list vector of line segments that live in that box.
When you calculate line segments within R of one segment, you can check only the relevant neighboring boxes.
Of course, you can create multiple maps of differing granularities, maybe 100 by 100 smaller boxes. Simply consider space vs time and maintenance trade-offs in your design.
updating segment positions is cheap: just integer-divide by box sizes in the x and y directions. For example, if box-size is 20 in both directions and your new coordinate is 145,30. 145/20==7 and 30/20==1, so it goes into box(7,1), for a 0-based system.
While items 2 & 3 are relatively easy, using a simple linear search with distance checks as each line is inserted, item 4 is a bit more involved.
I'd tend to use a constrained triangulation to solve this, where all the input lines are treated as constraints, and the triangulation is balanced using a nearest neighbour rather than Delaunay criterion. This is covered pretty well in Triangulations and applications by Øyvind Hjelle, Morten Dæhlen and Joseph O'Rourkes Computational Geometry in C Both have source available, including getting sets of all intersections.
The approach I've taken to do this dynamically in the past is as follows;
Create a arbitrary triangulation (TIN) comprising of two triangles
surrounding the extents (current + future) of your data.
For each new line
Insert both points into the TIN. This can be done very quickly by
traversing triangles to find the insertion point, and replacing the
triangle found with three new triangles based on the new point and
old triangle.
Cut a section through the TIN based on the two end points, keeping a
list of points where the section cuts any previously inserted lined.
Add the intersection point details to a list stored against both
lines, and insert them into the TIN.
Force the inserted line as a constraint
Balance all pairs of adjacent triangles modified in the above process
using a nearest neighbour criterion, and repeat until all triangles
have been balanced.
This works better than a grid based method for poorly distributed data, but is more difficult to implement. Grouping end-point and lines into overlapping grids will probably be a good optimization for 2 & 3.
Note that I think using the term 'nearest neighbour' in your question is misleading, as this is not the same as 'all points within a given distance of a line', or 'all points within a given radius of another point'. Nearest neighbour typically implies a single result, and does not equate to 'within a given point to point or point to line distance'.
Instead of inserting and deleting into a tree you can calculate a curve that completely fills the plane. Such a curve reduce the 2d complexity to a 1d complexity and you would be able to find the nearest neighbor. You can find some example like z curve and hilbert curve. Here is a better description of my problem http://en.wikipedia.org/wiki/Closest_pair_of_points_problem.

Nearest neighbor search with periodic boundary conditions

In a cubic box I have a large collection points in R^3. I'd like to find the k nearest neighbors for each point. Normally I'd think to use something like a k-d tree, but in this case I have periodic boundary conditions. As I understand it, a k-d tree works by partitioning the space by cutting it into hyper planes of one less dimension, i.e. in 3D we would split the space by drawing 2D planes. For any given point, it is either on the plane, above it, or below it. However, when you split the space with periodic boundary conditions a point could be considered to be on either side!
What's the most efficient method of finding and maintaining a list of nearest neighbors with periodic boundary conditions in R^3?
Approximations are not sufficient, and the points will only be moved one at a time (think Monte Carlo not N-body simulation).
Even in the Euclidean case, a point and its nearest neighbor may be on opposite sides of a hyperplane. The core of nearest-neighbor search in a k-d tree is a primitive that determines the distance between a point and a box; the only modification necessary for your case is to take the possibility of wraparound into account.
Alternatively, you could implement cover trees, which work on any metric.
(I'm posting this answer even though I'm not fully sure it works. Intuitively it seems right, but there might be an edge case I haven't considered)
If you're working with periodic boundary conditions, then you can think of space as being cut into a series of blocks of some fixed size that are all then superimposed on top of one another. Suppose that we're in R2. Then one option would be to replicate that block nine times and arrange them into a 3x3 grid of duplicates of the block. Given this, if we find the nearest neighbor of any single node in the central square, then either
The nearest neighbor is inside the central square, in which case the neighbor is a nearest neighbor, or
The nearest neighbor is in a square other than the central square. In that case, if we find the point in the central square that the neighbor corresponds to, that point should be the nearest neighbor of the original test point under the periodic boundary condition.
In other words, we just replicate the elements enough times so that the Euclidean distance between points lets us find the corresponding distance in the modulo space.
In n dimensions, you would need to make 3n copies of all the points, which sounds like a lot, but for R3 is only a 27x increase over the original data size. This is certainly a huge increase, but if it's within acceptable limits you should be able to use this trick to harness a standard kd-tree (or other spacial tree).
Hope this helps! (And hope this is correct!)

Resources