Nearest point to set of line segments - data-structures

I have a point p, and n line segments in the 2d space. Is there a way I can preprocess the line segments so that I can efficiently (i.e. sublineraly) find the line segment closest (i.e with lowest perpendicular distance) to P?
This is a real-world problem we're trying to solve. The best (approximate) answer we have is to preprocess the ends of the line segments of the points into a quad tree/2d kd tree, and find the nearest point. This should lead to a nearly optimal answer (or maybe even correct answer) in most cases.
Alternately one can use Mongodb's geonear, which works with points as well.
Can we do better than this, particularly in terms of accuracy?

If your segments are uniformly spread and not too long, you can think of a gridding approach: choose a cell size and determine for every cell which segment crosses it (this is done by "drawing" the segments on the grid). Then for a query point, find the nearest non-empty cell, by visiting neighborhoods of increasing size, and compute the exact nearest distance to the segment(s) so found. You need to continue the search as long as the distance between the query point and the next cells does not exceed the shortest distance found so far.
If the distribution is not uniform, a quad-tree decomposition can be better.
More generally, a suitable strategy is to use any acceleration device that quickly will report a small number of candidate segments, with a guaranty: the nearest segment must be among the candidates.

Related

Efficient algorithm to find closest line segments for each point

Given a polygonal subdivison S and a set of points P, find the closest line segment in S for each point (in 2-d space).
In my setting, I have hundreds of thousands of segments and a couple thousand points.
Checking each line for each point would take too long. Is there an efficient algorithm for this?
I was considering multiple options, but can't figure out which is best.
Build a trapezoidal map and query the face each point is in. Then go over the edges of the face (in the subdivision) to find the nearest line.
Build a range tree or segment tree. Query a box around the point and find the closest line segment in it. There has to be a segment in the box for this to find anything.
Build a line segment voronoi diagram. Each face describes the nearest segment, but I wouldn't know how to do a point query, since the edges can be parabolic arcs.
What is a good high-level approach for this problem?
Nearest Neighbor in Postgis
The approach of Postgis is to use R-trees with a custom search algorithm. While going down the tree like in a regular query, they keep track of the minimum and maximum distance to objects in the bounding regions they encounter in the tree. Each encountered branch of the tree is added to an Active Branch List (ABL), which are pruned using the distance metrics.
They pick a branch's bounding region in the ABL and apply the algorithm recursively. At a leaf (an object like a line segment), it updates the variable nearest. When returning from the recursion, they apply the nearest variable for more pruning of the ABL, repeating until the ABL is empty.
The theoretical worst case is linear per query, but it has much better results in practice.

Approximate a curve with a limited number of line segments and arcs of circles

Is there any algorithm that would allow to approximate a path on the x-y plane (i.e. an ordered suite of points defined by x and y) with a limited number of line segments and arcs of circles (constant curvature)? The resulting curve needs to be C1 (continuity of slope).
The maximum number or segments and arcs could be a parameter. An additional interesting constraint would be to prevent two consecutive circles of arcs without an intermediate line segment joining them.
I do not see any way to do this, and I do not think that there exists a method for it, but any hint towards this objective is welcome.
Example:
Sample file available here
Consider this path. It looks like a line, but is actually an ordered suite of very close points. There is no noise and the order of the sequence of points is well known.
I would like to approximate this curve with a minimum number of succession of line segments and circular arcs (let's say 10 line segments and 10 circular arcs) and a C1 continuity. The number of segments/arcs is not an objective itself but I need any parameter which would allow to reduce/increase this number to attain a certain simplicity of the parametrization, at the cost of accuracy loss.
Solution:
Here is my solution, based on Spektre's answer. Red curve is original data. Black lines are segments and blue curves are circle arcs. Green crosses are arc centers with radii shown and blue ones are points where segments potentially join.
Detect line segments, based on slope max deviation and segment minimal length as parameters. The slope of the new segment step is compared with the average step of the existing segment. I would prefer an optimization-based method, but I do not think that it exists for disjoint segments with unknown number, position and length.
Join segments with tangent arcs. To close the system, the radius is chosen such that the segments extremities are the least possible moved. A minimum radius constraint has been added for my purposes. I believe that there will be some special cases to treat in the inflexion points are far away when (e.g. lines are nearly parallel) and interact with neigboring segments.
so you got a point cloud ... for such Usually points close together are considered connected so:
you need to add info about what points are close to which ones
points close only to 2 neighbors signaling interior of curve/line. Only one neighbor means endpoint of curve/lines and more then 2 means intersection or too close almost or parallel lines/curves. No neighbors means either noise or just a dot.
group path segments together
This is called connected component analysis. So you need to form polylines from your neighbor info table.
detect linear path chunks
these have the same slope among neighboring segments so you can join them to single line.
fit the rest with curves
Here related QAs:
Finding holes in 2d point sets?
Algorithms: Ellipse matching
How approximation search works see the sublinks there are quite a bit of examples of fitting
Trace a shape into a polygon of max n sides
[Edit1] simple line detection from #3 on your data
I used 5.0 deg angle change as threshold for lines and also minimal size fo detected line as 50 samples (too lazy to compute length assuming constant point density). The result looks like this:
dots are detected line endpoints, green lines are the detected lines and white "lines" are the curves so I do not see any problem with this approach for now.
Now the problem is with the points left (curves) I think there should be also geometric approach for this as it is just circular arcs so something like this
Formula to draw arcs ending in straight lines, Y as a function of X, starting slope, ending slope, starting point and arc radius?
And this might help too:
Circular approximation of polygon (or its part)
the C1 requirement demands the you must have alternating straights and arcs. Also realize if you permit a sufficient number of segments you can trivially fit every pair of points with a straight and use a tiny arc to satisfy slope continuity.
I'd suggest this algorithm,
1 best fit with a set of (specified N) straight segments. (surely there are well developed algorithms for that.)
2 consider the straight segments fixed and at each joint place an arc. Treating each joint individually i think you have a tractable problem to find the optimum arc center/radius to satisfy continuity and improve the fit.
3 now that you are pretty close attempt to consider all arc centers and radii (segments being defined by tangency) as a global optimization problem. This of course blows up if N is large.
A typical constraint when approximating a given curve by some other curve is to bound the approximate curve to an epsilon-hose within the original curve (in terms if Minkowski sum with a disk of fixed radius epsilon).
For G1- or C2-continuous approximation (which people from CNC/CAD like) with biarcs (and a straight-line segment could be seen as an arc with infinite radius) former colleagues of mine developed an algorithm that gives solutions like this [click to enlarge]:
The above picture is taken from the project website: https://www.cosy.sbg.ac.at/~held/projects/apx/apx.html
The algorithm is fast, that is, it runs in O(n log n) time and is based on the generalized Voronoi diagram. However, it does not give an approximation with the exact minimum number of elements. If you look for the theoretical optimum I would refer to a paper by Drysdale et al., Approximation of an Open Polygonal Curve with
a Minimum Number of Circular Arcs and Biarcs, CGTA, 2008.

Remove points to maximize shortest nearest neighbor distance

If I have a set of N points in 2D space, defined by vectors X and Y of their locations. What is an efficient algorithm that will
Select a fixed number (M) points to remove so as to maximize the shortest nearest-neighbor distance among the remaining points.
Remove a minimum number of points so that the shortest nearest-neighbor distance among the remaining points is greater than a fixed distance (D).
Sorting by points by their shortest nearest neighbor distance and removing the points with the smallest values does not give the correct answer, as then you remove both points of close pairs, while you may only need to remove one of the points in those pairs.
For my case, I am usually dealing with 1,000-10,000 points, and I may remove 50-90% of points.
You shouldn't need to store (or compute) the entire distance matrix: a Delaunay triangulation should efficiently (O(n log n) worst case) give you the closest neighbors of your point set. You should also be able to update it efficiently as you delete points.
For most cases of close pairs, you should be able to check to see which of the pair would be farthest from its neighbors if the other is removed. This is not an exact solution; especially if you remove a large proportion of points, removing a locally optimum point may exclude the globally optimum solution. Also, you should be able to deal with clusters of 3 or more locally close points. However, if you are only removing a small proportion of points from a randomly distributed set, both these cases may be relatively rare.
There may or may not be a better way (i.e., an exact and efficient algorithm) to solve your problem, but the above suggestions should lead to an approximate and/or combinatorial approach which works best when the points that need deleting are sparsely distributed.
Noam
One method is to break your 2D space into N partitions. Within each partition, determine an average position for each X,Y. Then perform the nearest neighbor algorightm on the averaged points. Then repeat the nearest neighbor test on the full point set of the partitions that matched.
Here's the catch. The larger the partitions, the fewer points you will have but the less accurate. The smaller the partitions, it will be more accurate but with more points to process.
I can't think of anything other than a brute force approach. But you can probably shorten the data set you are looking at significantly before any analysis.
So, what I would do is. First work out the nearest neighbour distance for each point. Let's call that P_in. Then work out the maximum distance of each point to its M nearest neighbours, call it P_iM. If P_in is greater than P_iM for any point then it can be excluded from the analysis. Basically if you have one point that is a distance of 10 from any other point, and you have another point that is a distance of 9 from the nearest M points then you should remove the first point.
Depending on the level of clustering or how big M is, this might reduce your data set quite a bit.

Nearest vertex search

I'm looking for effective algorithm to find a vertex nearest to point P(x, y, z). The set of vertices is fixed, each request comes with new point P. I tried kd-tree and others known methods and I've got same problem everywhere: if P is closer then all is fine, search is performed for few tree nodes only. However if P is far enough, then more and more nodes should be scanned and finally speed becomes unacceptable slow. In my task I've no ability to specify a small search radius. What are solutions for such case?
Thanks
Igor
One possible way to speed up your search would be to discretize space into a large number of rectangular prisms spaced apart at regular intervals. For example, you could split space up into lots of 1 × 1 × 1 unit cubes. You then distribute the points in space into these volumes. This gives you a sort of "hash function" for points that distributes points into the volume that contains them.
Once you have done this, do a quick precomputation step and find, for each of these volumes, the closest nonempty volumes. You could do this by checking all volumes one step away from the volume, then two steps away, etc.
Now, to do a nearest neighbor search, you can do the following. Start off by hashing your point in space to the volume that contains it. If that volume contains any points, iterate over all of them to find which one is closest. Then, for each of the volumes that you found in the first step of this process, iterate over those points to see if any of them are closer. The resulting closest point is the nearest neighbor to your test point.
If your volumes end up containing too many points, you can refine this approach by subdividing those volumes into even smaller volumes and repeating this same process. You could alternatively create a bunch of smaller k-d trees, one for each volume, to do the nearest-neighbor search. In this way, each k-d tree holds a much smaller number of points than your original k-d tree, and the points within each volume are all reasonable candidates for a nearest neighbor. Therefore, the search should be much, much faster.
This setup is similar in spirit to an octree, except that you divide space into a bunch of smaller regions rather than just eight.
Hope this helps!
Well, this is not an issue of the index structures used, but of your query:
the nearest neighbor becomes just much more fuzzy the further you are away from your data set.
So I doubt that any other index will help you much.
However, you may be able to plug in a threshold in your search. I.e. "find nearest neighbor, but only when within a maximum distance x".
For static, in-memory, 3-d point double vector data, with euclidean distance, the k-d-tree is hard to beat, actually. It just splits the data very very fast. An octree may sometimes be faster, but mostly for window queries I guess.
Now if you really have very few objects but millions of queries, you could try to do some hybrid approach. Roughly something like this: compute all points on the convex hull of your data set. Compute the center and radius. Whenever a query point is x times further away (you need to do the 3d math yourself to figure out the correct x), it's nearest neighbor must be one of the convex hull points. Then again use a k-d-tree, but one containing the hull points only.
Or even simpler. Find the min/max point in each dimension. Maybe add some additional extremes (in x+y, x-y, x+z, x-y, y+z, y-z etc.). So you get a small set of candidates. So lets for now assume that is 8 points. Precompute the center and the distances of these 6 points. Let m be the maximum distance from the center to these 8 points. For a query compute the distance to the center. If this is larger than m, compute the closest of these 6 candidates first. Then query the k-d-tree, but bound the search to this distance. This costs you 1 (for close) and 7 (for far neighbors) distance computations, and may significantly speed up your search by giving a good candidate early. For further speedups, organize these 6-26 candidates in a k-d-tree, too, to find the best bound quickly.

Nearest neighbor search with periodic boundary conditions

In a cubic box I have a large collection points in R^3. I'd like to find the k nearest neighbors for each point. Normally I'd think to use something like a k-d tree, but in this case I have periodic boundary conditions. As I understand it, a k-d tree works by partitioning the space by cutting it into hyper planes of one less dimension, i.e. in 3D we would split the space by drawing 2D planes. For any given point, it is either on the plane, above it, or below it. However, when you split the space with periodic boundary conditions a point could be considered to be on either side!
What's the most efficient method of finding and maintaining a list of nearest neighbors with periodic boundary conditions in R^3?
Approximations are not sufficient, and the points will only be moved one at a time (think Monte Carlo not N-body simulation).
Even in the Euclidean case, a point and its nearest neighbor may be on opposite sides of a hyperplane. The core of nearest-neighbor search in a k-d tree is a primitive that determines the distance between a point and a box; the only modification necessary for your case is to take the possibility of wraparound into account.
Alternatively, you could implement cover trees, which work on any metric.
(I'm posting this answer even though I'm not fully sure it works. Intuitively it seems right, but there might be an edge case I haven't considered)
If you're working with periodic boundary conditions, then you can think of space as being cut into a series of blocks of some fixed size that are all then superimposed on top of one another. Suppose that we're in R2. Then one option would be to replicate that block nine times and arrange them into a 3x3 grid of duplicates of the block. Given this, if we find the nearest neighbor of any single node in the central square, then either
The nearest neighbor is inside the central square, in which case the neighbor is a nearest neighbor, or
The nearest neighbor is in a square other than the central square. In that case, if we find the point in the central square that the neighbor corresponds to, that point should be the nearest neighbor of the original test point under the periodic boundary condition.
In other words, we just replicate the elements enough times so that the Euclidean distance between points lets us find the corresponding distance in the modulo space.
In n dimensions, you would need to make 3n copies of all the points, which sounds like a lot, but for R3 is only a 27x increase over the original data size. This is certainly a huge increase, but if it's within acceptable limits you should be able to use this trick to harness a standard kd-tree (or other spacial tree).
Hope this helps! (And hope this is correct!)

Resources