Given a very large set of GPS coordinates, is there a time/computationally efficient way to determine whether an input GPS coordinate is within a given radius of any point in the set? Pre-computation is acceptable. The best I could think of is an O(N) implementation but just wondering if there is a better way to approach this problem.
You should look into range tree. Preprocess the points to create the range tree and then use it search for all the points in a specific range. Space complexity is O(n log n). Time complexity is O(n log n + k) where k is the number of points in the search radius.
The given coordinate can be tested with time complexity slighlty above O(log4(N)) asuming a radius that is small compared to the space the points spread.
log4 is the logarithmus with base 4
A Quadtree works extremly well for this task.
An alternative is the R-Tree.
If you are looking for a Java implementation, you may try one of the KD-Trees I have fully posted here. With that you are able to find the nearest point to your new GPS input. Then you just need to check the real distance, for whether it is inside the radius you are interested or not.
Related
In hill climbing for 1 dimension, I try two neighbors - a small delta to the left and one to the right of my current point, and then keep the one that gives a higher value of the objective function. How do I extend it to an n-dimensional space? How does one define a neighbor for an n-dimensional space? Do I have to try 2^n neighbors (a delta applied to each of the dimension)?
You don't need to compare each pair of neighbors, you need to compute a set of neighbors, e.g. on a circle (sphere/ hypersphere in a higher dimensions) with a radius of delta, and then take the one with the highest values to "climb up". In any case you will discretize the neighborhood of your current solution and compute the score function for each neighbor. When you can differentiate your function, than, Gradient ascent/descent based algorithms may solve your problem:
1) Compute the gradient (direction of steepest ascent)
2) Go a small step into the direction of the gradient
3) Stop if solution does not change
A common problem with those algorithms is, that you often only find local maxima / minima. You can find a great overview on gradient descent/ascent algorithms here: http://sebastianruder.com/optimizing-gradient-descent/
If you are using IEEE-754 floating point numbers then the obvious answer is something like (2^52*(log_2(delta)+1023))^(n-1)+1 if delta>=2^(-1022) (more or less depending on your search space...) as that is the only way you can be certain that there are no more neighboring solutions with a distance of delta.
Even assuming you instead take a random fixed size sample of all points within a given distance of delta, lets say delta=.1, you would still have the problem that if the distance from the local optimum was .0001 the probability of finding an improvement in just 1 dimension would be less than .0001/.1/2=0.05% so you would need to take more and more random samples as you get closer to the local optimum (of which you don't know the value...).
Obviously hill climbing is not intended for the real number space or theoretical graph spaces with infinite degree. You should instead be using a global search algorithm.
One example of a multidimensional search algorithm which needs only O(n) neighbours instead of O(2^n) neighbours is the Torczon simplex method described in Multidirectional search: A direct search algorithm for parallel machines (1989). I chose this over the more widely known Nelder-Mead method because the Torczon simplex method has a convergence proof (convergence to a local optimum given some reasonable conditions).
Is there an algorithm that for a given 2d position finds the closest point on a 2d polyline consisting of n - 1 line segments (n line vertices) in constant time? The naive solution is to traverse all segments, test the minimum distance of each segment to the given position and then for the closest segment, calculate the exact closest point to the given position, which has a complexity of O(n). Unfortunately, hardware constraints prevent me from using any type of loop or pointers, meaning also no optimizations like quadtrees for a hierarchical lookup of the closest segment in O(log n).
I have theoretically unlimited time to pre-calculate any datastructure that can be used for a lookup and this pre-calculation can be arbitrarily complex, only the lookup at runtime itself needs to be in O(1). However, the second constraint of the hardware is that I only have very limited memory, meaning that it is not feasible to find the closest point on the line for each numerically possible position of the domain and storing this in a huge array. In other words, the memory consumption should be in O(n^x).
So it comes down to the question how to find the closest segment of a polyline or its index given a 2d position without any loops. Is this possible?
Edit: About the given position … it can be quite arbitrary, but it is reasonable to consider only positions in the closer neighborhood of a line, given by a constant maximum distance.
Create a single axis-aligned box that contains all of your line segments with some padding. Discretize it into a WxH grid of integer indexes. For each grid cell, compute the nearest line segment, and store its index in that grid cell.
To query a point, in O(1) time compute which grid cell it falls in. Lookup the index of the nearest line segment. Do the standard O(1) algorithm to compute exactly the nearest point on the line.
This is an O(1) almost-exact algorithm that will take O(WH) space, where WH is the number of cells in the grid.
For example, here is the subdivision of the space imposed by some line segments:
Here is a 9x7 tiling of the space, where each color corresponds to an edge index: red (0), green (1), blue (2), purple (3). Notice how the discretizing of the space introduces some error. You would of course use a much finer subdivision of the space to reduce that error to as much as you want, at the cost of having to store a larger grid. This coarse tiling is meant for illustration only.
You can keep your algorithm O(1) and make it even more almost-exact by taking your query point, identifying what cell it lies in, and then looking at the 8 neighboring cells in addition to that cell. Determine the set of edges that those 9 cells identify. (The set contains at most 9 edges.) Then for each edge find the closest point. Then keep the closest among those (at most 9) closest points.
In any case, this approach will always fail for some pathological case, so you'll have to factor that into deciding whether you want to use this.
You can find the closest geometric point on a line in O(1) time, but that won't tell you which of the given vertices is closest to it. The best you can do for that is a binary search, which is O(log n), but of course requires a loop or recursion.
If you're designing VLSI or FPGA, you can evaluate all the vertices in parallel. Then, you can compare neighbors, and do a big wired-or to encode the index of the segment that straddles the closest geometric point. You'll technically get some sort of O(log n) delay based on the number of elements in the wired-or, but that kind of thing is usually treated as near-constant.
You can optimize this type of search using an R-Tree which is a general purpose spatial data structure support fast searches. It's not a constant time algorithm; it's average case is O(log n).
You said that you can pre-calculate the data structure, but you could not use any loops. However is there some limitation that prevents any loops? Arbitrary searches are not likely to hit an existing datapoint so it must at least look left and right in a tree.
This SO answer contains some links to libraries:
Java commercial-friendly R-tree implementation?
Suppose there is a point cloud having 50 000 points in the x-y-z 3D space. For every point in this cloud, what algorithms or data strictures should be implemented to find k neighbours of a given point which are within a distance of [R,r]? Naive way is to go through each of the 49 999 points for each of the 50 000 points and do a metric testing. But this approach will take large time. Just like there is kd tree to find nearest neighbour in small time so is there some real-time DS/algo implementation out there to pre-process the point clouds to achieve the goal inn shortest time?
Your problem is part of the topic of Nearest Neighbor Search, or more precisely, k-Nearest Neighbor Search. The answer to your question depends on the data structure you are using to store the points. If you use R-trees or variants like R*-trees, and you are doing multiple searches on your database, you will likely find a substantial performance improvement in two or three-dimensional space compared with naive linear search. In higher dimensions, space partitioning schemes tend to underperform linear search.
As some answers already suggest for NN search you could use some tree algorithm like k-d-tree. There are implementations available for all programming languages.
If your description [R,r] suggests a hollow sphere you should compare one-time-testing (within interval) vs. two stages (test-for-outer and remove samples that pass test-for-inner).
You also did not mention performance requirements (timing or frame rate?) and your intended application (feasible approach?).
If you are using an ordinary Euclidean metric, you could go through the list three times and extract those points that within R in each dimension, essentially extracting the enclosing cube. Searching the resulting list would still be O(n^2), but on a much smaller n.
There are efficient algorithms (in average, for random data), see Nearest neighbor search.
Your approach is not efficient, yet simple.
Please read through, check you requirements and get back so we can help.
I'm asking this questions out of curiostity, since my quick and dirty implementation seems to be good enough. However I'm curious what a better implementation would be.
I have a graph of real world data. There are no duplicate X values and the X value increments at a consistant rate across the graph, but Y data is based off of real world output. I want to find the nearest point on the graph from an arbitrary given point P programmatically. I'm trying to find an efficient (ie fast) algorithm for doing this. I don't need the the exact closest point, I can settle for a point that is 'nearly' the closest point.
The obvious lazy solution is to increment through every single point in the graph, calculate the distance, and then find the minimum of the distance. This however could theoretically be slow for large graphs; too slow for what I want.
Since I only need an approximate closest point I imagine the ideal fastest equation would involve generating a best fit line and using that line to calculate where the point should be in real time; but that sounds like a potential mathematical headache I'm not about to take on.
My solution is a hack which works only because I assume my point P isn't arbitrary, namely I assume that P will usually be close to my graph line and when that happens I can cross out the distant X values from consideration. I calculating how close the point on the line that shares the X coordinate with P is and use the distance between that point and P to calculate the largest/smallest X value that could possible be closer points.
I can't help but feel there should be a faster algorithm then my solution (which is only useful because I assume 99% of the time my point P will be a point close to the line already). I tried googling for better algorithms but found so many algorithms that didn't quite fit that it was hard to find what I was looking for amongst all the clutter of inappropriate algorithms. So, does anyone here have a suggested algorithm that would be more efficient? Keep in mind I don't need a full algorithm since what I have works for my needs, I'm just curious what the proper solution would have been.
If you store the [x,y] points in a quadtree you'll be able to find the closest one quickly (something like O(log n)). I think that's the best you can do without making assumptions about where the point is going to be. Rather than repeat the algorithm here have a look at this link.
Your solution is pretty good, by examining how the points vary in y couldn't you calculate a bound for the number of points along the x axis you need to examine instead of using an arbitrary one.
Let's say your point P=(x,y) and your real-world data is a function y=f(x)
Step 1: Calculate r=|f(x)-y|.
Step 2: Find points in the interval I=(x-r,x+r)
Step 3: Find the closest point in I to P.
If you can use a data structure, some common data structures for spacial searching (including nearest neighbour) are...
quad-tree (and octree etc).
kd-tree
bsp tree (only practical for a static set of points).
r-tree
The r-tree comes in a number of variants. It's very closely related to the B+ tree, but with (depending on the variant) different orderings on the items (points) in the leaf nodes.
The Hilbert R tree uses a strict ordering of points based on the Hilbert curve. The Hilbert curve (or rather a generalization of it) is very good at ordering multi-dimensional data so that nearby points in space are usually nearby in the linear ordering.
In principle, the Hilbert ordering could be applied by sorting a simple array of points. The natural clustering in this would mean that a search would usually only need to search a few fairly-short spans in the array - with the complication being that you need to work out which spans they are.
I used to have a link for a good paper on doing the Hilbert curve ordering calculations, but I've lost it. An ordering based on Gray codes would be simpler, but not quite as efficient at clustering. In fact, there's a deep connection between Gray codes and Hilbert curves - that paper I've lost uses Gray code related functions quite a bit.
EDIT - I found that link - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.133.7490
I have a set of N objects, and I'd like to compute a NxN distance matrix. Sometimes my set of N objects is very large, and I'd like to compute an approximation to the NxN distance matrix by only computing a subset of the distance comparisons.
Can anyone point me in the direction of something that calculates approximations to a full distance matrix? I have some ideas in mind, but I'd like to avoid re-inventing the wheel.
Edit: An example of the type of algorithm would take advantage of the fact that if there is a very small distance between object A and object B, and there is a very small distance between object B and object C, there has to be a somewhat short distance between objects A and C.
I had this same question and ended up writing Python code for it:
https://github.com/jpeterbaker/lazyDistance
README.md explains how the triangle inequality can be used to update upper and lower bounds for each distance.
Just run the Python file as a script for an example in 2-dimensional space. The plotted lines are the only distances that were actually calculated.
In my version, the time savings aren't about having a large number of objects. As I've written it, it's a O(n^4) algorithm, so it's actually worse than just calculating all distances if the number of objects is large. But my method will save time when you have a modest number of objects and the distance function is very expensive to calculate. It assumes that it is faster to do several O(n^2) operations rather than a single distance measurement.
If n is large, you could look for cheaper methods to decide which distance to calculate next (that don't involve arithmetic with n^2 entries of distance bounds matrices). You also may not need to update all 2*n^2 bounds every time that this code does.
Honestly, I think it depends how close you want your approximation to be and how big your subset is. If you just want some overall feel of what the matrix will look like, you can do simple linear interpolation on a random subset (including the maximal and minimal nodes) getting pretty accurate (tm) results.
I think the real trick here is figuring out the heuristic (linear, quadratic, etc interpolation) and the subset size. You could also figure out the distance matrices of various subsets and then interpolate those matrices with some method (linear, spherical linear, cubic).
Depending on your initial sample, it's pretty much an heuristic trial and error until you go "oh that's good enough for what I need".
Are your "objects" on a network? If the objects are in a network, you can use this or this that yields the all-pairs shortest paths. If not, you're pretty much stuck with calculated all the n x n distances, I think.
The solution you require is similar to what we commonly see in a graph, you can use All pair shortest path for finding the distance, you can also look at johnson's algorithm