I have a set of objects that have latitude and longitude coordinates and I need to be able to match another set to the closest item. Simple nearest-neighbor stuff. My best thought is to convert the lat/lng coordinates to 3D coordinates and then store in either a K-d tree or an octree for later lookup. It should work and be good enough for all practical purposes (see: kdtree for geospatial point search), but something about this feels off. I think it's the part where it's really just a 2D manifold in 3D space.
Is there a more appropriate structure to use or am I simply overthinking this?
Even Wikipedia says:
a k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a k-dimensional space.
so go ahead and use your 2D points (without projecting them to 3D, or something similar).
Insert them into the tree, and then query the tree in logarithmic time - and there, all happy!
Related
I'm reading about image search and I've gotten to the point where I have a basic understanding of feature vectors and have a very basic (definitely incomplete) understanding of rotation invariant and scale invariant features. How you can look at multi-sampled images for scale invariance and corners for rotational invariance.
To search a billion images though there is no way you could do a linear search. Most of my reading seems to imply a K-d tree is used as a partitioning data structure to improve the lookup times.
What metric is the K-d tree split on? If you use descriptors like SIFT,SURF, or ORB there is no guarantee your similar keypoints line up in the feature vectors so I'm confused how you determine 'left' or 'right' since with features like this you need the split to be based on similarity. My guess is in euclidean distance from a 'standard' then you do a robust nearest neighbor search, but would like some input on how the inital query into the KD tree is handled before the nearest neighbor search. I would think a KD tree needs to be comparing similar features in each dimension, but I don't see how that happens with many key points.
I can find a lot of papers on the nearest neighbor search, but most seem to assume you know how this is handled so I'm missing something here.
It's quite simple. All that feature descriptors present image as a point in multidimensional space. Just for the sake of simplicity, let's assume that your descriptor dimension is 2. Than all your images would be mapped onto two dimesional plane. Then, kd-tree will split this plane into rectangular areas. Any images that fall within same area would be considered as similar.
That means, btw, two images that lie really close to each other, but in different areas (leafs of the kd-tree) will not be considered as similar.
To overcome this issue, cosine similarity can be used instead of euclidian distance. You can read more about the subject in wiki.
I'm solving this problem, and I don't know which data structure to use.
I have multiple objects (convex polygons and circles) on a 2D plane, and for a given point, I have to calculate the objects the point lies within (they can overlap).
I've been reading about K-D trees, but I don't know how to "bend" it for this kind of objects. I've been also reading about bounding volume hierarchy, but I don't know if it would be optimal.
So, what do you think would be the best data structure for this problem? Time performance is more important than memory usage).
Thanks!
For most part, the "efficiency" of space partitioning schemes like BVH, kd-tree, R-tree etc, comes from smart tree construction. As long as you can build your tree well, you will have fast performance. For you case, I would say kd-tree is fine - it's very common with lots of source code available. So are R-trees. I don't understand what you mean by "bend" it for your objects. For Kd-Tree, all you have to decide, is given an axis aligned plane - for 2D case it would be either x = c or y = c, if the circle (or poly) lies to one side, or straddles. Rather trivial problem.
I have some objects that are geo-localized (I have for each object the latitude + longitude).
My App needs to display the objects that are 3 kilometers around the GPS position of the mobile device.
I have several thousands of objects and they are localized in large area (for example, several US state, several small country), meaning in my list of objects I can have one located in NYC and another one in Miami but I can have also objects that are very close (few meters).
Currently, my App performs an iterative search. For each objects I compute the distance with the GPS position and if distance is <= 3KM then I keep the object else I ignore it. This algorithm is not very efficient and I'm looking for an Algorithm that will give better performance.
I suppose there's a way to sort my objects using the geo coord and next to find more quickly the objects that are located around the GPS position.
My current idea is just to compute rectangle with the "extreme points", North / South / East / West (from 3km of the GPS position) to limit the search zone. Next I will compute the distance only for the objects inside this box.
I think something better could be done but I don't have the idea...
Any proposal will be appreciated ;-)
Thanks,
Séb.
Sounds like a nearest neighbor search, but not with a maximum number of neighbors (as in kNN), but with a maximum distance threshold.
A common approach is to put the objects into a special data structure to allow ruling out large parts of the search space fast.
However, these are usually made with euclidean spaces in mind, and not for the spherical (lat/lon-)plane (wrap-around issues).
Therefore, you'd probably need to convert your coordinates to 3d coordinates in a cartesian system relative to the center of the sphere before you can apply one of the following data structures to search efficiently for your objects:
Octree
kd-tree
The other answers mentioning spatial indexes are correct, but not necessarily the easiest solution for you.
I would consider something simpler:
Group the items by country, then by state, region, city, and finally - by a few landmarks in dense cities (where you have a lot o objects).
Then you would only need to perform a few queries (check which country am I in, which state, region, etc.) to limit yourself to a very small set of objects, without implementing advanced data structures in your mobile app.
One way of doing this without a specialized datastructure would appear to be sorting two copies of your data - once by longitude, once by latitude. Anything that binary searches down to close on both lat and long, is close.
Similarly, you could use your usual treap (fast) or red-black tree (low variability).
But there are probably advantages to using an r-tree or kd-tree. What I've described is probably only for avoiding taking new dependencies or avoiding coding a new datastructure from scratch.
I have collection of objects. Each object represents a coordinate range (ie, a block). What I want is to find the object near another coordinate in a given direction.
Is there a way to do this without traversing the whole collection all the time?
You may want to look into Binary Space Partitioning, and similar algorithms (Quadtree comes to mind, along with variations on Plane Sweet Algorithms)
While inserting the objects .. sort them by the cordinates then use divide and conquer algorithm to search for your nearest possibility
I just finished implementing a kd-tree for doing fast nearest neighbor searches. I'm interested in playing around with different distance metrics other than the Euclidean distance. My understanding of the kd-tree is that the speedy kd-tree search is not guaranteed to give exact searches if the metric is non-Euclidean, which means that I might need to implement a new data structure and search algorithm if I want to try out new metrics for my search.
I have two questions:
Does using a kd-tree permanently tie me to the Euclidean distance?
If so, what other sorts of algorithms should I try that work for arbitrary metrics? I don't have a ton of time to implement lots of different data structures, but other structures I'm thinking about include cover trees and vp-trees.
The nearest-neighbour search procedure described on the Wikipedia page you linked to can certainly be generalised to other distance metrics, provided you replace "hypersphere" with the equivalent geometrical object for the given metric, and test each hyperplane for crossings with this object.
Example: if you are using the Manhattan distance instead (i.e. the sum of the absolute values of all differences in vector components), your hypersphere would become a (multidimensional) diamond. (This is easiest to visualise in 2D -- if your current nearest neighbour is at distance x from the query point p, then any closer neighbour behind a different hyperplane must intersect a diamond shape that has width and height 2x and is centred on p). This might make the hyperplane-crossing test more difficult to code or slower to run, however the general principle still applies.
I don't think you're tied to euclidean distance - as j_random_hacker says, you can probably use Manhattan distance - but I'm pretty sure you're tied to geometries that can be represented in cartesian coordinates. So you couldn't use a kd-tree to index a metric space, for example.