I am looking for an efficient way to represent a connected graph, where the nodes are spatially located in a 3D Euclidian space and each node may have 6 edges (4 directions on its respective 2D plance and up and down), but have not yet found any examples, perhaps because I am not using the correct keywords.
Any guidance would be much appreciated.
Is there any library for such a structure?
Maybe you are looking for a 'spatial' data structure.
A simple example is an oct-tree (three dimensions), which is easy to implement, there are also plenty of implementation on the web.
Is the grid extended one node at a time? Or planes at a time? Or are you adding cubes (for example 10x10x10) of nodes?
I wrote my own multi-dimensional structure a while ago, called PH-Tree. If you add individual nodes, you could add them one by one. If you add cubes of nodes, maybe it's best to store these cubes in 3D arrays, then you add these arrays to the ph-tree, with their position in space as key.
The PH-Tree is somewhat complex to implement, but it's faster and more space efficient than octtrees, at least for large datasets.
The PH-Tree sources are in Java.
Other key-words to look up: R-Trees (R*-Tree, R+Tree, X-Tree) and kd-Trees.
Related
I have many points in space moving over time. They move in space full of AABB bounding boxes (including nested ones, there are less BBs than points) I wonder if there is a data structure that would help with organization of points getting into bounding boxes detection.
Currently I thought of a kd-tree based on boxes centers to perform ANN on points movement, with boxes intersection /nesting hierarchy (who is inside /beside whom) for box detection.
Yet this is slow for so many points so I wonder if there is some specialized algorithm/data structure for such case? A way to make such query for many points at the same time?
I would suggest using some kind if quadtree.
Basic quadtrees are already quite good with fast deletion/insertion, but there are special variants, that are better:
The Quadtree proposed by Samet et al uses overlapping regions, which allow moving objects to stay in the same node for longer, before requiring reinsertion in a different node.
The PH-Tree as technically a 'bit-level trie'. It basically looks like a quadtree, but has limited depth (64 for 64bit values), no reinsertion/rebalancing ever and is guaranteed to modify at most two nodes for every insertion or removal. At the same time, node size is limited by dimensionality, so maximum 8 points per node for 3D points or 64 entries when storing points and rectangles (boxes). Important for your case: Very good update performance (better than R-trees or KD-trees) and very good window-query performance. Window-queries would represent your AABBs when you look for points that overlap with a AABB. Moreover, you could also store th AABBs in the tree, then any window-query would return all points and other AABBs that it overlaps with (if that is useful for you).
Unfortunately, I'm not aware of any free implementation of Samet's quadtree. For the PH-Tree, you can find a Java version on the link given above and a C++ version here. You can also look at my index collection for Java implementations of various multidimensional indexes.
I'm reading about image search and I've gotten to the point where I have a basic understanding of feature vectors and have a very basic (definitely incomplete) understanding of rotation invariant and scale invariant features. How you can look at multi-sampled images for scale invariance and corners for rotational invariance.
To search a billion images though there is no way you could do a linear search. Most of my reading seems to imply a K-d tree is used as a partitioning data structure to improve the lookup times.
What metric is the K-d tree split on? If you use descriptors like SIFT,SURF, or ORB there is no guarantee your similar keypoints line up in the feature vectors so I'm confused how you determine 'left' or 'right' since with features like this you need the split to be based on similarity. My guess is in euclidean distance from a 'standard' then you do a robust nearest neighbor search, but would like some input on how the inital query into the KD tree is handled before the nearest neighbor search. I would think a KD tree needs to be comparing similar features in each dimension, but I don't see how that happens with many key points.
I can find a lot of papers on the nearest neighbor search, but most seem to assume you know how this is handled so I'm missing something here.
It's quite simple. All that feature descriptors present image as a point in multidimensional space. Just for the sake of simplicity, let's assume that your descriptor dimension is 2. Than all your images would be mapped onto two dimesional plane. Then, kd-tree will split this plane into rectangular areas. Any images that fall within same area would be considered as similar.
That means, btw, two images that lie really close to each other, but in different areas (leafs of the kd-tree) will not be considered as similar.
To overcome this issue, cosine similarity can be used instead of euclidian distance. You can read more about the subject in wiki.
I'm solving this problem, and I don't know which data structure to use.
I have multiple objects (convex polygons and circles) on a 2D plane, and for a given point, I have to calculate the objects the point lies within (they can overlap).
I've been reading about K-D trees, but I don't know how to "bend" it for this kind of objects. I've been also reading about bounding volume hierarchy, but I don't know if it would be optimal.
So, what do you think would be the best data structure for this problem? Time performance is more important than memory usage).
Thanks!
For most part, the "efficiency" of space partitioning schemes like BVH, kd-tree, R-tree etc, comes from smart tree construction. As long as you can build your tree well, you will have fast performance. For you case, I would say kd-tree is fine - it's very common with lots of source code available. So are R-trees. I don't understand what you mean by "bend" it for your objects. For Kd-Tree, all you have to decide, is given an axis aligned plane - for 2D case it would be either x = c or y = c, if the circle (or poly) lies to one side, or straddles. Rather trivial problem.
I have a set of very many axis-aligned rectangles which maybe nested and intersecting. I want to be able to find all the rectangles that enclose/bound a query point. What would be a good approach for this?
EDIT : Additional information-
1. By very many I meant ~100 million or more.
2. The rectangles are distributed across a huge span (span of a country). There is no restriction on the sizes.
3. Yes the rectangles can be pre-processed and stored in a tree structure.
4. No real-time insertions and deletions are required.
5. I only need to find all the rectangles enclosing/bounding a given query point. I do not need the Nearest Neighbors.
As you might have guessed, this is for a real-time geo-fencing application on a mobile unit and hence -
6. The search need not be repeated for rectangles sufficiently far from the point.
I've tried KD trees and Quad-Trees by approximating each Rectangle to a point. They've given me variable performances depending on the size of the rectangles.
Is there a more direct way of doing it ? How about r trees?
I would consider using a quadtree. (Posting from mobile so it's too much effort to link, but Wikipedia has a decent explanation.) You can split at the left, right, top, and bottom bound of any rectangle, and store each rectangle in the node representing the smallest region that contains the rectangle. To search for a point, you go down the quadtree towards the point and check every rectangle that you encounter along that path.
This will work well for small rectangles, but if many rectangles cover almost the entire region you'll still have to check all of those.
You need to look at the R*-tree data structure.
In contrast to many other structures, the R*-tree is well capable of storing (overlapping) rectangles. It is not limited to point data. You will be able to find many publications on how to best approximate polygons before putting them into the index, too. Also, it scales up to pretty large data, as it can operate on disk, too.
R*-trees are faster when bulk loaded; as this can be used to reduce overlap of index pages and ensure a near-perfectly balanced tree, whereas dynamic insertions only guarantee each page to be at least half full or so. I.e. a bulk loaded tree will often use only half as much memory / storage.
For 2d data, and your type of queries, a quadtree or grid may however work just well enough. It depends on how much local data density varies.
I'm considering trying to make a game that takes place on an essentially infinite grid.
The grid is very sparse. Certain small regions of relatively high density. Relatively few isolated nonempty cells.
The amount of the grid in use is too large to implement naively but probably smallish by "big data" standards (I'm not trying to map the Internet or anything like that)
This needs to be easy to persist.
Here are the operations I may want to perform (reasonably efficiently) on this grid:
Ask for some small rectangular region of cells and all their contents (a player's current neighborhood)
Set individual cells or blit small regions (the player is making a move)
Ask for the rough shape or outline/silhouette of some larger rectangular regions (a world map or region preview)
Find some regions with approximately a given density (player spawning location)
Approximate shortest path through gaps of at most some small constant empty spaces per hop (it's OK to be a bad approximation often, but not OK to keep heading the wrong direction searching)
Approximate convex hull for a region
Here's the catch: I want to do this in a web app. That is, I would prefer to use existing data storage (perhaps in the form of a relational database) and relatively little external dependency (preferably avoiding the need for a persistent process).
Guys, what advice can you give me on actually implementing this? How would you do this if the web-app restrictions weren't in place? How would you modify that if they were?
Thanks a lot, everyone!
I think you can do everything using quadtrees, as others have suggested, and maybe a few additional data structures. Here's a bit more detail:
Asking for cell contents, setting cell contents: these are the basic quadtree operations.
Rough shape/outline: Given a rectangle, go down sufficiently many steps within the quadtree that most cells are empty, and make the nonempty subcells at that level black, the others white.
Region with approximately given density: if the density you're looking for is high, then I would maintain a separate index of all objects in your map. Take a random object and check the density around that object in the quadtree. Most objects will be near high density areas, simply because high-density areas have many objects. If the density near the object you picked is not the one you were looking for, pick another one.
If you're looking for low-density, then just pick random locations on the map - given that it's a sparse map, that should typically give you low density spots. Again, if it doesn't work right try again.
Approximate shortest path: if this is a not-too-frequent operation, then create a rough graph of the area "between" the starting point A and end point B, for some suitable definition of between (maybe the square containing the circle with the midpoint of AB as center and 1.5*AB as diameter, except if that diameter is less than a certain minimum, in which case... experiment). Make the same type of grid that you would use for the rough shape / outline, then create (say) a Delaunay triangulation of the black points. Do a shortest path on this graph, then overlay that on the actual map and refine the path to one that makes sense given the actual map. You may have to redo this at a few different levels of refinement - start with a very rough graph, then "zoom in" taking two points that you got from the higher level as start and end point, and iterate.
If you need to do this very frequently, you'll want to maintain this type of graph for the entire map instead of reconstructing it every time. This could be expensive, though.
Approx convex hull: again start from something like the rough shape, then take the convex hull of the black points in that.
I'm not sure if this would be easy to put into a relational database; a file-based storage could work but it would be impractical to have a write operation be concurrent with anything else, which you would probably want if you want to allow this to grow to a reasonable number of players (per world / map, if there are multiple worlds / maps). I think in that case you are probably best off keeping a separate process alive... and even then making this properly respect multithreading is going to be a headache.
A kd tree or a quadtree is a good data structure to solve your problem. Especially the latter it's a clever way to address the grid and to reduce the 2d complexity to a 1d complexity. Quadtrees is also used in many maps application like bing and google maps. Here is a good start: Nick quadtree spatial index hilbert curve blog.