anything better than bounding boxes? - algorithm

I have a scenario, where I have x million longitude latitude points.
When a new long/lat point is added I want to know efficiently which other points are within a user configured distance parameter, so I can add them to a list.
got anything better than bounding boxes?
I would love to see algorithms, references and a few implementations ;) thank you kindly!

There are quite a few options that are better, mostly based around space partitioning.
A common, and often very good option (which isn't too tough to implement) is to use a KD-Tree. Quadtrees are easier to implement, but slower for searching. Depending on the distribution of your data, and your requirements, other space partitioning algorithms may perform better, have lower memory requirements, or other issues that are related.

A colleague told me that he had good experience with using Morton-Code as a spatial index on GIS data, maybe that is something worth investigating.

This quick-and-dirty approach may save you some grief: Divide the surface of the earth into 1 degree boxes. You will then have a 180x360 element array and you will only need to search a small number of boxes, including the box containing the new point and all the boxes immediately around it for which one of the corners is within the user-specified distance. You will find that there are some tricks you can use to quickly figure out what boxes to use without considering them all. Just don't forget latitude and longitude wrap-around.
If your "only" have millions of points, and they aren't clustered into hot-spots, that might get you through.
A theoretically superior way: You could map each point into three dimensional space and then store them in an octree, which would let you quickly find nearby points to within an arbitrary distance. Of course, the distance in three-dimensional space will be slightly different than the great-circle distance on the globe, so you will have to calculate a conversion factor. That should be simple, though. You don't mention an implementation language, but there is almost certainly going to be a well-tested octree implementation for any language you are working in. If you don't mind inserting the third-party code, this solution is the way to go.

Related

What is the best data structure for an AABB collision checking physics engine?

I need an engine which consists of a world populated with axis-aligned bounding boxes (AABBs). A continuous loop will be executed, doing the following:
for box_a in world
box_a = do_something(box_a)
for box_b in world
if (box_a!=box_b and collides(box_a, box_b))
collide(box_a, box_b)
collide(box_b, box_a)
The problem with that is, obviously, that this is O(n^2). I have managed to make this loop much faster partitioning the space in chunks, so this became:
for box_a in world
box_a = do_something(box_a)
for chunk in box_a.neighbor_chunks
for box_b in chunk
if (box_a!=box_b and collides(box_a, box_b))
collide(box_a, box_b)
collide(box_b, box_a)
This is much faster but a little crude. Given that there is such a faster algorithm with not a lot of effort, I'd bet there is a data structure I'm not aware of that generalizes what I've done here, providing much better scalability for this algorithm.
So, my question is: what is the name of this problem and what are the optimal algorithms and data-structures to implement it?
this is indeed a generic problem of computer science : space partitionning.
its used in raytracing, path tracing, raster rendering, physics, IA, games, and pretty sure in HPC, databases, matrix maths, whatever science (molecules, pharmacy....), and I bet thousands of other stuff.
there is no 1 best structure, I have a friend who did his master on an algorithm to tesselate a point of cloud coming out of a laser scanner (billions of data) and in his case the best data structure was to mix a collection of uniforms 3D grids with some octree.
For other people kd-tree is the best, for other people, BVH trees are the best.
I like the grid system but it cannot work if the space is too wide because all cells has to exist.
One day I even implemented a sparse grid system using a hash map, it worked, I didn't bother to profile or investigate the performance so I wouldn't know if its an excellent way, I know its one way though.
To do that, you make a KEY class which is basically a 3D position vector hasher, first you apply an integer division on the coordinates to define the size of one grid cell. Then you stupidely hash all coordinates into one hash and provide a hash_value method or friend method. an equality operator and then its usable in a hash map.
You can use a google::sparse_map or something along these lines. I personally used boost::unordered and it was enough in my case.
Then the thing to consider is the presence of AABB into more than one cell. You can store a reference in every cell covered by your AABB, its just something to be aware of in every algorithm : "there is no 1-1 relationship between cell references and AABB." that's all.
good luck

Subdividing 3D Grid into rectangular pieces

This seems like it should be a relatively simple solved problem, but I'm having difficulty finding a solution. I'm trying to divide an integer width 3-dimensional cubic space into a given number of integer width rectangular subdivisions. The blocks don't have to be the same size (as this isn't always possible), but the goal is that the volume of the largest subdivision is as small as possible (So its as fairly distributed as possible). On top of that, the surface area of the subdivisions should be as small as possible (which is to say, a 2x2x2 subdivision is preferred over a 1x2x4).
This is used to divide a space for distributed computing, so the purpose of these two requirements is to distribute load fairly and reduce required communication between processors . Anyway, I would appreciate any nudge in the correct direction for this problem.
The title of your question is not very clear, but you seem to be looking for either a space partitioning algorithm or a space filling curve, possibly both.
more details:
The above two subjects are deeply interconnected, but depending on your exact problem, one of them might be useless.
Space partitioning algorithms and data structures are generally used in an application that manage a fixed space and that need to find which (possibly moving) objects are in which part of this space. So you split your volume recursively to allow fast retrieval (and update) of the list of objects in a section of the volume. You can even find algorithms that try to balance the number of objects per partition.
Space filling curves are all about imposing an order in the way someone scans a immutable list of multidimensional coordinates. In that case, the properties of the order (locality for example) is critical. And once you have a liner order, you can distribute by splitting it in as many chunks as you want, each cunk will still have good properties.
Both problems can be encountered in a lot of scientific applications so it has been researched in depth by very smart people. I'm definitely not an expert on those subjects, this is just the way I understand them. Hope it helps.

Ask for resource about fast ray-tracing algorithm

First, I am sorry for this rough question, but I don't want to introduce too much details, so I just ask for related resource like articles, libraries or tips.
My program need to do intensive computation of ray-triangle intersection (there are millions of rays and triangles), and my goal is to make it as fast as I can.
What I have done is:
Use the fastest ray-triangle algorithm that I know.
Use Octree.(From Game Programming Gem 1, 4.10. 4.11)
Use An Efficient and Robust Ray–Box Intersection Algorithm which is used in octree algorithm.
It is faster than before I applied those better algorithms, but I believe it could be faster, Could you please shed lights on any possible places that could make it faster?
Thanks.
The place to ask these questions is ompf2.com. A forum with topics about realtime (although also non-realtime) raytracing
OMPF forum is the right place for this question, but since I'm here today...
Don't use a ray/box intersection for OctTree traversal. You may use it for the root node of the tree, but that's it. Once you know the distance to the entry and exit of the root box, you can calculate the distances to the x,y, and z partition planes - the planes that subdivide the box. If the distance to front and back are f and b respectively then you can determine which child nodes of the box are hit by analyzing f,b,x,y,z distances. You can also determine the order to traverse the child nodes and completely reject many of them.
At most 4 of the children can be hit since the ray starts in one octant and only changes octants when it crosses one of the 3 partition planes.
Also, since it becomes recursive you'll be needing the entry and exit distances for the child nodes. These distances are chosen from the set (f,b,x,y,z) which you've already computed.
I have been optimizing this for a very long time, and can safely say you have about an order of magnitude performance still on the table for trees many levels deep. I started right where you are now.
There are several optimizations you can do, but all of them depend on the exact domain of your problem. As far as general algorithms go, you are on the right track. Depending on the domain, you could:
Introduce a portal system
Move the calculations to a GPU and take advantage of parallel computation
A quite popular trend in raytracing recently is Bounding Volume Hierarchies
You've already gotten a good start using a spatial sort coupled with fast intersection algorithms. For tracing single rays at a time, one of the best structures out there (for static scenes) is a K-d tree built using the Surface Area Heuristic.
However, for truly high-speed ray tracing you need to take advantage of:
Coherent packets of rays
Frusta
SIMD
I would suggest you start with "Ray Tracing Animated Scenes using Coherent Grid Traversal". It gives an easy-to-follow example of such a modern approach. You can also follow the references to see how these ideas are applied to K-d trees and BVHs.
On the same page, also check out "State of the Art in Ray Tracing Animated Scenes".
Another great set of resources are all the SIGGRAPH publications over the years. This is a very competitive conference, so these papers tend to be top-notch.
Finally, if you're willing to use existing code, check out the project page for OpenRT.
A useful resource I've seen is the journal of graphics tools. Depending on your scenes, another BVH might be more appropriate than an octree.
Also, if you haven't looked at your performance with a profiler then you should. Shark is great on OSX, and I've gotten good results with Very Sleepy on windows.

How should I index for a simple world of rectangles?

The world consists of many (1k-10k) rectangles of similar sizes, and I need to be able to quickly determine potential overlaps when trying to add a new rectangle. Rectangles will be added and removed dynamically. Are R-Trees appropriate here? If so, are there any good libraries I should consider? (I'm open to suggestions in any language).
R-Trees would be suitable, yes.
quad trees are also a good data structure for quickly finding objects in a region of 2D space. They are really a more uniform version of r-trees. Using these structures you can quickly zero in on a small region of space, with very few tests, even with massive data sets.
There is a c# implementation here, though I have not looked at it.
This kind of data structure (and it's 3D version called Octrees) are often used in games to manage the large data sets of objects that need to know if they are near any other objects for collision testing, and all kinds of other fun reasons.
You should be able to find lots of articles and examples of these kinds of data structures in the games industry sites, like gamasutra and opengl.org
You can also look up to kd-trees.
I don't know of any implementation but in 3D at least they are usually considered more performant than Octrees. For example, here is a return of experience I just googled it.
You may want to consider alternative to quad trees if you ever have a problem of performance.
However it should be noted that kd-trees are hard to rebalance...

Datastructure for googlemap like application?

I am doing a maprouting application. Several people have suggested me, that I do a datastructure where I split the map in a grid. In theory it sounds really good, but I am not to sure because of the bad performance I get when I implement it.
In the worst case you have to draw every road. If you divide the map in a grid, the sum of roads in all the cells in the grid, will be much larger than if you put all roads in a list.(each cell must have more roads than actually needed if a road goes through it).
If I have to zoom in I can see some smartness in using a grid, but if I keep it in a list I can just decrease the numbers of roads each time I zoom in.
As it is now(by using the list) it is not really fast, so I am all for making it faster. But in practice dividing in a grid makes it slower for me.
Any suggestigion for what datastructure I should be using and/or what I might be doing wrong?
See this question for related information:
What algorithms compute directions from point A to point B on a map?
Somebody who writes this kind of software for a living has answered it.
Also for rendering see:
What is the best way to read, represent and render map data?
I'm not quite sure if you're trying to do routing quick or rendering!
If you want it to go quick, you might be better off organizing your roads in to major and minor roads.
Use the list of minor roads to find a route to the nearest major road.
Use the major roads to get you near the destination.
Then go back to the minor roads to complete the route.
Without a split like this, there are a heck of a lot of roads to search, most of which are quite slow routes.
google does not draw each road every time the screen is refreshed. They used pre-drawn tiles of the map. They can redraw them as needed. e.g. when there is a map update. They even use transparent overlays, stacks of tiles to add and remove layers of details.
Very clever, but very simple.
You may want to look at openlayers javascript library. Free and can do just about anything you need to do with a map.
Maptraction JS is also available - its not as complete as OpenLayers
More optimal then using a grid as your spatial data structure, might be a quadtree because it logarithmically breaks down the map. And from studying the source, my guesstimate is that google uses (that or) a similar data structure.
As for getting directions, you might want to look in to hierarchical path finding to approximate the direction at first and to speed up the process; generic path finding algorithms tend to be quite slow at that level of complexity.

Resources