I was wondering if there is a good data structure to hold a list of axis-aligned non overlapping discrete space rectangles. Thus each rectangle could be stored as the integers x, y, width, and height. It would be easy to just store such a list but I also want to be able to query if a given x,y coordinate is inside any other rectangle.
One easy solution would be to create a hash and fill it with the hashed lower left coordinates of the start of each rectangle. This would not allow me to test a given x,y coordinate because it would hit an empty space in the middle. Another answer is to create a bunch of edges into the hash table that cover the entire rectangle with unit squares. This would create too many needless entries for a rectangle of say 100 by 100.
R-Tree is the can be used. R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The information of all rectangles can be stored in tree form so searching will be easy
Wikipedia page, short ppt and the research paper will help you understand the concept.
Related
For a game I am writing, I am using a quadtree on a non-square map. The quadtree is used to look up neighboring units for collision detection, enemies to attack, nearest bases etc. within a given max. radius (circle).
What I wonder is, if there is a performance issue for having a quadtree made of rectangles rather than squares? Instead of dividing a square map into squares, a rectangular map is divided into rectangles of equal size in the quadtree.
Square Quadtree on Rectangular Map: a quadtree will be created filling the whole map but with empty/non used areas to the left or bottom depending on the orientation of the map (horizontal vs. vertical). This will require more squares for padding (?) and might have an impact on performance also during search?
Rectangular Quadtree matching the Rectangular Map: the quadtree will perfectly fill the map. However, will performance be impacted doing so? Given we search is using a radius which will fit into a square rather than a rectangle, it might result in slower searches? Also, both width & height have to be stored in each quadtree node as they are non-square.
Question:
Is it better to covert the quadtree to square form? I think using a rectangular squadtree might be OK but I am not sure?
Screenshot (Rectangular Quadtree):
I'm sure both options are okay. From you example it also look like your data set is rather small, only a few dozen entries, maybe 100?
Some things to consider:
As you mentioned: Rectangles require separate 'length' for x and y. The effect may be small but every additional bit of information slows down the structure because more data has to be move to and through the CPU.
If you are storing objects in the quadtree that are (often) directly on rectangle borders, you need to be careful to implement the quadtree correctly:
Insertion: Inserting an item on the corner of four quadtrants, in which does it get inserted?
Queries/lookup: Inverse to insertion, any search that ends on the border may (unnecessarily, search all bordering qaudrants, which can be expensive.
In summary, the question is probably less about square/rectangular quadtrees but one should be careful when data is often on the quadrant borders.
Task with example
I'm working with geodata (country-size) from openstreetmap. Buildings are often polygons without housenumbers and a single point with the housenumber is placed within the polygon of the building. Buildings may have multiple housenumbers.
I want to match the housenumbers to the polygons of the buildings.
Simple solution
Foreach housenumber perform a point-in-polygon-test with each building-polygon.
Problem
Way too slow for about 50,000,000 buildings and 10,000,000 address-points.
Idea
Build and index for the building-polygons to accelerate the search for the surrounding polygon for each housenumber-point.
Question
What index or strategy would you recommend for this polygon-structure? The polygons never overlap and the area is sparsly covered.
This question is duplicated to gis.stackexchange.com. It was recommendet to post the question there.
Since it sounds like you have well-formed polygons to test against, I'd use a spatial hash with a AABB check, and then finally the full point-in-polygon test. Hopefully at that point you'll be averaging three or less point-in-polygon tests per address.
Break the area your data is over into a simple grid where a grid is a small multiple (2 to 4) of the median building size. (Maybe 100-200 meters?)
Compute the axis aligned bounding box of every polygon, add it (with its bounding box) to each grid location which the bounding box intersects. (It's pretty simple to figure out where an axis aligned bounding box overlaps regular axis aligned grid cells. I wouldn't store the grid in a simple 2D array -- I'd use a hash table that maps 2D integer grid coordinates, e.g. (1023, 301), to a list of polygons)
Then go through all your address points. Look up in your hash table what cell that point is in. Go through all the polygons in that cell and if the point is within any polygon's axis aligned bounding box do the full point-in-polygon test.
This has several advantages:
The data structures are simple -- no fancy libraries needed (other than handling polygons). With C++, your polygon library, and the std namespace this could be implemented in less than an hour.
Spatial structure isn't hierarchical -- when you're looking up the points you only have to do one O(1) lookup in the hash table.
And of course, the usual disadvantage of grids as a spatial structure:
Doesn't handle wildly varying sized polygons particularly well. However, I'm hoping since you're using map data the sizes are almost always within an order of magnitude, and probably much less.
Assuming you end up with N maximum polygons in each of grid and each polygon has P points and you've got B buildings and A addresses, you're looking at O(B*P + N*A). Since B and P are likely relatively small, especially on average, you could consider this O(B + N) -- pretty much linear.
I have set of rectangles of various sizes in 2D space. Number of rectangles may be changed dynamically from 10 to 100 000, their position, as well as their sizes are often updated.
Which spatial structure would you recommend to find rectangle at given point (x,y)? Assuming that search operation also performed very often (on mouse move for example). If you could give a reference to various spatial indexing algorithms comparison or compare their search/build/update performance here - that would be lovely.
I would suggest R-Tree. It is primarily designed for rectangles (or N-dimensional axis aligned cubes).
Use a quadtree (http://en.wikipedia.org/wiki/Quadtree).
Determine all possible X and Y values at which rectangles start and end. Then build a quadtree upon these values. In each leaf of the quadtree, store which rectangles overlap with the coordinate-ranges of the leaf. Finding which rectangles overlap is then just a matter of finding the leaf containing the coordinate.
In a multi-dimensional space, I have a collection of rectangles, all of which are aligned to the grid. (I am using the word "rectangles" loosely - in a three dimensional space, they would be rectangular prisms.)
I want to query this collection for all rectangles that overlap an input rectangle.
What is the best data structure for holding the collection of rectangles? I will be adding rectangles to and removing rectangles from the collection from time to time, but these operations will be infrequent. The operation I want to be fast is the query.
One solution is to keep the corners of the rectangles in a list, and do a linear scan over the list, finding which rectangles overlap the query rectangle and skipping over the ones that don't.
However, I want the query operation to be faster than linear.
I've looked at the R-tree data structure, but it holds a collection of points, not a collection of rectangles, and I don't see any obvious way to generalize it.
The coordinates of my rectangles are discrete, in case you find that helpful.
I am interested in the general solution, but I will also tell you the properties of my specific problem: my problem space has three dimensions, and their multiplicity varies wildly. The first dimension has two possible values, the second dimension has 87 values, and the third dimension has 1.8 million values.
You can probably use KD-Trees which can be used for rectangles according to the wiki page:
Variations
Instead of points
Instead of points, a kd-tree can also
contain rectangles or
hyperrectangles[5]. A 2D rectangle is
considered a 4D object (xlow, xhigh,
ylow, yhigh). Thus range search
becomes the problem of returning all
rectangles intersecting the search
rectangle. The tree is constructed the
usual way with all the rectangles at
the leaves. In an orthogonal range
search, the opposite coordinate is
used when comparing against the
median. For example, if the current
level is split along xhigh, we check
the xlow coordinate of the search
rectangle. If the median is less than
the xlow coordinate of the search
rectangle, then no rectangle in the
left branch can ever intersect with
the search rectangle and so can be
pruned. Otherwise both branches should
be traversed. See also interval tree,
which is a 1-dimensional special case.
Let's call the original problem by PN - where N is number of dimensions.
Suppose we know the solution for P1 - 1-dimensional problem: find if a new interval is overlapping with a given collection of intervals.
Once we know to solve it, we can check if the new rectangle is overlapping with the collection of rectangles in each of the x/y/z projections.
So the solution of P3 is equivalent to P1_x AND P1_y AND P1_z.
In order to solve P1 efficiently we can use sorted list. Each node of the list will include coordinate and number-of-opened-intetrvals-up-to-this-coordinate.
Suppose we have the following intervals:
[1,5]
[2,9]
[3,7]
[0,2]
then the list will look as follows:
{0,1} , {1,2} , {2,2}, {3,3}, {5,2}, {7,1}, {9,0}
if we receive a new interval, say [6,7], we find the largest item in the list that is smaller than 6: {5,2} and smllest item that is greater than 7: {9,0}.
So it is easy to say that the new interval does overlap with the existing ones.
And the search in the sorted list is faster than linear :)
You have to use some sort of a partitioning technique. However, because your problem is constrained (you use only rectangles), the data-structure can be a little simplified. I haven't thought this through in detail, but something like this should work ;)
Using the discrete value constraint - you can create a secondary table-like data-structure where you store the discrete values of second dimension (the 87 possible values). Assume that these values represent planes perpendicular to this dimension. For each of these planes you can store, in this secondary table, the rectangles that intersect these planes.
Similarly for the third dimension you can use another table with as many equally spaced values as you need (1.8 million is too much, so you would probably want to make this at least a couple of magnitudes smaller), and create a map the rectangles that are between two chosen values.
Given a query rectangle you can query the first table in constant time to determine a set of tables which possibly intersects this query. Then you can do another query on the second table, and do an intersection of the results from the first and the second query results. This should narrow down the number of actual intersection tests that you have to perform.
I'm looking for a data structure that provides indexing for Rectangles. I need the insert algorithm to be as fast as possible since the rectangles will be moving around the screen (think of dragging a rectangle with your mouse to a new position).
I've looked into R-Trees, R+Trees, kD-Trees, Quad-Trees and B-Trees but from my understanding insert's are usually slow. I'd prefer to have inserts at sub-linear time complexity so maybe someone can prove me wrong about either of the listed data structures.
I should be able to query the data structure for what rectangles are at point(x, y) or what rectangles intersect rectangle(x, y, width, height).
EDIT: The reason I want insert so fast is because if you think of a rectangle being moved around the screen, they're going to have to be removed and then re-inserted.
Thanks!
I'd use a multiscale grid approach (equivalent to quad-trees in some form).
I'm assuming you're using integer coordinates (i.e. pixels) and have plenty of space to hold all the pixels.
Have an array of lists of rectangles, one for each pixel. Then, bin two-by-two and do it again. And again, and again, and again, until you have one pixel that covers everything.
Now, the key is that you insert your rectangles at the level that is a good match for the size of the rectangle. This will be something like (pixel size) ~= min(height,width)/2. Now for each rectangle you have only a handful of inserts to do into the lists (you could bound it above by a constant, e.g. pick something that has between 4 and 16 pixels).
If you want to seek for all rectangles at x,y you look in the list of the smallest pixel, and then in the list of the 2x2 binned pixel that contains it, and then in the 4x4 etc.; you should have log2(# of pixels) steps to look through. (For larger pixels, you then have to check whether (x,y) was really in the rectangle; you expect about half of them to be successful on borders, and all of them to be successful inside the rectangle, so you'd expect no worse than 2x more work than if you looked up the pixel directly.)
Now, what about insert? That's very inexpensive--O(1) to stick yourself on the front of a list.
What about delete? That's more expensive; you have to look through and heal each list for each pixel you're entered in. That's approximately O(n) in the number of rectangles overlapping at that position in space and of approximately the same size. If you have really large numbers of rectangles, then you should use some other data structure to hold them (hash set, RB tree, etc.).
(Note that if your smallest rectangle must be larger than a pixel, you don't need to actually form the multiscale structure all the way to the pixel level; just go down until the smallest rectangle won't get hopelessly lost inside your binned pixel.)
The data structures you mention are quite a mixed bag: in particular B-Trees should be fast (cost to insert grows with the logarithm of the number of items present) but won't speed up your intersection queries.
Ignoring that - and hoping for the best - the spatial data structures come in two parts. The first part tells you how to build a tree structure from the data. The second part tells you how to keep track of information at each node that describes the items stored below that node, and how to use it to speed up queries.
You can usually pinch the ideas about keeping track of information at each node without using the (expensive) ideas about exactly how the tree should be built. For instance, you could create a key for each rectangle by bit-interleaving the co-ordinates of its points and then use a perfectly ordinary tree structure (such as a B-tree or an AVL tree or a Red-Black tree) to store it, while still keeping information at each node. This might, in practice, speed up your queries enough - although you wouldn't be able to tell that until you implemented and tested it on real data. The purpose of the tree-building instructions in most schemes is to provide performance guarantees.
Two postscripts:
1) I like Patricia trees for this - they are reasonably easy to implement, and adding or deleting entries does not disturb the tree structure much, so you won't have too much work to do updating information stored at nodes.
2) Last time I looked at a window system, it didn't bother about any of this clever stuff at all - it just kept a linear list of items and searched all the way through it when it needed to: that was fast enough.
This is perhaps an extended comment rather than an answer.
I'm a bit puzzled about what you really want. I could guess that you want a data structure to support quick answers to questions such as 'Given the ID of a rectangle, return its current coordinates'. Is that right ?
Or do you want to answer 'what rectangle is at position (x,y)' ? In that case an array with dimensions matching the height and width of your display might suffice, with each element in the array being a (presumably short) list of the rectangles on that pixel.
But then you state that you need an insert algorithm to be as fast as possible to cope with rectangles moving constantly. If you had only, say, 10 rectangles on screen, you could simply have a 10-element array containing the coordinates of each of the rectangles. Updating their positions would not then require any inserts into the data structure.
How many rectangles ? How quickly are they created ? and destroyed ? How do you want to cope with overlaps ? Is a rectangle just a boundary, or does it include the interior ?