What is a coarse and fine grid search? - performance

I was reading this answer
Efficient (and well explained) implementation of a Quadtree for 2D collision detection
and encountered this paragraph
All right, so actually quadtrees are not my favorite data structure for this purpose. I tend to prefer a grid hierarchy, like a coarse grid for the world, a finer grid for a region, and an even finer grid for a sub-region (3 fixed levels of dense grids, and no trees involved), with row-based optimizations so that a row that has no entities in it will be deallocated and turned into a null pointer, and likewise completely empty regions or sub-regions turned into nulls. While this simple implementation of the quadtree running in one thread can handle 100k agents on my i7 at 60+ FPS, I've implemented grids that can handle a couple million agents bouncing off each other every frame on older hardware (an i3). Also I always liked how grids made it very easy to predict how much memory they'll require, since they don't subdivide cells. But I'll try to cover how to implement a reasonably efficient quadtree.
This type of grid seems intuitive, it sort of sounds like a "N-order" grid, where instead of 4 child nodes, you have N child nodes per parent. N^3 can go much further than 4^3, which allows better precision with potentially (I guess) less branching (since there are many less nodes to branch).
I'm a little puzzled because I would intuitively use a single, or maybe 3 std::map with the proper < operator(), to reduce its memory footprint, but I'm not sure it would be so fast, since querying an AABB would mean stacking several accesses that are O(log n).
What exactly are those row-based optimizations he is talking about? Is this type of grid search common?
I have some understanding of a z order curve, and I'm not entirely satisfied with a quadtree.

It's my own quote. But that's based on a common pattern I encountered in my personal experience. Also, terms like "parent" and "child" are ones I'd largely discard when talking about grids. You just got a big 2-dimensional or N-dimensional table/matrix storing stuff. There's not really a hierarchy involved whatsoever -- these data structures are more comparable to arrays than trees.
"Coarse" and "Fine"
On "coarse" and "fine", what I meant there is that "coarse" search queries tend to be cheaper but give more false positives. A coarser grid would be one that is lower in grid resolution (fewer, larger cells). Coarse searches may involve traversing/searching fewer and larger grid cells. For example, say we want to see if an element intersects a point/dot in a gigantic cell (imagine just a 1x1 grid storing everything in the simulation). If the dot intersects the cell, we may get a whole lot of elements returned in that cell but maybe only one or none of them actually intersect the dot.
So a "coarse" query is broad and simple but not very precise at narrowing down the list of candidates (or "suspects"). It may return too many results and still leave a whole lot of processing required left to do to narrow down what actually intersects the search parameter*.
It's like in those detective shows when they search a database for a
possible killer, putting in "white male" might not require much
processing to list the results but might give way too many results to
properly narrow down the suspects. "Fine" would be the opposite and might require more processing of the database but narrow down the result to just one suspect.
This is a crude and flawed analogy but I hope it helps.
Often the key to broadly optimizing spatial indices before we get into things like memory optimizations whether we're talking spatial hashes or quadtrees is to find a nice balance between "coarse" and "fine". Too "fine" and we might spend too much time traversing the data structure (searching many small cells in a uniform grid, or spending too much time in tree traversal for adaptive grids like quadtrees). Too "coarse" and the spatial index might give back too many results to significantly reduce the amount of time required for further processing. For spatial hashes (a data structure I don't personally like very much but they're very popular in gamedev), there's often a lot of thought and experimentation and measuring involved in determining an optimal cell size to use.
With uniform NxM grids, how "coarse" or "fine" they are (big or small cells and high or low grid resolution) not only impacts search times for a query but can also impact insertion and removal times if the elements are larger than a point. If the grid is too fine, a single large or medium-sized element may have to be inserted into many tiny cells and removed from many tiny cells, using lots of extra memory and processing. If it's too coarse, the element may only have to be inserted and removed to/from one large cell but at costs to the data structure's ability to narrow down the number of candidates returned from a search query to a minimum. Without care, going too "fine/granular" can become very bottlenecky in practice and a developer might find his grid structure using gigabytes of RAM for a modest input size. With tree variants like quadtrees, a similar thing can happen if the maximum allowed tree depth is too high a value causing explosive memory use and processing when the leaf nodes of the quadtree store the tiniest cell sizes (we can even start running into floating-point precision bugs that wreck performance if the cells are allowed to be subdivided to too small a size in the tree).
The essence of accelerating performance with spatial indices is often this sort of balancing act. For example, we typically don't want to apply frustum culling to individual polygons being rendered in computer graphics because that's typically not only redundant with what the hardware already does at the clipping stage, but it's also too "fine/granular" and requires too much processing on its own compared to the time required to just request to render one or more polygons. But we might net huge performance improvements with something a bit "coarser", like applying frustum culling to an entire creature or space ship (an entire model/mesh), allowing us to avoid requesting to render many polygons at once with a single test. So I often use the terms, "coarse" and "fine/granular" frequently in these sorts of discussions (until I find better terminology that more people can easily understand).
Uniform vs. Adaptive Grid
You can think of a quadtree as an "adaptive" grid with adaptive grid cell sizes arranged hierarchically (working from "coarse' to "fine" as we drill down from root to leaf in a single smart and adaptive data structure) as opposed to a simple NxM "uniform" grid.
The adaptive nature of the tree-based structures is very smart and can handle a broad range of use cases (although typically requiring some fiddling of maximum tree depth and/or minimum cell size allowed and possibly how many maximum elements are stored in a cell/node before it subdivides). However, it can be more difficult to optimize tree data structures because the hierarchical nature doesn't lend itself so easily to the kind of contiguous memory layout that our hardware and memory hierarchy is so well-suited to traverse. So very often I find data structures that don't involve trees to be easier to optimize in the same sense that optimizing a hash table might be simpler than optimizing a red-black tree, especially when we can anticipate a lot about the type of data we're going to be storing in advance.
Another reason I tend to favor simpler, more contiguous data structures in lots of contexts is that the performance requirements of a realtime simulation often want not just fast frame rates, but consistent and predictable frame rates. The consistency is important because even if a video game has very high frame rates for most of the game but some part of the game causes the frame rates to drop substantially for even a brief period of time, the player may die and game over as a result of it. It was often very important in my case to avoid these types of scenarios and have data structures largely absent pathological worst-case performance scenarios. In general, I find it easier to predict the performance characteristics of lots of simpler data structures that don't involve an adaptive hierarchy and are kind of on the "dumber" side. Very often I find the consistency and predictability of frame rates to be roughly tied to how easily I can predict the data structure's overall memory usage and how stable that is. If the memory usage is wildly unpredictable and sporadic, I often (not always, but often) find the frame rates will likewise be sporadic.
So I often find better results using grids personally, but if it's tricky to determine a single optimal cell size to use for the grid in a particular simulation context, I just use multiple instances of them: one instance with larger cell sizes for "coarse" searches (say 10x10), one with smaller ones for "finer" searches (say 100x100), and maybe even one with even smaller cells for the "finest" searches (say 1000x1000). If no results are returned in the coarse search, then I don't proceed with the finer searches. I get some balance of the benefits of quadtrees and grids this way.
What I did when I used these types of representations in the past is not to store a single element in all three grid instances. That would triple the memory use of an element entry/node into these structures. Instead, what I did was insert the indices of the occupied cells of the finer grids into the coarser grids, as there are typically far fewer occupied cells than there are a total number of elements in the simulation. Only the finest, highest-resolution grid with the smallest cell sizes would store the element. The cells in the finest grid are analogous to the leaf nodes of a quadtree.
The "loose-tight double grid" as I'm calling it in one of the answers to that question is an expansion on this multi-grid idea. The difference is that the finer grid is actually loose and has cell sizes that expand and shrink based on the elements inserted to it, always guaranteeing that a single element, no matter how large or small, needs only be inserted to one cell in the grid. The coarser grid stores the occupied cells of the finer grid leading to two constant-time queries (one in the coarser tight grid, another into the finer loose grid) to return an element list of potential candidates matching the search parameter. It also has the most stable and predictable memory use (not necessarily the minimal memory use because the fine/loose cells require storing an axis-aligned bounding box that expands/shrinks which adds another 16 bytes or so to a cell) and corresponding stable frame rates because one element is always inserted to one cell and doesn't take any additional memory required to store it besides its own element data with the exception of when its insertion causes a loose cell to expand to the point where it has to be inserted to additional cells in the coarser grid (which should be a fairly rare-case scenario).
Multiple Grids For Other Purposes
I'm a little puzzled because I would intuitively use a single, or maybe 3 std::map with the proper operator(), to reduce its memory footprint, but I'm not sure it would be so fast, since querying an AABB would mean stacking several accesses that are O(log n).
I think that's an intuition many of us have and also probably a subconscious desire to just lean on one solution for everything because programming can get quite tedious and repetitive and it'd be ideal to just implement something once and reuse it for everything: a "one-size-fits-all" t-shirt. But a one-sized-fits-all shirt can be poorly tailored to fit our very broad and muscular programmer bodies*. So sometimes it helps to use the analogy of a small, medium, and large size.
This is a very possibly poor attempt at humor on my part to make my long-winded texts less boring to read.
For example, if you are using std::map for something like a spatial hash, then there can be a lot of thought and fiddling around trying to find an optimal cell size. In a video game, one might compromise with something like making the cell size relative to the size of your average human in the game (perhaps a bit smaller or bigger), since lots of the models/sprites in the game might be designed for human use. But it might get very fiddly and be very sub-optimal for teeny things and very sub-optimal for gigantic things. In that case, we might do well to resist our intuitions and desires to just use one solution and use multiple (it could still be the same code but just multiple instances of the same class instance for the data structure constructed with varying parameters).
As for the overhead of searching multiple data structures instead of a single one, that's something best measured and it's worth remembering that the input sizes of each container will be smaller as a result, reducing the cost of each search and very possibly improve locality of reference. It might exceed the benefits in a hierarchical structure that requires logarithmic search times like std::map (or not, best to just measure and compare), but I tend to use more data structures which do this in constant-time (grids or hash tables). In my cases, I find the benefits far exceeding the additional cost of requiring multiple searches to do a single query, especially when the element sizes vary radically or I want some basic thing resembling a hierarchy with 2 or more NxM grids that range from "coarse" to "fine".
Row-Based Optimizations
As for "row-based optimizations", that's very specific to uniform fixed-sized grids and not trees. It refers to using a separate variable-sized list/container per row instead of a single one for the entire grid. Aside from potentially reducing memory use for empty rows that just turn into nulls without requiring an allocated memory block, it can save on lots of processing and improve memory access patterns.
If we store a single list for the entire grid, then we have to constantly insert and remove from that one shared list as elements move around, particles are born, etc. That could lead to more heap allocations/deallocations growing and shrinking the container but also increases the average memory stride to get from one element in that list to the next which will tend to translate to more cache misses as a result of more irrelevant data being loaded into a cache line. Also these days we have so many cores so having a single shared container for the entire grid may reduce the ability to process the grid in parallel (ex: searching one row while simultaneously inserting to another). It can also lead to more net memory use for the structure since if we use a contiguous sequence like std::vector or ArrayList, those can often store the memory capacity of as many as twice the elements required to reduce the time of insertions to amortized constant time by minimizing the need to reallocate and copy the former elements in linear-time by keeping excess capacity.
By associating a separate medium-sized container per grid row or per column instead of gigantic one for the entire grid, we can mitigate these costs in some cases*.
This is the type of thing you definitely measure before and after though to make sure it actually improves overall frame rates, and probably attempt in response to a first attempt storing a single list for the entire grid revealing many non-compulsory cache misses in the profiler.
This might beg the question of why we don't use a separate teeny list container for every single cell in the grid. It's a balancing act. If we store that many containers (ex: a million instances of std::vector for a 1000x1000 grid possibly storing very few or no elements each), it would allow maximum parallelism and minimize the stride to get from one element in a cell to the next one in the cell, but that can be quite explosive in memory use and introduce a lot of extra processing and heap overhead.
Especially in my case, my finest grids might store a million cells or more, but I only require 4 bytes per cell. A variable-sized sequence per cell would typically explode to at least something like 24 bytes or more (typically far more) per cell on 64-bit architectures to store the container data (typically a pointer and a couple of extra integers, or three pointers on top of the heap-allocated memory block), but on top of that, every single element inserted to an empty cell may require a heap allocation and compulsory cache miss and page fault and very frequently due to the lack of temporal locality. So I find the balance and sweet spot to be one list container per row typically among my best-measured implementations.
I use what I call a "singly-linked array list" to store elements in a grid row and allow constant-time insertions and removals while still allowing some degree of spatial locality with lots of elements being contiguous. It can be described like this:
struct GridRow
{
struct Node
{
// Element data
...
// Stores the index into the 'nodes' array of the next element
// in the cell or the next removed element. -1 indicates end of
// list.
int next = -1;
};
// Stores all the nodes in the row.
std::vector<Node> nodes;
// Stores the index of the first removed element or -1
// if there are no removed elements.
int free_element = -1;
};
This combines some of the benefits of a linked list using a free list allocator but without the need to manage separate allocator and data structure implementations which I find to be too generic and unwieldy for my purposes. Furthermore, doing it this way allows us to halve the size of a pointer down to a 32-bit array index on 64-bit architectures which I find to be a big measured win in my use cases when the alignment requirements of the element data combined with the 32-bit index don't require an additional 32-bits of padding for the class/struct which is frequently the case for me since I often use 32-bit or smaller integers and 32-bit single-precision floating-point or 16-bit half-floats.
Unorthodox?
On this question:
Is this type of grid search common?
I am not sure! I tend to struggle a bit with terminology and I'll have to ask people's forgiveness and patience in communication. I started programming from early childhood in the 1980s before the internet was widespread, so I came to rely on inventing a lot of my own techniques and using my own crude terminology as a result. I got my degree in computer science about a decade and a half later when I reached my 20s and corrected some of my terminology and misconceptions but I've had many years just rolling my own solutions. So I am often not sure if other people have come across some of the same solutions or not, and if there are formal names and terms for them or not.
That makes communication with other programmers difficult and very frustrating for both of us at times and I have to ask for a lot of patience to explain what I have in mind. I've made it a habit in meetings to always start off showing something with very promising results which tends to make people more patient with my crude and long-winded explanations of how they work. They tend to give my ideas much more of a chance if I start off by showing results, but I'm often very frustrated with the orthodoxy and dogmatism that can be prevalent in this industry that can sometimes prioritize concepts far more than execution and actual results. I'm a pragmatist at heart so I don't think in terms of "what is the best data structure?" I think in terms of what we can effectively implement personally given our strengths and weaknesses and what is intuitive and counter-intuitive to us and I'm willing to endlessly compromise on the purity of concepts in favor of a simpler and less problematic execution. I just like good, reliable solutions that roll naturally off our fingertips no matter how orthodox or unorthodox they may be, but a lot of my methods may be unorthodox as a result (or not and I might just have yet to find people who have done the same things). I've found this site useful at rare times in finding peers who are like, "Oh, I've done that too! I found the best results if we do this [...]" or someone pointing out like, "What you are proposing is called [...]."
In performance-critical contexts, I kind of let the profiler come up with the data structure for me, crudely speaking. That is to say, I'll come up with some quick first draft (typically very orthodox) and measure it with the profiler and let the profiler's results give me ideas for a second draft until I converge to something both simple and performant and appropriately scalable for the requirements (which may become pretty unorthodox along the way). I'm very happy to abandon lots of ideas since I figure we have to weed through a lot of bad ideas in a process of elimination to come up with a good one, so I tend to cycle through lots of implementations and ideas and have come to become a really rapid prototyper (I have a psychological tendency to stubbornly fall in love with solutions I spent lots of time on, so to counter that I've learned to spend the absolute minimal time on a solution until it's very, very promising).
You can see my exact methodology at work in the very answers to that
question where I iteratively converged through lots of profiling and
measuring over the course of a few days and prototyping from a fairly orthodox quadtree to that
unorthodox "loose-tight double grid" solution that handled the largest
number of agents at the most stable frame rates and was, for me
anyway, much faster and simpler to implement than all the structures
before it. I had to go through lots of orthodox solutions and measure them though to generate the final idea for the unusual loose-tight variant. I always start off with and favor the orthodox solutions and start off inside the box because they're well-documented and understood and just very gently and timidly venture outside, but I do often find myself a bit outside the box when the requirements are steep enough. I'm no stranger to the steepest requirements since in my industry and as a fairly low-level type working on engines, the ability to handle more data at good frame rates often translates not only to greater interactivity for the user but also allows artists to create more detailed content of higher visual quality than ever before. We're always chasing higher and higher visual quality at good frame rates, and that often boils down to a combination of both performance and getting away with crude approximations whenever possible. This inevitably leads to some degree of unorthodoxy with lots of in-house solutions very unique to a particular engine, and each engine tends to have its own very unique strengths and weaknesses as you find comparing something like CryEngine to Unreal Engine to Frostbite to Unity.
For example, I have this data structure I've been using since childhood and I don't know the name of it. It's a straightforward concept and it's just a hierarchical bitset that allows set intersections of potentially millions of elements to be found in as little as a few iterations of simple work as well as traverse millions of elements occupying the set with just a few iterations (less than linear-time requirements to traverse everything in the set just through the data structure itself which returns ranges of occupied elements/set bits instead of individual elements/bit indices). But I have no idea what the name is since it's just something I rolled and I've never encountered anyone talking about it in computer science. I tend to refer to it as a "hierarchical bitset". Originally I called it a "sparse bitset tree" but that seems a tad verbose. It's not a particularly clever concept at all and I wouldn't be surprised or disappointed (actually quite happy) to find someone else discovering the same solution before me but just one I don't find people using or talking about ever. It just expands on the strengths of a regular, flat bitset in rapidly finding set intersections with bitwise OR and rapidly traverse all set bits using FFZ/FFS but reducing the linear-time requirements of both down to logarithmic (with the logarithm base being a number much larger than 2).
Anyway, I wouldn't be surprised if some of these solutions are very unorthodox, but also wouldn't be surprised if they are reasonably orthodox and I've just failed to find the proper name and terminology for these techniques. A lot of the appeal of sites like this for me is a lonely search for someone else who has used similar techniques and to try to find proper names and terms for them often to end in frustration. I'm also hoping to improve on my ability to explain them although I've always been so bad and long-winded here. I find using pictures helps me a lot because I find human language to be incredibly riddled with ambiguities. I'm also fond of deliberately imprecise figurative language which embraces and celebrates the ambiguities such as metaphor and analogy and humorous hyperbole, but I've not found it's the type of thing programmers tend to appreciate so much due to its lack of precision. But I've never found precision to be that important so long as we can convey the meaty stuff and what is "cool" about an idea while they can draw their own interpretations of the rest. Apologies for the whole explanation but hopefully that clears some things up about my crude terminology and the overall methodology I use to arrive at these techniques. English is also not my first language so that adds another layer of convolution where I have to sort of translate my thoughts into English words and struggle a lot with that.

Related

Spatial partition data structure that is better suited for a placement system than a quadtree

I want to know if there is spatial partition data structure that is better suited for a placement system than a quadtree. By better suited I mean for the data structure to have a O(logn) time complexity or less when search querying it and using less memory. I want to know what data structure can organize my data in such a way that querying it is faster than a quadtree. Its all 2D and its all rectangles which should never overlap. I currently have a quadtree done and it works great and its fast, I am just curious to know if there is a data structure that uses less resources and its faster than a quadtree for this case.
The fastest is probably brute forcing it on a GPU.
Also, it is really worth trying out different implementations, I found performance differences between implementations to be absolutely wild.
Another tip: measure performance with realistic data (potentially multiple scenarios), data and usage characteristics can have enormous influence on index performance.
Some of these characteristics are (you already mentioned "rectangle data" and "2D"):
How big is your dataset
How much overlap do you have between rectangles?
Do you need to update data often?
Do you have a large variance between small and large rectangles?
Do you have dense cluster of rectangles?
How large is the are you cover?
Are your coordinates integers or floats?
Is it okay if the execution time of operations varies or should it be consistent?
Can you pre-load data? Do you need to update the index?
Quadtrees can be a good initial choice. However they have some problems, e.g.:
They can get very deep (and inefficient) with dense clusters
They don't work very well when there is a lot of overlap between rectangles
Update operations may take longer if nodes are merged or split.
Another popular choice are R-Trees (I found R-star-Trees to be the best). Some properties:
Balanced (good for predictable search time but bad because update times can be very unpredictable due to rebalancing)
Quite complex to implement.
R-Trees can also be preloaded (takes longer but allows queries to be faster), this is called STR-Tree (Sort-tile-recurse-R-Tree)
It may be worth looking at the PH-Tree (disclaimer: self advertisement):
Similar to a quadtree but depth is limited to the bit-width of the data (usually 32 or 64 (bits)).
No rebalancing. Merging or splitting is guaranteed to move only one entry (=cheap)
Prefers integer coordinates but works reasonably well with floating point data as well.
Implementations can be quite space efficient (they don't need to store all bit of coordinates). However, not all implementations support that. Also, the effect varies and is strongest with integer coordinates.
I made some measurements here. The measurements include a 2D dataset where I store line segments from OpenStreetMap as boxes, the relevant diagrams are labeled with "OSM-R" (R for rectangles).
Fig. 3a shows timings for inserting a given amount of data into a tree
Fig. 9a shows memory usage
Fig. 15a shows query times for queries that return on average 1000 entries
Fig. 17a shows how query performance changes when varying the query window size (on an index with 1M entries)
Fig. 41a shows average times for updating an index with 1M entries
PH/PHM is the PH-Tree, PHM has coordinates converted to integer before storing them
RSZ/RSS are two different R-Tree implementations
STR is an STR-Tree
Q(T)Z is a quadtree
In case you are using Java, have a look at my spatial index collection.
Similar collections exist for other programming languages.

Quadtree performance issue

The problem I have is that a game I work on uses a quadtree for fast proximity detection, used for range checks when weapons are firing. I'm using the classic "4 wide" quadtree, meaning that I subdivide when I attempt to add a 5th child node to an already full parent node.
Initially the set of available targets was fairly evenly spread out, so the quadtree worked very well. Due to design changes, we now get clusters of large numbers of enemies in a relatively small space, leading to performance problems because the quadtree becomes significantly unbalanced.
There are two possible solutions that occur to me, either modify the quadtree to handle this, or switch to an alternative representation.
The only other representation I'm familiar with is a spatial hash, and not very familiar at that. From what I understand, this risks suffering the same problem since the cluster would wind up in a relatively small number of hash buckets. From what I know of it, a BSP is a possible solution that will deal with the uneven distribution better than a quadtree or spatial hash.
No fair, I know, there are actually three questions now.
Are there any modifications I can make to the quadtree, e.g. increasing the "width" of nodes that would help deal with this?
Is it even worth my time to consider a spatial hash?
Is a BSP, or some other data structure a better bet to deal with the uneven distribution?
I usually use quadtree with at least 10 entries per node, but you'll have to try it out.
I have no experience with spatial hashing.
Other structures you could look into are:
KD-Trees: They are quite simple to implement and are also good for neighbour search, but get slower with string clustering. They are a bit slow to update and may get imbalanced.
R*Tree: More complex, very good with neighbour search, but even slower to update than KD-Trees. They won't get imbalanced because of automatic rebalancing. Rebalancing is mostly fast, but in extreme cases it can occasionally slow thing down further.
PH-Tree: Quite complex to implement. Good neighbour search. Has very good update speed (like quadtree), maximum depth is limited by the bit width of your coordinates (usually 32 or 64bit), so they can't really get imbalanced. Scales very well with large datasets (1 million and more), I have little experience with small datasets.
If you are using Java, I have Apache licensed versions available here (R*Tree) and here (PH-Tree).

Proper Data Structure Choice for Collision System

I am looking to implement a 2D top-down collision system, and was hoping for some input as to the likely performance between a few different ideas. For reference I expect the number of moving collision objects to be in the dozens, and the static collision objects to be in the hundreds.
The first idea is border-line brute force (or maybe not so border-line). I would store two lists of collision objects in a collision system. One list would be dynamic objects, the other would include both dynamic and static objects (each dynamic would be in both lists). Each frame I would loop through the dynamic list and pass each object the larger list, so it could find anything it may run into. This will involve a lot of unnecessary calculations for any reasonably sized loaded area but I am using it as a sort of baseline because it would be very easy to implement.
The second idea is to have a single list of all collision objects, and a 2D array of either ints or floats representing the loaded area. Each element in the array would represent a physical location, and each object would have a size value. Each time an object moved, it would subtract its size value from its old location and add it to its new location. The objects would have to access elements in the array before they moved to make sure there was room in their new location, but that would be fairly simple to do. Besides the fact that I have a very public, very large array, I think it would perform fairly well. I could also implement with a boolean array, simply storing if a location is full or not, but I don't see any advantage to this over the numeric storage.
The third I idea I had was less well formed. A month or two ago I read about a two dimensional, rectangle based data structure (may have been a tree, i don't remember) that would be able to keep elements sorted by position. Then I would only have to pass the dynamic objects their small neighborhood of objects for update. I was wondering if anyone had any idea what this data structure might be, so I could look more into it, and if so, how the per-frame sorting of it would affect performance relative to the other methods.
Really I am just looking for ideas on how these would perform, and any pitfalls I am likely overlooking in any of these. I am not so much worried about the actual detection, as the most efficient way to make the objects talk to one another.
You're not talking about a lot of objects in this case. Honestly, you could probably brute force it and probably be fine for your application, even in mobile game development. With that in mind, I'd recommend you keep it simple but throw a bit of optimization on top for gravy. Spatial hashing with a reasonable cell size is the way I'd go here -- relatively reasonable memory use, decent speedup, and not that bad as far as complexity of implementation goes. More on that in a moment!
You haven't said what the representation of your objects is, but in any case you're likely going to end up with a typical "broad phase" and "narrow phase" (like a physics engine) -- the "broad phase" consisting of a false-positives "what could be intersecting?" query and the "narrow phase" brute forcing out the resulting potential intersections. Unless you're using things like binary space partitioning trees for polygonal shapes, you're not going to end up with a one-phase solution.
As mentioned above, for the broad phase I'd use spatial hashing. Basically, you establish a grid and mark down what's in touch with each grid. (It doesn't have to be perfect -- it could be what axis-aligned bounding boxes are in each grid, even.) Then, later you go through the relevant cells of the grid and check if everything in each relevant cell is actually intersecting with anything else in the cell.
Trick is, instead of having an array, either have a hash table for every cell grid. That way you're only taking up space for grids that actually have something in them. (This is not a substitution for badly sized grids -- you want your grid to be coarse enough to not have an object in a ridiculous amount of cells because that takes memory, but you want it to be fine enough to not have all objects in a few cells because that doesn't save much time.) Chances are by visual inspection, you'll be able to figure out what a good grid size is.
One additional step to spatial hashing... if you want to save memory, throw away the indices that you'd normally verify in a hash table. False positives only cost CPU time, and if you're hashing correctly, it's not going to turn out to be much, but it can save you a lot of memory.
So:
When you update objects, update which grids they're probably in. (Again, it's good enough to just use a bounding box -- e.g. a square or rectangle around the object.) Add the object to the hash table for each cell it's in. (E.g. If you're in cell 5,4, that hashes to the 17th entry of the hash table. Add it to that entry of the hash table and throw away the 5,4 data.) Then, to test collisions, go through the relevant cells in the hash table (e.g. the entire screen's worth of cells if that's what you're interested in) and see what objects inside of each cell collide with other objects inside of each cell.
Compared to the solutions above:
Note brute forcing, takes less time.
This has some commonality with the "2D array" method mentioned because, after all, we're imposing a "grid" (or 2D array) over the represented space, however we're doing it in a way less prone to accuracy errors (since it's only used for a broad-phase that is conservative). Additionally, the memory requirements are lessened by the zealous data reduction in hash tables.
kd, sphere, X, BSP, R, and other "TLA"-trees are almost always quite nontrivial to implement correctly and test and, even after all that effort, can end up being much slower that you'd expect. You don't need that sort of complexity for a few hundreds of objects normally.
Implementation note:
Each node in the spatial hash table will ultimately be a linked list. I recommend writing your own linked list with careful allocations. Each node need take up more than 8 bytes (if you're using C/C++) and should a pooled allocation scheme so you're almost never allocating or freeing memory. Relying on the built-in allocator will likely cripple performance.
First thing, I am but a noob, I am working my way through the 3dbuzz xna extreme 101 videos, and we are just now covering a system that uses static lists of each different type of object, when updating an object you only check against the list/s of things it is supposed to collide with.
So you only check enemy collisions against the player or the players bullets, not other enemys etc.
So there is a static list of each type of game object, then each gamenode has its own collision list(edit:a list of nodes) , that are only the types it can hit.
sorry if its not clear what i mean, i'm still finding my feet

How to subdivide a 2d game world for better collision detection

I'm developing a game which features a sizeable square 2d playing area. The gaming area is tileless with bounded sides (no wrapping around). I am trying to figure out how I can best divide up this world to increase the performance of collision detection. Rather than checking each entity for collision with all other entities I want to only check nearby entities for collision and obstacle avoidance.
I have a few special concerns for this game world...
I want to be able to be able to use a large number of entities in the game world at once. However, a % of entities won't collide with entities of the same type. For example projectiles won't collide with other projectiles.
I want to be able to use a large range of entity sizes. I want there to be a very large size difference between the smallest entities and the largest.
There are very few static or non-moving entities in the game world.
I'm interested in using something similar to what's described in the answer here: Quadtree vs Red-Black tree for a game in C++?
My concern is how well will a tree subdivision of the world be able to handle large size differences in entities? To divide the world up enough for the smaller entities the larger ones will need to occupy a large number of regions and I'm concerned about how that will affect the performance of the system.
My other major concern is how to properly keep the list of occupied areas up to date. Since there's a lot of moving entities, and some very large ones, it seems like dividing the world up will create a significant amount of overhead for keeping track of which entities occupy which regions.
I'm mostly looking for any good algorithms or ideas that will help reduce the number collision detection and obstacle avoidance calculations.
If I were you I'd start off by implementing a simple BSP (binary space partition) tree. Since you are working in 2D, bound box checks are really fast. You basically need three classes: CBspTree, CBspNode and CBspCut (not really needed)
CBspTree has one root node instance of class CBspNode
CBspNode has an instance of CBspCut
CBspCut symbolize how you cut a set in two disjoint sets. This can neatly be solved by introducing polymorphism (e.g. CBspCutX or CBspCutY or some other cutting line). CBspCut also has two CBspNode
The interface towards the divided world will be through the tree class and it can be a really good idea to create one more layer on top of that, in case you would like to replace the BSP solution with e.g. a quad tree. Once you're getting the hang of it. But in my experience, a BSP will do just fine.
There are different strategies of how to store your items in the tree. What I mean by that is that you can choose to have e.g. some kind of container in each node that contains references to the objects occuping that area. This means though (as you are asking yourself) that large items will occupy many leaves, i.e. there will be many references to large objects and very small items will show up at single leaves.
In my experience this doesn't have that large impact. Of course it matters, but you'd have to do some testing to check if it's really an issue or not. You would be able to get around this by simply leaving those items at branched nodes in the tree, i.e. you will not store them on "leaf level". This means you will find those objects quick while traversing down the tree.
When it comes to your first question. If you only are going to use this subdivision for collision testing and nothing else, I suggest that things that can never collide never are inserted into the tree. A missile for example as you say, can't collide with another missile. Which would mean that you dont even have to store the missile in the tree.
However, you might want to use the bsp for other things as well, you didn't specify that but keep that in mind (for picking objects with e.g. the mouse). Otherwise I propose that you store everything in the bsp, and resolve the collision later on. Just ask the bsp of a list of objects in a certain area to get a limited set of possible collision candidates and perform the check after that (assuming objects know what they can collide with, or some other external mechanism).
If you want to speed up things, you also need to take care of merge and split, i.e. when things are removed from the tree, a lot of nodes will become empty or the number of items below some node level will decrease below some merge threshold. Then you want to merge two subtrees into one node containing all items. Splitting happens when you insert items into the world. So when the number of items exceed some splitting threshold you introduce a new cut, which splits the world in two. These merge and split thresholds should be two constants that you can use to tune the efficiency of the tree.
Merge and split are mainly used to keep the tree balanced and to make sure that it works as efficient as it can according to its specifications. This is really what you need to worry about. Moving things from one location and thus updating the tree is imo fast. But when it comes to merging and splitting it might become expensive if you do it too often.
This can be avoided by introducing some kind of lazy merge and split system, i.e. you have some kind of dirty flagging or modify count. Batch up all operations that can be batched, i.e. moving 10 objects and inserting 5 might be one batch. Once that batch of operations is finished, you check if the tree is dirty and then you do the needed merge and/or split operations.
Post some comments if you want me to explain further.
Cheers !
Edit
There are many things that can be optimized in the tree. But as you know, premature optimization is the root to all evil. So start off simple. For example, you might create some generic callback system that you can use while traversing the tree. This way you dont have to query the tree to get a list of objects that matched the bound box "question", instead you can just traverse down the tree and execute that call back each time you hit something. "If this bound box I'm providing intersects you, then execute this callback with these parameters"
You most definitely want to check this list of collision detection resources from gamedev.net out. It's full of resources with game development conventions.
For other than collision detection only, check their entire list of articles and resources.
My concern is how well will a tree
subdivision of the world be able to
handle large size differences in
entities? To divide the world up
enough for the smaller entities the
larger ones will need to occupy a
large number of regions and I'm
concerned about how that will affect
the performance of the system.
Use a quad tree. For objects that exist in multiple areas you have a few options:
Store the object in both branches, all the way down. Everything ends up in leaf nodes but you may end up with a significant number of extra pointers. May be appropriate for static things.
Split the object on the zone border and insert each part in their respective locations. Creates a lot of pain and isn't well defined for a lot of objects.
Store the object at the lowest point in the tree you can. Sets of objects now exist in leaf and non-leaf nodes, but each object has one pointer to it in the tree. Probably best for objects that are going to move.
By the way, the reason you're using a quad tree is because it's really really easy to work with. You don't have any heuristic based creation like you might with some BSP implementations. It's simple and it gets the job done.
My other major concern is how to
properly keep the list of occupied
areas up to date. Since there's a lot
of moving entities, and some very
large ones, it seems like dividing the
world up will create a significant
amount of overhead for keeping track
of which entities occupy which
regions.
There will be overhead to keeping your entities in the correct spots in the tree every time they move, yes, and it can be significant. But the whole point is that you're doing much much less work in your collision code. Even though you're adding some overhead with the tree traversal and update it should be much smaller than the overhead you just removed by using the tree at all.
Obviously depending on the number of objects, size of game world, etc etc the trade off might not be worth it. Usually it turns out to be a win, but it's hard to know without doing it.
There are lots of approaches. I'd recommend settings some specific goals (e.g., x collision tests per second with a ratio of y between smallest to largest entities), and do some prototyping to find the simplest approach that achieves those goals. You might be surprised how little work you have to do to get what you need. (Or it might be a ton of work, depending on your particulars.)
Many acceleration structures (e.g., a good BSP) can take a while to set up and thus are generally inappropriate for rapid animation.
There's a lot of literature out there on this topic, so spend some time searching and researching to come up with a list candidate approaches. Mock them up and profile.
I'd be tempted just to overlay a coarse grid over the play area to form a 2D hash. If the grid is at least the size of the largest entity then you only ever have 9 grid squares to check for collisions and it's a lot simpler than managing quad-trees or arbitrary BSP trees. The overhead of determining which coarse grid square you're in is typically just 2 arithmetic operations and when a change is detected the grid just has to remove one reference/ID/pointer from one square's list and add the same to another square.
Further gains can be had from keeping the projectiles out of the grid/tree/etc lookup system - since you can quickly determine where the projectile would be in the grid, you know which grid squares to query for potential collidees. If you check collisions against the environment for each projectile in turn, there's no need for the other entities to then check for collisions against the projectiles in reverse.

Efficient reordering of large dataset to maximize memory cache effectiveness

I've been working on a problem which I thought people might find interesting (and perhaps someone is aware of a pre-existing solution).
I have a large dataset consisting of a long list of pairs of pointers to objects, something like this:
[
(a8576, b3295),
(a7856, b2365),
(a3566, b5464),
...
]
There are way too many objects to keep in memory at any one time (potentially hundreds of gigabytes), so they need to be stored on disk, but can be cached in memory (probably using an LRU cache).
I need to run through this list processing every pair, which requires that both objects in the pair be loaded into memory (if they aren't already cached there).
So, the question: is there a way to reorder the pairs in the list to maximize the effectiveness of an in-memory cache (in other words: minimize the number of cache misses)?
Notes
Obviously, the re-ordering algorithm should be as fast as possible, and shouldn't depend on being able to have the entire list in memory at once (since we don't have enough RAM for that) - but it could iterate over the list several times if necessary.
If we were dealing with individual objects, not pairs, then the simple answer would be to sort them. This obviously won't work in this situation because you need to consider both elements in the pair.
The problem may be related to that of finding a minimum graph cut, but even if the problems are equivalent, I don't think solutions to min-cut meet
My assumption is that the heuristic would stream the data off the disk, and write it back in chunks in a better order. It may need to iterate over this several times.
Actually it may not just be pairs, it could be triplets, quadruplets, or more. I'm hoping that an algorithm that does this for pairs can be easily generalized.
Your problem is related to a similar one for computer graphics hardware:
When rendering indexed vertices in a triangle mesh, typically the hardware has a cache of most recently transformed vertices (~128 the last time I had to worry about it, but suspect the number is larger these days). Vertices not cached need a relatively expensive transform operation to calculate. "Mesh optimisation" to restructure triangle meshes to optimise cache usage used to be a pretty hot research topic. Googling
vertex cache optimisation
(or optimization :^) might find you some interesting material relevant to your problem. As other posters suggest, I suspect doing this effectively will depend on exploiting any inherent coherence in your data.
Another thing to bear in mind: as an LRU cache becomes overloaded it can be well worth changing to an MRU replacement strategy to at least hold some of the items in memory (rather than turning over the entire cache each pass). I seem to remember John Carmack has written some good material on this subject in connection with Direct3D texture caching strategies.
For start, you could mmap the list. That works if there's enough address space, not memory, e.g. on 64-bit CPUs. This makes it easier to access the elements in order.
You could sort that list according to a minimum distance in cache which considers both elements, which works well if the objects are in a contiguous space. The sorting function could be something like: compare (a, b) to (c, d) = (a - c) + (b - d) (which looks like a Hamming distance). Then you pull in slices of the object store and process according to the list.
EDIT: fixed a mistake in the distance.
Even though you're not just sorting this list, the general pattern of a multiway merge sort might be applicable - that is, consider some kind of (possibly recursive) breakdown of the set into smaller sets that can be dealt with in memory separately, and then a second phase where small chunks of the previously dealt-with sets can all be combined together. Even not knowing the specific nature of what you're doing with the pairs, it's safe to say that many algorithmic problems are made much more straightforward when you're dealing with sorted data (including graph problems, which might be what you have on your hands here).
I think the answer to this question is going to depend very heavily on exactly the access pattern of the pair of objects. As you said, just sorting the pointers would be best in a simple, non-paired case. In a more complex case it may still make sense to sort by one of the halves of the pair if the pattern is such that locality for those values is more important (if, for example, these are key/value pairs and you are doing a lot of searches, locality for the keys is infinitely more important than for the values).
So, really, my answer is that this question can't be answered in a general case.
For storing your structure, what you actually want is probably a B-tree. These are designed for what you're talking about--keeping track of large collections where you don't want to (or can't) keep the whole thing in memory.

Resources