Pattern name for flippable data structure? - algorithm

I'm trying to think of a naming convention that accurately conveys what's going on within a class I'm designing. On a secondary note, I'm trying to decide between two almost-equivalent user APIs.
Here's the situation:
I'm building a scientific application, where one of the central data structures has three phases: 1) accumulation, 2) analysis, and 3) query execution.
In my case, it's a spatial modeling structure, internally using a KDTree to partition a collection of points in 3-dimensional space. Each point describes one or more attributes of the surrounding environment, with a certain level of confidence about the measurement itself.
After adding (a potentially large number of) measurements to the collection, the owner of the object will query it to obtain an interpolated measurement at a new data point somewhere within the applicable field.
The API will look something like this (the code is in Java, but that's not really important; the code is divided into three sections, for clarity):
// SECTION 1:
// Create the aggregation object, and get the zillion objects to insert...
ContinuousScalarField field = new ContinuousScalarField();
Collection<Measurement> measurements = getMeasurementsFromSomewhere();
// SECTION 2:
// Add all of the zillion objects to the aggregation object...
// Each measurement contains its xyz location, the quantity being measured,
// and a numeric value for the measurement. For example, something like
// "68 degrees F, plus or minus 0.5, at point 1.23, 2.34, 3.45"
foreach (Measurement m : measurements) {
field.add(m);
}
// SECTION 3:
// Now the user wants to ask the model questions about the interpolated
// state of the model. For example, "what's the interpolated temperature
// at point (3, 4, 5)
Point3d p = new Point3d(3, 4, 5);
Measurement result = field.interpolateAt(p);
For my particular problem domain, it will be possible to perform a small amount of incremental work (partitioning the points into a balanced KDTree) during SECTION 2.
And there will be a small amount of work (performing some linear interpolations) that can occur during SECTION 3.
But there's a huge amount of work (constructing a kernel density estimator and performing a Fast Gauss Transform, using Taylor series and Hermite functions, but that's totally beside the point) that must be performed between sections 2 and 3.
Sometimes in the past, I've just used lazy-evaluation to construct the data structures (in this case, it'd be on the first invocation of the "interpolateAt" method), but then if the user calls the "field.add()" method again, I have to completely discard those data structures and start over from scratch.
In other projects, I've required the user to explicitly call an "object.flip()" method, to switch from "append mode" into "query mode". The nice this about a design like this is that the user has better control over the exact moment when the hard-core computation starts. But it can be a nuisance for the API consumer to keep track of the object's current mode. And besides, in the standard use case, the caller never adds another value to the collection after starting to issue queries; data-aggregation almost always fully precedes query preparation.
How have you guys handled designing a data structure like this?
Do you prefer to let an object lazily perform its heavy-duty analysis, throwing away the intermediate data structures when new data comes into the collection? Or do you require the programmer to explicitly flip the data structure from from append-mode into query-mode?
And do you know of any naming convention for objects like this? Is there a pattern I'm not thinking of?
ON EDIT:
There seems to be some confusion and curiosity about the class I used in my example, named "ContinuousScalarField".
You can get a pretty good idea for what I'm talking about by reading these wikipedia pages:
http://en.wikipedia.org/wiki/Scalar_field
http://en.wikipedia.org/wiki/Vector_field
Let's say you wanted to create a topographical map (this is not my exact problem, but it's conceptually very similar). So you take a thousand altitude measurements over an area of one square mile, but your survey equipment has a margin of error of plus-or-minus 10 meters in elevation.
Once you've gathered all the data points, you feed them into a model which not only interpolates the values, but also takes into account the error of each measurement.
To draw your topo map, you query the model for the elevation of each point where you want to draw a pixel.
As for the question of whether a single class should be responsible for both appending and handling queries, I'm not 100% sure, but I think so.
Here's a similar example: HashMap and TreeMap classes allow objects to be both added and queried. There aren't separate interfaces for adding and querying.
Both classes are also similar to my example, because the internal data structures have to be maintained on an ongoing basis in order to support the query mechanism. The HashMap class has to periodically allocate new memory, re-hash all objects, and move objects from the old memory to the new memory. A TreeMap has to continually maintain tree balance, using the red-black-tree data structure.
The only difference is that my class will perform optimally if it can perform all of its calculations once it knows the data set is closed.

If an object has two modes like this, I would suggest exposing two interfaces to the client. If the object is in append mode, then you make sure that the client can only ever use the IAppendable implementation. To flip to query mode, you add a method to IAppendable such as AsQueryable. To flip back, call IQueryable.AsAppendable.
You can implement IAppendable and IQueryable on the same object, and keep track of the state in the same way internally, but having two interfaces makes it clear to the client what state the object is in, and forces the client to deliberately make the (expensive) switch.

I generally prefer to have an explicit change, rather than lazily recomputing the result. This approach makes the performance of the utility more predictable, and it reduces the amount of work I have to do to provide a good user experience. For example, if this occurs in a UI, where do I have to worry about popping up an hourglass, etc.? Which operations are going to block for a variable amount of time, and need to be performed in a background thread?
That said, rather than explicitly changing the state of one instance, I would recommend the Builder Pattern to produce a new object. For example, you might have an aggregator object that does a small amount of work as you add each sample. Then instead of your proposed void flip() method, I'd have a Interpolator interpolator() method that gets a copy of the current aggregation and performs all your heavy-duty math. Your interpolateAt method would be on this new Interpolator object.
If your usage patterns warrant, you could do simple caching by keeping a reference to the interpolator you create, and return it to multiple callers, only clearing it when the aggregator is modified.
This separation of responsibilities can help yield more maintainable and reusable object-oriented programs. An object that can return a Measurement at a requested Point is very abstract, and perhaps a lot of clients could use your Interpolator as one strategy implementing a more general interface.
I think that the analogy you added is misleading. Consider an alternative analogy:
Key[] data = new Key[...];
data[idx++] = new Key(...); /* Fast! */
...
Arrays.sort(data); /* Slow! */
...
boolean contains = Arrays.binarySearch(data, datum) >= 0; /* Fast! */
This can work like a set, and actually, it gives better performance than Set implementations (which are implemented with hash tables or balanced trees).
A balanced tree can be seen as an efficient implementation of insertion sort. After every insertion, the tree is in a sorted state. The predictable time requirements of a balanced tree are due to the fact the cost of sorting is spread over each insertion, rather than happening on some queries and not others.
The rehashing of hash tables does result in less consistent performance, and because of that, aren't appropriate for certain applications (perhaps a real-time microcontroller). But even the rehashing operation depends only on the load factor of the table, not the pattern of insertion and query operations.
For your analogy to hold strictly, you would have to "sort" (do the hairy math) your aggregator with each point you add. But it sounds like that would be cost prohibitive, and that leads to the builder or factory method patterns. This makes it clear to your clients when they need to be prepared for the lengthy "sort" operation.

Your objects should have one role and responsibility. In your case should the ContinuousScalarField be responsible for interpolating?
Perhaps you might be better off doing something like:
IInterpolator interpolator = field.GetInterpolator();
Measurement measurement = Interpolator.InterpolateAt(...);
I hope this makes sense, but without fully understanding your problem domain it's hard to give you a more coherent answer.

"I've just used lazy-evaluation to construct the data structures" -- Good
"if the user calls the "field.add()" method again, I have to completely discard those data structures and start over from scratch." -- Interesting
"in the standard use case, the caller never adds another value to the collection after starting to issue queries" -- Whoops, false alarm, actually not interesting.
Since lazy eval fits your use case, stick with it. That's a very heavily used model because it is so delightfully reliable and fits most use cases very well.
The only reason for rethinking this is (a) the use case change (mixed adding and interpolation), or (b) performance optimization.
Since use case changes are unlikely, you might consider the performance implications of breaking up interpolation. For example, during idle time, can you precompute some values? Or with each add is there a summary you can update?
Also, a highly stateful (and not very meaningful) flip method isn't so useful to clients of your class. However, breaking interpolation into two parts might still be helpful to them -- and help you with optimization and state management.
You could, for example, break interpolation into two methods.
public void interpolateAt( Point3d p );
public Measurement interpolatedMasurement();
This borrows the relational database Open and Fetch paradigm. Opening a cursor can do a lot of preliminary work, and may start executing the query, you don't know. Fetching the first row may do all the work, or execute the prepared query, or simply fetch the first buffered row. You don't really know. You only know that it's a two part operation. The RDBMS developers are free to optimize as they see fit.

Do you prefer to let an object lazily perform its heavy-duty analysis,
throwing away the intermediate data structures when new data comes
into the collection? Or do you require the programmer to explicitly
flip the data structure from from append-mode into query-mode?
I prefer using data structures that allow me to incrementally add to it with "a little more work" per addition, and to incrementally pull the data I need with "a little more work" per extraction.
Perhaps if you do some "interpolate_at()" call in the upper-right corner of your region, you only need to do calculations involving the points in that upper-right corner,
and it doesn't hurt anything to leave the other 3 quadrants "open" to new additions.
(And so on down the recursive KDTree).
Alas, that's not always possible -- sometimes the only way to add more data is to throw away all the previous intermediate and final results, and re-calculate everything again from scratch.
The people who use the interfaces I design -- in particular, me -- are human and fallible.
So I don't like using objects where those people must remember to do things in a certain way, or else things go wrong -- because I'm always forgetting those things.
If an object must be in the "post-calculation state" before getting data out of it,
i.e. some "do_calculations()" function must be run before the interpolateAt() function gets valid data,
I much prefer letting the interpolateAt() function check if it's already in that state,
running "do_calculations()" and updating the state of the object if necessary,
and then returning the results I expected.
Sometimes I hear people describe such a data structure as "freeze" the data or "crystallize" the data or "compile" or "put the data into an immutable data structure".
One example is converting a (mutable) StringBuilder or StringBuffer into an (immutable) String.
I can imagine that for some kinds of analysis, you expect to have all the data ahead of time,
and pulling out some interpolated value before all the data has put in would give wrong results.
In that case,
I'd prefer to set things up such that the "add_data()" function fails or throws an exception
if it (incorrectly) gets called after any interpolateAt() call.
I would consider defining a lazily-evaluated "interpolated_point" object that doesn't really evaluate the data right away, but only tells that program that sometime in the future that data at that point will be required.
The collection isn't actually frozen, so it's OK to continue adding more data to it,
up until the point something actually extract the first real value from some "interpolated_point" object,
which internally triggers the "do_calculations()" function and freezes the object.
It might speed things up if you know not only all the data, but also all the points that need to be interpolated, all ahead of time.
Then you can throw away data that is "far away" from the interpolated points,
and only do the heavy-duty calculations in regions "near" the interpolated points.
For other kinds of analysis, you do the best you can with the data you have, but when more data comes in later, you want to use that new data in your later analysis.
If the only way to do that is to throw away all the intermediate results and recalculate everything from scratch, then that's what you have to do.
(And it's best if the object automatically handled this, rather than requiring people to remember to call some "clear_cache()" and "do_calculations()" function every time).

You could have a state variable. Have a method for starting the high level processing, which will only work if the STATE is in SECTION-1. It will set the state to SECTION-2, and then to SECTION-3 when it is done computing. If there's a request to the program to interpolate a given point, it will check if the state is SECTION-3. If not, it will request the computations to begin, and then interpolate the given data.
This way, you accomplish both - the program will perform its computations at the first request to interpolate a point, but can also be requested to do so earlier. This would be convenient if you wanted to run the computations overnight, for example, without needing to request an interpolation.

Related

Best practice about detecting DOM element exist in D3

All:
In D3, it often uses .data().enter().append() to reuse existing elements rather than remove everything and add them, but on the other hand, when the DOM structure is very deep, it will involve a lot of this detect(one for every level), I wonder if there is a good way to detect until which level, I need to start use .enter() rather than from the top level?
Thanks
The way I understand your question, you could be asking about one of two possible things. Either:
you're asking about how to use d3's .data() binding method to compute the three sets (enter, update, exit) at multiple levels of a dom hierarchy; or
you already know how to do #1, and are asking about how to NOT do it (i.e. skip calling .data()) in certain cases in order to really optimize performance.
If the question is #1, then check out this tutorial on working with nested selection by passing a function into the first argument of .data().
If the question is #2, then you're taking a risk. By that I mean that you're risking spending a whole lot of time and effort to optimize an aspect of your code that's probably far from being the slowest part of the program. Usually, it's the browser's rendering that's the slowest, while the data binding is quite fast. In fact, following the nested selections pattern from #1 is likely the most effective way to optimize, because it eliminates unnecessary appending to - and re-rendering of - the DOM.
If you really want to do #2 anyway, then I think the way to start is by implementing it using nested selections from #1, and then adding some sort of if statement at every level of the hierarchy that decides whether it's ok to skip calling the .data() method. For that, you have to examine the incoming data vs the outgoing data and deciding whether they're still equal or not. However, since deciding whether things are still equal is roughly what d3's .data() method does, then your optimization of it would have to do even less. Perhaps one way to achieve that level of optimization would involve using immutable data structures, because that's a way to quickly test equality of two nested data structures (that's basically how things work in React.js). It sounds complicated though. That's why I say it's a risk....
There may be another approach, in which you analyze the incoming vs outgoing data and determine which branches of the data hierarchy have changed and then pinpoint the equivalent location in the DOM and use d3 .data() locally within those changed DOM nodes. That sounds even more complex and ambiguous. So to get more help with that on, you'd have to create something like a jsFiddle that recreates your specific scenario.

Reflection.Emit Performance

Here's a simple question.
Let's say we want to unroll a looping method such as:
public int DoSum1(int n)
{
int result = 0;
for(int i = 1;i <= n; i++)
{
result += i;
}
return result;
}
Into a method performing simple additions only:
public int DoSum2( )
{
return 1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+16+17+18+19+20;
}
[http://etutorials.org/Programming/Programming+C.Sharp/Part+III+The+CLR+and+the+.NET+Framework/Chapter+18.+Attributes+and+Reflection/18.3+Reflection+Emit/][1]
Logically, we're going to need code to create DoSum2 in IL at some point.
In this IL generation code we will perform an actual loop with the same iteration count than the unoptimized method.
What's the point of creating a super fast dynamic method if the code required to generate it will use a similar amount of time to execute???
Perhaps you can give an example, when it worths using Emit in a similar case?
What's the point of creating a super fast dynamic method if the code required to generate it will use a similar amount of time to execute
This isn't really specific to Reflection.Emit, but to runtime code generation in general, so I will answer accordingly.
First, I do not recommend using code generation simply to perform micro-optimizations that compilers normally perform like loop unrolling. Let the JIT compiler do it's job.
Second, you are right in that there is usually little point in generating code that will only execute once. The time required to emit and JIT compile the IL is not insubstantial. You should only bother generating code if it will be executed many times.
Now, there definitely are cases where runtime code generation can prove beneficial. In fact, it's a technique I leverage heavily. I work in an electronic trading environment where it is necessary to process very high volumes of dynamic data. This introduces several concerns, the most significant being memory usage and throughput.
Our trading application needs to keep a lot of data in memory, so the footprint of each record is critical. Dynamic data structures like maps/dictionaries are less efficient than "POCO" classes with optimized field layouts and, depending on the design, may require boxing some values. I avoid this overhead by generating client-side storage classes once the shape of the data is known. In effect, the memory layout is as it would have been had I known the shape of the data at compile time.
Throughput is a major issue as well; (de)serializing dynamic data often involves some additional introspection and extra layers of indirection. Need to serialize a record? OK, first you need to query what the fields are. Then, for each field, you need to determine its type, then select a serializer for that type, and then invoke the serializer. If your data structure has optional fields, you may need to do some additional pre-processing, like figuring out the size of a presence map, and which bits in the presence map correspond to which fields. If you need to process a ton of data, all that overhead becomes a real problem. I avoid this overhead by generating specialized (de)serializers on both the server side and client side. Since the serializers are generated on demand, they can know the exact shape of the data, and read/write that data as efficiently as a hand-optimized serializer. When you have a high volume of data updating at very high frequencies, this can make a huge difference.
Now, keep in mind that we're something of an edge case. Most applications do not have the aggressive memory and throughput requirements that ours has, so runtime code generation isn't necessary. You should only go that route if you really need it, and you have exhausted all other possibilities. Although it can help with performance, generated code can be very difficult to debug and maintain.

How to use BDD to code complex data structures / data layers

I'm new to behavior-driven development and I can't find any examples or guidelines that parallel my current problem.
My current project involves a massive 3D grid with an arbitrary number of slots in each of the discrete cells. The entities stored in these slots have their own slots and, thus, an arbitrary nesting of entities can exist. The final implementation of the object(s) used will need be backed by some kind of persistent data store, which complicates the API a bit (i.e. using words like load/store instead of get/set and making sure modifying returned items doesn't modify the corresponding items in the data store itself). Don't worry, my first implementation will simply exist in-memory, but the API is what I'm supposed to be defining behavior against, so the actual implementation doesn't matter right now.
The thing I'm stuck on is the fact that BDD literature focuses on the interactions between objects and how mock objects can help with that. That doesn't seem to apply at all here. My abstract data store's only real "behavior" involves loading and storing data from entities outside those represented by the programming language itself; I can't define or test those behaviors since they're implementation-dependent.
So what can I define/test? The natural alternative is state. Store something. Make sure it loads. Modify the thing I loaded and make sure after I reload it's unmodified. Etc. But I'm under the impression that this is a common pitfall for new BDD developers, so I'm wondering if there's a better way that avoids it.
If I do take the state-testing route, a couple other questions arise. Obviously I can test an empty grid first, then an empty entity at one location, but what next? Two entities in different locations? Two entities in the same location? Nested entities? How deep should I test the nesting? Do I test the Cartesian product of these non-exclusive cases, i.e. two entities in the same location AND with nested entities each? The list goes on forever and I wouldn't know where to stop.
The difference between TDD and BDD is about language. Specifically, BDD focuses on function/object/system behavior to improve design and test readability.
Often when we think about behavior we think in terms of object interaction and collaboration and therefore need mocks to unit test. However, there is nothing wrong with an object whose behavior is to modify the state of a grid, if that is appropriate. State or mock based testing can be used in TDD/BDD alike.
However, for testing complex data structures, you should use a Matchers (e.g. Hamcrest in Java) to test only the part of the state you are interested in. You should also consider whether you can decompose the complex data into objects that collaborate (but only if that makes sense from an algorithmic/design standpoint).

Method for runtime comparison of two programs' objects

I am working through a particular type of code testing that is rather nettlesome and could be automated, yet I'm not sure of the best practices. Before describing the problem, I want to make clear that I'm looking for the appropriate terminology and concepts, so that I can read more about how to implement it. Suggestions on best practices are welcome, certainly, but my goal is specific: what is this kind of approach called?
In the simplest case, I have two programs that take in a bunch of data, produce a variety of intermediate objects, and then return a final result. When tested end-to-end, the final results differ, hence the need to find out where the differences occur. Unfortunately, even intermediate results may differ, but not always in a significant way (i.e. some discrepancies are tolerable). The final wrinkle is that intermediate objects may not necessarily have the same names between the two programs, and the two sets of intermediate objects may not fully overlap (e.g. one program may have more intermediate objects than the other). Thus, I can't assume there is a one-to-one relationship between the objects created in the two programs.
The approach that I'm thinking of taking to automate this comparison of objects is as follows (it's roughly inspired by frequency counts in text corpora):
For each program, A and B: create a list of the objects created throughout execution, which may be indexed in a very simple manner, such as a001, a002, a003, a004, ... and similarly for B (b001, ...).
Let Na = # of unique object names encountered in A, similarly for Nb and # of objects in B.
Create two tables, TableA and TableB, with Na and Nb columns, respectively. Entries will record a value for each object at each trigger (i.e. for each row, defined next).
For each assignment in A, the simplest approach is to capture the hash value of all of the Na items; of course, one can use LOCF (last observation carried forward) for those items that don't change, and any as-yet unobserved objects are simply given a NULL entry. Repeat this for B.
Match entries in TableA and TableB via their hash values. Ideally, objects will arrive into the "vocabulary" in approximately the same order, so that order and hash value will allow one to identify the sequences of values.
Find discrepancies in the objects between A and B based on when the sequences of hash values diverge for any objects with divergent sequences.
Now, this is a simple approach and could work wonderfully if the data were simple, atomic, and not susceptible to numerical precision issues. However, I believe that numerical precision may cause hash values to diverge, though the impact is insignificant if the discrepancies are approximately at the machine tolerance level.
First: What is a name for such types of testing methods and concepts? An answer need not necessarily be the method above, but reflects the class of methods for comparing objects from two (or more) different programs.
Second: What are standard methods exist for what I describe in steps 3 and 4? For instance, the "value" need not only be a hash: one might also store the sizes of the objects - after all, two objects cannot be the same if they are massively different in size.
In practice, I tend to compare a small number of items, but I suspect that when automated this need not involve a lot of input from the user.
Edit 1: This paper is related in terms of comparing the execution traces; it mentions "code comparison", which is related to my interest, though I'm concerned with the data (i.e. objects) than with the actual code that produces the objects. I've just skimmed it, but will review it more carefully for methodology. More importantly, this suggests that comparing code traces may be extended to comparing data traces. This paper analyzes some comparisons of code traces, albeit in a wholly unrelated area of security testing.
Perhaps data-tracing and stack-trace methods are related. Checkpointing is slightly related, but its typical use (i.e. saving all of the state) is overkill.
Edit 2: Other related concepts include differential program analysis and monitoring of remote systems (e.g. space probes) where one attempts to reproduce the calculations using a local implementation, usually a clone (think of a HAL-9000 compared to its earth-bound clones). I've looked down the routes of unit testing, reverse engineering, various kinds of forensics, and whatnot. In the development phase, one could ensure agreement with unit tests, but this doesn't seem to be useful for instrumented analyses. For reverse engineering, the goal can be code & data agreement, but methods for assessing fidelity of re-engineered code don't seem particularly easy to find. Forensics on a per-program basis are very easily found, but comparisons between programs don't seem to be that common.
(Making this answer community wiki, because dataflow programming and reactive programming are not my areas of expertise.)
The area of data flow programming appears to be related, and thus debugging of data flow programs may be helpful. This paper from 1981 gives several useful high level ideas. Although it's hard to translate these to immediately applicable code, it does suggest a method I'd overlooked: when approaching a program as a dataflow, one can either statically or dynamically identify where changes in input values cause changes in other values in the intermediate processing or in the output (not just changes in execution, if one were to examine control flow).
Although dataflow programming is often related to parallel or distributed computing, it seems to dovetail with Reactive Programming, which is how the monitoring of objects (e.g. the hashing) can be implemented.
This answer is far from adequate, hence the CW tag, as it doesn't really name the debugging method that I described. Perhaps this is a form of debugging for the reactive programming paradigm.
[Also note: although this answer is CW, if anyone has a far better answer in relation to dataflow or reactive programming, please feel free to post a separate answer and I will remove this one.]
Note 1: Henrik Nilsson and Peter Fritzson have a number of papers on debugging for lazy functional languages, which are somewhat related: the debugging goal is to assess values, not the execution of code. This paper seems to have several good ideas, and their work partially inspired this paper on a debugger for a reactive programming language called Lustre. These references don't answer the original question, but may be of interest to anyone facing this same challenge, albeit in a different programming context.

Proper Data Structure Choice for Collision System

I am looking to implement a 2D top-down collision system, and was hoping for some input as to the likely performance between a few different ideas. For reference I expect the number of moving collision objects to be in the dozens, and the static collision objects to be in the hundreds.
The first idea is border-line brute force (or maybe not so border-line). I would store two lists of collision objects in a collision system. One list would be dynamic objects, the other would include both dynamic and static objects (each dynamic would be in both lists). Each frame I would loop through the dynamic list and pass each object the larger list, so it could find anything it may run into. This will involve a lot of unnecessary calculations for any reasonably sized loaded area but I am using it as a sort of baseline because it would be very easy to implement.
The second idea is to have a single list of all collision objects, and a 2D array of either ints or floats representing the loaded area. Each element in the array would represent a physical location, and each object would have a size value. Each time an object moved, it would subtract its size value from its old location and add it to its new location. The objects would have to access elements in the array before they moved to make sure there was room in their new location, but that would be fairly simple to do. Besides the fact that I have a very public, very large array, I think it would perform fairly well. I could also implement with a boolean array, simply storing if a location is full or not, but I don't see any advantage to this over the numeric storage.
The third I idea I had was less well formed. A month or two ago I read about a two dimensional, rectangle based data structure (may have been a tree, i don't remember) that would be able to keep elements sorted by position. Then I would only have to pass the dynamic objects their small neighborhood of objects for update. I was wondering if anyone had any idea what this data structure might be, so I could look more into it, and if so, how the per-frame sorting of it would affect performance relative to the other methods.
Really I am just looking for ideas on how these would perform, and any pitfalls I am likely overlooking in any of these. I am not so much worried about the actual detection, as the most efficient way to make the objects talk to one another.
You're not talking about a lot of objects in this case. Honestly, you could probably brute force it and probably be fine for your application, even in mobile game development. With that in mind, I'd recommend you keep it simple but throw a bit of optimization on top for gravy. Spatial hashing with a reasonable cell size is the way I'd go here -- relatively reasonable memory use, decent speedup, and not that bad as far as complexity of implementation goes. More on that in a moment!
You haven't said what the representation of your objects is, but in any case you're likely going to end up with a typical "broad phase" and "narrow phase" (like a physics engine) -- the "broad phase" consisting of a false-positives "what could be intersecting?" query and the "narrow phase" brute forcing out the resulting potential intersections. Unless you're using things like binary space partitioning trees for polygonal shapes, you're not going to end up with a one-phase solution.
As mentioned above, for the broad phase I'd use spatial hashing. Basically, you establish a grid and mark down what's in touch with each grid. (It doesn't have to be perfect -- it could be what axis-aligned bounding boxes are in each grid, even.) Then, later you go through the relevant cells of the grid and check if everything in each relevant cell is actually intersecting with anything else in the cell.
Trick is, instead of having an array, either have a hash table for every cell grid. That way you're only taking up space for grids that actually have something in them. (This is not a substitution for badly sized grids -- you want your grid to be coarse enough to not have an object in a ridiculous amount of cells because that takes memory, but you want it to be fine enough to not have all objects in a few cells because that doesn't save much time.) Chances are by visual inspection, you'll be able to figure out what a good grid size is.
One additional step to spatial hashing... if you want to save memory, throw away the indices that you'd normally verify in a hash table. False positives only cost CPU time, and if you're hashing correctly, it's not going to turn out to be much, but it can save you a lot of memory.
So:
When you update objects, update which grids they're probably in. (Again, it's good enough to just use a bounding box -- e.g. a square or rectangle around the object.) Add the object to the hash table for each cell it's in. (E.g. If you're in cell 5,4, that hashes to the 17th entry of the hash table. Add it to that entry of the hash table and throw away the 5,4 data.) Then, to test collisions, go through the relevant cells in the hash table (e.g. the entire screen's worth of cells if that's what you're interested in) and see what objects inside of each cell collide with other objects inside of each cell.
Compared to the solutions above:
Note brute forcing, takes less time.
This has some commonality with the "2D array" method mentioned because, after all, we're imposing a "grid" (or 2D array) over the represented space, however we're doing it in a way less prone to accuracy errors (since it's only used for a broad-phase that is conservative). Additionally, the memory requirements are lessened by the zealous data reduction in hash tables.
kd, sphere, X, BSP, R, and other "TLA"-trees are almost always quite nontrivial to implement correctly and test and, even after all that effort, can end up being much slower that you'd expect. You don't need that sort of complexity for a few hundreds of objects normally.
Implementation note:
Each node in the spatial hash table will ultimately be a linked list. I recommend writing your own linked list with careful allocations. Each node need take up more than 8 bytes (if you're using C/C++) and should a pooled allocation scheme so you're almost never allocating or freeing memory. Relying on the built-in allocator will likely cripple performance.
First thing, I am but a noob, I am working my way through the 3dbuzz xna extreme 101 videos, and we are just now covering a system that uses static lists of each different type of object, when updating an object you only check against the list/s of things it is supposed to collide with.
So you only check enemy collisions against the player or the players bullets, not other enemys etc.
So there is a static list of each type of game object, then each gamenode has its own collision list(edit:a list of nodes) , that are only the types it can hit.
sorry if its not clear what i mean, i'm still finding my feet

Resources