I'm new to behavior-driven development and I can't find any examples or guidelines that parallel my current problem.
My current project involves a massive 3D grid with an arbitrary number of slots in each of the discrete cells. The entities stored in these slots have their own slots and, thus, an arbitrary nesting of entities can exist. The final implementation of the object(s) used will need be backed by some kind of persistent data store, which complicates the API a bit (i.e. using words like load/store instead of get/set and making sure modifying returned items doesn't modify the corresponding items in the data store itself). Don't worry, my first implementation will simply exist in-memory, but the API is what I'm supposed to be defining behavior against, so the actual implementation doesn't matter right now.
The thing I'm stuck on is the fact that BDD literature focuses on the interactions between objects and how mock objects can help with that. That doesn't seem to apply at all here. My abstract data store's only real "behavior" involves loading and storing data from entities outside those represented by the programming language itself; I can't define or test those behaviors since they're implementation-dependent.
So what can I define/test? The natural alternative is state. Store something. Make sure it loads. Modify the thing I loaded and make sure after I reload it's unmodified. Etc. But I'm under the impression that this is a common pitfall for new BDD developers, so I'm wondering if there's a better way that avoids it.
If I do take the state-testing route, a couple other questions arise. Obviously I can test an empty grid first, then an empty entity at one location, but what next? Two entities in different locations? Two entities in the same location? Nested entities? How deep should I test the nesting? Do I test the Cartesian product of these non-exclusive cases, i.e. two entities in the same location AND with nested entities each? The list goes on forever and I wouldn't know where to stop.
The difference between TDD and BDD is about language. Specifically, BDD focuses on function/object/system behavior to improve design and test readability.
Often when we think about behavior we think in terms of object interaction and collaboration and therefore need mocks to unit test. However, there is nothing wrong with an object whose behavior is to modify the state of a grid, if that is appropriate. State or mock based testing can be used in TDD/BDD alike.
However, for testing complex data structures, you should use a Matchers (e.g. Hamcrest in Java) to test only the part of the state you are interested in. You should also consider whether you can decompose the complex data into objects that collaborate (but only if that makes sense from an algorithmic/design standpoint).
I am simulating a NAO robot that has published physical properties for its links and joints (such as dimensions, link mass, center of mass, mass moment of inertia about that COM, etc).
The upper torso will be static, and I would like to get the lumped physical properties for the static upper torso. I have the math lined out (inertia tensors with rotation and parallel axis theorem), but I am wondering what the best method is to structure the data.
Currently, I am just defining everything as rules, a method I got from looking at data Import[]'d from struct's in a MAT file. I refer to attributes / properties with strings so that I don't have to worry about symbols being defined. Additionally, it makes it easier to generate the names for the different degrees of freedom.
Here is an example of how I am defining this:
http://pastebin.com/VNBwAVbX
I am also considering using some OOP package for Mathematica, but I am unsure of how to easily define it.
For this task, I had taken a look at Lichtblau's slides but could not figure out a simple way to apply it towards defining my data structure. I ended up using MathOO which got the job accomplished since efficiency was not too much of a concern and it was more or less of a one-time deal.
Hey guys! My physics engine is coming along quite nicely (thanks for asking!) and I'm ready to start working on some even more advanced junk. Case in point, I'm trying to set up my collision engine so that an arbitrary delegate can be notified when a collision occurs. Let me set up a scenario for you:
Say we have object A, object B, and object C in the physics simulation. I want to be able to inform a delegate about a collision between A and B, AND inform a potentially DIFFERENT delegate about a collision between A and C.
A little background information: I have a known interface for the delegate, I have the potential of storing state for my collision detector (but don't atm), and have the ability to store state in the objects themselves. Similarly, I use this delegate model to handle collision resolution, simply setting the physics engine as the delegate for all objects by default, allowing the user to change the delegate if desired.
Now, I already tried having each object store it's own collision delegate that would be informed when a collision occurred. This didn't work because when the objects had the same collision delegate, the same collision was handled twice. When I switched to only using the delegate of the first object (however that was decided), the order of simulation became an issue. I want to use a dictionary, but that introduces a significant amount of overhead. However, that seems like the direction I need to be heading.
So here's the question: fight to the death over a suitable solution. How would YOU solve this problem?
I must say that it's a bit odd that two objects can have different delegates (at a collision) and still it would be bad if two identical delegates fired at a collision. I seems like they should both fire all the time or only one of them should. Consistency is what bothers me here.
Explaining that would help some more.
Second, if you use the naive version of holding a delegate for each object and then conditioning activating its functionality ("if (!some boolean indicating this delegate was fired already) {do something}"), this could be solved with a very small overhead.
It works, but I don't like this kind of code.
My suggestion (a bit complex, so think about it before working on it) is to try to focus on a manager object which would go over all the delegates and invoke the two that where relevant to the collision.
For instance, A and B collide, and the manager is invoked with them as parameters. You now can cycle through all the delegates known to the system (assuming they are few) and fire the ones that match "delegate == a.del or delegate == b.del".
This comes at a greater overhead, but if we are talking about a small number of delegates, if will make very little difference. On the other side this will allow you to expend your collision detection engine in this area further more in the future (like the existence of more then one delegate per object).
I am currently developing a charting application (for the iPhone, although that is largely irrelevant) using their MVC pattern.
One aspect of the application is that you can overlay a number of statistics on the charts. I am a little unsure how I am going to structure these classes.
For each statistic there will be two aspects.
1. The calculation. The function which will take the data and calculate the relevant statistical figures.
2. The display. The statistics then need to be drawn over the top of the graph.
Obviously I want the code to comply with the MVC pattern as closely as possible, but I am planning to develop possibly hundreds of these statistics.
I could create three classes. One for the graphics, one for the logic and a factory class to tie the two together. This would then fit with the pattern, but this seems to be a huge extra overhead in terms of the number of classes in the system and additional complexity which I dont feel is necessary.
So, I am very tempted to create a single class for each statistic. But that would mean each class would have logic and graphics mixed in together, which is heavily frowned upon.
Are there any other suggestions as to how I can lay these out in a structured reuseable way without adding uneccessary complexity?
EDIT
Thanks for the answers. Most useful, but has raised more questions!
MVC does fit the rest of the application perfectly. Also as its for the iPhone, I seem to be pushed along this path anyway. This is the only reason I am considering MVC for these statistics.
However, for these statistics, the user will not interact with them, they are purely for display. The statistics are painting various lines and symbols directly onto the view canvas. Each statistic paints their information in its own way. There is very little that can be shared between each one, also each piece of data can only be useful represented in one way. I can think of no other useful way that I would want to represent the information.
So it seems MVC is out for these, but I am unsure now what pattern would fit other than my newly invented "Mix logic and graphics" pattern which just feels wrong due to the Single Responsibility Principle (thanks for that link).
First of all, will the user interact with the statistics? If not, then you don't need MVC. (The Controller in MVC deals with user interaction).
You want to keep the number of classes to a minimum, which is good. Let's consider both the calculation and display separately.
How will you be displaying the statistics? Will they typically be text labels, or will there be other graph elements (error bars or things like that)? Try to figure out the different ways you will want to display your statistics.
How many different calculations will you have? Does each calculation map directly to a single graph element, or might it be drawn in a number of different ways? Try to figure out how the calculations relate to the graph elements.
As a concrete example, suppose you have a set of data points that you have plotted. You want to display the mean, the median, and the mode. You could display each of these as separate horizontal lines that cut through the chart at the appropriate Y value. The calculations are all independent, but the display logic can be shared. Or, perhaps you want to display the mean as both a line and as a text label. Here, there is only one calculation, but two different display methods.
MVC design is about separating the underlying data from its presentation. By doing this, you can re-use bits of presentation logic for many different pieces of data. Also, you can display a single piece of data in multiple ways, and they will all stay in sync.
It depends on what you call complex. Most people consider methods or classes that are responsible for multiple things complex and classes or methods that are responsible for one thing simple. This is also known as the SRP
When you use MVC, the view contains the display logic, the model contains the business logic (the calculation in your case), and the controller ties them together. You can use different ways to implement MVC. Complex applications need to seperate domain model and map to a viewmodel, but most simple applications can use one model.
If you define complexity on a different way, MVC might not be what you like, and you should try a different approach.
I'm trying to think of a naming convention that accurately conveys what's going on within a class I'm designing. On a secondary note, I'm trying to decide between two almost-equivalent user APIs.
Here's the situation:
I'm building a scientific application, where one of the central data structures has three phases: 1) accumulation, 2) analysis, and 3) query execution.
In my case, it's a spatial modeling structure, internally using a KDTree to partition a collection of points in 3-dimensional space. Each point describes one or more attributes of the surrounding environment, with a certain level of confidence about the measurement itself.
After adding (a potentially large number of) measurements to the collection, the owner of the object will query it to obtain an interpolated measurement at a new data point somewhere within the applicable field.
The API will look something like this (the code is in Java, but that's not really important; the code is divided into three sections, for clarity):
// SECTION 1:
// Create the aggregation object, and get the zillion objects to insert...
ContinuousScalarField field = new ContinuousScalarField();
Collection<Measurement> measurements = getMeasurementsFromSomewhere();
// SECTION 2:
// Add all of the zillion objects to the aggregation object...
// Each measurement contains its xyz location, the quantity being measured,
// and a numeric value for the measurement. For example, something like
// "68 degrees F, plus or minus 0.5, at point 1.23, 2.34, 3.45"
foreach (Measurement m : measurements) {
field.add(m);
}
// SECTION 3:
// Now the user wants to ask the model questions about the interpolated
// state of the model. For example, "what's the interpolated temperature
// at point (3, 4, 5)
Point3d p = new Point3d(3, 4, 5);
Measurement result = field.interpolateAt(p);
For my particular problem domain, it will be possible to perform a small amount of incremental work (partitioning the points into a balanced KDTree) during SECTION 2.
And there will be a small amount of work (performing some linear interpolations) that can occur during SECTION 3.
But there's a huge amount of work (constructing a kernel density estimator and performing a Fast Gauss Transform, using Taylor series and Hermite functions, but that's totally beside the point) that must be performed between sections 2 and 3.
Sometimes in the past, I've just used lazy-evaluation to construct the data structures (in this case, it'd be on the first invocation of the "interpolateAt" method), but then if the user calls the "field.add()" method again, I have to completely discard those data structures and start over from scratch.
In other projects, I've required the user to explicitly call an "object.flip()" method, to switch from "append mode" into "query mode". The nice this about a design like this is that the user has better control over the exact moment when the hard-core computation starts. But it can be a nuisance for the API consumer to keep track of the object's current mode. And besides, in the standard use case, the caller never adds another value to the collection after starting to issue queries; data-aggregation almost always fully precedes query preparation.
How have you guys handled designing a data structure like this?
Do you prefer to let an object lazily perform its heavy-duty analysis, throwing away the intermediate data structures when new data comes into the collection? Or do you require the programmer to explicitly flip the data structure from from append-mode into query-mode?
And do you know of any naming convention for objects like this? Is there a pattern I'm not thinking of?
ON EDIT:
There seems to be some confusion and curiosity about the class I used in my example, named "ContinuousScalarField".
You can get a pretty good idea for what I'm talking about by reading these wikipedia pages:
http://en.wikipedia.org/wiki/Scalar_field
http://en.wikipedia.org/wiki/Vector_field
Let's say you wanted to create a topographical map (this is not my exact problem, but it's conceptually very similar). So you take a thousand altitude measurements over an area of one square mile, but your survey equipment has a margin of error of plus-or-minus 10 meters in elevation.
Once you've gathered all the data points, you feed them into a model which not only interpolates the values, but also takes into account the error of each measurement.
To draw your topo map, you query the model for the elevation of each point where you want to draw a pixel.
As for the question of whether a single class should be responsible for both appending and handling queries, I'm not 100% sure, but I think so.
Here's a similar example: HashMap and TreeMap classes allow objects to be both added and queried. There aren't separate interfaces for adding and querying.
Both classes are also similar to my example, because the internal data structures have to be maintained on an ongoing basis in order to support the query mechanism. The HashMap class has to periodically allocate new memory, re-hash all objects, and move objects from the old memory to the new memory. A TreeMap has to continually maintain tree balance, using the red-black-tree data structure.
The only difference is that my class will perform optimally if it can perform all of its calculations once it knows the data set is closed.
If an object has two modes like this, I would suggest exposing two interfaces to the client. If the object is in append mode, then you make sure that the client can only ever use the IAppendable implementation. To flip to query mode, you add a method to IAppendable such as AsQueryable. To flip back, call IQueryable.AsAppendable.
You can implement IAppendable and IQueryable on the same object, and keep track of the state in the same way internally, but having two interfaces makes it clear to the client what state the object is in, and forces the client to deliberately make the (expensive) switch.
I generally prefer to have an explicit change, rather than lazily recomputing the result. This approach makes the performance of the utility more predictable, and it reduces the amount of work I have to do to provide a good user experience. For example, if this occurs in a UI, where do I have to worry about popping up an hourglass, etc.? Which operations are going to block for a variable amount of time, and need to be performed in a background thread?
That said, rather than explicitly changing the state of one instance, I would recommend the Builder Pattern to produce a new object. For example, you might have an aggregator object that does a small amount of work as you add each sample. Then instead of your proposed void flip() method, I'd have a Interpolator interpolator() method that gets a copy of the current aggregation and performs all your heavy-duty math. Your interpolateAt method would be on this new Interpolator object.
If your usage patterns warrant, you could do simple caching by keeping a reference to the interpolator you create, and return it to multiple callers, only clearing it when the aggregator is modified.
This separation of responsibilities can help yield more maintainable and reusable object-oriented programs. An object that can return a Measurement at a requested Point is very abstract, and perhaps a lot of clients could use your Interpolator as one strategy implementing a more general interface.
I think that the analogy you added is misleading. Consider an alternative analogy:
Key[] data = new Key[...];
data[idx++] = new Key(...); /* Fast! */
...
Arrays.sort(data); /* Slow! */
...
boolean contains = Arrays.binarySearch(data, datum) >= 0; /* Fast! */
This can work like a set, and actually, it gives better performance than Set implementations (which are implemented with hash tables or balanced trees).
A balanced tree can be seen as an efficient implementation of insertion sort. After every insertion, the tree is in a sorted state. The predictable time requirements of a balanced tree are due to the fact the cost of sorting is spread over each insertion, rather than happening on some queries and not others.
The rehashing of hash tables does result in less consistent performance, and because of that, aren't appropriate for certain applications (perhaps a real-time microcontroller). But even the rehashing operation depends only on the load factor of the table, not the pattern of insertion and query operations.
For your analogy to hold strictly, you would have to "sort" (do the hairy math) your aggregator with each point you add. But it sounds like that would be cost prohibitive, and that leads to the builder or factory method patterns. This makes it clear to your clients when they need to be prepared for the lengthy "sort" operation.
Your objects should have one role and responsibility. In your case should the ContinuousScalarField be responsible for interpolating?
Perhaps you might be better off doing something like:
IInterpolator interpolator = field.GetInterpolator();
Measurement measurement = Interpolator.InterpolateAt(...);
I hope this makes sense, but without fully understanding your problem domain it's hard to give you a more coherent answer.
"I've just used lazy-evaluation to construct the data structures" -- Good
"if the user calls the "field.add()" method again, I have to completely discard those data structures and start over from scratch." -- Interesting
"in the standard use case, the caller never adds another value to the collection after starting to issue queries" -- Whoops, false alarm, actually not interesting.
Since lazy eval fits your use case, stick with it. That's a very heavily used model because it is so delightfully reliable and fits most use cases very well.
The only reason for rethinking this is (a) the use case change (mixed adding and interpolation), or (b) performance optimization.
Since use case changes are unlikely, you might consider the performance implications of breaking up interpolation. For example, during idle time, can you precompute some values? Or with each add is there a summary you can update?
Also, a highly stateful (and not very meaningful) flip method isn't so useful to clients of your class. However, breaking interpolation into two parts might still be helpful to them -- and help you with optimization and state management.
You could, for example, break interpolation into two methods.
public void interpolateAt( Point3d p );
public Measurement interpolatedMasurement();
This borrows the relational database Open and Fetch paradigm. Opening a cursor can do a lot of preliminary work, and may start executing the query, you don't know. Fetching the first row may do all the work, or execute the prepared query, or simply fetch the first buffered row. You don't really know. You only know that it's a two part operation. The RDBMS developers are free to optimize as they see fit.
Do you prefer to let an object lazily perform its heavy-duty analysis,
throwing away the intermediate data structures when new data comes
into the collection? Or do you require the programmer to explicitly
flip the data structure from from append-mode into query-mode?
I prefer using data structures that allow me to incrementally add to it with "a little more work" per addition, and to incrementally pull the data I need with "a little more work" per extraction.
Perhaps if you do some "interpolate_at()" call in the upper-right corner of your region, you only need to do calculations involving the points in that upper-right corner,
and it doesn't hurt anything to leave the other 3 quadrants "open" to new additions.
(And so on down the recursive KDTree).
Alas, that's not always possible -- sometimes the only way to add more data is to throw away all the previous intermediate and final results, and re-calculate everything again from scratch.
The people who use the interfaces I design -- in particular, me -- are human and fallible.
So I don't like using objects where those people must remember to do things in a certain way, or else things go wrong -- because I'm always forgetting those things.
If an object must be in the "post-calculation state" before getting data out of it,
i.e. some "do_calculations()" function must be run before the interpolateAt() function gets valid data,
I much prefer letting the interpolateAt() function check if it's already in that state,
running "do_calculations()" and updating the state of the object if necessary,
and then returning the results I expected.
Sometimes I hear people describe such a data structure as "freeze" the data or "crystallize" the data or "compile" or "put the data into an immutable data structure".
One example is converting a (mutable) StringBuilder or StringBuffer into an (immutable) String.
I can imagine that for some kinds of analysis, you expect to have all the data ahead of time,
and pulling out some interpolated value before all the data has put in would give wrong results.
In that case,
I'd prefer to set things up such that the "add_data()" function fails or throws an exception
if it (incorrectly) gets called after any interpolateAt() call.
I would consider defining a lazily-evaluated "interpolated_point" object that doesn't really evaluate the data right away, but only tells that program that sometime in the future that data at that point will be required.
The collection isn't actually frozen, so it's OK to continue adding more data to it,
up until the point something actually extract the first real value from some "interpolated_point" object,
which internally triggers the "do_calculations()" function and freezes the object.
It might speed things up if you know not only all the data, but also all the points that need to be interpolated, all ahead of time.
Then you can throw away data that is "far away" from the interpolated points,
and only do the heavy-duty calculations in regions "near" the interpolated points.
For other kinds of analysis, you do the best you can with the data you have, but when more data comes in later, you want to use that new data in your later analysis.
If the only way to do that is to throw away all the intermediate results and recalculate everything from scratch, then that's what you have to do.
(And it's best if the object automatically handled this, rather than requiring people to remember to call some "clear_cache()" and "do_calculations()" function every time).
You could have a state variable. Have a method for starting the high level processing, which will only work if the STATE is in SECTION-1. It will set the state to SECTION-2, and then to SECTION-3 when it is done computing. If there's a request to the program to interpolate a given point, it will check if the state is SECTION-3. If not, it will request the computations to begin, and then interpolate the given data.
This way, you accomplish both - the program will perform its computations at the first request to interpolate a point, but can also be requested to do so earlier. This would be convenient if you wanted to run the computations overnight, for example, without needing to request an interpolation.