How do you go about splitting a class with TDD? - refactoring

I feel pretty skilled in TDD, and I'm even consired the "TDD expert" in my company, but nevertheless, there are some cases that I feel I don't know how to handle properly, so I would like to hear other's opinions.
My problems is as follows:
Even though in general TDD helps me think of the core responsibility of a class, and extract every other responsibility to dependent classes, there are cases that after some time I realize that one of the classes has multiple responsibilities and it needs to be refactored and split it into 2 classes. This conclusion often comes because the tests of that class start to become complicated or repetitive. I can pretty easily do refactoring to split this class to the design I want (and I do it in small steps, keeping on the green bar). My problem is that I end up with the same complicated and repetitive tests that now tests the 2 classes together, while I would like to have seperate tests for each class.
The only (more-or-less safe) manner I could think of for doing that, is to do the following for each test (after I completed the refactoring of the production code):
Duplicate the test case
Change one copy of the test to use a mock instead of the 1st class, and the other copy of the test to use a mock instead of the 2nd class.
Then if I see that an identical test already exists for one of the copies, I delete it.
I think that sometimes its possible to do the following:
start by creating the 2 classes from scratch (using TDD of course)
Change the old tests to use the new classes instead of the old one
Delete the old class
Delete the old tests
Both of these techniques seems pretty cumbersome and time consuming, so I wonder: how do the "real experts" go about this issue?

Without an actual example I can't be sure I know what exactly you mean. But it sounds like you try to test every class (and maybe even every method) in isolation.
When I get to a point where I want to/have to split a class into multiple classes, I tend to still view the resulting collection of classes as a unit and test it as a whole. Only when they stop building a functional whole and start to become independent units, I test them independently of each other.

I can pretty easily do refactoring to split this class to the design I
want (and I do it in small steps, keeping on the green bar). My
problem is that I end up with the same complicated and repetitive
tests that now tests the 2 classes together, while I would like to
have seperate tests for each class.
I've gotten to this point as well. Here I start refactoring the tests, using the same techniques as for the non-test code - convert variable to field, move field, extract method, move variable, etc etc. Naming is of course very important and provides a lot of design guidance.
eg http://www.kdgregory.com/index.php?page=junit.refactoring
eg http://www.natpryce.com/articles/000686.html
eg http://www-public.it-sudparis.eu/~gibson/Teaching/CSC7302/ReadingMaterial/vanDeursenMdenBK01.pdf
That last article has some example smells and refactorings common to refactoring tests specifically.

I start with asking myself (as you have) what are the responsibilities of a class. Let's say for example that your class is responsible to aggregate weather data and generate a weather report.
At this point I make three (3) lists:
Data aggregation members (attributes, behaviors)
Report generation members
Common members
The first two are easy, the members that exclusively belong in one class exclusively become part of one of the two new classes. I will keep the original dual-responsibility class as a facade, whose members are a pass-through to the new classes, so that tests and functionality will not be broken while refactoring. Depending on circumstances, I may eventually remove the facade, and refactor the tests and dependent objects to use the new classes.
As for the members that are common to both responsibilities - I will move them to a helper class (usually scoped as internal), that the new classes (and any others may use). The functionality has proven to be reused, and may be reused again. Note that the common members might not necessarily all land in one helper class; the helping functionality might be added to one new class, multiple (depending, of course on responsibilities) classes, and some functionality may be added to existing helper classes, if one fits the bill.

I wondered about this a while back, and couldn't really find a satisfactory answer. Here are some discussions I found on the topic:
http://tech.groups.yahoo.com/group/testdrivendevelopment/message/27199
and
http://tech.groups.yahoo.com/group/testdrivendevelopment/message/16227
Personally, I've adopted a "hair-trigger" approach to moving responsibilites into dependencies, and while "spinning off" a new dependency before there is a clear need for it smacks of YAGNI, I've found that re-absorbing a dependency that turned out to be too anemic to warrant being a separate class is much easier than the rigmarole involved with splitting out a separate class from a class that already has a significant battery of tests written for it.
Edit:
Oh - and I should probably point out that I'm not at all a "real expert" ;)

Related

Is there such a thing as ‘class bloat’ - i.e. too many classes causing inefficiencies?

E.g. let’s consider I have the following classes:
Item
ItemProperty which would include objects such as Colour and Size. There's a relation-property of the Item class which lists all of the ItemProperty objects applicable to this Item (i.e. for one item you might need to specify the Colour and for another you might want to specify the Size).
ItemPropertyOption would include objects such as Red, Green (for Colour) and Big, Small (for Size).
Then an Item Object would relate to an ItemProperty, whereas an ItemChoice Object would relate to an ItemPropertyOption (and the ItemProperty which the ItemPropertyOption refers to could be inferred).
The reason for this is so I could then make use of queries much more effectively. i.e. give me all item-choices which are Red. It would also allow me to use the Parse Dashboard to quickly add elements to the site as I could easily specify more ItemPropertys and ItemPropertyOptions, rather than having to add them in the codebase.
This is just a small example and there's many more instances where I'd like to use classes so that 'options' for various drop-downs in forms are in the database and can easily be added and edited by me, rather than hard-coded.
1) I’ll probably be doing this in a similar way for 5+ more similar kinds of class-structures
2) there could be hundreds of nested properties that I want to access via ‘inverse querying’
So, I can think of 2 potential causes of inefficiency and wanted to know if they’re founded:
Is having lots of classes inefficient?
Is back-querying against nested classes inefficient?
The other option I can think of — if ‘class-bloat’ really is a problem — is to make fields on parent classes that, instead of being nested across other classes (that represent further properties, as above), just representing them as a nested JSON property directly.
The job of designing is to render in object descriptions truths about the world that are relevent to the system's requirements. In the world of the OP's "items", it's a fact that items have color, and it's a relevant fact because users care about an item's color. You'd only call a system inefficient if it consumes computing resources that it doesn't need to consume.
So, for something like a configurator, the fact that we have items, and that those items have properties, and those properties have an enumerable set of possible values sounds like a perfectly rational design.
Is it inefficient or "bloated"? The only place I'd raise doubt is in the explicit assertion that items have properties. Of course they do, but that's natively true of javascript objects and parse entities.
In other words, you might be able to get along with just item and several flavors of propertyOptions: e.g. Item has an attribute called "colorProperty" that is a pointer to an instance of "ColorProperty" (whose instances have a name property like 'red', 'green', etc. and maybe describe other pertinent facts, like a more precise description in RGB form).
There's nothing wrong with lots of classes if they represent relevant truth. Do that first. You might discover empirically that your design is too resource consumptive (I doubt you will in this case), at which point we'd start looking for cheats to be somehow skinnier. But do it the right way first, cheat later only if you must.
Is having lots of classes inefficient?
It's certainly inefficient for poor humans who have to remember what all those classes do and how they're related to each other. It takes time to write all those classes in the first place, and every line that you write is a line that has to be maintained.
Beyond that, there's certainly some cost for each class in any OOP language, and creating more classes than you really need will mean that you're paying more than you need to for the work that you're doing, which is pretty much the definition of inefficient.
I’ll probably be doing this in a similar way for 5+ more similar kinds of class-structures
Maybe you could spend some time thinking about the similarity between these cases and come up with a single set of more flexible classes that you can use in all those cases. Writing general code is harder than writing very specific code, but if you do a good job you'll recoup the extra effort many times over through reuse.

TDD vs Defensive Programming

Uncle Bob says:
"Defensive programming, in non-public APIs, is a smell, and a symptom, of teams that don't do TDD."
I am wondering how TDD can avoid an (internal) function to be used in an unintended way? I think TDD can´t avoid it. It merely shows that the function is used correctly because a calling function is covered by it´s passing unit tests.
When developing a new feature using the (undefensive) function this feature is also developed with TDD. So unintended use of the function will fail the new features tests.
So using TDD to drive new features will force you to correcty use (internal) functions.
Do you think that is what is meant by Uncle Bob´s tweet?
So using TDD to drive new features will force you to correctly use (internal) functions.
Exactly. But keep in mind the subtle "gap" here: you should use TDD to write (unit) tests that test the contract of your public methods. You do not care about the implementation of these methods - that is all internal implementation detail.
Therefore: if your "new" code uses an existing method in an unintended way you are "told" because an exception is thrown or you receive an unexpected result.
That is what I mean by "gap": you see, the above describes a black box testing approach. You have a public method X, and you verify its public contract. Compare that to white box testing where you write tests to cover all paths taken within X. When doing that, you could notice: "ok to test that one condition in my internal method, I would have to drive this special data".
But as said - I think you should go for black box testing - white box tests might break easily when refactoring internal methods.
And there is an additional dimension here: keep in mind that ideally you change code in order to implement new features. This means that adding new features only takes place by writing new classes and methods. This means that your new code has no chance using private internal methods. Because you are within a new class. In other words: when you regularly happen to run into situations where your internal methods are used in many different ways - then you are probably doing something wrong.
The ideal path is: you implement a new requirement by creating a set of new classes. Later on, you have to add other requirements - by writing more classes.
In that ideal path - there is no need for defensive programming within internal methods. Because you exactly understand each use case for such internal methods!
Thus, the conclusion is: avoid defensive programming in internal methods. Make sure that your public APIs check all pre-conditions, so they fail (as fast as possible) if there is a problem. Try to avoid these internal consistency checks - as they tend to bloat your code - and rest assured: in 5 weeks or 5 months you will not remember if you really needed that check, or if it is just "defensive".
One way to answer this is to look at what else Uncle Bob has had to say on the topic. For example:
In a system with meager code coverage, few tests, and lots of tangled legacy code, defensive programming should be the rule.
In a system born of TDD, with 90+% coverage and highly reliable, well-maintained unit tests, defensive programming should be the exception.
From this, we can infer his main argument -- if the defensive checks are actually providing a benefit, then that is a hint that we are missing some constraints. If we are missing some constraints, and all the tests are passing, then we must also be missing some tests.
Or, to express the same idea in a slightly different way -- the constraints implied by the defensive patterns in your implementation belong closer to the boundary (ie, in the public API).
If there are constraints, for example, to limit what data is allowed to pass through the boundary, then there should be tests to ensure that the boundary actually implements the constraints.
When you use TDD properly, you cover all the possible cases and assert that your public functions that call the private ones do respond properly as expected not only for the happy scenario, but for all different possible scenarios. When you use defending programing in your private methods, you are actually getting yourself ready for these (different possible) scenarios mentioned above.
I, personally, do not think defending programing is bad even if it is in private methods, however, based on my description above I see it is a double effort that is unnecessary and also, it eliminates the importance of the TTD because you are handling these special cases in your application by complicating the code, instead of writing it a way that is proof.

Tdd workflow when you find Single Responsibility Principle violation

I trying to follow TDD. So here is my problem
I have interface Risk with method
boolean check(...)
Risk1, and Risk2 are implentations deveped test first, so now they are fully covered.
I decided that unit that check all risks (CompositeRisk) also could implement Risk.
CompositeRisk applies OR on each Risk1 and Risk2 rezult (If one risk is true then whole this is risky). Still everything is test first.
Now I am looking to one of the risk and thinking - this one has word "AND" and checks different fields. It seems that I can split it to two object and create one more CompositeAndRisk which would apply And on both splitted risks. This way I could construct DSL for risks decision tree (seems nice because risks rules could changes a lot).
So what I should do with risk's I split tests? Should I rename i to CompositeAndRiskTest? should I delete it?, should I write test for splitClasses?
First of all, I suggest that you turn the CompositeRisk class into an interface, and have two separate subclasses of it: CompositeOrRisk and CompositeAndRisk. This is just about the design though.
Regarding your question, I believe there's no single right answer, so let me share how I see it.
As you know, in TDD there are concrete steps you follow (that comprise the TDD cycle), and there's a specific state the tests should be at in between each of them. Here's what I mean:
[State = No tests]
1. Write a test that fails
[State = Test fails]
2. Write as little code as possible in order for the test to pass
[State = Test passes]
3. Refactor
[State = Test still passes]
Given that this is what we aim for in TDD, I would do the changes you're talking about in the refactoring phase, including refactoring the tests accordingly.
This means that if I'm splitting a class, I'll be splitting the relevant test as well. At no point should the tests fail, as I'm only changing the structure of the code, not what it does (this is the meaning of refactoring after all).
If you have a larger change to do though, I would go about creating a new class from scratch (TDD of course), and later on, remove the no longer needed functionality from the old class, as well as the now redundant test cases.
The approach I'd take in this case is "play it innocent" -- when you discover a new requirement, just write a test and the implementation for it, pretending to ignore the relationship with previous requirements at first.
The "And" case here is clearly new functionality. No need to modify the contents of the existing test at that point, just create another test with a name that reflects the new requirement, such as CompositeAndRiskTest and create the corresponding implementation.
Then, during the Refactor step, "realize" that the two previous objects are 2 sides of the same coin and refactor them accordingly. That could just mean renaming CompositeRisk to CompositeOrRisk, or more complex things.
Once the 2 sorts of Risks are identified, tested and implemented, you could go on and create new tests for combinations of them.

Refactoring methods in existing code base with huge number of parameters

I have inherited an existing code base where the "features" are as follows:
huge monolithic classes with
(literally) 100's of member variables
and methods that go one for pages
(er. screens)
public and private methods with a large number of arguments.
I am trying to clean up and refactor the code, to leave it a little better
than how I found it. So my questions
is worth it (or do you) refactor methods with 10 or so arguments so that they are more readable ?
are there best practices on how long methods should be ? How long do you usually keep them?
are monolithic classes bad ?
is worth it (or do you) refactor methods with 10 or so arguments so that they are more readable ?
Yes, it is worth it. It is typically more important to refactor methods that are not "reasonable" than ones that already are nice, short, and have a small argument list.
Typically, if you have many arguments, it's because a method does too much - most likely, it should be a class of it's own, not a method.
That being said, in those cases when many parameters are required, it's best to encapsulate the parameters into a single class (ie: SpecificAlgorithmOptions), and pass one instance of that class. This way, you can provide clean defaults, and its very obvious which methods are essential vs. optional (based on what is required to construct the options class).
are there best practices on how long methods should be ? How long do you usually keep them?
A method should be as short as possible. It should have one purpose, and be used for one task, whenver possible. If it's possible to split it into separate methods, where each as a real, qualitative "task", then do so when refactoring.
are monolithic classes bad ?
Yes.
if the code is working and there is no need to touch it, i wouldn't refactor. i only refactor very problematic cases if i anyway have to touch them (either for extending them for functionality or bug-fixing). I favor the pragmatic way: Only (in 95%) touch, what you change.
Some first thoughts on your specific problem (though in detail it is difficult without knowing the code):
start to group instance variables, these groups will then be target to do 'extract class'
when having grouped these variables you hopefully can group some methods, which also be moved when doing 'extract class'
often there are many methods which aren't using any fields. make them static (they most likely are helper methods, which can be extracted to helper-classes.
in case non-related instance fields are mixed in many methods, do loads of 'extract method'
use automatic refactoring tools as much as possible, because you most likely have no tests in place and automation is more safe.
Regarding your other concrete questions.
is worth it (or do you) refactor methods with 10 or so arguments so that they are more readable?
definetely. 10 parameters are too many to grasp for us humans. most likely the method is doing too much.
are there best practices on how long methods should be ? How long do you usually keep them?
it depends... on preferences. i stated some things on this thread (though the question was PHP). still i would apply these numbers/metrics to any language.
are monolithic classes bad ?
it depends, what you mean with monolithic. if you mean many instance variables, endless methods, a lot of if/else complexity, yes.
also have a look at a real gem (to me a must have for every developer): working effectively with legacy code
Assuming the code is functioning I would suggest you think about these questions first:
is the code well documented?
do you understand the code?
how often are new features being added?
how often are bugs reported and fixed?
how difficult is it to modify and fix the code?
what is the expected life of the code?
how many versions of the compiler are you behind (if at all)?
is the OS it runs on expected to change during its lifetime?
If the system will be replaced in five years, is documented well, will undergo few changes, and bugs are easy to fix - leave it alone regardless of the size of the classes and the number of parameters. If you are determined to refactor make a list of your refactoring proposals in the order of maximum benefit with minimum changes and attack it incrementally.

How do YOU factor your Domain (namespaces), in Domain Driven Design?

How do YOU factor your Domain (namespaces), in Domain Driven Design?
I have been moving to the following concept:
Project.Entity
Project.Entity.Abstracts
Project.Entity.Entities
Project.Entity.Extensions
Project.Entity.Immutables
Project.Entity.Interfaces
Project.Entity.Repositories
For example, I have an entity in a CMS called "Content". So, I would create a project called Project.Content, and factor the classes to look like:
interface IContent
class Content : IContent
interface IContentRepository
class ContentRepository : IContentRepository
This "Content" Entity model would have its own namespace.
But, I am finding it does not scale well in a large Enterprise environment with well over a dozen projects (try 18) of "Entity" models. I end up with a solution with over a dozen projects, some of which only have 2 or 3 classes (i.e. UrlRewriter). Also, I find myself referencing other projects just for their Interfaces. I feel like this is poluting my domain; while not concret references, it's sometimes difficult to keep from circular references.
So, I fall back to the "Layer" concept at times...
I am wanting to know how other DDD experts are factoring Enterprise-size applications. Please feel free to recommend books and articles.
And thanks in advance!
One think that I do is to add something that identifies the bounded context to it.
Ps. to make sure it is clear why, check both links on bounded context:
http://dddcommunity.org/discussion/messageboardarchive/BoundedContext.html, http://devlicio.us/blogs/casey/archive/2009/02/11/ddd-bounded-contexts.aspx
I use follow the .NET guidelines. I find them very intuitive and they allow you to setup namespaces such that you don't need to import anything you don't need.
I would never impose a strict naming convention for the feature level. The design of each different project should guide that.
I similarily to you have found out that having load of projects becomes a pain to manage.
I prefer the
Project.Domain
Project.DataAccess
Project.Presentation (presenters and such)
Project.Gui (in case of a winforms app)
setup.
In a way making things simple helps a lot when things go bad.
The question is what do you gain when you create another project ? (it is very easy to do so, almost to easy)
Will you ever want to use that project independently or not ? You might end up with the resulting .dlls so coupled you can't even deploy them without being exactly the same versions etc. in that case there is little reason for splitting it up and cluttering your IDE)
You can always move things to a new project later if the need arises, it is somewhat painfull, but by that time you would have a good reason to do it apart from just the feeling that is the way it is done.

Resources