Unit testing: is "good enough" good enough? - tdd

I have a unit test that can spontaneously fail 1 in 1,000,000 (guesstimation) times even when there is no fault in the code. Is this an acceptable tolerance or does the TDD manifesto requires iron fisted absoluteness?
Just for those who are interested it goes something like this
stuff = randomCrap.get()
stuff2 = randomCrap.get()
assert(stuff != stuff2)

Well, that really depends on the source of the failure. Do you know why it fails? If so, have you tried to isolate that fault so that it doesn't trip up the unit test?
Personally, I'd say that if it's genuinely 1 in a million and you know why it's happening, then add a comment to that effect and don't worry about it. It's not likely to bother people significantly in a continuous build, after all. Of course, if it's really one in ten, or something like that, that's a very different matter.
I would at least try to remove the source of incorrectness though. For one thing, it suggests your test isn't repeatable. Sometimes that's okay - there are some sources of randomness which are very difficult to extract out - but I would try not to do it. If you've tried and reached a block, then the pragmatic thing to do is document it and move on, IMO.

The question isn't "does TDD permit me to allow an occasionally failing test?" but, rather: "do my system's requirements permit occasional failures?" And that's a question only you can answer.
From the TDD perspective, it's clumsy to have tests which fail occasionally - you don't know, when they fail, whether it's because this is one of those rare permissible failures, or whether it's because your code is broken in an unacceptable way. So an occasionally failing test is significantly less useful to you than one which always passes.
If your requirement is to have a different behavior one time out of a million, then you should test to that requirement. Test the general case, not with a random number, but with a meaningful subset of valid inputs. Test the special case with the value that should bring about the special behavior.

What is your requirement for how often before this error should happen?
Are you comfortable, and your functional users, with what is happening?
You may want to make certain it is written down that your random generation will repeat, sometimes in duplicate requests.
To me that is the more troubling part, that it can happen as you showed above, and I would think that this is something to look into.

A unit test cannot fail if there is no fault in the code. There is a fault, be it related to timing, networking, etc... you just haven't figured it out yet. Once you do figure it out, you just learned something that you didn't know. +1 to you.
The real problem if it is not fixed is psychological. That method will tend to get blamed whenever there is something strange/random/unexplained that happens in the system. Better to fix the red herring now when you are thinking about it.
And FYI, randomness does not imply unique.

Do you absolutely have to use random data? Are you testing a system that is made to return two distinct random values?
Otherwise create a stub.
Unit tests should be 100% repeatable, this is hard to do when for example using threads of file system stuff, but that is what stubs are for.

Your test asserts that stuff1 is never equal to stuff2, but seeing the test fail occasionally means that this is not true.
You might be able to assert that stuff1 is occasionally equal to stuff2 by taking a million samples and asserting that the frequency of being equal is less than 10, but still this will fail occasionally - but perhaps much less often which might be acceptable.
You might be better off with:
stuff = 4
stuff2 = 5
assert(stuff != stuff2)
You can be pretty certain that the above code will perform the same as your original code once in a million million times - but you are certain that this code will pass every time!

Related

What failure modes can TDD leave behind?

Please note I have not yet 'seen the light' on TDD nor truly got why it has all of the benefits evangelised by its main proponents. I'm not dismissing it - I just have my reservations which are probably born of ignorance. So by all means laugh at the questions below, so long as you can correct me :-)
Can using TDD leave yourself open to unintended side-effects of your implementation? The concept of "the least amount of code to satisfy a test" suggests thinking in the narrowest terms about a particular problem without necessarily contemplating the bigger picture.
I'm thinking of objects that hold or depend upon state (e.g. internal field values). If you have tests which instantiate an object in isolation, initialise that object and then call the method under test, how would you spot that a different method has left behind an invalid state that would adversely affect the behaviour of the first method? If I have understood matters correctly, then you shouldn't rely on order of test execution.
Other failures I can imagine cover the non-closure of streams, non-disposal of GDI+ objects and the like.
Is this even TDD's problem domain, or should integration and system testing catch such issues?
Thanks in anticipation....
Some of this is in the domain of TDD.
Dan North says there is no such thing as test-driven development; that what we're really doing is example-driven development, and the examples become regression tests only once the system under test has been implemented.
This means that as you are designing a piece of code, you consider example scenarios and set up tests for each of those cases. Those cases should include the possibility that data is not valid, without considering why the data might be invalid.
Something like closing a stream can and should absolutely be covered when practicing TDD.
We use constructs like functions not only to reduce duplication but to encapsulate functionality. We reduce side effects by maintaining that encapsulation. I'd argue that we consider the bigger picture from a design perspective, but when it comes to implementing a method, we should be able to narrow our focus to that scope -- that unit of functionality. When we start juggling externalities is when we are likely to introduce defects.
That's my take, anyway; others may see it differently.
TDD is not a replacement for being smart. The best programmers become even better with TDD. The worst programmers are still terrible.
The fact that you are asking these questions is a good sign: it means you're serious about doing programming well.
The concept of "the least amount of
code to satisfy a test" suggests
thinking in the narrowest terms about
a particular problem without
necessarily contemplating the bigger
picture.
It's easy to take that attitude, just like "I don't need to test this; I'm sure it just works." Both are naive.
This is really about taking small steps, not about calling it quits early. You're still going after a great final result, but along the way you are careful to justify and verify each bit of code you write, with a test.
The immediate goal of TDD is pretty narrow: "how can I be sure that the code I'm writing does what I intend it to do?" If you have other questions you want to answer (like, "will this go over well in Ghana?" and "is my program fast enough?") then you'll need different approaches to answer them.
I'm thinking of objects that hold or
depend upon state.
how would you spot that a different
method has left behind an invalid
state?
Dependencies and state are troublesome. They make for subtle bugs that appear at the worst times. They make refactoring and future enhancement harder. And they make unit testing infeasible.
Luckily, TDD is great at helping you produce code that isolates your logic from dependencies and state. That's the second "D" in "TDD".
The concept of "the least amount of
code to satisfy a test" suggests
thinking in the narrowest terms about
a particular problem without
necessarily contemplating the bigger
picture.
It suggests that, but that isn't what it means. What it means is powerful blinders for the moment. The bigger picture is there, but interferes with the immediate task at hand - so focus entirely on that immediate task, and then worry about what comes next. The big picture is present, is accounted for in TDD, but we suspend attention to it during the Red phase. So long as there is a failing test, our job is to get that test to pass. Once it, and all the other tests, are passing, then it's time to think about the big picture, to look at shortcomings, to anticipate new failure modes, new inputs - and write a test to express them. That puts us back into Red, and re-narrows our focus. Get the new test to pass, then set aside the blinders for the next step forward.
Yes, TDD gives us blinders. But it doesn't blind us.
Good questions.
Here's my two cents, based on my personal experience:
Can using TDD leave yourself open to
unintended side-effects of your
implementation?
Yes, it does. TDD is not a "fully-fledged" option. It should be used along with other techniques, and you should definitely bear in mind the big picture (whether you are responsible of it or not).
I'm thinking of objects that hold or
depend upon state (e.g. internal field
values). If you have tests which
instantiate an object in isolation,
initialise that object and then call
the method under test, how would you
spot that a different method has left
behind an invalid state that would
adversely affect the behaviour of the
first method? If I have understood
matters correctly, then you shouldn't
rely on order of test execution.
Every test method should execute with no regard of what was executed before, or will be executed after. If that's not the case then something's wrong (from a TDD perspective on things).
Talking about your example, when you write a test you should know with a reasonable detail what your inputs will be and what are the expected outputs. You start from a defined input, in a defined state, and you check for a desired output. You're not 100% guaranteed that the same method in another state will do it's job without errors. However the "unexpected" should be reduced to a minimum.
If you design the class you should definitely know if two methods can change some shared internal state and how; and more important, if this should really happen at all, or if there is a problem about low cohesion.
Anyway a good design at the "tdd" level doesn't necessarily means that your software is well Built, you need more as Uncle Bob explains well here:
http://blog.objectmentor.com/articles/2007/10/17/tdd-with-acceptance-tests-and-unit-tests
Martin Fowler wrote an interesting article about Mocks vs Stubs test which covers some of the topics you are talking about:
http://martinfowler.com/articles/mocksArentStubs.html#ClassicalAndMockistTesting

How to develop complex methods with TDD

A few weeks ago I started my first project with TDD. Up to now, I have only read one book about it.
My main concern: How to write tests for complex methods/classes. I wrote a class that calculates a binomial distribution. Thus, a method of this class takes n, k, and p as input, and calculates the resp. probability. (In fact it does a bit more, that's why I had to write it myself, but let's stick to this description of the class, for ease of the argument.)
What I did to test this method is: copying some tables with different n I found in the web into my code, picking randomly an entry in this table, feeded the resp. values for n, k, and p into my function, and looked whether the result was near the value in the table. I repeat this a number of times for every table.
This all works well now, but after writing the test, I had to tank for a few hours to really code the functionality. From reading the book, I had the impression that I should not code longer than a few minutes, until the test shows green again. What did I do wrong here? Of course I have broken this task down in a lot of methods, but they are all private.
A related question: Was it a bad idea to pick randomly numbers from the table? In case of an error, I will display the random-seed used by this run, so that I can reproduce the bug.
I don't agree with people saying that it's ok to test private code, even if you make them into separate classes. You should test entry points to your application (or your library, if it's a library you're coding). When you test private code, you limit your re-factoring possibilities for later (because refactoring your privates classes mean refactoring your test code, which you should refrain doing). If you end up re-using this private code elsewhere, then sure, create separate classes and test them, but until you do, assume that You Ain't Gonna Need It.
To answer your question, I think that yes, in some cases, it's not a "2 minutes until you go green" situation. In those cases, I think it's ok for the tests to take a long time to go green. But most situations are "2 minutes until you go green" situations. In your case (I don't know squat about binomial distribution), you wrote you have 3 arguments, n, k and p. If you keep k and p constant, is your function any simpler to implement? If yes, you should start by creating tests that always have constant k and p. When your tests pass, introduce a new value for k, and then for p.
"I had the impression that I should not code longer than a few minutes, until the test shows green again. What did I do wrong here?"
Westphal is correct up to a point.
Some functionality starts simple and can be tested simply and coded simply.
Some functionality does not start out simple. Simple is hard to achieve. EWD says that simplicity is not valued because it is so difficult to achieve.
If your function body is hard to write, it isn't simple. This means you have to work much harder to reduce it to something simple.
After you eventually achieve simplicity, you, too, can write a book showing how simple it is.
Until you achieve simplicity, it will take a long time to write things.
"Was it a bad idea to pick randomly numbers from the table?"
Yes. If you have sample data, run your test against all the sample data. Use a loop or something, and test everything you can possibly test.
Don't select one row -- randomly or otherwise, select all rows.
You should TDD using baby steps. Try thinking of tests that will require less code to be written. Then write the code. Then write another test, and so on.
Try to break your problem into smaller problems (you probably used some other methods to have your code completed). You could TDD these smaller methods.
--EDIT - based on the comments
Testing private methods is not necessarily a bad stuff. They sometimes really contain implementation details, but sometimes they might also act like an interface (in this case, you could follow my suggestion next paragraph).
One other option is to create other classes (implemented with interfaces that are injected) to take some of the responsibilities (maybe some of those smaller methods), and test them separately, and mock them when testing your main class.
Finally, I don't see spending more time coding as a really big problem. Some problems are really more complex to implement than to test, and require much thinking time.
You are correct about short quick refactors, I rarely go more than a few minutes between rebuild/test no matter how complicated the change. It takes a little practice.
The test you described is more of a system test than a unit test though. A unit test tries never to test more than a single method--in order to reduce complexity you should probably break your problem down into quite a few methods.
The system test should probably be done after you have built up your functionality with small unit tests on small straight-forward methods.
Even if the methods are just taking a part of the formula out of a longer method, you get the advantage of readability (the method name should be more readable than the formula part it replaces) and if the methods are final the JIT should inline them so you don't lose any speed.
On the other hand, if your formula isn't that big, maybe you just write it all in one method and test it like you did and take the downtime--rules are made to be broken.
It's difficult to answer your question without knowing a little bit more about the things you wanted to implement. It sounds like they were not easily partinioable in testable parts. Either the functionality works as a whole or it doesn't. If this is the case, it's no wonder you tool hours to implement it.
As to your second question: Yes, I think it's a bad idea to make the test fixture random. Why did you do this in the first place? Changing the fixture changes the test.
Avoid developing complex methods with TDD until you have developed simple methods as building blocks for the more complex methods. TDD would typically be used to create a quantity of simple functionality which could be combined to produce more complex behaviour. Complex methods/classes should always be able to be broken down into simpler parts, but it is not always obvious how and is often problem specific. The test you have written sounds like it might be more of an integration test to make sure all the components work together correctly, although the complexity of the problem you describe only borders on the edge of requiring a set of components to solve it. The situation you describe sounds like this:
class A {
public doLotsOfStuff() // Call doTask1..n
private doTask1()
private doTask2()
private doTask3()
}
You will find it quite hard to develop with TDD if you start by writing a test for the greatest unit of functionality (i.e. doLotsOfStuff()). By breaking the problem down into more mangeable chunks and approaching it from the end of simplest functionality you will also be able to create more discrete tests (much more useful than tests that check for everything!). Perhaps your potential solution could be reformulated like this:
class A{
public doLotsOfStuff() // Call doTask1..n
public doTask1()
public doTask2()
public doTask3()
}
Whilst your private methods may be implementation detail that is not a reason to avoid testing them in isolation. Just like many problems a divide-and-conquer approach would prove affective here. The real question is what size is a suitably testable and maintainable chunk of functionality? Only you can answer that based on your knowledge of the problem and your own judgement of applying your abilities to the task.
I think the style of testing you have is totally appropriate for code thats primarily a computation. Rather than pick a random row from your known results table, it'd be better to just hardcode the significant edge cases. This way your tests are consistently verifying the same thing, and when one breaks you know what it was.
Yes TDD prescribes short spans from test to implementation, but what you've down is still well beyond standards you'll find in the industry. You can now rely on the code to calculate what how it should, and can refactor / extend the code with a degree of certainty that you aren't breaking it.
As you learn more testing techniques you may find different approach that shortens the red/green cycle. In the meantime, don't feel bad about it. Its a means to an end, not an end in itself.

What do you think about the omnipresent "Test, Test, Test!" principle?

In the old days programming used to involve less guesswork. I would write some lines of code and be 100% certain about what the code does and what it does not at a glance. Errors were mostly typos, but not about the functionality.
The last years I believe there is a trend for this "trial-and-error" programming : write the code (as if in draft), and then debug iteratively until the program's behavior appears to comply with the requirements. Test, and test again, and then again.
Funny thing is, in my Visual Studio the "Run" button has been replaced by a button labelled "Debug" (= I know you have some bugs!). I have to admit that in several apps that I write I cannot guarantee a bug-free code.
What do you think ? Or maybe our systems are now overly complicated (browser/OS/Service Pack compatibilities, etc etc) and this justifies testing on all types of environments.
I've experienced the opposite, actually. Whereas it used to be a case of running until it worked, I now unit test until the tests pass... and this seems to be at least a reasonably common transition, as far as I can see.
I have to say that code which worked first time with only typos has never been the norm in my experience. The difference is that now I can find the problems much more quickly, and also spot if old problems come back. I can sometimes manage pretty short and simple bits of code with no errors (and posting on Stack Overflow has improved that ability) but large, complex systems? Heck no.
To answer the title of your post - the "test, test, test" principle is a good one, in my view... but I don't associate that with running the whole program repeatedly. I associate it with running unit tests frequently. I rarely need to use the debugger for unit tests - usually a failure makes the cause suitably obvious by inspection, because only a small amount of code is being tested.
The one word answer is "Complexity". The real answer is "Unnecessary Complexity"!
The accounting principles has not changed for the past 30 years. Why then is writing an accounting system is so much more difficult today? It is good to have a Graphic User Interface but do we have to go overboard?
Software development has been caught in a vicious circle for many years. The complexity is feeding itself and instead of reducing it we simply hide it under layers and layers of wrappers. Eventually something is going to give.
When we favor form over function, we have to pay the price.
Could it be that in later years developers have come to the realization that the "100% certainty" might not actually be correct? Developing software is very complex, and even though the tools have evolved over the years, so has our realization that writing good code is hard. True, debugging and automated unit tests have made us more productive, but we still produce bugs, just as we did back then, only now we have different tools to catch them with.
You may write code that you think you know 100% what it does and does not do, but there is always that edge case that you haven't thought of or the odd exception thrown that you don't expect. Some times trial-and-error programming can be a helpful tool to narrow down a problem, with the debuggers help.
Its important to know what tools are available to you to help produce code with minimal bugs.
I have found that the Test-Test approach helps me design the code. Sometimes the work that has to be done is too complex to do it all in one. Testing forces me to split it into smaller parts and as I solve these I am able to put them together into a larger whole.
I think the advantage comes in an indirect way: When you embrace tests and unit tests, you have to write your application in such a way that you can actually write tests:
Classes need to be written in such a way that you can instantiate a single object without the whole application and OS around it, but just a few helper objects. This means you need to minimize the dependencies, and make all communication to the surrounding system explicit.
Implementing the test cases means that you have to find a minimum sequence of commands and calls that makes your class do something meaningful. This often points to awkward design decisions, or shows you that classes are very difficult to use for certain purposes.
All in all, when you embrace tests, you end up with a system that has a minimum of interdependencies between its components, and the test cases serve as documentation of how to use your components.
Testing (executing your system) tells you something about "the presence of bugs but NOT about the absence of them" (afaik this term is coinced by dijkstra). It points to the direction that the strength of your test-suite is the key of testing: "You have so many test cases, that you can say, that many bugs do not exist. This implies that big parts of your software work as expected".
Some examples for having a strong/mighty test-suite:
A lot of code is executed by your unit tests (the traditional coverage term)
You have no false-negative tests (test which show green but in fact should be red). False negative tests are evil, because they give you a wrong sense of test-case quality. For details of good test-asserts and false-negatives see also blog-entry#1 and blog-entry#2.
The requirements are well understood (I have seen a lot of cases where an automated test was testing the wrong thing and the developer misunderstood the requirement from business). For the developer is was green, but for business the system was not working as expected (another kind of false-negative example but on a higher level).
In a sense the correctness of a program is only proven, when it is done with mathematical proofs (which only pays off for life-critical and money-intense systems). Still you can achieve a lot with automated testings (apart from unit-testing, automated integration testing always helped a lot).
Regarding debugging: I use debugging to as often as I used to be, but sometimes when adding new functionality to code (my new test-case shows green) I break other test-cases. By the assert I instantly see that something went wrong, but still didn't locate the bug. For locating the bug debugging is still helpful (with the red test-case I execute the problematic code-paths, with the debugger I locate the bug).
If you're interested in test-automation have a look at masterpiece xUnit Test patterns.
I've read one book ("TDD by example" by Kent Beck) which indeed seems to take that "trial and error" approach to an extreme: but it's more like "make the unit tests work". Still, I couldn't get myself to finish this book - a rare occurence, especially since I really hoped to get a better understanding. Still, committing obviously imbecile code to be improved later makes me shiver.
Science: Automated tests have their advantages. However, they are not the silver bullet they are claimed to be. No single test method is sufficient to findenough defects, and other methods have a better detection rate.
Gut feel: Our problems are facets of ever-increasing complexity. Complexity highly correlates with the amount of code we have to manage. In this light, TDD attempts to solve the problems of to much code by writing even more code.
Advantages: We now have an established formalism to make testing repeatable, accountable and immediately documented. It is definitely a way out of the "works on my machine" and "strange, it worked yesterday, I'll give you the latest DLL" trap.
I currently practice Test Driven Development (TDD), or at least write many unit tests to verify that most/all of my code behaves the way I expect it to behave. Taking this approach forces me to look at my program from the perspective of the consumer. Also, as I write tests, I often think of boundary limits, additional scenarios that I didn't originally envision, etc.
I've now come to the point where I'm afraid to make changes to older programs, as I'm afraid that I'll break something. Regression testing is onerous, compared with running a suite of unit tests.

Adversarial/Naive Pairing with TDD: How effective is it?

A friend of mine was explaining how they do ping-pong pairing with TDD at his workplace and he said that they take an "adversarial" approach. That is, when the test writing person hands the keyboard over to the implementer, the implementer tries to do the bare simplest (and sometimes wrong thing) to make the test pass.
For example, if they're testing a GetName() method and the test checks for "Sally", the implementation of the GetName method would simply be:
public string GetName(){
return "Sally";
}
Which would, of course, pass the test (naively).
He explains that this helps eliminate naive tests that check for specific canned values rather than testing the actual behavior or expected state of components. It also helps drive the creation of more tests and ultimately better design and fewer bugs.
It sounded good, but in a short session with him, it seemed like it took a lot longer to get through a single round of tests than otherwise and I didn't feel that a lot of extra value was gained.
Do you use this approach, and if so, have you seen it pay off?
It can be very effective.
It forces you to think more about what test you have to write to get the other programmer to write the correct functionality you require.
You build up the code piece by piece passing the keyboard frequently
It can be quite tiring and time consuming but I have found that its rare I have had to come back and fix a bug in any code that has been written like this
I've used this approach. It doesn't work with all pairs; some people are just naturally resistant and won't give it an honest chance. However, it helps you do TDD and XP properly. You want to try and add features to your codebase slowly. You don't want to write a huge monolithic test that will take lots of code to satisfy. You want a bunch of simple tests. You also want to make sure you're passing the keyboard back and forth between your pairs regularly so that both pairs are engaged. With adversarial pairing, you're doing both. Simple tests lead to simple implementations, the code is built slowly, and both people are involved throughout the whole process.
I like it some of the time - but don't use that style the entire time. Acts as a nice change of pace at times. I don't think I'd like to use the style all of the time.
I've found it a useful tool with beginners to introduce how the tests can drive the implementation though.
(First, off, Adversarial TDD should be fun. It should be an opportunity for teaching. It shouldn't be an opportunity for human dominance rituals. If there isn't the space for a bit of humor then leave the team. Sorry. Life is to short to waste in a negative environment.)
The problem here is badly named tests. If the test looked like this:
foo = new Thing("Sally")
assertEquals("Sally", foo.getName())
Then I bet it was named "testGetNameReturnsNameField". This is a bad name, but not immediately obviously so. The proper name for this test is "testGetNameReturnsSally". That is what it does. Any other name is lulling you into a false sense of security. So the test is badly named. The problem is not the code. The problem is not even the test. The problem is the name of the test.
If, instead, the tester had named the test "testGetNameReturnsSally", then it would have been immediately obvious that this is probably not testing what we want.
It is therefore the duty of the implementor to demonstrate the poor choice of the tester. It is also the duty of the implementor to write only what the tests demand of them.
So many bugs in production occur not because the code did less than expected, but because it did more. Yes, there were unit tests for all the expected cases, but there were not tests for all the special edge cases that the code did because the programmer thought "I better just do this too, we'll probably need that" and then forgot about it. That is why TDD works better than test-after. That is why we throw code away after a spike. The code might do all the things you want, but it probably does somethings you thought you needed, and then forgot about.
Force the test writer to test what they really want. Only write code to make tests pass and no more.
RandomStringUtils is your friend.
It is based on the team's personality. Every team has a personality that is the sum of its members. You have to be careful not to practice passive-aggressive implementations done with an air of superiority. Some developers are frustrated by implementations like
return "Sally";
This frustration will lead to an unsuccessful team. I was among the frustrated and did not see it pay off. I think a better approach is more oral communication making suggestions about how a test might be better implemented.

How do you unit test a unit test? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I was watching Rob Connerys webcasts on the MVCStoreFront App, and I noticed he was unit testing even the most mundane things, things like:
public Decimal DiscountPrice
{
get
{
return this.Price - this.Discount;
}
}
Would have a test like:
[TestMethod]
public void Test_DiscountPrice
{
Product p = new Product();
p.Price = 100;
p.Discount = 20;
Assert.IsEqual(p.DiscountPrice,80);
}
While, I am all for unit testing, I sometimes wonder if this form of test first development is really beneficial, for example, in a real process, you have 3-4 layers above your code (Business Request, Requirements Document, Architecture Document), where the actual defined business rule (Discount Price is Price - Discount) could be misdefined.
If that's the situation, your unit test means nothing to you.
Additionally, your unit test is another point of failure:
[TestMethod]
public void Test_DiscountPrice
{
Product p = new Product();
p.Price = 100;
p.Discount = 20;
Assert.IsEqual(p.DiscountPrice,90);
}
Now the test is flawed. Obviously in a simple test, it's no big deal, but say we were testing a complicated business rule. What do we gain here?
Fast forward two years into the application's life, when maintenance developers are maintaining it. Now the business changes its rule, and the test breaks again, some rookie developer then fixes the test incorrectly...we now have another point of failure.
All I see is more possible points of failure, with no real beneficial return, if the discount price is wrong, the test team will still find the issue, how did unit testing save any work?
What am I missing here? Please teach me to love TDD, as I'm having a hard time accepting it as useful so far. I want too, because I want to stay progressive, but it just doesn't make sense to me.
EDIT: A couple people keep mentioned that testing helps enforce the spec. It has been my experience that the spec has been wrong as well, more often than not, but maybe I'm doomed to work in an organization where the specs are written by people who shouldn't be writing specs.
First, testing is like security -- you can never be 100% sure you've got it, but each layer adds more confidence and a framework for more easily fixing the problems that remain.
Second, you can break tests into subroutines which themselves can then be tested. When you have 20 similar tests, making a (tested) subroutine means your main test is 20 simple invocations of the subroutine which is much more likely to be correct.
Third, some would argue that TDD addresses this concern. That is, if you just write 20 tests and they pass, you're not completely confident that they are actually testing anything. But if each test you wrote initially failed, and then you fixed it, then you're much more confident that it's really testing your code. IMHO this back-and-forth takes more time than it's worth, but it is a process that tries to address your concern.
A test being wrong is unlikely to break your production code. At least, not any worse than having no test at all. So it's not a "point of failure": the tests don't have to be correct in order for the product to actually work. They might have to be correct before it's signed off as working, but the process of fixing any broken tests does not endanger your implementation code.
You can think of tests, even trivial tests like these, as being a second opinion what the code is supposed to do. One opinion is the test, the other is the implementation. If they don't agree, then you know you have a problem and you look closer.
It's also useful if someone in future wants to implement the same interface from scratch. They shouldn't have to read the first implementation in order to know what Discount means, and the tests act as an unambiguous back-up to any written description of the interface you may have.
That said, you're trading off time. If there are other tests you could be writing using the time you save skipping these trivial tests, maybe they would be more valuable. It depends on your test setup and the nature of the application, really. If the Discount is important to the app, then you're going to catch any bugs in this method in functional testing anyway. All unit testing does is let you catch them at the point you're testing this unit, when the location of the error will be immediately obvious, instead of waiting until the app is integrated together and the location of the error might be less obvious.
By the way, personally I wouldn't use 100 as the price in the test case (or rather, if I did then I'd add another test with another price). The reason is that someone in future might think that Discount is supposed to be a percentage. One purpose of trivial tests like this is to ensure that mistakes in reading the specification are corrected.
[Concerning the edit: I think it's inevitable that an incorrect specification is a point of failure. If you don't know what the app is supposed to do, then chances are it won't do it. But writing tests to reflect the spec doesn't magnify this problem, it merely fails to solve it. So you aren't adding new points of failure, you're just representing the existing faults in code instead of waffle documentation.]
All I see is more possible points of failure, with no real beneficial return, if the discount price is wrong, the test team will still find the issue, how did unit testing save any work?
Unit testing isn't really supposed to save work, it's supposed to help you find and prevent bugs. It's more work, but it's the right kind of work. It's thinking about your code at the lowest levels of granularity and writing test cases that prove that it works under expected conditions, for a given set of inputs. It's isolating variables so you can save time by looking in the right place when a bug does present itself. It's saving that suite of tests so that you can use them again and again when you have to make a change down the road.
I personally think that most methodologies are not many steps removed from cargo cult software engineering, TDD included, but you don't have to adhere to strict TDD to reap the benefits of unit testing. Keep the good parts and throw out the parts that yield little benefit.
Finally, the answer to your titular question "How do you unit test a unit test?" is that you shouldn't have to. Each unit test should be brain-dead simple. Call a method with a specific input and compare it to its expected output. If the specification for a method changes then you can expect that some of the unit tests for that method will need to change as well. That's one of the reasons that you do unit testing at such a low level of granularity, so only some of the unit tests have to change. If you find that tests for many different methods are changing for one change in a requirement, then you may not be testing at a fine enough level of granularity.
Unit tests are there so that your units (methods) do what you expect. Writing the test first forces you to think about what you expect before you write the code. Thinking before doing is always a good idea.
Unit tests should reflect the business rules. Granted, there can be errors in the code, but writing the test first allows you to write it from the perspective of the business rule before any code has been written. Writing the test afterwards, I think, is more likely to lead to the error you describe because you know how the code implements it and are tempted just to make sure that the implementation is correct -- not that the intent is correct.
Also, unit tests are only one form -- and the lowest, at that -- of tests that you should be writing. Integration tests and acceptance tests should also be written, the latter by the customer, if possible, to make sure that the system operates the way it is expected. If you find errors during this testing, go back and write unit tests (that fail) to test the change in functionality to make it work correctly, then change your code to make the test pass. Now you have regression tests that capture your bug fixes.
[EDIT]
Another thing that I have found with doing TDD. It almost forces good design by default. This is because highly coupled designs are nearly impossible to unit test in isolation. It doesn't take very long using TDD to figure out that using interfaces, inversion of control, and dependency injection -- all patterns that will improve your design and reduce coupling -- are really important for testable code.
How does one test a test? Mutation testing is a valuable technique that I have personally used to surprisingly good effect. Read the linked article for more details, and links to even more academic references, but in general it "tests your tests" by modifying your source code (changing "x += 1" to "x -= 1" for example) and then rerunning your tests, ensuring that at least one test fails. Any mutations that don't cause test failures are flagged for later investigation.
You'd be surprised at how you can have 100% line and branch coverage with a set of tests that look comprehensive, and yet you can fundamentally change or even comment out a line in your source without any of the tests complaining. Often this comes down to not testing with the right inputs to cover all boundary cases, sometimes it's more subtle, but in all cases I was impressed with how much came out of it.
When applying Test-Driven Development (TDD), one begins with a failing test. This step, that might seem unecessary, actually is here to verify the unit test is testing something. Indeed, if the test never fails, it brings no value and worse, leads to wrong confidence as you'll rely on a positive result that is not proving anything.
When following this process strictly, all ''units'' are protected by the safety net the unit tests are making, even the most mundane.
Assert.IsEqual(p.DiscountPrice,90);
There is no reason the test evolves in that direction - or I'm missing something in your reasoning. When the price is 100 and the discount 20, the discount price is 80. This is like an invariant.
Now imagine your software needs to support another kind of discount based on percentage, perhaps depending on the volume bought, your Product::DiscountPrice() method may become more complicated. And it is possible that introducing those changes breaks the simple discount rule we had initially. Then you'll see the value of this test which will detect the regression immediately.
Red - Green - Refactor - this is to remember the essence of the TDD process.
Red refers to JUnit red bar when a tests fails.
Green is the color of JUnit progress bar when all tests pass.
Refactor under green condition: remove any dupliation, improve readability.
Now to address your point about the "3-4 layers above the code", this is true in a traditional (waterfall-like) process, not when the development process is agile. And agile is the world where TDD is coming from ; TDD is the cornerstone of eXtreme Programming.
Agile is about direct communication rather than thrown-over-the-wall requirement documents.
While, I am all for unit testing, I
sometimes wonder if this form of test
first development is really beneficial...
Small, trivial tests like this can be the "canary in the coalmine" for your codebase, alerting of danger before it's too late. The trivial tests are useful to keep around because they help you get the interactions right.
For example, think about a trivial test put in place to probe how to use an API you're unfamiliar with. If that test has any relevance to what you're doing in the code that uses the API "for real" it's useful to keep that test around. When the API releases a new version and you need to upgrade. You now have your assumptions about how you expect the API to behave recorded in an executable format that you can use to catch regressions.
...[I]n a real process, you have 3-4
layers above your code (Business
Request, Requirements Document,
Architecture Document), where the
actual defined business rule (Discount
Price is Price - Discount) could be
misdefined. If that's the situation,
your unit test means nothing to you.
If you've been coding for years without writing tests it may not be immediately obvious to you that there is any value. But if you are of the mindset that the best way to work is "release early, release often" or "agile" in that you want the ability to deploy rapidly/continuously, then your test definitely means something. The only way to do this is by legitimizing every change you make to the code with a test. No matter how small the test, once you have a green test suite you're theoretically OK to deploy. See also "continuous production" and "perpetual beta."
You don't have to be "test first" to be of this mindset, either, but that generally is the most efficient way to get there. When you do TDD, you lock yourself into small two to three minute Red Green Refactor cycle. At no point are you not able to stop and leave and have a complete mess on your hands that will take an hour to debug and put back together.
Additionally, your unit test is another
point of failure...
A successful test is one that demonstrates a failure in the system. A failing test will alert you to an error in the logic of the test or in the logic of your system. The goal of your tests is to break your code or prove one scenario works.
If you're writing tests after the code, you run the risk of writing a test that is "bad" because in order to see that your test truly works, you need to see it both broken and working. When you're writing tests after the code, this means you have to "spring the trap" and introduce a bug into the code to see the test fail. Most developers are not only uneasy about this, but would argue it is a waste of time.
What do we gain here?
There is definitely a benefit to doing things this way. Michael Feathers defines "legacy code" as "untested code." When you take this approach, you legitimize every change you make to your codebase. It's more rigorous than not using tests, but when it comes to maintaining a large codebase, it pays for itself.
Speaking of Feathers, there are two great resources you should check out in regard to this:
Working Effectively with Legacy Code
Brownfield Application Development in .NET
Both of these explain how to work these types of practices and disciplines into projects that aren't "Greenfield." They provide techniques for writing tests around tightly coupled components, hard wired dependencies, and things that you don't necessarily have control over. It's all about finding "seams" and testing around those.
[I]f the discount price is wrong, the
test team will still find the issue,
how did unit testing save any work?
Habits like these are like an investment. Returns aren't immediate; they build up over time. The alternative to not testing is essentially taking on debt of not being able to catch regressions, introduce code without fear of integration errors, or drive design decisions. The beauty is you legitimize every change introduced into your codebase.
What am I missing here? Please teach
me to love TDD, as I'm having a hard
time accepting it as useful so far. I
want too, because I want to stay
progressive, but it just doesn't make
sense to me.
I look at it as a professional responsibility. It's an ideal to strive toward. But it is very hard to follow and tedious. If you care about it, and feel you shouldn't produce code that is not tested, you'll be able to find the will power to learn good testing habits. One thing that I do a lot now (as do others) is timebox myself an hour to write code without any tests at all, then have the discipline to throw it away. This may seem wasteful, but it's not really. It's not like that exercise cost a company physical materials. It helped me to understand the problem and how to write code in such a way that it is both of higher quality and testable.
My advice would ultimately be that if you really don't have a desire to be good at it, then don't do it at all. Poor tests that aren't maintained, don't perform well, etc. can be worse than not having any tests. It's hard to learn on your own, and you probably won't love it, but it is going to be next to impossible to learn if you don't have a desire to do it, or can't see enough value in it to warrant the time investment.
A couple people keep mentioned that
testing helps enforce the spec. It has
been my experience that the spec has
been wrong as well, more often than
not...
A developer's keyboard is where the rubber meets the road. If the spec is wrong and you don't raise the flag on it, then it's highly probable you'll get blamed for it. Or at least your code will. The discipline and rigor involved in testing is difficult to adhere to. It's not at all easy. It takes practice, a lot of learning and a lot of mistakes. But eventually it does pay off. On a fast-paced, quickly changing project, it's the only way you can sleep at night, no matter if it slows you down.
Another thing to think about here is that techniques that are fundamentally the same as testing have been proven to work in the past: "clean room" and "design by contract" both tend to produce the same types of "meta"-code constructs that tests do, and enforce those at different points. None of these techniques are silver bullets, and rigor is going to cost you ultimately in the scope of features you can deliver in terms of time to market. But that's not what it's about. It's about being able to maintain what you do deliver. And that's very important for most projects.
Unit testing works very similar to double entry book keeping. You state the same thing (business rule) in two quite different ways (as programmed rules in your production code, and as simple, representative examples in your tests). It's very unlikely that you make the same mistake in both, so if they both agree with each other, it's rather unlikely that you got it wrong.
How is testing going to be worth the effort? In my experience in at least four ways, at least when doing test driven development:
it helps you come up with a well decoupled design. You can only unit test code that is well decoupled;
it helps you determine when you are done. Having to specify the needed behavior in tests helps to not build functionality that you don't actually need, and determine when the functionality is complete;
it gives you a safety net for refactorings, which makes the code much more amenable to changes; and
it saves you a lot of debugging time, which is horribly costly (I've heard estimates that traditionally, developers spend up to 80% of their time debugging).
Most unit tests, test assumptions. In this case, the discount price should be the price minus the discount. If your assumptions are wrong I bet your code is also wrong. And if you make a silly mistake, the test will fail and you will correct it.
If the rules change, the test will fail and that is a good thing. So you have to change the test too in this case.
As a general rule, if a test fails right away (and you don't use test first design), either the test or the code is wrong (or both if you are having a bad day). You use common sense (and possilby the specs) to correct the offending code and rerun the test.
Like Jason said, testing is security. And yes, sometimes they introduce extra work because of faulty tests. But most of the time they are huge time savers. (And you have the perfect opportunity to punish the guy who breaks the test (we are talking rubber chicken)).
Test everything you can. Even trivial mistakes, like forgetting to convert meters to feet can have very expensive side effects. Write a test, write the code for it to check, get it to pass, move on. Who knows at some point in the future, someone may change the discount code. A test can detect the problem.
I see unit tests and production code as having a symbiotic relationship. Simply put: one tests the other. And both test the developer.
Remember that the cost of fixing defects increases (exponentially) as the defects live through the development cycle. Yes, the testing team might catch the defect, but it will (usually) take more work to isolate and fix the defect from that point than if a unit test had failed, and it will be easier to introduce other defects while fixing it if you don't have unit tests to run.
That's usually easier to see with something more than a trivial example ... and with trivial examples, well, if you somehow mess up the unit test, the person reviewing it will catch the error in the test or the error in the code, or both. (They are being reviewed, right?) As tvanfosson points out, unit testing is just one part of an SQA plan.
In a sense, unit tests are insurance. They're no guarantee that you'll catch every defect, and it may seem at times like you're spending a lot of resources on them, but when they do catch defects that you can fix, you'll be spending a lot less than if you'd had no tests at all and had to fix all defects downstream.
I see your point, but it's clearly overstated.
Your argument is basically: Tests introduce failure. Therefore tests are bad/waste of time.
While that may be true in some cases, it's hardly the majority.
TDD assumes: More Tests = Less Failure.
Tests are more likely to catch points of failure than introduce them.
Even more automation can help here !
Yes, writing unit tests can be a lot of work, so use some tools to help you out.
Have a look at something like Pex, from Microsoft, if you're using .Net
It will automatically create suites of unit tests for you by examining your code. It will come up with tests which give good coverage, trying to cover all paths through your code.
Of course, just by looking at your code it can't know what you were actually trying to do, so it doesn't know if it's correct or not. But, it will generate interesting tests cases for you, and you can then examine them and see if it is behaving as you expect.
If you then go further and write parameterized unit tests (you can think of these as contracts, really) it will generate specific tests cases from these, and this time it can know if something's wrong, because your assertions in your tests will fail.
I've thought a bit about a good way to respond to this question, and would like to draw a parallel to the scientific method. IMO, you could rephrase this question, "How do you experiment an experiment?"
Experiments verify empirical assumptions (hypotheses) about the physical universe. Unit tests will test assumptions about the state or behavior of the code they call. We can talk about the validity of an experiment, but that's because we know, through numerous other experiments, that something doesn't fit. It doesn't have both convergent validity and empirical evidence. We don't design a new experiment to test or verify the validity of an experiment, but we may design a completely new experiment.
So like experiments, we don't describe the validity of a unit test based on whether or not it passes a unit test itself. Along with other unit tests, it describes the assumptions we make about the system it is testing. Also, like experiments, we try to remove as much complexity as we can from what we are testing. "As simple as possible, but no simpler."
Unlike experiments, we have a trick up our sleeve to verify our tests are valid other than just convergent validity. We can cleverly introduce a bug we know should be caught by the test, and see if the test does indeed fail. (If only we could do that in the real world, we'd depend much less on this convergent validity thing!) A more efficient way to do this is watch your test fail before implementing it (the red step in Red, Green, Refactor).
You need to use the correct paradigm when writing tests.
Start by first writing your tests.
Make sure they fail to start off with.
Get them to pass.
Code review before you checkin your code (make sure the tests are reviewed.)
You cant always be sure but they improve overall tests.
Even if you do not test your code, it will surely be tested in production by your users. Users are very creative in trying to crash your soft and finding even non-critical errors.
Fixing bugs in production is much more costly than resolving issues in development phase.
As a side-effect, you will lose income because of an exodus of customers. You can count on 11 lost or not gained customers for 1 angry customer.

Resources