TDD and Test Data - tdd

I'm new to TDD and was wondering where to begin. I've read about as much as I think I can without throwing up and am still confused what to test and what not to test. For example, I know, from reading, we should not run tests against databases, thus enters the mocking framework. We mock our repositories so they will return fake data. My question is do we test requirements for data constants? For example, a requirement may state, a person should always have a name, thus:
Assert.IsNotNull(personObject.Name);
Should always be true, but how do I test it without have "fake" data? Do I care to test that type of requirement?

Let's take your requirement "a person should always have a name". Where could we start?
First, we need some clarity. I believe that when you say "should always have a name", you mean "should never have a null or empty string" as a name.
What we probably really mean here is "when a person is stored in our database, its name cannot be null or empty". The first line of attack would be to enforce that with a constraint in your database; however, nothing protects you against a rogue DBA removing that constraint, so you may want to test that what you think is true about the system, will be flagged by a failing test if it changes. For this purpose you would write a test like "when my app sends a Person with a Null name to be saved to the DB, it should fail miserably". That's not a unit test, it is more of an integration test - and it is more complicated to write than a good old unit test.
However, this still doesn't cover the scenario of a rogue DBA removing the constraint and directly creating records which have null names. In other words, your app cannot trust the data it gets back to be correct, so the question becomes: how do you want your domain to deal with the possibility of persons with null names?
A very literal approach could be to enforce that Person can only be constructed with non-null Name, and throws otherwise. That would be easy to unit test and enforce, but will probably make development painful. A more pleasant approach would be to have no constraints on Person, but a Validator class, which can Validate a Person for broken rules. This is more palatable because you could now do anything you wish to a Person (figuratively) and then Validate whether that Person is still in a Valid state.
This has the benefit of
1) being very easily testable: creating a Unit Test for such a validator is a piece of cake,
2) addressing the rogue DBA issue: you can now Validate everything that comes from or goes to the outside of the app, by applying the Validator.
So, being the lazy developer that I am, this is where I would start: go with the Validator, because it addresses my problem, while being much easier to test than something involving the data. In other words, in general I tend to stick as much as possible with Unit Tests (i.e. completely in memory), and to have as much as I can of my business logic in the code / domain, because it's easier to have all in one place, and easier to test.

You can use acceptance or integration testing to cover your database and constraints.
For instance, on one project, we had a separate "integration" package with tests which just checked our NHibernate bindings. If you're not using NHibernate, you could just do this with your repositories, keeping the database connections in place. You can verify constraints, referential integrity, etc. from this level.
If you're still looking for guidance on other aspects of TDD, try Dan North's post on BDD, which starts:
"I had a problem... programmers wanted to know where to start, what to test and what not to test, how much to test in one go, what to call their tests, and how to understand why a test fails."

Related

PHPUnit test dependencies

A similar question has been asked before but I don't quite understand the answer. My specific case is that I have a unit test which tests the registration of a user via a REST API endpoint. User registration however depends on a few records which must exist in the database, otherwise it will fail. Inserting these records into the database is most definitely a test case by itself too. So my question is, should I execute my tests in a specific order in order for the records to exist, or should I explicitly insert the records again in every testcase that depends on it?
It might be somewhat irrelevant but I'm using Laravel 5, so testing is done in PHPUnit.
should I execute my tests in a specific order in order for the
records to exist, or should I explicitly insert the records again in
every testcase that depends on it?
I think the correct answer here is that you should not do either (but please read on, it might still be ok to do the latter, though not perfect).
If you say registering the user is a test case in itself. Very well then, write that test and let's assume you have that test in what follows.
Creating tests so that they run in order
Lets deal with the first option of running the creating those rows once and then running multiple tests against them.
I think this is a very flawed approach no matter the circumstances. All of a sudden all tests depend on one another.
Say you run test A, B and C on those rows. Maybe it's even the case that right now none of them alters the rows. But there is no way you can be sure that no bug is ever introduced into B that alters data ( mustn't even be a bug, could just be that the underlying functionality is changed ).
Now you're in a situation where test C might pass, but only if B did not run before. This is an entirely unacceptable situation, especially when the reverse is true, C only passing if B ran.
This could show in say a fresh installation of your App throwing errors in real life, while your development setup containing a bunch of data works and so do the tests because B created a certain state in your database ( that maybe also exists randomly in your dev database ).
Then you give it out to some poor customer and all of a sudden "option X" is not set, or the initial admin user does not exist or whatever :)
=> bad plan
Running the Setup for Every Test that depends on it
This is a significantly better plan. Now you at least have full control of your database state in every test and they all run independent of one another.
The order of them running will not affect outcome
=> good
Also this is a relatively standard thing to do for a subset of tests. Just subclass your main UnittestCase class and make all tests depending on that function subclasses of that thing like so:
abstract class NeedsDbSetupTestCase extends MyAppMainTestCase {
function setUp(){
parent::setUp();
$this->setupDb();
}
private function setupDb(){
//add your rows and tables and such
}
}
=> acceptable idea
The Optimal Approach
The above still comes some drawbacks. For one it isn't really a unittest anymore once it depends on very specific database interactions, which makes it less valueable in exactly pinpointing an issue. Admittedly though this is in many cases more a theoretical than a practical issue :)
What will much more likely become a practical issue though is performance. You are adding a bunch of database writes that might need to be run hundreds of times once your test suit grows. At the beginning of your project this might mean that it takes 4s to run it instead of 2s :P ... once the project grows you might find yourself losing a lot of time because of this though.
One last issues you might also face is that your test suit becomes dependent on the database it's run against. Maybe it passes running against MySQL 5.5 and fails against 5.6 ( academic example I guess :P ) => you might have all kinds of strange behavior with tests passing locally but failing in CI and whatnot (somewhat likely depending on your setup).
Since you are interesting in this in a more generic sense, let me outline the proper way of handling this here generically too :)
What it will always come down to is that a situation like this causes you trouble:
class User {
private $id;
public function get_data(){
return make_a_sql_call_and_return_row_as_array("SELECT properta1, propertyb FROM users WHERE id = " . $this->id);
}
}
Now some other method is to be tested that actually uses the return of get_data() and you need that data in the db :) ... or you just mock your User object!
Assuming you have some method in another class that uses that User object.
And your test looks a little something like this:
// run this in the context of the class that sets up the db for you
$user = new User($user_id);
$this->assertTrue(some_method_or_function($user);
All you need here from $user is to say return the array [1,5]. Instead of inserting this and then using an instance of User, just create the mock:
// this one doesn't do anything yet, returns null on every method.
$user = $this->getMockBuilder('User')->disableOriginalConstructor()->get_mock();
// now just make it return what you want it to return
$user->method('get_data')->willReturn(array(1,2));
// And run your test lightning fast without having ever touched the database but getting the same result :)
$this->assertTrue(some_method_or_function($user);
Another hidden ( but valuable ) benefit of this approach is, that setting up the mocks and such actually forces you about the details that go into every classes behavior, giving you a significantly more detailed understanding of your app in the end.
Obviously the downside is that it (not always but often) requires a lot more work to code your tests this way and the benefit might not be worth the trouble.
Especially when working with other frameworks like WordPress and such that your code depends on, it might be somewhat unfeasible to really mock all db interaction, while existing libraries provide slower but trivial to implement database testing capabilities for your code :)
But in general option 3 is the way to go, option one is just wrong and option two might be what everyone eventually does in real life :D

Should I protect my database from invalid data?

I always tend to "protect" my persistance layer from violations via the service layer. However, I am beginning to wonder if it's really necessary. What's the point in taking the time in making my database robust, building relationships & data integrity when it never actually comes into play.
For example, consider a User table with a unique contraint on the Email field. I would naturally want to write blocker code in my service layer to ensure the email being added isn't already in the database before attempting to add anything. In the past I have never really seen anything wrong with it, however, as I have been exposed to more & more best practises/design principles I feel that this approach isn't very DRY.
So, is it correct to always ensure data going to the persistance layer is indeed "valid" or is it more natural to let the invalid data get to the database and handle the error?
Please don't do that.
Implementing even "simple" constraints such as keys is decidedly non-trivial in a concurrent environment. For example, it is not enough to query the database in one step and allow the insertion in another only if the first step returned empty result - what if a concurrent transaction inserted the same value you are trying to insert (and committed) in between your steps one and two? You have a race condition that could lead to duplicated data. Probably the simplest solution for this is to have a global lock to serialize transactions, but then scalability goes out of the window...
Similar considerations exist for other combinations of INSERT / UPDATE / DELETE operations on keys, as well as other kinds of constraints such as foreign keys and even CHECKs in some cases.
DBMSes have devised very clever ways over the decades to be both correct and performant in situations like these, yet allow you to easily define constraints in declarative manner, minimizing the chance for mistakes. And all the applications accessing the same database will automatically benefit from these centralized constraints.
If you absolutely must choose which layer of code shouldn't validate the data, the database should be your last choice.
So, is it correct to always ensure data going to the persistance layer is indeed "valid" (service layer) or is it more natural to let the invalid data get to the database and handle the error?
Never assume correct data and always validate at the database level, as much as you can.
Whether to also validate in upper layers of code depends on a situation, but in the case of key violations, I'd let the database do the heavy lifting.
Even though there isn't a conclusive answer, I think it's a great question.
First, I am a big proponent of including at least basic validation in the database and letting the database do what it is good at. At minimum, this means foreign keys, NOT NULL where appropriate, strongly typed fields wherever possible (e.g. don't put a text field where an integer belongs), unique constraints, etc. Letting the database handle concurrency is also paramount (as #Branko Dimitrijevic pointed out) and transaction atomicity should be owned by the database.
If this is moderately redundant, than so be it. Better too much validation than too little.
However, I am of the opinion that the business tier should be aware of the validation it is performing even if the logic lives in the database.
It may be easier to distinguish between exceptions and validation errors. In most languages, a failed data operation will probably manifest as some sort of exception. Most people (me included) are of the opinion that it is bad to use exceptions for regular program flow, and I would argue that email validation failure (for example) is not an "exceptional" case.
Taking it to a more ridiculous level, imagine hitting the database just to determine if a user had filled out all required fields on a form.
In other words, I'd rather call a method IsEmailValid() and receive a boolean than try to have to determine if the database error which was thrown meant that the email was already in use by someone else.
This approach may also perform better, and avoid annoyances like skipped IDs because an INSERT failed (speaking from a SQL Server perspective).
The logic for validating the email might very well live in a reusable stored procedure if it is more complicated than simply a unique constraint.
And ultimately, that simple unique constraint provides final protection in case the business tier makes a mistake.
Some validation simply doesn't need to make a database call to succeed, even though the database could easily handle it.
Some validation is more complicated than can be expressed using database constructs/functions alone.
Business rules across applications may differ even against the same (completely valid) data.
Some validation is so critical or expensive that it should happen prior to data access.
Some simple constraints like field type/length can be automated (anything run through an ORM probably has some level of automation available).
Two reasons to do it. The db may be accessed from another application..
You might make a wee error in your code, and put data in the db, which because your service layer operates on the assumption that this could never happen, makes it fall over if you are lucky, silent data corruption being worst case.
I've always looked at rules in the DB as backstop for that exceptionally rare occasion when I make a mistake in the code. :)
The thing to remember, is if you need to , you can always relax a constraint, tightening them after your users have spent a lot of effort entering data will be far more problematic.
Be real wary of that word never, in IT, it means much sooner than you wished.

page object model: why not include assertions in page methods?

First-time poster. I've been working in UI automation for many years, but was only recently introduced to/instructed to work with the Page Object Model. Most of it is common sense and includes techniques I've been using already, but there's a particular fine point which I haven't been able to justify in my own mind, despite searching extensively for a well-reasoned explanation. I'm hoping someone here might enlighten me, as this question has caused some consternation as I try to integrate the POM with my own best practices.
From http://code.google.com/p/selenium/wiki/PageObjects:
The code presented above shows an important point: the tests, not the
PageObjects, should be responsible for making assertions about the
state of a page.... Of course, as with every guideline there are
exceptions...
From http://seleniumhq.org/docs/06_test_design_considerations.html#chapter06-reference:
There is a lot of flexibility in how the page objects may be designed,
but there are a few basic rules for getting the desired
maintainability of your test code. Page objects themselves should
never be make verifications or assertions. This is part of your test
and should always be within the test’s code, never in an page object.
The page object will contain the representation of the page, and the
services the page provides via methods but no code related to what is
being tested should be within the page object.
There is one, single, verification which can, and should, be within
the page object and that is to verify that the page, and possibly
critical elements on the page, were loaded correctly. This
verification should be done while instantiating the page object.
Both of these "guidelines" allow for potential exceptions, but I couldn't disagree more with the basic premise. I'm accustomed to doing a considerable amount of verification within "page methods", and I think the presence of verification there is a powerful technique for finding issues in a variety of contexts (i.e., verification occurs every time the method is called) rather than only occurring in the limited context of particular tests.
For example, let's imagine that when you login to your AUT, some text appears that says "logged in as USER". It's appropriate to have a single test validate this specifically, but why wouldn't you want to verify it every time login is called? This artifact is not directly related to whether the page "loaded correctly" or not, and it's not related to "what is being tested" in general, so according to the POM guidelines above, it clearly SHOULDN'T be in a page method... but it seems to me that it clearly SHOULD be there, to maximize the power of automation by verifying important artifacts as often as possible, with as little forethought as possible. Putting verification code in page methods multiplies the power of automation by allowing you to get a lot of verification "for free", without having to worry about it in your tests, and such frequent verification in different contexts often finds issues which you would NOT find if the verification were limited to, say, a single test for that artifact.
In other words, I tend to distinguish between test-specific verification and "general" verification, and I think it's perfectly appropriate/desirable for the latter to be included - extensively - in page methods. This promotes thinner tests and thicker page objects, which generally increases test maintainability by reusing more code - despite the opposite contention in these guidelines. Am I missing the point? What's the real rationale for NOT wanting verification in page methods? Is the situation I've described actually one of the 'exceptions' described in these guidelines, and therefore actually NOT inconsistent with the POM? Thanks in advance for your thoughts. -jn-
As a guideline, assertions should be done in tests and not in page objects. Of course, there are times when this isn't a pragmatic approach, but those times are infrequent enough for the above guideline to be right. Here are the reasons why I dislike having assertions in page objects:
It is quite frustrating to read a test that just calls verify methods where assertions are buried elsewhere in page objects. Where possible, it should be obvious what a test is asserting; this is best achieved when assertions are directly in a test. By hiding the assertions somewhere outside of a test, the intent of the test is not so clear.
Assertions in browser tests can be expensive - they can really slow your tests down. When you have hundreds or thousands of tests, minutes/hours can be added to your test execution time; this is A Bad Thing. If you move the assertions to just the tests that care about those particular assertions you'll find that you'll have much quicker tests and you will still catch the relevant defects. The question included the following:
Putting verification code in page methods multiplies the power of automation by allow you to get a lot of verification "for free"
Well, "Freedom Isn't Free" :) What you're actually multiplying is your test execution time.
Having assertions all over the place violates another good guideline; "One Assertion Per Test" ( http://blog.jayfields.com/2007/06/testing-one-assertion-per-test.html ). I don't stick religiously to it, but I try to follow the principle. Where possible, a test should be interested in one thing only.
The value of tests is reduced because one bug will cause loads of tests to fail thus preventing them from testing what they should be testing.
For example, let's imagine that when you login to your AUT, some text appears that says "logged in as USER". It's appropriate to have a single test validate this specifically, but why wouldn't you want to verify it every time login is called?
If you have the assertion in the page object class and the expected text changes, all tests that log in will fail. If instead the assertion is in the test then only one test will fail - the one that specifically tests for the correct message - leaving all the other tests to continue running to find other bugs. You don't need 5,000 tests to tell you that the login message is wrong; 1 test will do ;)
Having a class do more than one thing violates 'S' in SOLID, ie: 'Single Responsibility Principle' (SRP). A class should be responsible for one thing, and one thing only. In this instance a page-object class should be responsible for modelling a page (or section thereof) and nothing more. If it does any more than that (eg: including assertions) then you're violating SRP.
I too have struggled at times with this recommendation. I believe the reason behind this guideline is to keep your page objects reusable, and putting asserts inside your page objects could possibly limit their ability to be reused by a large number of unrelated tests. That said, I have put certain verification methods on my page objects like testing the caption for a header - in my experience, that is a better way to encapsulate test logic for elements of a page that don't change.
Another note - I have seen MVC applications that have domain models reused as page objects. When done correctly, this can significantly reduce redundant code in your testing library. With this pattern, the view models have no reference to a testing framework, so obviously, you could not put any asserts in them.
Your page object shouldn't perform an assertion because then the page object has to know about your test framework (unless you're using built-in language assertions). But your page needs to know it's state to locate elements and perform actions.
The key is in the statement "Of course, as with every guideline there are exceptions..."
Your page should throw exceptions, not perform assertions. That way your test can catch the assertion and bail or act accordingly. For instance.
page = ProfilePage.open
try
page.ChangePassword(old, new)
catch notLoggedIn
page.Login(user, pass)
assert page.contains "your password has been updated"
In this limited example you'd have to check again (and again) so it might not be the best way, but you get the idea. You could also just check state (twice)
if page.hasLoginDialog
page.Login
if page.hasLoginDialog //(again!)
assert.fail("can't login")
You could also just check that you have a profile page
try
page = site.OpenProfilePage
catch notOnProfilePage
or has the elements you need
try
profilepage.changePassword(old,new)
catch elementNotFound
or without throwing an exception
page = site.OpenProfilePage
if ! page instanceof ProfilePage
or with complex checking
assert page.looksLikeAProfilePage
It's not how you do it that matters. You want to keep logic in your tests to a minimum but you don't want your page objects to be tied to your test framework -- after all, you might use the same objects for scraping or data generation -- or with a different test framework that has it's own assertions.
If you feel a need you can push your assertions out of your test case into test helper methods.
page = site.GoToProfilePage
validate.looksLikeProfilePage(page)
which a great opportuntity for a mixin if your language supports them, so you can have your clean page objects -- and mixin your sanity checks.
This perplexes me when I see same assertion could be used across multiple test methods. For example, writing assertion specific method -
public PaymentPage verifyOrderAmount(BigDecimal orderAmount) {
Assertion.assertEquals(lblOrderAmount.getText(), orderAmount, "Order Amount is wrong on Payment details page");
return this;
}
Now I can reuse it in all tests I need. Instead of repeating same assertion statement in multiple tests dealing with multiple scenarios. Needless to say I can chain multiple assertions in a method depending on test i.e -
.verifyOrderAmount(itemPrice)
.verifyBankAmount(discountedItemPrice)
.verifyCouponCode(flatDiscountCode.getCouponCode())
When page object is supposed to represent the services offered by page then, is not assertion point also a service provided by Page?
#Matt reusing domain models in page object might save you time but isn't that a Test Smell, test logic be well clear of domain model (depending on what you are trying to achieve).
Back to the Original Question, If you really must do assertions in the Page Object, why not use selenium loadablecomponent<> where you can use the isLoaded() method or include your custom assertion in the loadablecomponent<> class. This will keep your page object free of assertions.But you get to do assertions in the loadable component. See link below...
https://github.com/SeleniumHQ/selenium/wiki/LoadableComponent
_The Dreamer
I couldn't agree more with the author.
Adding assertions in test methods helps you to 'fail early'. By assertions, I mean checking if a certain page is loaded after clicking button, etc. (the so called general assertions).
I really don't believe it increases the execution time so much. UI automation is slow by default, adding some few-millisecond checks do not make that much of a difference, but could ease your troubleshooting, report an early failure and make your code more re-usable.
However, it also depends on the type of UI tests. For instance, if you are implementing end-to-end tests with mostly positive paths, it makes sense to make a check inside the test method that clicking a button actually results in opening a page. However, if you are writing bunch of negative scenarios, that's not always the case.

Test Driven Development initial implementation

A common practice of TDD is that you make tiny steps. But one thing which is bugging me is something I've seen a few people do, where by they just hardcode values/options, and then refactor later to make it work properly. For example…
describe Calculator
it should multiply
assert Calculator.multiply(4, 2) == 8
Then you do the least possible to make it pass:
class Calculator
def self.multiply(a, b)
return 8
And it does!
Why do people do this? Is it to ensure they're actually implementing the method in the right class or something? Cause it just seems like a sure-fire way to introduce bugs and give false-confidence if you forget something. Is it a good practice?
This practice is known as "Fake it 'til you make it." In other words, put fake implementations in until such time as it becomes simpler to put in a real implementation. You ask why we do this.
I do this for a number of reasons. One is simply to ensure that my test is being run. It's possible to be configured wrong so that when I hit my magic "run tests" key I'm actually not running the tests I think I'm running. If I press the button and it's red, then put in the fake implementation and it's green, I know I'm really running my tests.
Another reason for this practice is to keep a quick red/green/refactor rhythm going. That is the heartbeat that drives TDD, and it's important that it have a quick cycle. Important so you feel the progress, important so you know where you're at. Some problems (not this one, obviously) can't be solved in a quick heartbeat, but we must advance on them in a heartbeat. Fake it 'til you make it is a way to ensure that timely progress. See also flow.
There is a school of thought, which can be useful in training programmers to use TDD, that says you should not have any lines of source code that were not originally part of a unit test. By first coding the algorithm that passes the test into the test, you verify that your core logic works. Then, you refactor it out into something your production code can use, and write integration tests to define the interaction and thus the object structure containing this logic.
Also, religious TDD adherence would tell you that there should be no logic coded that a requirement, verified by an assertion in a unit test, does not specifically state. Case in point; at this time, the only test for multiplication in the system is asserting that the answer must be 8. So, at this time, the answer is ALWAYS 8, because the requirements tell you nothing different.
This seems very strict, and in the context of a simple case like this, nonsensical; to verify correct functionality in the general case, you would need an infinite number of unit tests, when you as an intelligent human being "know" how multiplication is supposed to work and could easily set up a test that generated and tested a multiplication table up to some limit that would make you confident it would work in all necessary cases. However, in more complex scenarios with more involved algorithms, this becomes a useful study in the benefits of YAGNI. If the requirement states that you need to be able to save record A to the DB, and the ability to save record B is omitted, then you must conclude "you ain't gonna need" the ability to save record B, until a requirement comes in that states this. If you implement the ability to save record B before you know you need to, then if it turns out you never need to then you have wasted time and effort building that into the system; you have code with no business purpose, that regardless can still "break" your system and thus requires maintenance.
Even in the simpler cases, you may end up coding more than you need if you code beyond requirements that you "know" are too light or specific. Let's say you were implementing some sort of parser for string codes. The requirements state that the string code "AA" = 1, and "AB" = 2, and that's the limit of the requirements. But, you know the full library of codes in this system include 20 others, so you include logic and tests that parse the full library. You go back the the client, expecting your payment for time and materials, and the client says "we didn't ask for that; we only ever use the two codes we specified in the tests, so we're not paying you for the extra work". And they would be exactly right; you've technically tried to bilk them by charging for code they didn't ask for and don't need.

Is it bad practice to run tests on a database instead of on fake repositories?

I know what the advantages are and I use fake data when I am working with more complex systems.
What if I am developing something simple and I can easily set up my environment in a real database and the data being accessed is so small that the access time is not a factor, and I am only running a few tests.
Is it still important to create fake data or can I forget the extra coding and skip right to the real thing?
When I said real database I do not mean a production database, I mean a test database, but using a real live DBMS and the same schema as the real database.
The reasons to use fake data instead of a real DB are:
Speed. If your tests are slow you aren't going to run them. Mocking the DB can make your tests run much faster than they otherwise might.
Control. Your tests need to be the sole source of your test data. When you use fake data, your tests choose which fakes you will be using. So there is no chance that your tests are spoiled because someone left the DB in an unfamiliar state.
Order Independence. We want our tests to be runnable in any order at all. The input of one test should not depend on the output of another. When your tests control the test data, the tests can be independent of each other.
Environment Independence. Your tests should be runnable in any environment. You should be able to run them while on the train, or in a plane, or at home, or at work. They should not depend on external services. When you use fake data, you don't need an external DB.
Now, if you are building a small little application, and by using a real DB (like MySQL) you can achieve the above goals, then by all means use the DB. I do. But make no mistake, as your application grows you will eventually be faced with the need to mock out the DB. That's OK, do it when you need to. YAGNI. Just make sure you DO do it WHEN you need to. If you let it go, you'll pay.
It sort of depends what you want to test. Often you want to test the actual logic in your code not the data in the database, so setting up a complete database just to run your tests is a waste of time.
Also consider the amount of work that goes into maintaining your tests and testdatabase. Testing your code with a database often means your are testing your application as a whole instead of the different parts in isolation. This often result in a lot of work keeping both the database and tests in sync.
And the last problem is that the test should run in isolation so each test should either run on its own version of the database or leave it in exactly the same state as it was before the test ran. This includes the state after a failed test.
Having said that, if you really want to test on your database you can. There are tools that help setting up and tearing down a database, like dbunit.
I've seen people trying to create unit test like this, but almost always it turns out to be much more work then it is actually worth. Most abandoned it halfway during the project, most abandoning ttd completely during the project, thinking the experience transfer to unit testing in general.
So I would recommend keeping tests simple and isolated and encapsulate your code good enough it becomes possible to test your code in isolation.
As far as the Real DB does not get in your way, and you can go faster that way, I would be pragmatic and go for it.
In unit-test, the "test" is more important than the "unit".
I think it depends on whether your queries are fixed inside the repository (the better option, IMO), or whether the repository exposes composable queries; for example - if you have a repository method:
IQueryable<Customer> GetCustomers() {...}
Then your UI could request:
var foo = GetCustomers().Where(x=>SomeUnmappedFunction(x));
bool SomeUnmappedFunction(Customer customer) {
return customer.RegionId == 12345 && customer.Name.StartsWith("foo");
}
This will pass for an object-based fake repo, but will fail for actual db implementations. Of course, you can nullify this by having the repository handle all queries internally (no external composition); for example:
Customer[] GetCustomers(int? regionId, string nameStartsWith, ...) {...}
Because this can't be composed, you can check the DB and the UI independently. With composable queries, you are forced to use integration tests throughout if you want it to be useful.
It rather depends on whether the DB is automatically set up by the test, also whether the database is isolated from other developers.
At the moment it may not be a problem (e.g. only one developer). However (for manual database setup) setting up the database is an extra impediment for running tests, and this is a very bad thing.
If you're just writing a simple one-off application that you absolutely know will not grow, I think a lot of "best practices" just go right out the window.
You don't need to use DI/IOC or have unit tests or mock out your db access if all you're writing is a simple "Contact Us" form. However, where to draw the line between a "simple" app and a "complex" one is difficult.
In other words, use your best judgment as there is no hard-and-set answer to this.
It is ok to do that for the scenario, as long as you don't see them as "unit" tests. Those would be integration tests. You also want to consider if you will be manually testing through the UI again and again, as you might just automated your smoke tests instead. Given that, you might even consider not doing the integration tests at all, and just work at the functional/ui tests level (as they will already be covering the integration).
As others as pointed out, it is hard to draw the line on complex/non complex, and you would usually now when it is too late :(. If you are already used to doing them, I am sure you won't get much overhead. If that were not the case, you could learn from it :)
Assuming that you want to automate this, the most important thing is that you can programmatically generate your initial condition. It sounds like that's the case, and even better you're testing real world data.
However, there are a few drawbacks:
Your real database might not cover certain conditions in your code. If you have fake data, you cause that behavior to happen.
And as you point out, you have a simple application; when it becomes less simple, you'll want to have tests that you can categorize as unit tests and system tests. The unit tests should target a simple piece of functionality, which will be much easier to do with fake data.
One advantage of fake repositories is that your regression / unit testing is consistent since you can expect the same results for the same queries. This makes it easier to build certain unit tests.
There are several disadvantages if your code (if not read-query only) modifies data:
- If you have an error in your code (which is probably why you're testing), you could end up breaking the production database. Even if you didn't break it.
- if the production database changes over time and especially while your code is executing, you may lose track of the test materials that you added and have a hard time later cleaning it out of the database.
- Production queries from other systems accessing the database may treat your test data as real data and this can corrupt results of important business processes somewhere down the road. For example, even if you marked your data with a certain flag or prefix, can you assure that anyone accessing the database will adhere to this schema?
Also, some databases are regulated by privacy laws, so depending on your contract and who owns the main DB, you may or may not be legally allowed to access real data.
If you need to run on a production database, I would recommend running on a copy which you can easily create during of-peak hours.
It's a really simple application, and you can't see it growing, I see no problem running your tests on a real DB. If, however, you think this application will grow, it's important that you account for that in your tests.
Keep everything as simple as you can, and if you require more flexible testing later on, make it so. Plan ahead though, because you don't want to have a huge application in 3 years that relies on old and hacky (for a large application) tests.
The downsides to running tests against your database is lack of speed and the complexity for setting up your database state before running tests.
If you have control over this there is no problem in running the tests directly against the database; it's actually a good approach because it simulates your final product better than running against fake data. The key is to have a pragmatic approach and see best practice as guidelines and not rules.

Resources