Should a Unit-test replicate functionality or Test output? - ruby

I've run into this dilemma several times. Should my unit-tests duplicate the functionality of the method they are testing to verify its integrity? OR Should unit tests strive to test the method with numerous manually created instances of inputs and expected outputs?
I'm mainly asking the question for situations where the method you are testing is reasonably simple and its proper operation can be verified by glancing at the code for a minute.
Simplified example (in ruby):
def concat_strings(str1, str2)
return str1 + " AND " + str2
end
Simplified functionality-replicating test for the above method:
def test_concat_strings
10.times do
str1 = random_string_generator
str2 = random_string_generator
assert_equal (str1 + " AND " + str2), concat_strings(str1, str2)
end
end
I understand that most times the method you are testing won't be simple enough to justify doing it this way. But my question remains; is this a valid methodology in some circumstances (why or why not)?

Testing the functionality by using the same implementation, doesn't test anything. If one has a bug in it, the other will as well.
But testing by comparing with an alternative implementation is a valid approach. For example, you might test a iterative (fast) method of calculating fibonacci numbers by comparing it with a trivial recursive, yet slow implementation of the same method.
A variation of this is using an implementation, that only works for special cases. Of course in that case you can use it only for such special cases.
When choosing input values, using random values most of the time isn't very effective. I'd prefer carefully chosen values anytime. In the example you gave, null values and extremely long values which won't fit into a String when concatenated come to mind.
If you use random values, make sure, you have a way to recreate the exact run, with the same random values, for example by logging the seed value, and having a way to set that value at start time.

It's a controversial stance, but I believe that unit testing using Derived Values is far superior to using arbitrary hard-coded input and output.
The issue is that as an algorithm becomes even slightly complex, the relationship between input and output becomes obscure if represented by hard-coded values. The unit test ends up being a postulate. It may work technically, but hurts test maintenability because it leads to Obscure Tests.
Using Derived Values to test against the result establishes a much clearer relationship between test input and expected output.
The argument that this doesn't test anything is simply not true, because any test case will exercise only a part of a path through the SUT, so no single test case will reproduce the entire algorithm being tested, but the combination of tests will do so.
An additional benefit is that you can use fewer unit tests to cover the desired functionality, and even make them more communicative at the same time. The end result is terser and more maintainable unit tests.

In unit testing you should definitely manually come up with test cases (so input, output and what side-effects you are expecting - these will be expectations on your mock objects). You come up with these test cases in a way so that they cover all functionality of your class (e.g. all methods are covered, all branches of all if statements, etc.). Think about it more along the lines of creating documentation of your class by showing all possible usages.
Reimplementing the class is not a good idea, because not only you get obvious code/functionality duplication, but also it is likely that you will introduce the same bugs in this new implementation.

to test the functionality of a method i'd use input and output pairs wherever possible. otherwise you might be copy&pasting the functionality as well as the errors in its implementation. what are you testing then? you would be testing if the functionality (including all of its errors) hasn't changed over time. but you wouldn't be testing the correctness of the implementation.
testing if the functionality hasn't changed over time might (temporarily) be useful during refactoring. but how often do you refactor such small methods?
also unit tests can be seen as documentation and as specification of a method's inputs and expected outputs. both should be as simple as possible so others can easily read and comprehend it. as soon as you introduce additional code/logic into a test it becomes harder to read.
your test actually looks like a fuzz test. fuzz tests can be very useful, but in unit tests randomness should be avoided due to reproducibility.

A Unit-Test should exercise your code, not something as part of the language you are using.
If the code's logic is to concatenate strings in a special way, you should be testing for that - otherwise you need to rely on your language/framework.
Finally, you should create your unit tests to fail first "with meaning". In other words, random values shouldn't be used (unless you're testing your random number generator isn't returning the same set of random values!)

Yes. It bothers me too.. although I'd say that it is more prevalent with non-trivial computations. In order to avoid updating the test when the code changes, some programmers write a IsX=X test, which always succeeds irrespective of the SUT
About duplicating functionality
You don't have to. Your test can state what the expected output is, not how you derived it.
Although in some non-trivial cases, it may make your test more readable as to how you derived the expected value - test as a spec. You shouldn't refactor away this duplication
def doubler(x); x * 2; end
def test_doubler()
input, expected = 10, doubler(10)
assert_equal expected, doubler(10)
end
Now if I change doubler(x) to be a tripler, the above test won't fail.
def doubler(x); x * 3; end
However this one would:
def test_doubler()
assert_equal(20, doubler(10))
end
randomness in unit tests - Don't.
Instead of random datasets, choose static representative data points for testing and use a xUnit RowTest/TestCase to run the test with diff data inputs. If n input-sets are identical for the unit, choose 1.
The test in the OP could be used as a exploratory test/ or to determine additional representative input-sets. Unit tests need to be repeatable (See q#61400) - Using random values defeats this objective.

Never use random data for input. If your test reports a failure, how are you ever going to be able to duplicate it? And don't use the same function to generate the expected result. If you have a bug in your method you're likely to put the same bug in your test. Compute the expected results by some other method.
Hard-coded values are perfectly fine, and make sure inputs are picked to represent all of the normal and edge cases. At the very least test the expected inputs as well as inputs in the wrong format or wrong size (eg: null values).
It's really quite simple -- a unit test must test whether the function works or not. That means you need to give a range of known inputs that have known outputs and test against that. There is no universal right way to do that. However, using the same algorithm for the method and the verification proves nothing but that you're adept at copy/paste.

Related

Tdd workflow when you find Single Responsibility Principle violation

I trying to follow TDD. So here is my problem
I have interface Risk with method
boolean check(...)
Risk1, and Risk2 are implentations deveped test first, so now they are fully covered.
I decided that unit that check all risks (CompositeRisk) also could implement Risk.
CompositeRisk applies OR on each Risk1 and Risk2 rezult (If one risk is true then whole this is risky). Still everything is test first.
Now I am looking to one of the risk and thinking - this one has word "AND" and checks different fields. It seems that I can split it to two object and create one more CompositeAndRisk which would apply And on both splitted risks. This way I could construct DSL for risks decision tree (seems nice because risks rules could changes a lot).
So what I should do with risk's I split tests? Should I rename i to CompositeAndRiskTest? should I delete it?, should I write test for splitClasses?
First of all, I suggest that you turn the CompositeRisk class into an interface, and have two separate subclasses of it: CompositeOrRisk and CompositeAndRisk. This is just about the design though.
Regarding your question, I believe there's no single right answer, so let me share how I see it.
As you know, in TDD there are concrete steps you follow (that comprise the TDD cycle), and there's a specific state the tests should be at in between each of them. Here's what I mean:
[State = No tests]
1. Write a test that fails
[State = Test fails]
2. Write as little code as possible in order for the test to pass
[State = Test passes]
3. Refactor
[State = Test still passes]
Given that this is what we aim for in TDD, I would do the changes you're talking about in the refactoring phase, including refactoring the tests accordingly.
This means that if I'm splitting a class, I'll be splitting the relevant test as well. At no point should the tests fail, as I'm only changing the structure of the code, not what it does (this is the meaning of refactoring after all).
If you have a larger change to do though, I would go about creating a new class from scratch (TDD of course), and later on, remove the no longer needed functionality from the old class, as well as the now redundant test cases.
The approach I'd take in this case is "play it innocent" -- when you discover a new requirement, just write a test and the implementation for it, pretending to ignore the relationship with previous requirements at first.
The "And" case here is clearly new functionality. No need to modify the contents of the existing test at that point, just create another test with a name that reflects the new requirement, such as CompositeAndRiskTest and create the corresponding implementation.
Then, during the Refactor step, "realize" that the two previous objects are 2 sides of the same coin and refactor them accordingly. That could just mean renaming CompositeRisk to CompositeOrRisk, or more complex things.
Once the 2 sorts of Risks are identified, tested and implemented, you could go on and create new tests for combinations of them.

Complex RSpec Argument Testing

I have a method I want to test that is being called with a particular object. However, identifying this object is somewhat complex because I'm not interested in a particular instance of the object, but one that conforms to certain conditions.
For example, my code might be like:
some_complex_object = ComplexObject.generate(params)
my_function(some_complex_object)
And in my test I want to check that
test_complex_object = ComplexObject.generate(test_params)
subject.should_receive(:my_function).with(test_complex_object)
But I definitely know that ComplexObject#== will return false when comparing some_complex_object and test_complex_object because that is desired behavior (elsewhere I rely on standard == behavior and so don't want to overload it for ComplexObject).
I would argue that it's a problem that I find myself needing to write a test such as this, and would prefer to restructure the code so that I don't need to write such a test, but that's unfortunately a much larger task that would require rewriting a lot of existing code and so is a longer term fix that I want to get to but can't do right now within time constraints.
Is there a way with Rspec to be able to do a more complex comparison between arguments in a test? Ideally I'd like something where I could use a block so I can write arbitrary comparison code.
See https://www.relishapp.com/rspec/rspec-mocks/v/2-7/docs/argument-matchers for information on how you can provide a block to do arbitrary analysis of the arguments passed to a method, as in:
expect(subject).to receive(:my_function) do |test_complex_object|
# code setting expectations on test_complex_object
end
You can also define a custom matcher which will let you evaluate objects to see if that satisfy the condition, as described in https://www.relishapp.com/rspec/rspec-expectations/v/2-3/docs/custom-matchers/define-matcher

Jasmine: Why toBeUndefined and not.toBeDefined?

I'm just learning the Jasmine library, and I've noticed that Jasmine has a very limited number of built-in assertions. I've also noticed that, despite having such a limited number, two of its assertions appear to be redundant: toBeDefined/toBeUndefined.
In other words, both of these would seem to check for the same exact thing:
expect(1).toBeDefined();
expect(undefined).not.toBeUndefined();
Is there some reason for this, like a case where toBeDefined isn't the same as toBeUndefined? Or is this just the one assertion in Jasmine that has two perfectly equal ways of being invoked?
One might assume the same for toBeTruthy and toBeFalsy, or toBeLessThan and toBeGreaterThan (although I guess the missing assert from the last two is toEqual). In the end it comes down to readability and user preference.
To give you a more complete answer, it might be useful to take a look at the code that is invoked for these functions. The code that is executed goes through separate paths (so toBeUndefined is not simply !toBeDefined). The only real answer that makes sense is readability (or giving in to annoying feature requests). https://github.com/jasmine/jasmine/tree/main/src/core/matchers

Can I ensure all tests contain an assertion in test/unit?

With test/unit, and minitest, is it possible to fail any test that doesn't contain an assertion, or would monkey-patching be required (for example, checking if the assertion count increased after each test was executed)?
Background: I shouldn't write unit tests without assertions - at a minimum, I should use assert_nothing_raised if I'm smoke testing to indicate that I'm smoke testing.
Usually I write tests that fail first, but I'm writing some regression tests. Alternatively, I could supply an incorrect expected value to see if the test is comparing the expected and actual value.
To ensure unit tests actually verify anything a technique called Mutation testing is used.
For Ruby, you can take a look at Mutant.
As PK's link points out too, the presence of assertions in itself doesn't mean the unit test is meaningful and correct. I believe there is no automatic replacement for careful thinking and awareness.
Ensuring the tests fail first is a good practice, which should be made into a habit.
Apart from the things you mention, I often set wrong values in asserts in new tests, to check that the test really runs and fails. (Then I fix it of course :-) This is less obtrusive than editing the production code.
I don't really think that forcing the test to fail without an assert is really helpful. Having an assert in a test is not a goal in itself - the goal is to write useful tests.
The missing assert is just an indication that the test may not be useful. The interesting question is: Will the test fail if something breaks?. If it doesn't, it's obviously useless.
If all you're testing for is that the code doesn't crash, then assert_nothing_raised around it is just a kind of comment. But testing for "no explosions" probably indicates a weak test in itself. In most cases, it doesn't give you any useful information about your code (because "no crash != correct"), so why did you write the test in the first place? Plus I rather prefer a method that explodes properly to one that just returns a wrong result.
I found the best regression test come from the field: Bang your app (or have your tester do it), and for each problem you find write a test that fails. Fix it, and have the test pass.
Otherwise I'd test the behavior, not the absence of crashes. In the case that I have "empty" tests (meaning that I didn't write the test code yet), I usually put a #flunk inside to remind me.

Is it a bad practice to randomly-generate test data?

Since I've started using rspec, I've had a problem with the notion of fixtures. My primary concerns are this:
I use testing to reveal surprising behavior. I'm not always clever enough to enumerate every possible edge case for the examples I'm testing. Using hard-coded fixtures seems limiting because it only tests my code with the very specific cases that I've imagined. (Admittedly, my imagination is also limiting with respect to which cases I test.)
I use testing to as a form of documentation for the code. If I have hard-coded fixture values, it's hard to reveal what a particular test is trying to demonstrate. For example:
describe Item do
describe '#most_expensive' do
it 'should return the most expensive item' do
Item.most_expensive.price.should == 100
# OR
#Item.most_expensive.price.should == Item.find(:expensive).price
# OR
#Item.most_expensive.id.should == Item.find(:expensive).id
end
end
end
Using the first method gives the reader no indication what the most expensive item is, only that its price is 100. All three methods ask the reader to take it on faith that the fixture :expensive is the most expensive one listed in fixtures/items.yml. A careless programmer could break tests by creating an Item in before(:all), or by inserting another fixture into fixtures/items.yml. If that is a large file, it could take a long time to figure out what the problem is.
One thing I've started to do is add a #generate_random method to all of my models. This method is only available when I am running my specs. For example:
class Item
def self.generate_random(params={})
Item.create(
:name => params[:name] || String.generate_random,
:price => params[:price] || rand(100)
)
end
end
(The specific details of how I do this are actually a bit cleaner. I have a class that handles the generation and cleanup of all models, but this code is clear enough for my example.) So in the above example, I might test as follows. A warning for the feint of heart: my code relies heavily on use of before(:all):
describe Item do
describe '#most_expensive' do
before(:all) do
#items = []
3.times { #items << Item.generate_random }
#items << Item.generate_random({:price => 50})
end
it 'should return the most expensive item' do
sorted = #items.sort { |a, b| b.price <=> a.price }
expensive = Item.most_expensive
expensive.should be(sorted[0])
expensive.price.should >= 50
end
end
end
This way, my tests better reveal surprising behavior. When I generate data this way, I occasionally stumble upon an edge case where my code does not behave as expected, but which I wouldn't have caught if I were only using fixtures. For example, in the case of #most_expensive, if I forgot to handle the special case where multiple items share the most expensive price, my test would occasionally fail at the first should. Seeing the non-deterministic failures in AutoSpec would clue me in that something was wrong. If I were only using fixtures, it might take much longer to discover such a bug.
My tests also do a slightly better job of demonstrating in code what the expected behavior is. My test makes it clear that sorted is an array of items sorted in descending order by price. Since I expect #most_expensive to be equal to the first element of that array, it's even more obvious what the expected behavior of most_expensive is.
So, is this a bad practice? Is my fear of fixtures an irrational one? Is writing a generate_random method for each Model too much work? Or does this work?
I'm surprised no one in this topic or in the one Jason Baker linked to mentioned
Monte Carlo Testing. That's the only time I've extensively used randomized test inputs. However, it was very important to make the test reproducible, by having a constant seed for the random number generator for each test case.
This is an answer to your second point:
(2) I use testing to as a form of documentation for the code. If I have hard-coded fixture values, it's hard to reveal what a particular test is trying to demonstrate.
I agree. Ideally spec examples should be understandable by themselves. Using fixtures is problematic, because it splits the pre-conditions of the example from its expected results.
Because of this, many RSpec users have stopped using fixtures altogether. Instead, construct the needed objects in the spec example itself.
describe Item, "#most_expensive" do
it 'should return the most expensive item' do
items = [
Item.create!(:price => 100),
Item.create!(:price => 50)
]
Item.most_expensive.price.should == 100
end
end
If your end up with lots of boilerplate code for object creation, you should take a look at some of the many test object factory libraries, such as factory_girl, Machinist, or FixtureReplacement.
We thought about this a lot on a recent project of mine. In the end, we settled on two points:
Repeatability of test cases is of paramount importance. If you must write a random test, be prepared to document it extensively, because if/when it fails, you will need to know exactly why.
Using randomness as a crutch for code coverage means you either don't have good coverage or you don't understand the domain enough to know what constitutes representative test cases. Figure out which is true and fix it accordingly.
In sum, randomness can often be more trouble than it's worth. Consider carefully whether you're going to be using it correctly before you pull the trigger. We ultimately decided that random test cases were a bad idea in general and to be used sparingly, if at all.
Lots of good information has already been posted, but see also: Fuzz Testing. Word on the street is that Microsoft uses this approach on a lot of their projects.
My experience with testing is mostly with simple programs written in C/Python/Java, so I'm not sure if this is entirely applicable, but whenever I have a program that can accept any sort of user input, I always include a test with random input data, or at least input data generated by the computer in an unpredictable way, because you can never make assumptions about what users will enter. Or, well, you can, but if you do then some hacker who doesn't make that assumption may well find a bug that you totally overlooked. Machine-generated input is the best (only?) way I know of to keep human bias completely out of the testing procedures. Of course, in order to reproduce a failed test you have to do something like saving the test input to a file or printing it out (if it's text) before running the test.
Random testing is a bad practice a long as you don't have a solution for the oracle problem, i.e., determining which is the expected outcome of your software given its input.
If you solved the oracle problem, you can get one step further than simple random input generation. You can choose input distributions such that specific parts of your software get exercised more than with simple random.
You then switch from random testing to statistical testing.
if (a > 0)
// Do Foo
else (if b < 0)
// Do Bar
else
// Do Foobar
If you select a and b randomly in int range, you exercise Foo 50% of the time, Bar 25% of the time and Foobar 25% of the time. It is likely that you will find more bugs in Foo than in Bar or Foobar.
If you select a such that it is negative 66.66% of the time, Bar and Foobar get exercised more than with your first distribution. Indeed the three branches get exercised each 33.33% of the time.
Of course, if your observed outcome is different than your expected outcome, you have to log everything that can be useful to reproduce the bug.
I would suggest having a look at Machinist:
http://github.com/notahat/machinist/tree/master
Machinist will generate data for you, but it is repeatable, so each test-run has the same random data.
You could do something similar by seeding the random number generator consistently.
Use of random test data is an excellent practice -- hard-coded test data only tests the cases you explicitly thought of, whereas random data flushes out your implicit assumptions that might be wrong.
I highly recommend using Factory Girl and ffaker for this. (Never use Rails fixtures for anything under any circumstances.)
One problem with randomly generated test cases is that validating the answer should be computed by code and you can't be sure it doesn't have bugs :)
You might also see this topic: Testing with random inputs best practices.
Effectiveness of such testing largely depends on quality of random number generator you use and on how correct is the code that translates RNG's output into test data.
If the RNG never produces values causing your code to get into some edge case condition you will not have this case covered. If your code that translates the RNG's output into input of the code you test is defective it may happen that even with a good generator you still don't hit all the edge cases.
How will you test for that?
The problem with randomness in test cases is that the output is, well, random.
The idea behind tests (especially regression tests) is to check that nothing is broken.
If you find something that is broken, you need to include that test every time from then on, otherwise you won't have a consistent set of tests. Also, if you run a random test that works, then you need to include that test, because its possible that you may break the code so that the test fails.
In other words, if you have a test which uses random data generated on the fly, I think this is a bad idea. If however, you use a set of random data, WHICH YOU THEN STORE AND REUSE, this may be a good idea. This could take the form of a set of seeds for a random number generator.
This storing of the generated data allows you to find the 'correct' response to this data.
So, I would recommend using random data to explore your system, but use defined data in your tests (which may have originally been randomly generated data)

Resources