ruby tdd best practice in dealing with obsolete tests - ruby

I'm running through a very basic challenge at Code Wars. The challenge is to test-drive a method that returns the sum of an array of squares.
So far, my tests are:
describe "square method" do
it "should return the square of a number" do
Test.assert_equals(squareSum(4), [16])
end
it "should return the square of multiple numbers" do
Test.assert_equals(squareSum(4, 2, 3), [16, 4, 9])
end
end
and my code is:
def squareSum(*numbers)
numbers.map { |num| num ** 2 }
end
Now I'm at the point where I need to change it so that it adds the sum. Which, in my mind, necessarily negates the two previous tests. As far as TDD best practices go, was I being ridiculous testing those first two scenarios, given that they aren't what I'm trying to get the method to do? How should I proceed with my next test?
Should I:
delete the previous two tests, since they will fail once I change the method?
find a way to make it so that the two previous tests don't fail even once I've changed it?
In approaching this problem, should I have not worried about the first two tests? I am having a fair amount of difficulty phrasing this question. Basically, what I know I want to end up with is:
describe "squareSum method" do
it "should return the sum of the squares of the numbers passed" do
Test.assert_equals(squareSum(1, 2, 2), 9)
end
end
with the code to make it work. I'm just wondering what the best practices are in regards to test-driving this particular kind of problem, given that I wanted to test that I could return squares for multiple numbers before returning the sum. My "final" code will render the initial tests obsolete. This is a, "How much of my work should be present in the final solution?", picky and kind of anal-retentive question, I think. But I am curious.

Since the tests are specifications for the software you intend to write, the question is why you wrote a specification for something you didn't want (since the task of the challenge was not write a function that squares its arguments)?
You should have written a specification for "a method that returns the sum of an array of squares" in the first place. This test starts out being red.
Then you may decide that you need a function that squares its arguments (or the elements of a given array) as an intermediate step. Write a test for this function. Make it green by implementing such a function.
Eventually put all the things together and make your initial test green (your main function could use the helper function and sum-up its return values).
No you can refactor. If during refactoring you decide that you no longer need the helper function: delete the function and delete its tests.
Also when your specifications are changing, you need to rewrite your tests, write new ones or even delete some of them.
But the general rule is: The tests are always a specification for the current state of your software. They should specify exactly what your software is intended to to. Nothing more, nothing less.

Related

How to use RSpec to test negamax?

I'm working on a Tic Tac Toe and I have a #negamax method that returns the best position to computer to move to, and a #winner method that returns 1 (computer wins) or -1 (user wins). How can I test #negamax so that I guarantee that its implementation is right and that user never wins?
I have a few test cases in places, to test that it returns the best position, and it does, but it does not cover all possible cases. Right now, this is what I have (besides the test cases for the best choice):
it 'never allows user to win' do
until game_over?
unless is_ai?
pos = empty_positions.sample
move(pos, user)
else
pos = negamax(0, 1, -100, 100)
move(pos, computer)
end
end
if game.won?
expect(winner).to eq(1)
else
expect(winner).to be_nil
end
end
It does not seem very effective to just 'hope' that the test will never fail. What would be a better way to accomplish it?
but it does not cover all possible cases.
Don't worry, this is normal, it's nearly impossible to simulate all the ways an application will be used. Testing some things can lead to huge increases in results. Testing “everything” is a waste of time. That’s because “everything” doesn’t matter. Only certain things matter. The right things matter.

Even when using the same randomseed in Lua, get different results?

I have a large, rather complicated procedural content generation lua project. One thing I want to be able to do, for debugging purposes, is use a random seed so that I can re-run the system & get the same results.
To the end, I print out the seed at the start of a run. The problem is, I still get completely different results each time I run it. Assuming the seed doesn't change anywhere else, this shouldn't be possible, right?
My question is, what other ways are there to influence the output of lua's math.random()? I've searched through all the code in the project, and there's only one place where I call math.randomseed(), and I do that before I do anything else. I don't use the time or date for any calculations, so that wouldn't be influencing the results... What else could I be missing?
Updated on 2/22/16 monkey patching math.random & math.randomseed has, oftentimes (but not always) output the same sequence of random numbers. But still not the same results – so I guess the real question is now: what behavior in lua is indeterminate, and could result in different output when the same code is run in sequence? Noting where it diverges, when it does, is helping me narrow it down, but I still haven't found it. (this code does NOT use coroutines, so I don't think it's a threading / race condition issue)
randomseed is using srandom/srand function, which "sets its argument as the seed for a new sequence of pseudo-random integers to be returned by random()".
I can offer several possible explanations:
you think you call randomseed, but you do not (random will initialize the sequence for you in this case).
you think you call randomseed once, but you call it multiple times (or some other part of the code calls randomseed as well, possibly at different times in your sequence).
some other part of the code calls random (some number of times), which generates different results for your part of the code.
there is nothing wrong with the generated sequence, but you are misinterpreting the results.
your version of Lua has a bug in srandom/random processing.
there is something wrong with srandom or random function in your system.
Having some information about your version of Lua and your system (in addition to the small example demonstrating the issue) would help in figuring out what's causing this.
Updated on 2016/2/22: It should be fairly easy to check; monkeypatch both math.randomseed and math.random and log all the calls and the values returned by the functions for two subsequent runs. Compare the results. If the results differ, you should be able to isolate why they differ and reproduce on a smaller example. You can also look at where the functions are called from using debug.traceback.
Correct, as stated in the documentation, 'equal seeds produce equal sequences of numbers.'
Immediately after setting the seed to a known constant value, output a call to rand - if this varies across runs, you know something is seriously wrong (corrupt library download, whack install, gamma ray hit your drive, etc).
Assuming that the first value matches across runs, add another output midway through the code. From there, you can use a binary search to zero in on where things go wrong (I.E. first half or second half of the code block in question).
While you can & should use some intuition to find the error as you go, keep in mind that if intuition alone was enough, you would have already found it, thus a bit of systematic elimination is warranted.
Revision to cover comment regarding array order:
If possible, use debugging tools. This SO post on detecting when the value of a Lua variable changes might help.
In the absence of tools, here's one way to roll your own for this problem:
A full debugging dump of any sizable array quickly becomes a mess that makes it tough to spot changes. Instead, I'd use a few extra variables & a test function to keep things concise.
Make two deep copies of the array. Let's call them debug01 & debug02 & call the original array original. Next, deliberately swap the order of two elements in debug02.
Next, build a function to compare two arrays & test if their elements match up & return / print the index of the first mismatch if they do not. Immediately after initializing the arrays, test them to ensure:
original & debug01 match
original & debug02 do not match
original & debug02 mismatch where you changed them
I cannot stress enough the insanity of using an unverified (and thus, potentially bugged) test function to track down bugs.
Once you've verified the function works, you can again use a binary search to zero in on where things go off the rails. As before, balance the use of a systematic search with your intuition.

RSpec Example - Tennis Soccer class

In the book "Programming Ruby 1.9/2.0" the author gives an example of a Tennis Scorer class that will be developed by writing some RSpec tests before the actual code.
The author introduces 4 tests:
it "should start with a score of 0-0"
it "should be 15-0 if the server wins a point"
it "should be 0-15 if the receiver wins a point"
it "should be 15-15 after they both win a point"
and then the author suggests that the reader should go ahead and complete the class by writing tests like this:
it "should be 40-0 after the server wins three points"
it "should be W-L after the server wins four points"
it "should be L-W after the receiver wins four points"
it "should be Deuce after each wins three points"
it "should be A-server after each wins three points and the server gets one more"
(The actual TennisScorer Class adds scores for each player and returns them in a format like "15-15").
Does the author assume that the code will work 100% for scores like 30-15, 15-30, 0-30, 30-0, and so forth as long as the test succeeds for 15-0, 0-15, and 15-15? In other words, it's not necessary to test for each possible score explicitly?
The author suggests a 40-0 test, which makes sense because 40 breaks the 0-15-30 convention (score * 15), so does a 40-0 test suffice to show that 40-30, 15-40, etc will work as well?
Also, maybe I'm overcomplicating this, but wouldn't it make more sense to have a "random game" in my test where I add random scores 100000 times and compare the outcome dynamically? (but I guess then my test could contain some bugs easily..?).
I figure that this would be the way to go if I would write a test for a multiplication method for example (or would I then just check if 1*2 = 2 and assume that everything works fine?)
The point with tdd is to have your specs and code grow over time in small increments. So you are supposed to start out with some simple things exactly as outlined above. However as your spec suite grows as well as your code you will feel the need to refactor both the code and the specs. This is natural and as it should be. I would expect the code inside one of your its to be a one line call to a generic method that takes input to the method under test and the expected outcome. At least that is the place I often end up at.
With the spec above the code may not work with 30-15 etc as you are pointing out. It depends on how the the implementation turns out. It would make sense to add some more specs here and reuse the test code beneath.
I would recommend against having randomized specs in most cases because you can't guarantee the outcome. If the code itself has random behavior it may make sense though. I would try to isolate the randomness to one place so that the rest can be tested deterministically.

Ruby: Using rand() in code but writing tests to verify probabilities

I have some code which delivers things based on weighted random. Things with more weight are more likely to be randomly chosen. Now being a good rubyist I of couse want to cover all this code with tests. And I want to test that things are getting fetched according the correct probabilities.
So how do I test this? Creating tests for something that should be random make it very hard to compare actual vs expected. A few ideas I have, and why they wont work great:
Stub Kernel.rand in my tests to return fixed values. This is cool, but rand() gets called multiple times and I'm not sure I can rig this with enough control to test what I need to.
Fetch a random item a HUGE number of times and compare the actual ratio vs the expected ratio. But unless I can run it an infinite number of times, this will never be perfect and could intermittently fail if I get some bad luck in the RNG.
Use a consistent random seed. This makes the RNG repeatable but it still doesn't give me any verification that item A will happen 80% of the time (for example).
So what kind of approach can I use to write test coverage for random probabilities?
I think you should separate your goals. One is to stub Kernel.rand as you mention. With rspec for example, you can do something like this:
test_values = [1, 2, 3]
Kernel.stub!(:rand).and_return( *test_values )
Note that this stub won't work unless you call rand with Kernel as the receiver. If you just call "rand" then the current "self" will receive the message, and you'll actually get a random number instead of the test_values.
The second goal is to do something like a field test where you actually generate random numbers. You'd then use some kind of tolerance to ensure you get close to the desired percentage. This is never going to be perfect though, and will probably need a human to evaluate the results. But it still is useful to do because you might realize that another random number generator might be better, like reading from /dev/random. Also, it's good to have this kind of test because let's say you decide to migrate to a new kind of platform whose system libraries aren't as good at generating randomness, or there's some bug in a certain version. The test could be a warning sign.
It really depends on your goals. Do you only want to test your weighting algorithm, or also the randomness?
It's best to stub Kernel.rand to return fixed values.
Kernel.rand is not your code. You should assume it works, rather than trying to write tests that test it rather than your code. And using a fixed set of values that you've chosen and explicitly coded in is better than adding a dependency on what rand produces for a specific seed.
If you wanna go down the consistent seed route, look at Kernel#srand:
http://www.ruby-doc.org/core/classes/Kernel.html#M001387
To quote the docs (emphasis added):
Seeds the pseudorandom number
generator to the value of number. If
number is omitted or zero, seeds the
generator using a combination of the
time, the process id, and a sequence
number. (This is also the behavior if
Kernel::rand is called without
previously calling srand, but without
the sequence.) By setting the seed
to a known value, scripts can be made
deterministic during testing. The
previous seed value is returned. Also
see Kernel::rand.
For testing, stub Kernel.rand with the following simple but perfectly reasonable LCPRNG:
##q = 0
def r
##q = 1_103_515_245 * ##q + 12_345 & 0xffff_ffff
(##q >> 2) / 0x3fff_ffff.to_f
end
You might want to skip the division and use the integer result directly if your code is compatible, as all bits of the result would then be repeatable instead of just "most of them". This isolates your test from "improvements" to Kernel.rand and should allow you to test your distribution curve.
My suggestion: Combine #2 and #3. Set a random seed, then run your tests a very large number of times.
I do not like #1, because it means your test is super-tightly coupled to your implementation. If you change how you are using the output of rand(), the test will break, even if the result is correct. The point of a unit test is that you can refactor the method and rely on the test to verify that it still works.
Option #3, by itself, has the same problem as #1. If you change how you use rand(), you will get different results.
Option #2 is the only way to have a true black box solution that does not rely on knowing your internals. If you run it a sufficiently high number of times, the chance of random failure is negligible. (You can dig up a stats teacher to help you calculate "sufficiently high," or you can just pick a really big number.)
But if you're hyper-picky and "negligible" isn't good enough, a combination of #2 and #3 will ensure that once the test starts passing, it will keep passing. Even that negligible risk of failure only crops up when you touch the code under test; as long as you leave the code alone, you are guaranteed that the test will always work correctly.
Pretty often when I need predictable results from something that is derived from a random number I usually want control of the RNG, which means that the easiest is to make it injectable. Although overriding/stubbing rand can be done, Ruby provides a fine way to pass your code a RNG that is seeded with some value:
def compute_random_based_value(input_value, random: Random.new)
# ....
end
and then inject a Random object I make on the spot in the test, with a known seed:
rng = Random.new(782199) # Scientific dice roll
compute_random_based_value(your_input, random: rng)

How to use TDD correctly to implement a numerical method?

I am trying to use Test Driven Development to implement my signal processing library. But I have a little doubt: Assume I am trying to implement a sine method (I'm not):
Write the test (pseudo-code)
assertEqual(0, sine(0))
Write the first implementation
function sine(radians)
return 0
Second test
assertEqual(1, sine(pi))
At this point, should I:
implement a smart code that will work for pi and other values, or
implement the dumbest code that will work only for 0 and pi?
If you choose the second option, when can I jump to the first option? I will have to do it eventually...
At this point, should I:
implement real code that will work outside the two simple tests?
implement more dumbest code that will work only for the two simple tests?
Neither. I'm not sure where you got the "write just one test at a time" approach from, but it sure is a slow way to go.
The point is to write clear tests and use that clear testing to design your program.
So, write enough tests to actually validate a sine function. Two tests are clearly inadequate.
In the case of a continuous function, you have to provide a table of known good values eventually. Why wait?
However, testing continuous functions has some problems. You can't follow a dumb TDD procedure.
You can't test all floating-point values between 0 and 2*pi. You can't test a few random values.
In the case of continuous functions, a "strict, unthinking TDD" doesn't work. The issue here is that you know your sine function implementation will be based on a bunch of symmetries. You have to test based on those symmetry rules you're using. Bugs hide in cracks and corners. Edge cases and corner cases are part of the implementation and if you unthinkingly follow TDD you can't test that.
However, for continuous functions, you must test the edge and corner cases of the implementation.
This doesn't mean TDD is broken or inadequate. It says that slavish devotion to a "test first" can't work without some thinking about what you real goal is.
In kind of the strict baby-step TDD, you might implement the dumb method to get back to green, and then refactor the duplication inherent in the dumb code (testing for the input value is a kind of duplication between the test and the code) by producing a real algorithm. The hard part about getting a feel for TDD with such an algorithm is that your acceptance tests are really sitting right next to you (the table S. Lott suggests), so you kind of keep an eye on them the whole time. In more typical TDD, the unit is divorced enough from the whole that the acceptance tests can't just be plugged in right there, so you don't start thinking about testing for all scenarios, because all scenarios are not obvious.
Typically, you might have a real algorithm after one or two cases. The important thing about TDD is that it is driving design, not the algorithm. Once you have enough cases to satisfy the design needs, the value in TDD drops significantly. Then the tests more convert into covering corner cases to ensure your algorithm is correct in all aspects you can think of. So, if you are confident in how to build the algorithm, go for it. The kinds of baby steps you are talking about are only appropriate when you are uncertain. By taking such baby steps you start to build out the boundaries of what your code has to cover, even though your implementation isn't actually real yet. But as I said, that is more for when you are uncertain about how to build the algorithm.
Write tests that verify Identities.
For the sin(x) example, think about double-angle formula and half-angle formula.
Open a signal-processing textbook. Find the relevant chapters and implement every single one of those theorems/corollaries as test code applicable for your function. For most signal-processing functions there are identities that must be uphold for the inputs and the outputs. Write tests that verify those identities, regardless of what those inputs might be.
Then think about the inputs.
Divide the implementation process into separate stages. Each stage should have a Goal. The tests for each stage would be to verify that Goal. (Note 1)
The goal of the first stage is to be "roughly correct". For the sin(x) example, this would be like a naive implementation using binary search and some mathematical identities.
The goal of the second stage is to be "accurate enough". You will try different ways of computing the same function and see which one gets better result.
The goal of the third stage is to be "efficient".
(Note 1) Make it work, make it correct, make it fast, make it cheap. - attributed to Alan Kay
I believe the step when you jump to the first option is when you see there are too many "ifs" in your code "just to pass the tests". That wouldn't be the case yet, just with 0 and pi.
You'll feel the code is beginning to smell, and will be willing to refactor it asap. I'm not sure if that's what pure TDD says, but IMHO you do it in the refactor phase (test fail, test pass, refactor cycle). I mean, unless your failing tests ask for a different implementation.
Note that (in NUnit) you can also do
Assert.That(2.1 + 1.2, Is.EqualTo(3.3).Within(0.0005);
when you're dealing with floating-point equality.
One piece of advice I remember reading was to try to refactor out the magic numbers from your implementations.
You should code up all your unit tests in one hit (in my opinion). While the idea of only creating tests specifically covering what has to be tested is correct, your particular specification calls for a functioning sine() function, not a sine() function that works for 0 and PI.
Find a source you trust enough (a mathematician friend, tables at the back of a math book or another program that already has the sine function implemented).
I opted for bash/bc because I'm too lazy to type it all in by hand :-). If it were a sine() function, I'd just run the following program and paste it into the test code. I'd also put a copy of this script in there as a comment as well so I can re-use it if something changes (such as the desired resolution if more than 20 degrees in this case, or the value of PI you want to use).
#!/bin/bash
d=0
while [[ ${d} -le 400 ]] ; do
r=$(echo "3.141592653589 * ${d} / 180" | bc -l)
s=$(echo "s(${r})" | bc -l)
echo "assertNear(${s},sine(${r})); // ${d} deg."
d=$(expr ${d} + 20)
done
This outputs:
assertNear(0,sine(0)); // 0 deg.
assertNear(.34202014332558591077,sine(.34906585039877777777)); // 20 deg.
assertNear(.64278760968640429167,sine(.69813170079755555555)); // 40 deg.
assertNear(.86602540378430644035,sine(1.04719755119633333333)); // 60 deg.
assertNear(.98480775301214683962,sine(1.39626340159511111111)); // 80 deg.
assertNear(.98480775301228458404,sine(1.74532925199388888888)); // 100 deg.
assertNear(.86602540378470305958,sine(2.09439510239266666666)); // 120 deg.
assertNear(.64278760968701194759,sine(2.44346095279144444444)); // 140 deg.
assertNear(.34202014332633131111,sine(2.79252680319022222222)); // 160 deg.
assertNear(.00000000000079323846,sine(3.14159265358900000000)); // 180 deg.
assertNear(-.34202014332484051044,sine(3.49065850398777777777)); // 200 deg.
assertNear(-.64278760968579663575,sine(3.83972435438655555555)); // 220 deg.
assertNear(-.86602540378390982112,sine(4.18879020478533333333)); // 240 deg.
assertNear(-.98480775301200909521,sine(4.53785605518411111111)); // 260 deg.
assertNear(-.98480775301242232845,sine(4.88692190558288888888)); // 280 deg.
assertNear(-.86602540378509967881,sine(5.23598775598166666666)); // 300 deg.
assertNear(-.64278760968761960351,sine(5.58505360638044444444)); // 320 deg.
assertNear(-.34202014332707671144,sine(5.93411945677922222222)); // 340 deg.
assertNear(-.00000000000158647692,sine(6.28318530717800000000)); // 360 deg.
assertNear(.34202014332409511011,sine(6.63225115757677777777)); // 380 deg.
assertNear(.64278760968518897983,sine(6.98131700797555555555)); // 400 deg.
Obviously you will need to map this answer to what your real function is meant to do. My point is that the test should fully validate the behavior of the code in this iteration. If this iteration was to produce a sine() function that only works for 0 and PI, then that's fine. But that would be a serious waste of an iteration in my opinion.
It may be that your function is so complex that it must be done over several iterations. Then your approach two is correct and the tests should be updated in the next iteration where you add the extra functionality. Otherwise, find a way to add all the tests for this iteration quickly, then you won't have to worry about switching between real code and test code frequently.
Strictly following TDD, you can first implement the dumbest code that will work. In order to jump to the first option (to implement the real code), add more tests:
assertEqual(tan(x), sin(x)/cos(x))
If you implement more than what is absolutely required by your tests, then your tests will not completely cover your implementation. For example, if you implemented the whole sin() function with just the two tests above, you could accidentally "break" it by returning a triangle function (that almost looks like a sine function) and your tests would not be able to detect the error.
The other thing you will have to worry about for numeric functions is the notion of "equality" and having to deal with the inherent loss of precision in floating point calculations. That's what I thought your question was going to be about after reading just the title. :)
I don't know what language you are using, but when I am dealing with a numeric method, I typically write a simple test like yours first to make sure the outline is correct, and then I feed more values to cover cases where I suspect things might go wrong. In .NET, NUnit 2.5 has a nice feature for this, called [TestCase], where you can feed multiple input values to the same test like this:
[TestCase(1,2,Result=3)]
[TestCase(1,1,Result=2)]
public int CheckAddition(int a, int b)
{
return a+b;
}
Short answer.
Write one test at a time.
Once it fails, Get back to green first. If that means doing the simplest thing that can work, do it. (Option 2)
Once you're in the green, you can look at the code and choose to cleanup (option1). Or you can say that the code still doesn't smell that much and write subsequent tests that put the spotlight on the smells.
Another question you seem to have, is how many tests should you write. You need to test till fear (the function may not work) turns into boredom. So once you've tested for all the interesting input-output combinations, you're done.

Resources