I'm working on a Tic Tac Toe and I have a #negamax method that returns the best position to computer to move to, and a #winner method that returns 1 (computer wins) or -1 (user wins). How can I test #negamax so that I guarantee that its implementation is right and that user never wins?
I have a few test cases in places, to test that it returns the best position, and it does, but it does not cover all possible cases. Right now, this is what I have (besides the test cases for the best choice):
it 'never allows user to win' do
until game_over?
unless is_ai?
pos = empty_positions.sample
move(pos, user)
else
pos = negamax(0, 1, -100, 100)
move(pos, computer)
end
end
if game.won?
expect(winner).to eq(1)
else
expect(winner).to be_nil
end
end
It does not seem very effective to just 'hope' that the test will never fail. What would be a better way to accomplish it?
but it does not cover all possible cases.
Don't worry, this is normal, it's nearly impossible to simulate all the ways an application will be used. Testing some things can lead to huge increases in results. Testing “everything” is a waste of time. That’s because “everything” doesn’t matter. Only certain things matter. The right things matter.
Related
I'm trying to write something that simulates the Martingale betting system. If you're not familiar with this, it's a "sure thing!" (not a sure thing) betting system for coin-toss games where you double your bet each time you lose, hoping to win back all your lost money upon the first win.
So your bets would go $10 -> loss -> $20 -> loss -> $40 -> loss -> $80 -> win! -> $10...
Simple, right? I figure the logic will be:
Have a wallet variable that starts at $1,000.
Make a bet.
Flip a coin with rand(0..1). 0 will be a loss and 1 a win.
If I win, add the bet to my wallet. If I lose, subtract the bet from my wallet, and then issue a new bet for twice the previous.
I write this as:
def flip(bet)
if rand(0..1) == 0 then
$balance += bet
else
$balance -= bet
flip(bet*2)
end
end
Then I run flip(10) a thousand times just to see how effective this betting system is.
The problem is that I always get the exact same results. I'll run the program ten times, and the first five results will always be 1010, 1020, 1030, 1040, 1050... So something's wrong. But I can't really see what; the logic seems fine to me.
Just to test things out, I removed the recursive call, the line flip(bet*2). Instead, I just ran a thousand regular bets. And that behaves the way you'd expect, different results every time.
So what's going on here?
Looking at the logic it looks as if it will recursively bet until you win. So it looks like your balance is going up by 10 every time, hence the "1010, 1020, 1030, 1040, 1050".
If you put a puts $balance before the flip(bet*2) line you can see the balance going up and down.
I guess that's the point of the betting system. I don't think there is anything wrong with the random part of the method.
Your result is exactly what you expect with "sure thing" betting, because you allow $balance to go negative, so the better is not limited in any way (effectively they have infinite resources). The strategy will always exit $10 up on last balance, due to losing e.g. 10,20,40 dollars, then adding 80. Because you allow negative balance, the better is allowed to continue this - whilst a model could notice if they lost 6 games in a row (1 in 64 chance), then they would be down to $370, and not able to make the next bet at $640.
Add something to catch running out of money, and you should see a difference in how many bets it will take before that happens, or what the losing value of $balance is (i.e. you can demonstrate this way that the "sure thing" strategy is flawed - because for every 63 wins of $10, there is a single loss of $630 to perfectly balance it)
I'm running through a very basic challenge at Code Wars. The challenge is to test-drive a method that returns the sum of an array of squares.
So far, my tests are:
describe "square method" do
it "should return the square of a number" do
Test.assert_equals(squareSum(4), [16])
end
it "should return the square of multiple numbers" do
Test.assert_equals(squareSum(4, 2, 3), [16, 4, 9])
end
end
and my code is:
def squareSum(*numbers)
numbers.map { |num| num ** 2 }
end
Now I'm at the point where I need to change it so that it adds the sum. Which, in my mind, necessarily negates the two previous tests. As far as TDD best practices go, was I being ridiculous testing those first two scenarios, given that they aren't what I'm trying to get the method to do? How should I proceed with my next test?
Should I:
delete the previous two tests, since they will fail once I change the method?
find a way to make it so that the two previous tests don't fail even once I've changed it?
In approaching this problem, should I have not worried about the first two tests? I am having a fair amount of difficulty phrasing this question. Basically, what I know I want to end up with is:
describe "squareSum method" do
it "should return the sum of the squares of the numbers passed" do
Test.assert_equals(squareSum(1, 2, 2), 9)
end
end
with the code to make it work. I'm just wondering what the best practices are in regards to test-driving this particular kind of problem, given that I wanted to test that I could return squares for multiple numbers before returning the sum. My "final" code will render the initial tests obsolete. This is a, "How much of my work should be present in the final solution?", picky and kind of anal-retentive question, I think. But I am curious.
Since the tests are specifications for the software you intend to write, the question is why you wrote a specification for something you didn't want (since the task of the challenge was not write a function that squares its arguments)?
You should have written a specification for "a method that returns the sum of an array of squares" in the first place. This test starts out being red.
Then you may decide that you need a function that squares its arguments (or the elements of a given array) as an intermediate step. Write a test for this function. Make it green by implementing such a function.
Eventually put all the things together and make your initial test green (your main function could use the helper function and sum-up its return values).
No you can refactor. If during refactoring you decide that you no longer need the helper function: delete the function and delete its tests.
Also when your specifications are changing, you need to rewrite your tests, write new ones or even delete some of them.
But the general rule is: The tests are always a specification for the current state of your software. They should specify exactly what your software is intended to to. Nothing more, nothing less.
In the book "Programming Ruby 1.9/2.0" the author gives an example of a Tennis Scorer class that will be developed by writing some RSpec tests before the actual code.
The author introduces 4 tests:
it "should start with a score of 0-0"
it "should be 15-0 if the server wins a point"
it "should be 0-15 if the receiver wins a point"
it "should be 15-15 after they both win a point"
and then the author suggests that the reader should go ahead and complete the class by writing tests like this:
it "should be 40-0 after the server wins three points"
it "should be W-L after the server wins four points"
it "should be L-W after the receiver wins four points"
it "should be Deuce after each wins three points"
it "should be A-server after each wins three points and the server gets one more"
(The actual TennisScorer Class adds scores for each player and returns them in a format like "15-15").
Does the author assume that the code will work 100% for scores like 30-15, 15-30, 0-30, 30-0, and so forth as long as the test succeeds for 15-0, 0-15, and 15-15? In other words, it's not necessary to test for each possible score explicitly?
The author suggests a 40-0 test, which makes sense because 40 breaks the 0-15-30 convention (score * 15), so does a 40-0 test suffice to show that 40-30, 15-40, etc will work as well?
Also, maybe I'm overcomplicating this, but wouldn't it make more sense to have a "random game" in my test where I add random scores 100000 times and compare the outcome dynamically? (but I guess then my test could contain some bugs easily..?).
I figure that this would be the way to go if I would write a test for a multiplication method for example (or would I then just check if 1*2 = 2 and assume that everything works fine?)
The point with tdd is to have your specs and code grow over time in small increments. So you are supposed to start out with some simple things exactly as outlined above. However as your spec suite grows as well as your code you will feel the need to refactor both the code and the specs. This is natural and as it should be. I would expect the code inside one of your its to be a one line call to a generic method that takes input to the method under test and the expected outcome. At least that is the place I often end up at.
With the spec above the code may not work with 30-15 etc as you are pointing out. It depends on how the the implementation turns out. It would make sense to add some more specs here and reuse the test code beneath.
I would recommend against having randomized specs in most cases because you can't guarantee the outcome. If the code itself has random behavior it may make sense though. I would try to isolate the randomness to one place so that the rest can be tested deterministically.
Consider the following two peices of ruby code
Example 1
name = user.first_name
round_number = rounds.count
users.each do |u|
puts "#{name} beat #{u.first_name} in round #{round_number}"
end
Example 2
users.each do |u|
puts "#{user.first_name} beat #{u.first_name} in #{rounds.count}"
end
For both pieces of code imagine
#user.rb
def first_name
name.split.first
end
So in a classical analysis of algorithms, the first piece of code would be more efficient, however in most modern compiled languages, modern compilers would optimize the second piece of code to make it look like the first, eliminating the need to optimize code in such maner.
Will ruby optimize or cache values for this code before execution? Should my ruby code look like example 1 or example 2?
Example 1 will run faster, as first_name() is only called once, and it's value stored in the variable.
In Example 2 Ruby will not memoize this value automatically, since the value could have changed between iterations for the each() loop.
Therefor expensive-to-calculate methods should be explicitly memoized if they are expected to be used more than once without the return value changing.
Making use of Ruby's Benchmark Module can be useful when making decisions like this. It will likely only be worth memoizing if there are a lot of values in users, or if first_name() is expensive to calculate.
A compiler can only perform this optimization if it can prove that the method has no side effects. This is even more difficult in Ruby than most languages, as everything is mutable and can be overridden at runtime. Whether it happens or not is implementation dependent, but since it's hard to do in Ruby, most do not. I actually don't know of any that do at the time of this posting.
I have some code which delivers things based on weighted random. Things with more weight are more likely to be randomly chosen. Now being a good rubyist I of couse want to cover all this code with tests. And I want to test that things are getting fetched according the correct probabilities.
So how do I test this? Creating tests for something that should be random make it very hard to compare actual vs expected. A few ideas I have, and why they wont work great:
Stub Kernel.rand in my tests to return fixed values. This is cool, but rand() gets called multiple times and I'm not sure I can rig this with enough control to test what I need to.
Fetch a random item a HUGE number of times and compare the actual ratio vs the expected ratio. But unless I can run it an infinite number of times, this will never be perfect and could intermittently fail if I get some bad luck in the RNG.
Use a consistent random seed. This makes the RNG repeatable but it still doesn't give me any verification that item A will happen 80% of the time (for example).
So what kind of approach can I use to write test coverage for random probabilities?
I think you should separate your goals. One is to stub Kernel.rand as you mention. With rspec for example, you can do something like this:
test_values = [1, 2, 3]
Kernel.stub!(:rand).and_return( *test_values )
Note that this stub won't work unless you call rand with Kernel as the receiver. If you just call "rand" then the current "self" will receive the message, and you'll actually get a random number instead of the test_values.
The second goal is to do something like a field test where you actually generate random numbers. You'd then use some kind of tolerance to ensure you get close to the desired percentage. This is never going to be perfect though, and will probably need a human to evaluate the results. But it still is useful to do because you might realize that another random number generator might be better, like reading from /dev/random. Also, it's good to have this kind of test because let's say you decide to migrate to a new kind of platform whose system libraries aren't as good at generating randomness, or there's some bug in a certain version. The test could be a warning sign.
It really depends on your goals. Do you only want to test your weighting algorithm, or also the randomness?
It's best to stub Kernel.rand to return fixed values.
Kernel.rand is not your code. You should assume it works, rather than trying to write tests that test it rather than your code. And using a fixed set of values that you've chosen and explicitly coded in is better than adding a dependency on what rand produces for a specific seed.
If you wanna go down the consistent seed route, look at Kernel#srand:
http://www.ruby-doc.org/core/classes/Kernel.html#M001387
To quote the docs (emphasis added):
Seeds the pseudorandom number
generator to the value of number. If
number is omitted or zero, seeds the
generator using a combination of the
time, the process id, and a sequence
number. (This is also the behavior if
Kernel::rand is called without
previously calling srand, but without
the sequence.) By setting the seed
to a known value, scripts can be made
deterministic during testing. The
previous seed value is returned. Also
see Kernel::rand.
For testing, stub Kernel.rand with the following simple but perfectly reasonable LCPRNG:
##q = 0
def r
##q = 1_103_515_245 * ##q + 12_345 & 0xffff_ffff
(##q >> 2) / 0x3fff_ffff.to_f
end
You might want to skip the division and use the integer result directly if your code is compatible, as all bits of the result would then be repeatable instead of just "most of them". This isolates your test from "improvements" to Kernel.rand and should allow you to test your distribution curve.
My suggestion: Combine #2 and #3. Set a random seed, then run your tests a very large number of times.
I do not like #1, because it means your test is super-tightly coupled to your implementation. If you change how you are using the output of rand(), the test will break, even if the result is correct. The point of a unit test is that you can refactor the method and rely on the test to verify that it still works.
Option #3, by itself, has the same problem as #1. If you change how you use rand(), you will get different results.
Option #2 is the only way to have a true black box solution that does not rely on knowing your internals. If you run it a sufficiently high number of times, the chance of random failure is negligible. (You can dig up a stats teacher to help you calculate "sufficiently high," or you can just pick a really big number.)
But if you're hyper-picky and "negligible" isn't good enough, a combination of #2 and #3 will ensure that once the test starts passing, it will keep passing. Even that negligible risk of failure only crops up when you touch the code under test; as long as you leave the code alone, you are guaranteed that the test will always work correctly.
Pretty often when I need predictable results from something that is derived from a random number I usually want control of the RNG, which means that the easiest is to make it injectable. Although overriding/stubbing rand can be done, Ruby provides a fine way to pass your code a RNG that is seeded with some value:
def compute_random_based_value(input_value, random: Random.new)
# ....
end
and then inject a Random object I make on the spot in the test, with a known seed:
rng = Random.new(782199) # Scientific dice roll
compute_random_based_value(your_input, random: rng)