Ruby: Using rand() in code but writing tests to verify probabilities - ruby

I have some code which delivers things based on weighted random. Things with more weight are more likely to be randomly chosen. Now being a good rubyist I of couse want to cover all this code with tests. And I want to test that things are getting fetched according the correct probabilities.
So how do I test this? Creating tests for something that should be random make it very hard to compare actual vs expected. A few ideas I have, and why they wont work great:
Stub Kernel.rand in my tests to return fixed values. This is cool, but rand() gets called multiple times and I'm not sure I can rig this with enough control to test what I need to.
Fetch a random item a HUGE number of times and compare the actual ratio vs the expected ratio. But unless I can run it an infinite number of times, this will never be perfect and could intermittently fail if I get some bad luck in the RNG.
Use a consistent random seed. This makes the RNG repeatable but it still doesn't give me any verification that item A will happen 80% of the time (for example).
So what kind of approach can I use to write test coverage for random probabilities?

I think you should separate your goals. One is to stub Kernel.rand as you mention. With rspec for example, you can do something like this:
test_values = [1, 2, 3]
Kernel.stub!(:rand).and_return( *test_values )
Note that this stub won't work unless you call rand with Kernel as the receiver. If you just call "rand" then the current "self" will receive the message, and you'll actually get a random number instead of the test_values.
The second goal is to do something like a field test where you actually generate random numbers. You'd then use some kind of tolerance to ensure you get close to the desired percentage. This is never going to be perfect though, and will probably need a human to evaluate the results. But it still is useful to do because you might realize that another random number generator might be better, like reading from /dev/random. Also, it's good to have this kind of test because let's say you decide to migrate to a new kind of platform whose system libraries aren't as good at generating randomness, or there's some bug in a certain version. The test could be a warning sign.
It really depends on your goals. Do you only want to test your weighting algorithm, or also the randomness?

It's best to stub Kernel.rand to return fixed values.
Kernel.rand is not your code. You should assume it works, rather than trying to write tests that test it rather than your code. And using a fixed set of values that you've chosen and explicitly coded in is better than adding a dependency on what rand produces for a specific seed.

If you wanna go down the consistent seed route, look at Kernel#srand:
http://www.ruby-doc.org/core/classes/Kernel.html#M001387
To quote the docs (emphasis added):
Seeds the pseudorandom number
generator to the value of number. If
number is omitted or zero, seeds the
generator using a combination of the
time, the process id, and a sequence
number. (This is also the behavior if
Kernel::rand is called without
previously calling srand, but without
the sequence.) By setting the seed
to a known value, scripts can be made
deterministic during testing. The
previous seed value is returned. Also
see Kernel::rand.

For testing, stub Kernel.rand with the following simple but perfectly reasonable LCPRNG:
##q = 0
def r
##q = 1_103_515_245 * ##q + 12_345 & 0xffff_ffff
(##q >> 2) / 0x3fff_ffff.to_f
end
You might want to skip the division and use the integer result directly if your code is compatible, as all bits of the result would then be repeatable instead of just "most of them". This isolates your test from "improvements" to Kernel.rand and should allow you to test your distribution curve.

My suggestion: Combine #2 and #3. Set a random seed, then run your tests a very large number of times.
I do not like #1, because it means your test is super-tightly coupled to your implementation. If you change how you are using the output of rand(), the test will break, even if the result is correct. The point of a unit test is that you can refactor the method and rely on the test to verify that it still works.
Option #3, by itself, has the same problem as #1. If you change how you use rand(), you will get different results.
Option #2 is the only way to have a true black box solution that does not rely on knowing your internals. If you run it a sufficiently high number of times, the chance of random failure is negligible. (You can dig up a stats teacher to help you calculate "sufficiently high," or you can just pick a really big number.)
But if you're hyper-picky and "negligible" isn't good enough, a combination of #2 and #3 will ensure that once the test starts passing, it will keep passing. Even that negligible risk of failure only crops up when you touch the code under test; as long as you leave the code alone, you are guaranteed that the test will always work correctly.

Pretty often when I need predictable results from something that is derived from a random number I usually want control of the RNG, which means that the easiest is to make it injectable. Although overriding/stubbing rand can be done, Ruby provides a fine way to pass your code a RNG that is seeded with some value:
def compute_random_based_value(input_value, random: Random.new)
# ....
end
and then inject a Random object I make on the spot in the test, with a known seed:
rng = Random.new(782199) # Scientific dice roll
compute_random_based_value(your_input, random: rng)

Related

What is the meaning of the UNPREDICTABLE in random function?

Linguistically I understand the meaning of unpredictable. But, during this time I often find the word predictable in some cases. I usually find these words if I enter an area with several topics, for example:
Math.random vs crypto.getRandomValues in Javascript
Random vs Secure Random numbers
Etc
So what exactly does unpredictable mean in random functions? Then what are the conditions for a random function to be called "unpredictable random function"?
If a value is random, then it means that knowing the previous values in the sequence provides you no information about the next value.
If a value is unpredictable, then there is no "practical" means of determining the next value. It is generally a stronger claim than random.
(The word "practical" here is doing some work. It generally means "within some set of rules about what the attacker may do." If the attacker has full access to the CPU and RAM, then nothing is "unpredictable," but we are generally interested in cases where they do not have this.)
As an example of the difference, the digits of pi are believed to be random (we don't actually know this, but it appears to be true). That means that there is no way to guess, better than chance, the 10,000th digit of pi. It's random. But it's perfectly predictable. Anyone can easily determine its value. So the digits of pi are a perfectly good random sequence, and could even be used effectively to drive a game's behavior where randomness is sufficient, but it won't be a secure random sequence and is useless for cryptographic purposes.
If I went to random.org (which provides very good random numbers), and generated a value, but then used it repeatedly, it would be a random value but also completely predictable.
This predictability can occur when producing the seed of a PRNG. While the PRNG may generate excellent random values, if its seed is predictable then the entire sequence will be known. ("Predictable" here doesn't mean with 100% certainty; any level of certainty better than chance is sufficient.)
As an example of this problem, networking gear has a significant challenge generating an unpredictable seed when first booted, particularly if the networking gear nearby is rebooted at the same time. Whatever process you use to create a random value can easily fall into a small set of likely values ("small" compared to all the possible values; it may still be in the millions, but that's not many values in cryptography). This is a problem that can require significant effort to resolve in high-security systems.
Most cryptographic systems do not define how these initial, unpredictable values are to be generated. They're just an assumed input to the system.
Predictable is when the seed itself is from something that can be predicted, like the time for example in python random library:
import random, time
random.seed(time.time())
r1 = random.randrange(1e49, 1e50-1)
random.seed(time.time())
r2 = random.randrange(1e49, 1e50-1)
print(r1)
print(r2)
The output here will be the same.
Unpredictable would be when a random number has really high entropy, so that none could really find the initial seed and track down the random algorithm that was used.

How to use tf.session.run() for testing (not updating network's parameters)?

Normally I use tf.session.run() for training my networks, and use eval() for getting test accuracy or loss. But I see people also use session.run() for getting testing result, which is very strange for me. I thought tf.session.run() is only for training, not testing.
Is there any secret under tf.session.run() that I didn't know?
Thank you very much!
tf.session.run() is meant to run one or several TF operations, or evaluate TF tensors, possibly even mixing of these two categories.
When called on a tensor, it will basically evaluate it just like eval(). You can use it for training, with sess.run(train_op, feed_dict=train_data), which will update your variable values, because this is what train_op does. However, if you call sess.run(accuracy, feed_dict=data), it will evaluate the value of the accuracy tensor when the input values are those given by data, regardless of whether this is training, validation or testing data, and it will not change the variable values, since you're just evaluating a tensor, not running an operation that changes variable values.
So tf.session.run() is much broader than you thought indeed, it is the usual way to run inference or evaluate your performance, even at test time !
You can even do multiple things at once, which avoids doing the forward pass multiple times on the same input:
_, loss_value, accuracy_value = sess.run([train_op, loss_tensor, accuracy_tensor], feed_dict=data)
print('Loss value: %f' % loss_value)

Even when using the same randomseed in Lua, get different results?

I have a large, rather complicated procedural content generation lua project. One thing I want to be able to do, for debugging purposes, is use a random seed so that I can re-run the system & get the same results.
To the end, I print out the seed at the start of a run. The problem is, I still get completely different results each time I run it. Assuming the seed doesn't change anywhere else, this shouldn't be possible, right?
My question is, what other ways are there to influence the output of lua's math.random()? I've searched through all the code in the project, and there's only one place where I call math.randomseed(), and I do that before I do anything else. I don't use the time or date for any calculations, so that wouldn't be influencing the results... What else could I be missing?
Updated on 2/22/16 monkey patching math.random & math.randomseed has, oftentimes (but not always) output the same sequence of random numbers. But still not the same results – so I guess the real question is now: what behavior in lua is indeterminate, and could result in different output when the same code is run in sequence? Noting where it diverges, when it does, is helping me narrow it down, but I still haven't found it. (this code does NOT use coroutines, so I don't think it's a threading / race condition issue)
randomseed is using srandom/srand function, which "sets its argument as the seed for a new sequence of pseudo-random integers to be returned by random()".
I can offer several possible explanations:
you think you call randomseed, but you do not (random will initialize the sequence for you in this case).
you think you call randomseed once, but you call it multiple times (or some other part of the code calls randomseed as well, possibly at different times in your sequence).
some other part of the code calls random (some number of times), which generates different results for your part of the code.
there is nothing wrong with the generated sequence, but you are misinterpreting the results.
your version of Lua has a bug in srandom/random processing.
there is something wrong with srandom or random function in your system.
Having some information about your version of Lua and your system (in addition to the small example demonstrating the issue) would help in figuring out what's causing this.
Updated on 2016/2/22: It should be fairly easy to check; monkeypatch both math.randomseed and math.random and log all the calls and the values returned by the functions for two subsequent runs. Compare the results. If the results differ, you should be able to isolate why they differ and reproduce on a smaller example. You can also look at where the functions are called from using debug.traceback.
Correct, as stated in the documentation, 'equal seeds produce equal sequences of numbers.'
Immediately after setting the seed to a known constant value, output a call to rand - if this varies across runs, you know something is seriously wrong (corrupt library download, whack install, gamma ray hit your drive, etc).
Assuming that the first value matches across runs, add another output midway through the code. From there, you can use a binary search to zero in on where things go wrong (I.E. first half or second half of the code block in question).
While you can & should use some intuition to find the error as you go, keep in mind that if intuition alone was enough, you would have already found it, thus a bit of systematic elimination is warranted.
Revision to cover comment regarding array order:
If possible, use debugging tools. This SO post on detecting when the value of a Lua variable changes might help.
In the absence of tools, here's one way to roll your own for this problem:
A full debugging dump of any sizable array quickly becomes a mess that makes it tough to spot changes. Instead, I'd use a few extra variables & a test function to keep things concise.
Make two deep copies of the array. Let's call them debug01 & debug02 & call the original array original. Next, deliberately swap the order of two elements in debug02.
Next, build a function to compare two arrays & test if their elements match up & return / print the index of the first mismatch if they do not. Immediately after initializing the arrays, test them to ensure:
original & debug01 match
original & debug02 do not match
original & debug02 mismatch where you changed them
I cannot stress enough the insanity of using an unverified (and thus, potentially bugged) test function to track down bugs.
Once you've verified the function works, you can again use a binary search to zero in on where things go off the rails. As before, balance the use of a systematic search with your intuition.

ruby tdd best practice in dealing with obsolete tests

I'm running through a very basic challenge at Code Wars. The challenge is to test-drive a method that returns the sum of an array of squares.
So far, my tests are:
describe "square method" do
it "should return the square of a number" do
Test.assert_equals(squareSum(4), [16])
end
it "should return the square of multiple numbers" do
Test.assert_equals(squareSum(4, 2, 3), [16, 4, 9])
end
end
and my code is:
def squareSum(*numbers)
numbers.map { |num| num ** 2 }
end
Now I'm at the point where I need to change it so that it adds the sum. Which, in my mind, necessarily negates the two previous tests. As far as TDD best practices go, was I being ridiculous testing those first two scenarios, given that they aren't what I'm trying to get the method to do? How should I proceed with my next test?
Should I:
delete the previous two tests, since they will fail once I change the method?
find a way to make it so that the two previous tests don't fail even once I've changed it?
In approaching this problem, should I have not worried about the first two tests? I am having a fair amount of difficulty phrasing this question. Basically, what I know I want to end up with is:
describe "squareSum method" do
it "should return the sum of the squares of the numbers passed" do
Test.assert_equals(squareSum(1, 2, 2), 9)
end
end
with the code to make it work. I'm just wondering what the best practices are in regards to test-driving this particular kind of problem, given that I wanted to test that I could return squares for multiple numbers before returning the sum. My "final" code will render the initial tests obsolete. This is a, "How much of my work should be present in the final solution?", picky and kind of anal-retentive question, I think. But I am curious.
Since the tests are specifications for the software you intend to write, the question is why you wrote a specification for something you didn't want (since the task of the challenge was not write a function that squares its arguments)?
You should have written a specification for "a method that returns the sum of an array of squares" in the first place. This test starts out being red.
Then you may decide that you need a function that squares its arguments (or the elements of a given array) as an intermediate step. Write a test for this function. Make it green by implementing such a function.
Eventually put all the things together and make your initial test green (your main function could use the helper function and sum-up its return values).
No you can refactor. If during refactoring you decide that you no longer need the helper function: delete the function and delete its tests.
Also when your specifications are changing, you need to rewrite your tests, write new ones or even delete some of them.
But the general rule is: The tests are always a specification for the current state of your software. They should specify exactly what your software is intended to to. Nothing more, nothing less.

Does Kernel::srand have a maximum input value?

I'm trying to seed a random number generator with the output of a hash. Currently I'm computing a SHA-1 hash, converting it to a giant integer, and feeding it to srand to initialize the RNG. This is so that I can get a predictable set of random numbers for an set of infinite cartesian coordinates (I'm hashing the coordinates).
I'm wondering whether Kernel::srand actually has a maximum value that it'll take, after which it truncates it in some way. The docs don't really make this obvious - they just say "a number".
I'll try to figure it out myself, but I'm assuming somebody out there has run into this already.
Knowing what programmers are like, it probably just calls libc's srand(). Either way, it's probably limited to 2^32-1, 2^31-1, 2^16-1, or 2^15-1.
There's also a danger that the value is clipped when cast from a biginteger to a C int/long, instead of only taking the low-order bits.
An easy test is to seed with 1 and take the first output. Then, seed with 2i+1 for i in [1..64] or so, take the first output of each, and compare. If you get a match for some i=n and all greater is, then it's probably doing arithmetic modulo 2n.
Note that the random number generator is almost certainly limited to 32 or 48 bits of entropy anyway, so there's little point seeding it with a huge value, and an attacker can reasonably easily predict future outputs given past outputs (and an "attacker" could simply be a player on a public nethack server).
EDIT: So I was wrong.
According to the docs for Kernel::rand(),
Ruby currently uses a modified Mersenne Twister with a period of 2**19937-1.
This means it's not just a call to libc's rand(). The Mersenne Twister is statistically superior (but not cryptographically secure). But anyway.
Testing using Kernel::srand(0); Kernel::sprintf("%x",Kernel::rand(2**32)) for various output sizes (2*16, 2*32, 2*36, 2*60, 2*64, 2*32+1, 2*35, 2*34+1), a few things are evident:
It figures out how many bits it needs (number of bits in max-1).
It generates output in groups of 32 bits, most-significant-bits-first, and drops the top bits (i.e. 0x[r0][r1][r2][r3][r4] with the top bits masked off).
If it's not less than max, it does some sort of retry. It's not obvious what this is from the output.
If it is less than max, it outputs the result.
I'm not sure why 2*32+1 and 2*64+1 are special (they produce the same output from Kernel::rand(2**1024) so probably have the exact same state) — I haven't found another collision.
The good news is that it doesn't simply clip to some arbitrary maximum (i.e. passing in huge numbers isn't equivalent to passing in 2**31-1), which is the most obvious thing that can go wrong. Kernel::srand() also returns the previous seed, which appears to be 128-bit, so it seems likely to be safe to pass in something large.
EDIT 2: Of course, there's no guarantee that the output will be reproducible between different Ruby versions (the docs merely say what it "currently uses"; apparently this was initially committed in 2002). Java has several portable deterministic PRNGs (SecureRandom.getInstance("SHA1PRNG","SUN"), albeit slow); I'm not aware of something similar for Ruby.

Resources