How should OpenAI environments (gyms) use env.seed(0)? - random

I've created a very simple OpenAI gym (banana-gym) and wonder if / how I should implement env.seed(0).
See https://github.com/openai/gym/issues/250#issuecomment-234126816 for example.

In a recent merge, the developers of OpenAI gym changed the behavior of env.seed() to not call the method env._seed() anymore. Instead the method now just issues a warning and returns. I think if you want to use this method to set the seed of your environment, you should just overwrite it now.

The docstring of the env.seed() function (which can be found in this file) provides the following documentation on what the function should be implemented to do:
Sets the seed for this env's random number generator(s).
Note:
Some environments use multiple pseudorandom number generators.
We want to capture all such seeds used in order to ensure that
there aren't accidental correlations between multiple generators.
Returns:
list<bigint>: Returns the list of seeds used in this env's random
number generators. The first value in the list should be the
"main" seed, or the value which a reproducer should pass to
'seed'. Often, the main seed equals the provided 'seed', but
this won't be true if seed=None, for example.
Note that, unlike what the documentation and the comments in the issue you linked to seem to imply, it doesn't seem (to me) like env.seed() is supposed to be overridden by custom environments. env.seed() has a very simple implementation, where it only calls and returns the return value of env._seed(), and it seems to me like that is the function which should be overridden by custom environments.
For example, OpenAI gym's atari environments have a custom _seed() implementation which sets the seed used internally by the (C++-based) Arcade Learning Environment.
Since you have a random.random() call in your custom environment, you should probably implement _seed() to call random.seed(). In that way, users of your environments can reproduce experiments by making sure to call seed() on your environment with the same argument.
Note: Messing around with the global random seed like this may be unexpected though, it may be better to create a dedicated random object when your environment gets initialized, seed that object, and make sure to always obtain your random numbers if you need them in the environment from that object.

env.seed(seed) works like a charm. The key is to seed the env not just at the beginning, but EVERY time the reset() function is called. Since we invariably end up playing multiple games during training this seeding function should be inside one of the loops and will be executed multiple times. Possibly this is the reason why it is deprecated now in favor of env.reset(seed=seed)
Of course needless to say, if you are using randomness in the agent, you need to seed that as well. In that case, seeding once at the start of the training would be fine. You may want to seed the NN framework as well.. A typical seeding function (Pytorch) would be:
def seed_everything(seed):
random.seed(seed)
np.random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
env.seed(seed)
##One call at beginning is enough
seed_everything(SEED)
Howver remember to call env.seed every time you reset the env:
curr_state = env.reset()
env.seed(SEED)
or simply use the new API: env.reset(seed=seed)
BUT - While this determinism may be used in early training to debug your code, it is recommended not to use the the same(ie env.seed(SEED)) in your final training. This is because, by nature, the start position of the environment is supposed to be random and your RL code is expected to function considering that randomness. If you make your start position deterministic then your model will not perform well in the live environment

Related

What is the easiest way to replace Rust's thread_rng() generator with a seeded generator for tests?

I have a large amount of code that uses rand::thread_rng(). I would like to make this code behave deterministically in tests using something like StdRng::seed_from_u64(...). I'm aware that the 'correct' solution to this problem is to pass around a random number generator to all of my code which I can change at runtime, but changing hundreds of functions to take a new parameter isn't really feasible for me right now. Is there a simple way to just configure thread_rng to be deterministic or make my own version of it?

Is there a difference between fun(n::Integer) and fun(n::T) where T<:Integer in performance/code generation?

In Julia, I most often see code written like fun(n::T) where T<:Integer, when the function works for all subtypes of Integer. But sometimes, I also see fun(n::Integer), which some guides claim is equivalent to the above, whereas others say it's less efficient because Julia doesn't specialize on the specific subtype unless the subtype T is explicitly referred to.
The latter form is obviously more convenient, and I'd like to be able to use that if possible, but are the two forms equivalent? If not, what are the practicaly differences between them?
Yes Bogumił Kamiński is correct in his comment: f(n::T) where T<:Integer and f(n::Integer) will behave exactly the same, with the exception the the former method will have the name T already defined in its body. Of course, in the latter case you can just explicitly assign T = typeof(n) and it'll be computed at compile time.
There are a few other cases where using a TypeVar like this is crucially important, though, and it's probably worth calling them out:
f(::Array{T}) where T<:Integer is indeed very different from f(::Array{Integer}). This is the common parametric invariance gotcha (docs and another SO question about it).
f(::Type) will generate just one specialization for all DataTypes. Because types are so important to Julia, the Type type itself is special and allows parameterization like Type{Integer} to allow you to specify just the Integer type. You can use f(::Type{T}) where T<:Integer to require Julia to specialize on the exact type of Type it gets as an argument, allowing Integer or any subtypes thereof.
Both definitions are equivalent. Normally you will use fun(n::Integer) form and apply fun(n::T) where T<:Integer only if you need to use specific type T directly in your code. For example consider the following definitions from Base (all following definitions are also from Base) where it has a natural use:
zero(::Type{T}) where {T<:Number} = convert(T,0)
or
(+)(x::T, y::T) where {T<:BitInteger} = add_int(x, y)
And even if you need type information in many cases it is enough to use typeof function. Again an example definition is:
oftype(x, y) = convert(typeof(x), y)
Even if you are using a parametric type you can often avoid using where clause (which is a bit verbose) like in:
median(r::AbstractRange{<:Real}) = mean(r)
because you do not care about the actual value of the parameter in the body of the function.
Now - if you are Julia user like me - the question is how to convince yourself that this works as expected. There are the following methods:
you can check that one definition overwrites the other in methods table (i.e. after evaluating both definitions only one method is present for this function);
you can check code generated by both functions using #code_typed, #code_warntype, #code_llvm or #code_native etc. and find out that it is the same
finally you can benchmark the code for performance using BenchmarkTools
A nice plot explaining what Julia does with your code is here http://slides.com/valentinchuravy/julia-parallelism#/1/1 (I also recommend the whole presentation to any Julia user - it is excellent). And you can see on it that Julia after lowering AST applies type inference step to specialize function call before LLVM codegen step.
You can hint Julia compiler to avoid specialization. This is done using #nospecialize macro on Julia 0.7 (it is only a hint though).

How to seed the pcg random number generator?

In the samples for PCG they only seed one way which I assume is best/preferred practice:
pcg32 rng(pcg_extras::seed_seq_from<std::random_device>{});
or
// Seed with a real random value, if available
pcg_extras::seed_seq_from<std::random_device> seed_source;
// Make a random number engine
pcg32 rng(seed_source);
However running this on my machine just produces the same seed every time. It is no better then if I just typed in some integer to seed with myself. What would be a good method to seed if trying it this way doesn't work ?
pcg_extras::seed_seq_from is supposed to be the recommended way, but it delegates the actual seed generation to the generator specified in the template parameter.
MinGW has a broken implementation of std::random_device. So at this moment, if you want to target MinGW, you must not use std::random_device.
Some potential alternatives:
boost::random_device
randutils, by the author of PCG, M.E. O'Neill
seed11::seed_device, drop-in replacement for std::random_device (disclaimer: it's my own library)
More info about seeding in this blog post by M.E. O'Neill.

What is the best way to manage a large quantity of constants

I am currently working on a very complex program that processes rows from an input table and has a huge number of possible outcomes for each record. Because of this I have a very large number of constants defined for the outcome messages. There is one success message for the record, but a multitude of possible warnings and errors.
My first thought was to define all of my constants for these messages at the package body level, but then I decided to move each constant to the procedure where it is used. I'm now second guessing that decision and thinking of moving everything back to package body level. What is the best way to define this many constants? Ease of maintainability is my ultimate goal for this program since it is so complex.
I think this is a matter of taste. In my application I put all error codes into an Error-Package. All main and commonly used constants I put into a separate package (without a package body).
Again, a matter of taste, but I tend to put a list of named constants at the package spec level rather than the package body so that they can be referenced by any portion of the application. If I ever want to change the error code that c_err_for_specific_reason_x uses, it becomes a single place to do so.
If I wanted to hide the codes and put them within the body I would have a get_error_code(p_get_error_name varchar) function that did the translation based on you passing a valid constant name.
I've done both on different projects, but tend towards the list over the function most times. I tend to use the function if it a table-driven source of the data.
It ... wait for it ... depends!
Since you currently define your constants in the package body, you don't need them to be publicly accessible outside the package. So defining them in a spec really doesn't buy you anything.
Here's is the rule I follow: Define constants within the smallest scope needed. So if a constant is used only within one procedure, define it in that procedure. If it is used within more than one procedure, define it in the body. If it is used elsewhere by code in other packages (or non-packaged SPs) but only when using a particular package, define it in the spec of that package. If it is used by other code for general use, put it in a separate spec of such general constants.

Should a Unit-test replicate functionality or Test output?

I've run into this dilemma several times. Should my unit-tests duplicate the functionality of the method they are testing to verify its integrity? OR Should unit tests strive to test the method with numerous manually created instances of inputs and expected outputs?
I'm mainly asking the question for situations where the method you are testing is reasonably simple and its proper operation can be verified by glancing at the code for a minute.
Simplified example (in ruby):
def concat_strings(str1, str2)
return str1 + " AND " + str2
end
Simplified functionality-replicating test for the above method:
def test_concat_strings
10.times do
str1 = random_string_generator
str2 = random_string_generator
assert_equal (str1 + " AND " + str2), concat_strings(str1, str2)
end
end
I understand that most times the method you are testing won't be simple enough to justify doing it this way. But my question remains; is this a valid methodology in some circumstances (why or why not)?
Testing the functionality by using the same implementation, doesn't test anything. If one has a bug in it, the other will as well.
But testing by comparing with an alternative implementation is a valid approach. For example, you might test a iterative (fast) method of calculating fibonacci numbers by comparing it with a trivial recursive, yet slow implementation of the same method.
A variation of this is using an implementation, that only works for special cases. Of course in that case you can use it only for such special cases.
When choosing input values, using random values most of the time isn't very effective. I'd prefer carefully chosen values anytime. In the example you gave, null values and extremely long values which won't fit into a String when concatenated come to mind.
If you use random values, make sure, you have a way to recreate the exact run, with the same random values, for example by logging the seed value, and having a way to set that value at start time.
It's a controversial stance, but I believe that unit testing using Derived Values is far superior to using arbitrary hard-coded input and output.
The issue is that as an algorithm becomes even slightly complex, the relationship between input and output becomes obscure if represented by hard-coded values. The unit test ends up being a postulate. It may work technically, but hurts test maintenability because it leads to Obscure Tests.
Using Derived Values to test against the result establishes a much clearer relationship between test input and expected output.
The argument that this doesn't test anything is simply not true, because any test case will exercise only a part of a path through the SUT, so no single test case will reproduce the entire algorithm being tested, but the combination of tests will do so.
An additional benefit is that you can use fewer unit tests to cover the desired functionality, and even make them more communicative at the same time. The end result is terser and more maintainable unit tests.
In unit testing you should definitely manually come up with test cases (so input, output and what side-effects you are expecting - these will be expectations on your mock objects). You come up with these test cases in a way so that they cover all functionality of your class (e.g. all methods are covered, all branches of all if statements, etc.). Think about it more along the lines of creating documentation of your class by showing all possible usages.
Reimplementing the class is not a good idea, because not only you get obvious code/functionality duplication, but also it is likely that you will introduce the same bugs in this new implementation.
to test the functionality of a method i'd use input and output pairs wherever possible. otherwise you might be copy&pasting the functionality as well as the errors in its implementation. what are you testing then? you would be testing if the functionality (including all of its errors) hasn't changed over time. but you wouldn't be testing the correctness of the implementation.
testing if the functionality hasn't changed over time might (temporarily) be useful during refactoring. but how often do you refactor such small methods?
also unit tests can be seen as documentation and as specification of a method's inputs and expected outputs. both should be as simple as possible so others can easily read and comprehend it. as soon as you introduce additional code/logic into a test it becomes harder to read.
your test actually looks like a fuzz test. fuzz tests can be very useful, but in unit tests randomness should be avoided due to reproducibility.
A Unit-Test should exercise your code, not something as part of the language you are using.
If the code's logic is to concatenate strings in a special way, you should be testing for that - otherwise you need to rely on your language/framework.
Finally, you should create your unit tests to fail first "with meaning". In other words, random values shouldn't be used (unless you're testing your random number generator isn't returning the same set of random values!)
Yes. It bothers me too.. although I'd say that it is more prevalent with non-trivial computations. In order to avoid updating the test when the code changes, some programmers write a IsX=X test, which always succeeds irrespective of the SUT
About duplicating functionality
You don't have to. Your test can state what the expected output is, not how you derived it.
Although in some non-trivial cases, it may make your test more readable as to how you derived the expected value - test as a spec. You shouldn't refactor away this duplication
def doubler(x); x * 2; end
def test_doubler()
input, expected = 10, doubler(10)
assert_equal expected, doubler(10)
end
Now if I change doubler(x) to be a tripler, the above test won't fail.
def doubler(x); x * 3; end
However this one would:
def test_doubler()
assert_equal(20, doubler(10))
end
randomness in unit tests - Don't.
Instead of random datasets, choose static representative data points for testing and use a xUnit RowTest/TestCase to run the test with diff data inputs. If n input-sets are identical for the unit, choose 1.
The test in the OP could be used as a exploratory test/ or to determine additional representative input-sets. Unit tests need to be repeatable (See q#61400) - Using random values defeats this objective.
Never use random data for input. If your test reports a failure, how are you ever going to be able to duplicate it? And don't use the same function to generate the expected result. If you have a bug in your method you're likely to put the same bug in your test. Compute the expected results by some other method.
Hard-coded values are perfectly fine, and make sure inputs are picked to represent all of the normal and edge cases. At the very least test the expected inputs as well as inputs in the wrong format or wrong size (eg: null values).
It's really quite simple -- a unit test must test whether the function works or not. That means you need to give a range of known inputs that have known outputs and test against that. There is no universal right way to do that. However, using the same algorithm for the method and the verification proves nothing but that you're adept at copy/paste.

Resources