Sampling from user provided target densities in PyMC3 - pymc

Is it possible to sample from a user provided target measure in PyMC3 in an easy way? I.e. I want to be able to provide black box functions logposterior(theta) and grad_logposterior(theta) that and sample from those instead of specifying a model in PyMC3s modeling language.

This is a bit clunky. You'd need to create a new Theano Op. Here are a few examples: https://github.com/Theano/Theano/blob/master/theano/tensor/slinalg.py#L32
You then need to create a distribution class that evaluates the logp via your new Op, for example: https://github.com/pymc-devs/pymc3/blob/master/pymc3/distributions/continuous.py#L70

Related

Optimize other parameters than the predefined, using step-wise algorithm in optuna.integration.lightgbm

As far as I understand, the LightGBM integration in Optuna uses a step-wise algorithm to optimize the hyper-parameters such as lambda_l1, lambda_l2 etc.
Although it is great, I would very much want to add additional parameters such as learning_rates.
I know I can just use Optuna the "regular" way but since the integrated lgbm part should be way faster, I would prefer using that.
Is there a way to add additional parameters to optimize, or are we forced to use all of (and only) the specified parameters? I can see theres e.g a parameter called learning_rates but in the docs it is not specified what that does and how to use it (I think it's the learning-rate for each tree). Setting it in the lgb.train like
model = lgb.train(
params,
dtrain,
valid_sets=[dtrain, dval],
)

Seaborn global hue order

I very often use the hue argument to distinguish between categories but it seems like seaborn isn't consistent in how it matches hues to categories (from what I've read it depends on the plotted data, in particular its order). I would like to avoid passing the hue_order argument everywhere because I know I will forget it at some point and not notice it (which will lead to misinterpretations because I will suppose hues are correct).
Is there a way to set the hue_order globally (fixed order for all plots)?
Even better, would it possible to set categorical indexes to all behave the same (eg., alphanumeric order)?
For now I use the following ugly strategy:
SNS_SETTINGS = dict(hue_order=[...])
sns.displot(df, **SNS_SETTINGS, x="time", kind="ecdf", hue="algorithm")
A very practical solution is to add the hue parameter in the SNS_SETTINGS dictionary. This coupling will ensure the needed consistency across your plots.
Another solution, that may or may not be adequate in your specific case, would be to define custom functions with functools.partial, defining the parameters once to have shorter function calls:
from functools import partial
displot_by_algorithm = partial(sns.displot, hue="algorithm", hue_order=[...])
This way, you can later call
displot_by_algorithm(df, x="time", kind="ecdf")
Of course, you will have to define such function for all the different plotting functions you want to use, so the trade-off might not be worth it.

How should OpenAI environments (gyms) use env.seed(0)?

I've created a very simple OpenAI gym (banana-gym) and wonder if / how I should implement env.seed(0).
See https://github.com/openai/gym/issues/250#issuecomment-234126816 for example.
In a recent merge, the developers of OpenAI gym changed the behavior of env.seed() to not call the method env._seed() anymore. Instead the method now just issues a warning and returns. I think if you want to use this method to set the seed of your environment, you should just overwrite it now.
The docstring of the env.seed() function (which can be found in this file) provides the following documentation on what the function should be implemented to do:
Sets the seed for this env's random number generator(s).
Note:
Some environments use multiple pseudorandom number generators.
We want to capture all such seeds used in order to ensure that
there aren't accidental correlations between multiple generators.
Returns:
list<bigint>: Returns the list of seeds used in this env's random
number generators. The first value in the list should be the
"main" seed, or the value which a reproducer should pass to
'seed'. Often, the main seed equals the provided 'seed', but
this won't be true if seed=None, for example.
Note that, unlike what the documentation and the comments in the issue you linked to seem to imply, it doesn't seem (to me) like env.seed() is supposed to be overridden by custom environments. env.seed() has a very simple implementation, where it only calls and returns the return value of env._seed(), and it seems to me like that is the function which should be overridden by custom environments.
For example, OpenAI gym's atari environments have a custom _seed() implementation which sets the seed used internally by the (C++-based) Arcade Learning Environment.
Since you have a random.random() call in your custom environment, you should probably implement _seed() to call random.seed(). In that way, users of your environments can reproduce experiments by making sure to call seed() on your environment with the same argument.
Note: Messing around with the global random seed like this may be unexpected though, it may be better to create a dedicated random object when your environment gets initialized, seed that object, and make sure to always obtain your random numbers if you need them in the environment from that object.
env.seed(seed) works like a charm. The key is to seed the env not just at the beginning, but EVERY time the reset() function is called. Since we invariably end up playing multiple games during training this seeding function should be inside one of the loops and will be executed multiple times. Possibly this is the reason why it is deprecated now in favor of env.reset(seed=seed)
Of course needless to say, if you are using randomness in the agent, you need to seed that as well. In that case, seeding once at the start of the training would be fine. You may want to seed the NN framework as well.. A typical seeding function (Pytorch) would be:
def seed_everything(seed):
random.seed(seed)
np.random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
env.seed(seed)
##One call at beginning is enough
seed_everything(SEED)
Howver remember to call env.seed every time you reset the env:
curr_state = env.reset()
env.seed(SEED)
or simply use the new API: env.reset(seed=seed)
BUT - While this determinism may be used in early training to debug your code, it is recommended not to use the the same(ie env.seed(SEED)) in your final training. This is because, by nature, the start position of the environment is supposed to be random and your RL code is expected to function considering that randomness. If you make your start position deterministic then your model will not perform well in the live environment

Why TensorFlow in Go didn't find the optimizer as python?

I am a newbie of TensorFlow in Go.
There are some doubts during my first traing demo. I just find one optimizer in Go's wrappers.go.
But i learn the demos of python,they has serveral optimizers. Like
GradientDescentOptimizer
AdagradOptimizer
AdagradDAOptimizer
MomentumOptimizer
AdamOptimizer
FtrlOptimizer
RMSPropOptimizer
The similar prefix of func like ResourceApply...
GradientDescent
Adagrad
AdagradDA
Momentum
Adam
Ftrl
RMSProp.
And they return a option.I don't know what are their purpose. I cant find the relation of them and optimizer.
And how can i make a train in Go by TensorFlow.
What should I should use like python's tf.Variable in Go?
You can't train a Tensorflow model using Go.
The only thing you can do is load a pre-trained model and use it for the inference.
You can't because the Go implementation lacks the Variable support, therefore it's impossible to train anything at the moment.

Bloom Filter class to include in my OMNET++ project

I'm doing a project using OMNET++. All I need is a decent bloom filter class(Hash functions that can take in a string and simple control functions: add/reset/check), so I can create an object of it and use it in my network.
I tried creating one myself but the hash function part gave me trouble. I'm operating my project simulation on a simple macbook pro.
Any recommendations?
You can basically use any available bloom-filter implementation for C++, e.g., https://code.google.com/p/bloom/

Resources