What meta learners are used by h2o.automl() to build the ensembles? - h2o

I am wondering what meta learners are used by h2o.automl() to build the ensembles. So far all the ensembles I've seen were GLMs. Is it because h2o.automl() uses only glm as the meta learner or due to the limited number of base models (25 -50 with my setting), glm is always the best choice?
Thank you.

H2OAutoML uses GLM as a default metalearner algo, and we're not currently trying multiple metalearners to find the best one (this may change in future releases).
For now, you can train a different ensemble using the autoML models as base models:
aml = H2OAutoML(project_name="my_aml",
...,
keep_cross_validation_predictions=True) # important if you want to stack the models later
aml.train(...)
# train another ensemble using GBM as algo metalearner
lb = aml.leaderboard
base_models = [m for m in [lb[i,0] for i in range(lb.nrows)]
if 'StackedEnsemble' not in m]
se = h2o.estimators.H2OStackedEnsembleEstimator(
base_models=base_models,
metalearner_algorithm='gbm',
...
)

Related

What's difference RobertaModel, RobertaSequenceClassification (hugging face)

I try to use hugging face transformers api.
As I import library , I have some questions. If anyone who know the answer, please tell me your knowledge.
transformers library have several models that are trained. transformers provide not only bare model like 'BertModel, RobertaModel, ... but also convenient heads like 'ModelForMultipleChoice' , 'ModelForSequenceClassification', 'ModelForTokenClassification' , ModelForQuestionAnswering.
I wonder what's difference between bare model adding new linear transformation myself and modelforsequenceclassification.
what's different custom model (pretrained model with random intialized linear) and transformers modelforsequenceclassification.
is ModelforSequenceClassification trained from glue data?
I look forward to someone's reply Thanks.
I think it's easiest to understand if we have a look at the actual implementation, where I randomly chose RobertaModel and RobertaForSequenceClassification as an example. However, the conclusion is valid for all other models, too.
You can find the implementation for RobertaForSequenceClassification here, which looks roughly like this:
class RobertaForSequenceClassification(RobertaPreTrainedModel):
authorized_missing_keys = [r"position_ids"]
def __init__(self, config):
super().__init__(config)
self.num_labels = config.num_labels
self.roberta = RobertaModel(config, add_pooling_layer=False)
self.classifier = RobertaClassificationHead(config)
self.init_weights()
[...]
def forward([...]):
[...]
As we can see, there is no indication about the pretraining here, and it simply adds another linear layer on top (the implementation of the RobertaClassificationHead can be found a bit further down, namely here):
class RobertaClassificationHead(nn.Module):
"""Head for sentence-level classification tasks."""
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config.hidden_size, config.hidden_size)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.out_proj = nn.Linear(config.hidden_size, config.num_labels)
def forward(self, features, **kwargs):
x = features[:, 0, :] # take <s> token (equiv. to [CLS])
x = self.dropout(x)
x = self.dense(x)
x = torch.tanh(x)
x = self.dropout(x)
x = self.out_proj(x)
return x
So, to answer your question: These models come without any pretrained additional layers on top, and you could easily implement them yourself*.
Now for the asterisk: While it could be easy to wrap this yourself, also note that it is an inherited class RobertaPreTrainedModel. This has several advantages, the most important one being a consistent design between different implementations (sequence classification model, sequence tagging model, etc.). Further, there are some neat functionalities that they are providing, like the forward call including extensive parameters (padding, masking, attention output, ...), which would cost quite some time to implement.
Last but not least, there are existing trained models based on these specific implementations, which you can search for on the Huggingface Model Hub. There, you might find models that are fine-tuned on a sequence classification task (e.g., this one), and then directly load its weights in a RobertaForSequenceClassification model. If you had your own implementation of a sequence classification model, loading and aligning these pre-trained weights would be incredibly more complicated.
I hope this answers your main concern, but feel free to elaborate (either as comment or new question) on any points that have not been addressed!

Updating Weights from Caffe and DIGITS

I've built a model using DIGITS by Nvidia, but when I try to run it using caffe, I don't know where the Weights are. Any idea how I'd find this. I have the architecture because that is provided right on the output model screen.
The weights are not accessible from any of the output models given on the Digits UI, however they are accessible!
I use NVIDIAs DGX, which can take python code. To pull weights on that platform (where I route the models to save I use this bit of code:
net = caffe.Net('../models/bvlc_reference_caffenet/deploy.prototxt',
'../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel',
caffe.TEST)
params = ['fc6', 'fc7', 'fc8']
fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params}
for fc in params:
print '{} weights are {} dimensional and biases are {} dimensional'.format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape)
Good Luck!

What problem does a reinitializable iterator solve?

From the tf.data documentation:
A reinitializable iterator can be initialized from multiple different
Dataset objects. For example, you might have a training input pipeline
that uses random perturbations to the input images to improve
generalization, and a validation input pipeline that evaluates
predictions on unmodified data. These pipelines will typically use
different Dataset objects that have the same structure (i.e. the same
types and compatible shapes for each component).
the following example was given:
# Define training and validation datasets with the same structure.
training_dataset = tf.data.Dataset.range(100).map(
lambda x: x + tf.random_uniform([], -10, 10, tf.int64))
validation_dataset = tf.data.Dataset.range(50)
# A reinitializable iterator is defined by its structure. We could use the
# `output_types` and `output_shapes` properties of either `training_dataset`
# or `validation_dataset` here, because they are compatible.
iterator = tf.data.Iterator.from_structure(training_dataset.output_types,
training_dataset.output_shapes)
next_element = iterator.get_next()
training_init_op = iterator.make_initializer(training_dataset)
validation_init_op = iterator.make_initializer(validation_dataset)
# Run 20 epochs in which the training dataset is traversed, followed by the
# validation dataset.
for _ in range(20):
# Initialize an iterator over the training dataset.
sess.run(training_init_op)
for _ in range(100):
sess.run(next_element)
# Initialize an iterator over the validation dataset.
sess.run(validation_init_op)
for _ in range(50):
sess.run(next_element)
It is unclear what the benefit of this complexity is.
Why not simply create 2 different iterators?
The original motivation for reinitializable iterators was as follows:
The user's input data is in two or more tf.data.Dataset objects with the same structure but different pipeline definitions.
For example, you might have a training data pipeline with augmentations in a Dataset.map(), and an evaluation data pipeline that produced raw examples, but they would both produce batches with the same structure (in terms of the number of tensors, their element types, shapes, etc.).
The user would define a single training graph that took input from a tf.data.Iterator, created using Iterator.from_structure().
The user could then switch between the different input data sources by reinitializing the iterator from one of the datasets.
In hindsight, reinitializable iterators have turned out to be quite hard to use for their intended purpose. In TensorFlow 2.0 (or 1.x with eager execution enabled), it is much easier to create iterators over different datasets using idiomatic Python for loops and high-level training APIs:
tf.enable_eager_execution()
model = ... # A `tf.keras.Model`, or some other class exposing `fit()` and `evaluate()` methods.
train_data = ... # A `tf.data.Dataset`.
eval_data = ... # A `tf.data.Dataset`.
for i in range(NUM_EPOCHS):
model.fit(train_data, ...)
# Evaluate every 5 epochs.
if i % 5 == 0:
model.evaluate(eval_data, ...)

ROC on multiple test sets in h2o (python)

I had a use-case that I thought was really simple but couldn't find a way to do it with h2o. I thought you might know.
I want to train my model once, and then evaluate its ROC on a few different test sets (e.g. a validation set and a test set, though in reality I have more than 2) without having to retrain the model. The way I know to do it now requires retraining the model each time:
train, valid, test = fr.split_frame([0.2, 0.25], seed=1234)
rf_v1 = H2ORandomForestEstimator( ... )
rf_v1.train(features, var_y, training_frame=train, validation_frame=valid)
roc = rf_v1.roc(valid=1)
rf_v1.train(features, var_y, training_frame=train, validation_frame=test) # training again with the same training set - can I avoid this?
roc2 = rf_v1.roc(valid=1)
I can also use model_performance(), which gives me some metrics on an arbitrary test set without retraining, but not the ROC. Is there a way to get the ROC out of the H2OModelMetrics object?
Thanks!
You can use the h2o flow to inspect the model performance. Simply go to: http://localhost:54321/flow/index.html (if you changed the default port change it in the link); type "getModel "rf_v1"" in a cell and it will show you all the measurements of the model in multiple cells in the flow. It's quite handy.
If you are using Python, you can find the performance in your IDE like this:
rf_perf1 = rf_v1.model_performance(test)
and then print the ROC like this:
print (rf_perf1.auc())
Yes, indirectly. Get the TPRs and FPRs from the H2OModelMetrics object:
out = rf_v1.model_performance(test)
fprs = out.fprs
tprs = out.tprs
roc = zip(fprs, tprs)
(By the way, my H2ORandomForestEstimator object does not seem to have an roc() method at all, so I'm not 100% sure that this output is in the exact same format. I'm using h2o version 3.10.4.7.)

How to implement custom Gibbs sampling scheme in pymc

I have a hidden Markov stochastic volatility model (represented as a linear state space model). I am using a hand-written Gibbs sampling scheme to estimate parameters for the model. The actual sampler requires some fairly sophisticated update rules that I believe I need to write by hand. You can see an example of a Julia version of these update rules here.
My question is the following: how can I specify the model in a custom way and then hand the job of running the sampler and collecting the samples to pymc? In other words, I am happy to provide code to do all the heavy lifting (how to update each block of parameters on each scan -- utilizing full conditionals within each block), but I want to let pymc handle the "accounting" for me.
I realize that I will probably need to provide more information so that others can answer this question. The problem is I am not sure exactly what information will be useful. So, if you feel you can help me out with this, but need more information -- please let me know in a comment and I will update the question.
Here is an example of a custom sampler in PyMC2:
class BDSTMetropolis(mc.Metropolis):
def __init__(self, stochastic):
mc.Metropolis.__init__(self, stochastic, scale=1., proposal_sd='custom',
proposal_distribution='custom', verbose=None, tally=False)
def propose(self):
T = self.stochastic.value
T.u_new, T.v_new = T.edges()[0]
while T.has_edge(T.u_new, T.v_new):
T.u_new, T.v_new = random.choice(T.base_graph.edges())
T.path = nx.shortest_path(T, T.u_new, T.v_new)
i = random.randrange(len(T.path)-1)
T.u_old, T.v_old = T.path[i], T.path[i+1]
T.remove_edge(T.u_old, T.v_old)
T.add_edge(T.u_new, T.v_new)
self.stochastic.value = T
def reject(self):
T = self.stochastic.value
T.add_edge(T.u_old, T.v_old)
T.remove_edge(T.u_new, T.v_new)
self.stochastic.value = T
It pretty different than your model, but it should demonstrate all the parts. Does that give you enough to go on?

Resources