Let us use a validation split of 0.3 when fitting a Sequential model. What will be used for validation, the first or the last 30% samples?
Secondly, checkpointing the best model saves the best model weights in .hdf5 file format. Does this mean that, for a certain experiment, the saved model is the best tuned model?
For your first question, the last 30% samples will be used for validation.
From Keras documentation:
validation_split: Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling
For your second question, I assume that you're talking about ModelCheckpoint with save_best_only=True. In this case, this callback saves the weights of a given epoch only if monitor ('val_loss', by default) is better than the best monitored value. Concretely, this happens here. If monitor is 'val_loss', this should be the tuned model for a particular setting of hyperparameters, according to the validation loss.
Related
I'm finetuning a t5-base model following this notebook.
However, the loss of both validation set and training set decreases very slowly. I changed the learning_rate to a larger number, but it did not help. Eventually, the bleu score on the validation set was low (around 13.7), and the translation quality was low as well.
***** Running Evaluation *****
Num examples = 1000
Batch size = 32
{'eval_loss': 1.06500244140625, 'eval_bleu': 13.7229, 'eval_gen_len': 17.564, 'eval_runtime': 16.7915, 'eval_samples_per_second': 59.554, 'eval_steps_per_second': 1.906, 'epoch': 5.0}
If I use the "Helsinki-NLP/opus-mt-en-ro" model, the loss decreases properly, and at the end, the finetuned model works pretty well.
How to fine-tune t5-base properly? Did I miss something?
I think the metrics shown in the tutorial are for the already trained EN>RO opus-mt model which was then fine-tuned. I don't see the before and after comparison of the metrics for it, so it is hard to tell how much of a difference that fine-tuning really made.
You generally shouldn't expect the same results from fine-tuning T5 which is not a (pure) machine translation model. More important is the difference in metrics before and after the fine-tuning.
Two things I could imagine having gone wrong with your training:
Did you add the proper T5 prefix to the input sequences ("translate English to Romanian: ") for both your training and your evaluation? If you did not you might have been training a new task from scratch and not use the bit of pre-training the model did on MT to Romanian (and German and perhaps some other ones). You can see how that affects the model behavior for example in this inference demo: Language used during pretraining and Language not used during pretraining.
If you chose a relatively small model like t5-base but you stuck with the num_train_epochs=1 in the tutorial your train epoch number is probably a lot too low to make a noticable difference. Try increasing the epochs for as long as you get significant performance boosts from it, in the example this is probably the case for at least the first 5 to 10 epochs.
I actually did something very similar to what you are doing before for EN>DE (German). I fine-tuned both opus-mt-en-de and t5-base on a custom dataset of 30.000 samples for 10 epochs. opus-mt-en-de BLEU increased from 0.256 to 0.388 and t5-base from 0.166 to 0.340, just to give you an idea of what to expect. Romanian/the dataset you use might be more of a challenge for the model and result in different scores though.
Just started with H2O AutoML so apologies in advance if I have missed something basic.
I have a binary classification problem where data are observations from K years. I want to train on the K-1 years and tune the models and select the best one explicitly based on the remaining K year.
If I switch off cross-validation (with nfolds=0) to avoid randomly blending of years into the N folds and define data of year K as the validation_frame then I don't have the ensemble created (as expected according to the documentation) which in fact I need.
If I train with cross-validation (default nfolds) and defining a validation frame to be the K-year data
aml = H2OAutoML(max_runtime_secs=3600, seed=1)
aml.train(x=x,y=y, training_frame=k-1_years, validation_frame=k_year)
then according to
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html
the validation_frame is ignored
"...By default and when nfolds > 1, cross-validation metrics will be used for early stopping and thus validation_frame will be ignored."
Is there a way to get the tuning of the models and the selection of the best one(ensemble or not) based on the K-year data only, and while the ensemble of models is also available in the output?
Thanks a lot!
You don't want to have cross-validation (CV) if you are dealing with times-series (non-IID) data, since you won't want folds from the future to the predict the past.
I would explicitly add nfolds=0 so that CV is disabled in AutoML:
aml = H2OAutoML(max_runtime_secs=3600, seed=1, nfolds=0)
aml.train(x=x,y=y, training_frame=k-1_years, validation_frame=k_year)
To have an ensemble, add a blending_frame which also applies to time-series. See more info here.
Additionally, since you are dealing with time-series data. I would recommend adding time-series transformations (e.g. lags), so that your model gets info from previous years and their aggregates (e.g. weighted moving average).
(1) I am using the same preprocessing steps for the training and validation set.
(2) Passing the same dataset as the training and validation set.
(3) Having the same number of steps per epochs (steps per epoch = dataset len/batch len) for training and validation loss.
The training loss goes down as expected, but the validation loss (on the same dataset used for training) is fluctuating wildly.
My intent is to use a held-out dataset for validation, but I saw similar behavior on a held-out validation dataset. So, I thought I'll pass the training dataset as validation (for testing purposes) - still see the same behavior.
What can be going on?
I am dealing with a discrete time system with sampling time of 300s.
My question is that how to express the state equation or output eqatuin like
x(k+1)=A*x(k)+B*u(k)
y(k)=C*x(k)
where x(k) is the state and y(k) is the output. I have all the value of A, B, C matrix.
I found some information about discrete time system on webpage https://apmonitor.com/wiki/index.php/Apps/DiscreteStateSpace
I want to know whether there is another way to express state equation other than
x,y,u = m.state_space(A,B,C,D=None,discrete=True)
The discrete state space model is the preferred way to pose your model. You could also convert your equations to a discrete time series form or to a continuous state space form. These all are equivalent forms. Another way to write your model is to use IMODE=2 (algebraic equations) but this is much more complicated. Here is an example of MIMO identification where we estimate ARX parameters with IMODE=2. I recommend the m.state_space model and to use it with IMODE>=4.
Here is a pendulum state space model example.
and a flight control state space model.
These both use continuous state space models but the methods are similar to what is needed for your application.
I'm trying to calculate performance in a different way how it is built in for models right now.
I would like to access raw predictions during cross-validation, so I can calculate performance on my own.
g = h2o.get_grid(grid_id)
for m in g.models:
print "Model %s" % m.model_id
rrc[m.model_id] = m.cross_validation_holdout_predictions()
I could just run prediction with a model on my dataset, but I think then this test might be biased because the model has seen this data before, or not? Can I take new predictions made on the same data set and use it to calculate performance?
I would like to access raw predictions during cross-validation, so I can calculate performance on my own.
If you want to calculate a custom metric on the cross-validated predictions, then set keep_cross_validation_predictions = True and you can access the raw predicted values using the .cross_validation_holdout_predictions() method like you have above.
Can I take new predictions made on the same data set and use it to calculate performance?
It sounds like you're asking if you can use only training data to estimate model performance? Yes, using cross-validation. If you set nfolds > 1, H2O will do cross-validation and compute a handful of cross-validated performance metrics for you. Also, if you tell H2O to save the cross-validated predictions, you can compute "cross-validated metrics" of your own.