continueEpochs Parameter in pybrain - pybrain

In pybrain documentation I found the following documentation for trainUntilConvergence as follow,
trainUntilConvergence(dataset=None, maxEpochs=None, verbose=None,
continueEpochs=10, validationProportion=0.25)
Train the module on the dataset until it converges.
Return the module with the parameters that gave the minimal validation error.
If no dataset is given, the dataset passed during Trainer initialization is used. validationProportion is the ratio of the
dataset that is used for the validation dataset.
If maxEpochs is given, at most that many epochs are trained. Each time validation error hits a minimum, try for continueEpochs epochs to
find a better one.
But they didn't tell what is continueEpochs and verbose parameters do or define? Does one have an idea?

verbose is almost self-explanatory - it simply prints to stdout current loss after each iteration.
continueEpochs is explained in the excerpt you provide
Each time validation error hits a minimum, try for continueEpochs
epochs to find a better one

If verbose is set to True, it prints out the loss of each epoch during the training.
The trainUntilConvergence method trains on your data set until the error on the validation set is no longer decreasing for a certain number of epochs. You can vary the number of epochs that the trainer considers before stopping the training by changing the continueEpochs parameter. It defaults to 10. In other words, if the error on the validation set does not improve in 10 consecutive epochs, the training is terminated. This is also known as the early stopping method and its widely used in training neural nets.

Related

training loss goes down, but validation loss fluctuates wildly, when same dataset is passed as training and validation dataset in keras

(1) I am using the same preprocessing steps for the training and validation set.
(2) Passing the same dataset as the training and validation set.
(3) Having the same number of steps per epochs (steps per epoch = dataset len/batch len) for training and validation loss.
The training loss goes down as expected, but the validation loss (on the same dataset used for training) is fluctuating wildly.
My intent is to use a held-out dataset for validation, but I saw similar behavior on a held-out validation dataset. So, I thought I'll pass the training dataset as validation (for testing purposes) - still see the same behavior.
What can be going on?

Validation Split and Checkpoint Best Model in Keras

Let us use a validation split of 0.3 when fitting a Sequential model. What will be used for validation, the first or the last 30% samples?
Secondly, checkpointing the best model saves the best model weights in .hdf5 file format. Does this mean that, for a certain experiment, the saved model is the best tuned model?
For your first question, the last 30% samples will be used for validation.
From Keras documentation:
validation_split: Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling
For your second question, I assume that you're talking about ModelCheckpoint with save_best_only=True. In this case, this callback saves the weights of a given epoch only if monitor ('val_loss', by default) is better than the best monitored value. Concretely, this happens here. If monitor is 'val_loss', this should be the tuned model for a particular setting of hyperparameters, according to the validation loss.

Does Weka test results on a separate holdout set with 10CV?

I used 10-fold cross validation in Weka.
I know this usually means that the data is split in 10 parts, 90% training, 10% test and that this is alternated 10 times.
I am wondering on what Weka calculates the resulting AUC. Is it the average of all 10 test sets? Or (and I hope this is true), does it use a holdout test set? I can't seem to find a description of this in the weka book.
Weka averages the test results. And this is a better approach then the holdout set, I don't understand why you would hope for such approach. If you hold out the test set (of what size?) your test would not be statisticaly significant, It would only say, that for best chosen parameters on the training data you achieved some score on arbitrary small part of data. The whole point of cross validation (as the evaluation technique) is to use all the data as training and as testing in turns, so the resulting metric is approximation of the expected value of the true evaluation measure. If you use the hold out test it would not converge to expected value (at least not in a reasonable time) and what is even more important - you would have to choose another constant (how big hold out set and why?) and reduce the number of samples used for training (while cross validation has been developed due to the problem with to small datasets for both training and testing).
I performed cross validation on my own (made my own random folds and created 10 classifiers) and checked the average AUC. I also checked to see if the entire dataset was used to report the AUC (similar as to when Weka outputs a decision tree under 10-fold).
The AUC for the credit dataset with a naive Bayes classifier as found by...
10-fold weka = 0.89559
10-fold mine = 0.89509
original train = 0.90281
There is a slight discrepancy between my average AUC and Weka's, but this could be from a failure in replicating the folds (although I did try to control the seeds).

net.trainParam.max_fail

Anyone has any comment on how to choose the validation max.fail number?
As you may know there is no unique criteria to choose a certain number. i believe that it could depends on the number of samples being used for training/validation.
However, it has a nontrivial role in stopping the training of the neural network
You're right, this parameter is critical for NN training. In fact, the biggest disadvantage of NNs is the presence of many critical parameters that are strongly problem-dependent, like number of neurons and training algorithm parameters such as learning rate or early stopping criteria (like in this case). In some applications, use a value of 3 or 30 is more or less the same, because after some point the NN generalization do not increase anymore, so I can suggest you to try with different parameters, including 0 and inf (i.e. no early stopping) and observe the training/validation error curves. Of course, DO NOT consider only a single run but do at least 5-10 runs for each configuration. At this point, you can try to have an idea of the "error landscape".
Use : nnparam.max_fail
For the trainlm training function, you could type:
net.trainParam.max_fail = 10 (if you want to increase the validation fail to be 10)
From Matlab Documentation
Maximum Validation Checks (max_fail) function parameter
max_fail is a training function parameter. It must be a strictly positive integer scalar.
max_fail is maximum number of validation checks before training is stopped.
This parameter is used by trainb, trainbfg, trainbr, trainc, traincgb, traincgf, traincgp, traingd, traingda, traingdm,
traingdx, trainlm, trainoss, trainrp, trains and trainscg

pyBrain trainUntilConvergence... help me understand this function please

I have several newbie questions about trainUntilConvergence in pyBrain.
trainUntilConvergence divides the data set into training and validation sets (defaults to 25% used for testing). Is this correct?
Is the error reported (when verbose=True) after each epoch the error on the validation set or the error against the training set?
Is the network considered converged (thus stopping execution) when the validation set's error is no longer reducing? Or when the error on the training set is no longer reducing? (I assume it's the former else why use a portion for validation?)
Is the section of data chosen for validation contiguous (e.g. the last x% of the data set) or does it choose x% of rows at random from the data?
Thanks!
According to the documentation trainUntilConvergence, takes in several parameters. It's shown below. Yes it defaults to 25% used as the validation set.
trainUntilConvergence(dataset=None, maxEpochs=None, verbose=None, continueEpochs=10, validationProportion=0.25)
You can change the validationProportion parameters to other values as you see fit. The proportion of validation set is debatable and has no one-value-fits-all proportion. You need to try them to fit your case.
The trainUntilConvergencemethod trains on your data set until the error on the validation set is no longer decreasing for a certain number of epochs. You can vary the number of epochs that the trainer considers before stopping the training by changing the continueEpochs parameter. It defaults to 10. In other words, if the error on the validation set does not improve in 10 consecutive epochs, the training is terminated. This is also known as the early stopping method and its widely used in training neural nets.
Regarding the contiguous of the validation set i'm not sure about it. But logically it should select on random picks.

Resources