Validation loss when using Dropout - validation

I am trying to understand the effect of dropout on validation Mean Absolute Error (non-linear regression problem).
Without dropout
With dropout of 0.05
With dropout of 0.075
Without any dropouts the validation loss is more than training loss as shown in 1. My understanding is that the validation loss should only be slightly more than the training loss for a good fit.
Carefully, I increased the dropout so that validation loss is close to the training loss as seen in 2. The dropout is only applied during training and not during validation, hence the validation loss is lower than the training loss.
Finally the dropout was increased further and the validation loss again became more than the training loss in 3.
Which amongst these three should be called as a good fit?
Following the response of Marcin Możejko, I predicted against three tests as shown in 4. The 'Y' axis shows RMS error instead of MAE. The model 'without dropout' gave the best result.

Well - this a really good question. In my opinion - the lowest validation score (confirmed on a separate test set) is the best fit. Remember that in the end - the performance of your model on a totally new data is the most crucial thing and the fact that it performed even better on a training set is not so important.
Moreover - I think that your model might generaly underfit - and you could try extend it to e.g. have more layers or neurons and prune it a little bit using dropout in order to prevent example memoization.
If my hypothesis turned out to be false - remember - that it still might be possible that there are certain data patterns present only on validation set (this relatively often in case of medium size datasets) what makes the divergence of train and test loss. Moreover - I think that even though that your losses values saturated in case without dropout there is still a room for improvement by simple increase in number of epochs as there seems to be a trend for losses to be smaller.
Another technique I recommend you to try is reducing learning rate on plateau (using example this callback) as your model seems to need refinement with lower value learning rate.

Related

How to handle overfitting properly?

I am training a CNN regression model with a 34560 size dataset. I have already got a training error rate less than 5%, but the validation error rate is over 60%. It seems to be an overfitting problem. And I tried the four ways to solve the problem, but none of them works well:
Increase the dataset size
Reduce the model complexity
Add a dropout layer before the output layer
Use L2 regularization / weight decay
Probably I did not use them in the right way. Can someone tell some details of these methods? Or are there any other ways to solve the overfitting problem?

80-20 or 80-10-10 for training machine learning models?

I have a very basic question.
1) When is it recommended to hold part of the data for validation and when is it unnecessary? For example, when can we say it is better to have 80% training, 10% validating and 10% testing split and when can we say it is enough to have a simple 80% training and 20% testing split?
2) Also, does using K-Cross Validation go with the simple split (training-testing)?
I find it more valuable to have a training and validation set if I have a limited size data set. The validation set is essentially a test set anyway. The reason for this is that you want your model to be able to extrapolate from having a high accuracy on the data it is trained on too also have high accuracy on data it has not seen before. The validation set allows you to determine if that is the case. I generally take at least 10% of the data set and make it a validation set. It is important that you select the validation data randomly so that it's probability distribution matches that of the training set. Next I monitor the validation loss and save the model with the lowest validation loss. I also use an adjustable learning rate. Keras has two useful callbacks for this purpose, ModelCheckpoint and ReduceLROnPlateau. Documentation is here. With a validation set you can monitor the validation loss during training and ascertain if your model is training proberly (training accuracy) and if it is extrapolating properly ( validation loss). The validation loss on average should decrease as the model accuracy increases. If the validation loss starts to increase with high training accuracy your model is over fitting and you can take remedial action such as including dropout layers, regularizers or reduce your model complexity. Documentation for that is here and here. To see why I use an adjustable learning rate see the answer to a stack overflow question here.

Can I update weights of keras neural net only if validation improves?

I am training a neural network in keras and I reach a classical limit - my training accuracy improves with increasing epochs, but my validation accuracy decreases after 9 epochs (see figure).
I wonder if I can avoid the decrease of validation accuracy by doing the following: make the keras net only accept the changes to the weights after each epoch if the epoch led to an improvement of the validation accuracy, else reset to the state before the epoch? I assume that the validation is starting to diverge in a big part because after each epoch >9 the weights of the neural net diverge away from similarity to the validation data.
So, is my suggestion a good practice and can I achieve it in keras (are there callbacks or options that allow me to update the net only if the validation improved)?
Side question: Is my suggestion maybe violating the principle of "don't use your validation data for training"? Because I am making implicitly the performance of the neural net a function of my validation data.
The point of the validation set is to give you an idea of the generalizability your model achieves by learning using the training data. You don't HAVE to have a validation dataset. If your validation data is a random sample of your training data, then your best bet is probably modifying your architecture.
In short, if you want your model to train based on your validation data, then train the model on the training set, then take the resulting model, and train it on the validation data (i.e. make the validation data the training data). This obviously defeats the point of having a validation set.

What is the difference between test and validation specifically in Mask-R-CNN?

I have my own image dataset and use Mask-R-CNN for training. There you divide your dataset into train, valivation and test.
I want to know the difference between validation and test.
I know that validation in general is used to see the quality of the NN after each epoch. Based on that you can see how good the NN is and if overfitting is happening.
But i want to know if the NN learns based on the validation set.
Based on the trainset the NN learns after each image and adjusts each neuron to reduce the loss. And after the NN is finished learning, we use the testset to see how good our NN is really with new unseen images.
But what exactly happen in Mask-R-CNN based on the validationset? Is the validation set only there for seeing the results? Or will some parameters be adjusted based on the validation result to avoid overfitting? An even if this is the case, how much influence does the validationset have on the parameters? Will the neurons itself be adjusted or not?
If the influence is very very small, then i will choose the validation set equal to the testset, because i don't have many images(800).
So basically i want to know the difference between test and validation in Mask-R-CNN, that is how and how much the validationset influence the NN.
The model does not learn off the validation set. The validation set is just used to give an approximation of generalization error at any epoch but also, crucially, for hyperparameter optimization. So I can iterate over several different hyperparameter configuration and evaluate the accuracy of those on the validation set.
Then after we choose the best model based on the validation set accuracies we can then calculate the test error based on the test set. Ideally there is not a large difference between test set and validation set accuracies. Sometimes your model can essentially 'overfit' to the validation set if you iterate over lots of different hyperparameters.
Reserving another set, the test set, to evaluate on after this validation set evaluation is a luxury you may have if you have a lot of data. Lots of times you may be lacking enough labelled data for it even to be worth having a separate test set held back.
Lastly, these things are not specific to an Mask RCNN. Validation sets never affect the training of a model i.e. the weights or biases. Validation sets, like test sets, are purely for evaluation purposes.

Alternatives to validate Multi Linear regression time series

I am using multi linear regression to do sales quantity forecasting in retail. Due to practical issues, I cannot use use ARIMA or Neural Networks.
I split the historical data into train and validation sets. Using a walk forward validation method would be computationally quite expensive at this point. I have to take x number of weeks preceding current date as my validation set. The time series prior to x is my training set. The problem I am noting with this method is that accuracy is far higher during the validation period as compared to the future prediction. That is, the further we move from the end of the training period, the less accurate the prediction / forecast. How best can I control this problem?
Perhaps a smaller validation period, will allow the training period to come closer to the current date and hence provide a more accurate forecast; but this hurts the value of validation.
Another thought is to cheat and give both the training and validation historical data during training. As I am not using neural nets, the selected algo should not be over-fitted. Please correct me if this assumption is not right.
Any other thoughts or solution would be most welcome.
Thanks
Regards,
Adeel
If you're not using ARIMA or DNN, how about using rolling windows of regressions to train and test the historical data?

Resources