Can I update weights of keras neural net only if validation improves? - validation

I am training a neural network in keras and I reach a classical limit - my training accuracy improves with increasing epochs, but my validation accuracy decreases after 9 epochs (see figure).
I wonder if I can avoid the decrease of validation accuracy by doing the following: make the keras net only accept the changes to the weights after each epoch if the epoch led to an improvement of the validation accuracy, else reset to the state before the epoch? I assume that the validation is starting to diverge in a big part because after each epoch >9 the weights of the neural net diverge away from similarity to the validation data.
So, is my suggestion a good practice and can I achieve it in keras (are there callbacks or options that allow me to update the net only if the validation improved)?
Side question: Is my suggestion maybe violating the principle of "don't use your validation data for training"? Because I am making implicitly the performance of the neural net a function of my validation data.

The point of the validation set is to give you an idea of the generalizability your model achieves by learning using the training data. You don't HAVE to have a validation dataset. If your validation data is a random sample of your training data, then your best bet is probably modifying your architecture.
In short, if you want your model to train based on your validation data, then train the model on the training set, then take the resulting model, and train it on the validation data (i.e. make the validation data the training data). This obviously defeats the point of having a validation set.

Related

80-20 or 80-10-10 for training machine learning models?

I have a very basic question.
1) When is it recommended to hold part of the data for validation and when is it unnecessary? For example, when can we say it is better to have 80% training, 10% validating and 10% testing split and when can we say it is enough to have a simple 80% training and 20% testing split?
2) Also, does using K-Cross Validation go with the simple split (training-testing)?
I find it more valuable to have a training and validation set if I have a limited size data set. The validation set is essentially a test set anyway. The reason for this is that you want your model to be able to extrapolate from having a high accuracy on the data it is trained on too also have high accuracy on data it has not seen before. The validation set allows you to determine if that is the case. I generally take at least 10% of the data set and make it a validation set. It is important that you select the validation data randomly so that it's probability distribution matches that of the training set. Next I monitor the validation loss and save the model with the lowest validation loss. I also use an adjustable learning rate. Keras has two useful callbacks for this purpose, ModelCheckpoint and ReduceLROnPlateau. Documentation is here. With a validation set you can monitor the validation loss during training and ascertain if your model is training proberly (training accuracy) and if it is extrapolating properly ( validation loss). The validation loss on average should decrease as the model accuracy increases. If the validation loss starts to increase with high training accuracy your model is over fitting and you can take remedial action such as including dropout layers, regularizers or reduce your model complexity. Documentation for that is here and here. To see why I use an adjustable learning rate see the answer to a stack overflow question here.

Alternatives to validate Multi Linear regression time series

I am using multi linear regression to do sales quantity forecasting in retail. Due to practical issues, I cannot use use ARIMA or Neural Networks.
I split the historical data into train and validation sets. Using a walk forward validation method would be computationally quite expensive at this point. I have to take x number of weeks preceding current date as my validation set. The time series prior to x is my training set. The problem I am noting with this method is that accuracy is far higher during the validation period as compared to the future prediction. That is, the further we move from the end of the training period, the less accurate the prediction / forecast. How best can I control this problem?
Perhaps a smaller validation period, will allow the training period to come closer to the current date and hence provide a more accurate forecast; but this hurts the value of validation.
Another thought is to cheat and give both the training and validation historical data during training. As I am not using neural nets, the selected algo should not be over-fitted. Please correct me if this assumption is not right.
Any other thoughts or solution would be most welcome.
Thanks
Regards,
Adeel
If you're not using ARIMA or DNN, how about using rolling windows of regressions to train and test the historical data?

Validation loss when using Dropout

I am trying to understand the effect of dropout on validation Mean Absolute Error (non-linear regression problem).
Without dropout
With dropout of 0.05
With dropout of 0.075
Without any dropouts the validation loss is more than training loss as shown in 1. My understanding is that the validation loss should only be slightly more than the training loss for a good fit.
Carefully, I increased the dropout so that validation loss is close to the training loss as seen in 2. The dropout is only applied during training and not during validation, hence the validation loss is lower than the training loss.
Finally the dropout was increased further and the validation loss again became more than the training loss in 3.
Which amongst these three should be called as a good fit?
Following the response of Marcin Możejko, I predicted against three tests as shown in 4. The 'Y' axis shows RMS error instead of MAE. The model 'without dropout' gave the best result.
Well - this a really good question. In my opinion - the lowest validation score (confirmed on a separate test set) is the best fit. Remember that in the end - the performance of your model on a totally new data is the most crucial thing and the fact that it performed even better on a training set is not so important.
Moreover - I think that your model might generaly underfit - and you could try extend it to e.g. have more layers or neurons and prune it a little bit using dropout in order to prevent example memoization.
If my hypothesis turned out to be false - remember - that it still might be possible that there are certain data patterns present only on validation set (this relatively often in case of medium size datasets) what makes the divergence of train and test loss. Moreover - I think that even though that your losses values saturated in case without dropout there is still a room for improvement by simple increase in number of epochs as there seems to be a trend for losses to be smaller.
Another technique I recommend you to try is reducing learning rate on plateau (using example this callback) as your model seems to need refinement with lower value learning rate.

Does validation accuracy/loss impact training in caffe

Have a simple question about validation set in caffe, was wondering if validation set has any impact on training? I know that you use validation set to check if the network isn't overfitting and as I understand validation set has no impact on weight update, but does it have some kind of impact on selecting or modifying hyper-parameters or is it just for user to see and estimate how well network has learned?
No, the results of the validation set are not used by the neural network during training to adjust any hyperparameters. Using the validation set during training is the same as applying the network at some point in time to predict values for the validation set, and then scoring how well it did.
You might decide that you want to run the same network training procedure many times over using different values for hyperparameters. In its fully exhaustive form, that would mean you would do a grid search over the hyperparameter space with many different training sessions of separate networks. In practice, it's not a great idea to do a fully exhaustive grid search with neural networks because the amount of parameters can be extremely large.
Often with neural networks you can tune one parameter at a time until they each seem "about right". Of course this might not get you the absolute best result, but it's not a bad first approach.

Is validation set necessary in neural networks while cross validation alone works well in regression based models?

Do we need the validation in neural networks because neural networks do not always converge to the same answer?
I have never heard of a validation set in models such as regression or ensemble learning. We cross validate our dataset entirely. dividing it into k-fold train and test sets. however for neural networks we also need a validation set that we extract from the training set. Now I know why we need the validation set in neural networks. What I need to know is why we don't do the same procedure in let's say logistic regression.
There is a somewhat good discussion on the purpose of the training, testing and validation sets here.
As for the need of a testing set, if you are not modifying any parameters of your model, there probably isn't much of a need for a second set to test on. A neural network has a large number of parameters that can be adjusted (hidden layers, number of neurons, training runs, epochs, momentum, learning rate, etc.) that can be influenced by the results of the validation set. Using this additional testing set can be used to confirm that a model generalises well on unseen testing data after the model has been tuned (further alterations should not occur once the testing set has been completed).
I have also used a testing set for ensemble configurations in the past where the model had some adjustable parameters (number of ensemble members, combination parameters) and this also verified its ability to estimate unseen testing data after tuning.
Hope this helps!

Resources