How is cross validation implemented? - validation

I'm currently trying to train a neural network using cross validation, but I'm not sure if I'm getting how cross validation works. I understand the concept, but I can't totally see yet how the concept translates to code implementation. The following is a description of what I've got implemented, which is more-or-less guesswork.
I split the entire data set into K-folds, where 1 fold is the validation set, 1 fold is the testing set, and the data in the remaining folds are dumped into the training set.
Then, I loop K times, each time reassigning the validation and testing sets to other folds. Within each loop, I continuously train the network (update the weights) using only the training set until the error produced by the network meets some threshold. However, the error that is used to decide when to stop training is produced using the validation set, not the training set. After training is done, the error is once again produced, but this time using the testing set. This error from the testing set is recorded. Lastly, all the weights are re-initialized (using the same random number generator used to initialize them originally) or reset in some fashion to undo the learning that was done before moving on to the next set of validation, training, and testing sets.
Once all K loops finish, the errors recorded in each iteration of the K-loop are averaged.
I have bolded the parts where I'm most confused about. Please let me know if I made any mistakes!

I believe your implementation of Cross Validation is generally correct. To answer your questions:
However, the error that is used to decide when to stop training is produced using the validation set, not the training set.
You want to use the error on the validation set because it's reduces overfitting. This is the reason you always want to have a validation set. If you would do as you suggested, you could have a lower threshold, your algorithm will achieve a higher training accuracy than validation accuracy. However, this would generalize poorly to the unseen examples in the real world, that which your validation set is supposed to model.
Lastly, all the weights are re-initialized (using the same random number generator used to initialize them originally) or reset in some fashion to undo the learning that was done before moving on to the next set of validation, training, and testing sets.
The idea behind cross validation is that each iteration is like training the algorithm from scratch. This is desirable since by averaging your validation score, you get a more robust value. It protects against the possibility of a biased validation set.
My only suggestion would be to not use a test set in your cross validation scheme, since your validation set already models unseen examples, a seperate test set during the cross validation is redundant. I would instead split the data into a training and test set before you start cross validation. I would then not touch the test set until you want to gain an objective score for your algorithm.
You could use your cross validation score as an indication of performance on unseen examples, I assume however that you will be choosing parameters on this score, optimizing your model for your training set. Again, the possibility arises this does not generalize well to unseen examples, which is why it is a good practice to keep a seperate unseen test set. Which is only used after you have optimized your algorithm.

Related

What is the difference between test and validation specifically in Mask-R-CNN?

I have my own image dataset and use Mask-R-CNN for training. There you divide your dataset into train, valivation and test.
I want to know the difference between validation and test.
I know that validation in general is used to see the quality of the NN after each epoch. Based on that you can see how good the NN is and if overfitting is happening.
But i want to know if the NN learns based on the validation set.
Based on the trainset the NN learns after each image and adjusts each neuron to reduce the loss. And after the NN is finished learning, we use the testset to see how good our NN is really with new unseen images.
But what exactly happen in Mask-R-CNN based on the validationset? Is the validation set only there for seeing the results? Or will some parameters be adjusted based on the validation result to avoid overfitting? An even if this is the case, how much influence does the validationset have on the parameters? Will the neurons itself be adjusted or not?
If the influence is very very small, then i will choose the validation set equal to the testset, because i don't have many images(800).
So basically i want to know the difference between test and validation in Mask-R-CNN, that is how and how much the validationset influence the NN.
The model does not learn off the validation set. The validation set is just used to give an approximation of generalization error at any epoch but also, crucially, for hyperparameter optimization. So I can iterate over several different hyperparameter configuration and evaluate the accuracy of those on the validation set.
Then after we choose the best model based on the validation set accuracies we can then calculate the test error based on the test set. Ideally there is not a large difference between test set and validation set accuracies. Sometimes your model can essentially 'overfit' to the validation set if you iterate over lots of different hyperparameters.
Reserving another set, the test set, to evaluate on after this validation set evaluation is a luxury you may have if you have a lot of data. Lots of times you may be lacking enough labelled data for it even to be worth having a separate test set held back.
Lastly, these things are not specific to an Mask RCNN. Validation sets never affect the training of a model i.e. the weights or biases. Validation sets, like test sets, are purely for evaluation purposes.

Which model to pick from K fold Cross Validation

I was reading about cross validation and about how it it is used to select the best model and estimate parameters , I did not really understand the meaning of it.
Suppose I build a Linear regression model and go for a 10 fold cross validation, I think each of the 10 will have different coefficiant values , now from 10 different which should I pick as my final model or estimate parameters.
Or do we use Cross Validation only for the purpose of finding an average error(average of 10 models in our case) and comparing against another model ?
If your build a Linear regression model and go for a 10 fold cross validation, indeed each of the 10 will have different coefficient values. The reason why you use cross validation is that you get a robust idea of the error of your linear model - rather than just evaluating it on one train/test split only, which could be unfortunate or too lucky. CV is more robust as no ten splits can be all ten lucky or all ten unfortunate.
Your final model is then trained on the whole training set - this is where your final coefficients come from.
Cross-validation is used to see how good your models prediction is. It's pretty smart making multiple tests on the same data by splitting it as you probably know (i.e. if you don't have enough training data this is good to use).
As an example it might be used to make sure you aren't overfitting the function. So basically you try your function when you've finished it with Cross-validation and if you see that the error grows a lot somewhere you go back to tweaking the parameters.
Edit:
Read the wikipedia for deeper understanding of how it works: https://en.wikipedia.org/wiki/Cross-validation_%28statistics%29
You are basically confusing Grid-search with cross-validation. The idea behind cross-validation is basically to check how well a model will perform in say a real world application. So we basically try randomly splitting the data in different proportions and validate it's performance. It should be noted that the parameters of the model remain the same throughout the cross-validation process.
In Grid-search we try to find the best possible parameters that would give the best results over a specific split of data (say 70% train and 30% test). So in this case, for different combinations of the same model, the dataset remains constant.
Read more about cross-validation here.
Cross Validation is mainly used for the comparison of different models.
For each model, you may get the average generalization error on the k validation sets. Then you will be able to choose the model with the lowest average generation error as your optimal model.
Cross-Validation or CV allows us to compare different machine learning methods and get a sense of how well they will work in practice.
Scenario-1 (Directly related to the question)
Yes, CV can be used to know which method (SVM, Random Forest, etc) will perform best and we can pick that method to work further.
(From these methods different models will be generated and evaluated for each method and an average metric is calculated for each method and the best average metric will help in selecting the method)
After getting the information about the best method/ or best parameters we can train/retrain our model on the training dataset.
For parameters or coefficients, these can be determined by grid search techniques. See grid search
Scenario-2:
Suppose you have a small amount of data and you want to perform training, validation and testing on data. Then dividing such a small amount of data into three sets reduce the training samples drastically and the result will depend on the choice of pairs of training and validation sets.
CV will come to the rescue here. In this case, we don't need the validation set but we still need to hold the test data.
A model will be trained on k-1 folds of training data and the remaining 1 fold will be used for validating the data. A mean and standard deviation metric will be generated to see how well the model will perform in practice.

Does validation accuracy/loss impact training in caffe

Have a simple question about validation set in caffe, was wondering if validation set has any impact on training? I know that you use validation set to check if the network isn't overfitting and as I understand validation set has no impact on weight update, but does it have some kind of impact on selecting or modifying hyper-parameters or is it just for user to see and estimate how well network has learned?
No, the results of the validation set are not used by the neural network during training to adjust any hyperparameters. Using the validation set during training is the same as applying the network at some point in time to predict values for the validation set, and then scoring how well it did.
You might decide that you want to run the same network training procedure many times over using different values for hyperparameters. In its fully exhaustive form, that would mean you would do a grid search over the hyperparameter space with many different training sessions of separate networks. In practice, it's not a great idea to do a fully exhaustive grid search with neural networks because the amount of parameters can be extremely large.
Often with neural networks you can tune one parameter at a time until they each seem "about right". Of course this might not get you the absolute best result, but it's not a bad first approach.

Should I split my data into training/testing/validation sets with k-fold-cross validation?

When evaluating a recommender system, one could split his data into three pieces: training, validation and testing sets. In such case, the training set would be used to learn the recommendation model from data and the validation set would be used to choose the best model or parameters to use. Then, using the chosen model, the user could evaluate the performance of his algorithm using the testing set.
I have found a documentation page for the scikit-learn cross validation (http://scikit-learn.org/stable/modules/cross_validation.html) where it says that is not necessary to split the data into three pieces when using k-fold-cross validation, but only into two: training and testing.
A solution to this problem is a procedure called cross-validation (CV for short). A test set should still be held out for final evaluation, but the validation set is no longer needed when doing CV. In the basic approach, called k-fold CV, the training set is split into k smaller sets (other approaches are described below, but generally follow the same principles).
I am wondering if this would be a good approach. And if so, someone could show me a reference to an article/book backing this theory up?
Cross validation does not avoid validation set, it simply uses many. In other words instead of one split into three parts, you have one split into two, and what you now call "training" is actually what previously has been training and validation, CV is simply about repeated splits (in slightly more smart manner than just randomly) into train and test, and then averaging the results. Theory backing it up is widely available in pretty much any good ML book; the crucial bit is "should I use it" and the answer is suprisingly simple - only if you do not have enough data to do one split. CV is used when you do not have enough data for each of the splits to be representative for the distribution you are interested in, then doing repeated splits simply reduce the variance. Furthermore, for really small datasets one does nested CV - one for [train+val][test] split and internal for [train][val], so the variance of both - model selection and its final evaluation - are reduced.

Training and Validating Correctly With Encog

I think I'm doing something wrong with Encog. In all of the examples I've seen, they simply TRAIN until a certain training error is reached and then print the results. When is the gradient calculated and the weights of the hidden layers updated? Is this all contained within the training.iteration() function? This makes no sense because even though my TRAINING error keeps decreasing in my program, which seems to imply that the weights are changing, I have not yet run a validation set through the network (which I broke off and separated from the training set when building the data at the beginning) in order to determine if the validation error is still decreasing with the training error.
I have also loaded the validation set into a trainer and ran it through the network with compute() but the validation error is always similar to the training error - so it's hard to tell if its the same error from training. Meanwhile, the testing hit rate is less than 50% (expected if not learning).
I know there are a lot of different types of backpropogation techniques, particularly the common one using gradient descent as well as resilient backpropogation. What part of the network are we expected to update manually ourselves?
In Encog, weights are updated during the Train.iteration method call. This includes all weights. If you are using a gradient descent type trainer (i.e. backprop, rprop, quickprop) then your neural network is updated at the end of each iteration call. If you are using a population based trainer (i.e. genetic algorithm, etc) then you must call finishTraining so that the best population member can be copied back to the actual neural network that you passed to the trainer's constructor. Actually, its always a good idea to call finishTraining after your iterations. Some trainers need it, others do not.
Another thing to keep in mind is that some trainers report the current error at the beginning of the call to iteration, others at the of the iteration(improved error). This is for efficiency to keep some of the trainers from having to iterate over the data twice.
Keeping a validation set to test your training is a good idea. A few methods that might be helpful to you:
BasicNetwork.dumpWeights - Displays the weights for your neural network. This allows you to see if they have changed.
BasicNetwork.calculateError - Pass a training set to this and it will give you the error.

Resources