Training and Validating Correctly With Encog - validation

I think I'm doing something wrong with Encog. In all of the examples I've seen, they simply TRAIN until a certain training error is reached and then print the results. When is the gradient calculated and the weights of the hidden layers updated? Is this all contained within the training.iteration() function? This makes no sense because even though my TRAINING error keeps decreasing in my program, which seems to imply that the weights are changing, I have not yet run a validation set through the network (which I broke off and separated from the training set when building the data at the beginning) in order to determine if the validation error is still decreasing with the training error.
I have also loaded the validation set into a trainer and ran it through the network with compute() but the validation error is always similar to the training error - so it's hard to tell if its the same error from training. Meanwhile, the testing hit rate is less than 50% (expected if not learning).
I know there are a lot of different types of backpropogation techniques, particularly the common one using gradient descent as well as resilient backpropogation. What part of the network are we expected to update manually ourselves?

In Encog, weights are updated during the Train.iteration method call. This includes all weights. If you are using a gradient descent type trainer (i.e. backprop, rprop, quickprop) then your neural network is updated at the end of each iteration call. If you are using a population based trainer (i.e. genetic algorithm, etc) then you must call finishTraining so that the best population member can be copied back to the actual neural network that you passed to the trainer's constructor. Actually, its always a good idea to call finishTraining after your iterations. Some trainers need it, others do not.
Another thing to keep in mind is that some trainers report the current error at the beginning of the call to iteration, others at the of the iteration(improved error). This is for efficiency to keep some of the trainers from having to iterate over the data twice.
Keeping a validation set to test your training is a good idea. A few methods that might be helpful to you:
BasicNetwork.dumpWeights - Displays the weights for your neural network. This allows you to see if they have changed.
BasicNetwork.calculateError - Pass a training set to this and it will give you the error.

Related

Regularization vs. Validation

What I always see in the papers and articles about under/overfitting is a falling curve for training error and a U-shaped curve for testing error, saying the area left to the U-curve bottom is subject to underfitting and the area right to it is subject to overfitting.
To find the best model, we can test each configuration (e.g. changing the number of nodes and layers) and compare the test error values to find the minimum point (typically via cross-validation). That looks straightforward and perfect.
Do we need a regularizer to achieve this point? This is what I am not sure I have the topic understood well. To me, it seems that we don't need a regularizer if we can test different model configurations. The only case when a regularizer comes to play is when we have a fixed model configuration (e.g. fixed number of nodes and layers) and don't want to try other configurations, so we use regularizer to limit the model complexity by forcing other model parameters (e.g. network weights) to low values. Is this view right?
But if it is right, then what is the intuition behind it? First of all, when using a regularizer we don't know in advance if this network configuration/complexity bring us to the right or left of the minimum of test error curve. It may be already underfit, overfit, or fit. Putting math aside, why forcing weights to lower values will cause network to be more generalizable and less overfit? Is there any analogy of this method with the previous method of moving along test loss curve to find its minimum? Also regularizer does its job while training, it can not do anything with test data. How can it help to move toward minimum test error?

What is the difference between test and validation specifically in Mask-R-CNN?

I have my own image dataset and use Mask-R-CNN for training. There you divide your dataset into train, valivation and test.
I want to know the difference between validation and test.
I know that validation in general is used to see the quality of the NN after each epoch. Based on that you can see how good the NN is and if overfitting is happening.
But i want to know if the NN learns based on the validation set.
Based on the trainset the NN learns after each image and adjusts each neuron to reduce the loss. And after the NN is finished learning, we use the testset to see how good our NN is really with new unseen images.
But what exactly happen in Mask-R-CNN based on the validationset? Is the validation set only there for seeing the results? Or will some parameters be adjusted based on the validation result to avoid overfitting? An even if this is the case, how much influence does the validationset have on the parameters? Will the neurons itself be adjusted or not?
If the influence is very very small, then i will choose the validation set equal to the testset, because i don't have many images(800).
So basically i want to know the difference between test and validation in Mask-R-CNN, that is how and how much the validationset influence the NN.
The model does not learn off the validation set. The validation set is just used to give an approximation of generalization error at any epoch but also, crucially, for hyperparameter optimization. So I can iterate over several different hyperparameter configuration and evaluate the accuracy of those on the validation set.
Then after we choose the best model based on the validation set accuracies we can then calculate the test error based on the test set. Ideally there is not a large difference between test set and validation set accuracies. Sometimes your model can essentially 'overfit' to the validation set if you iterate over lots of different hyperparameters.
Reserving another set, the test set, to evaluate on after this validation set evaluation is a luxury you may have if you have a lot of data. Lots of times you may be lacking enough labelled data for it even to be worth having a separate test set held back.
Lastly, these things are not specific to an Mask RCNN. Validation sets never affect the training of a model i.e. the weights or biases. Validation sets, like test sets, are purely for evaluation purposes.

Does validation accuracy/loss impact training in caffe

Have a simple question about validation set in caffe, was wondering if validation set has any impact on training? I know that you use validation set to check if the network isn't overfitting and as I understand validation set has no impact on weight update, but does it have some kind of impact on selecting or modifying hyper-parameters or is it just for user to see and estimate how well network has learned?
No, the results of the validation set are not used by the neural network during training to adjust any hyperparameters. Using the validation set during training is the same as applying the network at some point in time to predict values for the validation set, and then scoring how well it did.
You might decide that you want to run the same network training procedure many times over using different values for hyperparameters. In its fully exhaustive form, that would mean you would do a grid search over the hyperparameter space with many different training sessions of separate networks. In practice, it's not a great idea to do a fully exhaustive grid search with neural networks because the amount of parameters can be extremely large.
Often with neural networks you can tune one parameter at a time until they each seem "about right". Of course this might not get you the absolute best result, but it's not a bad first approach.

How is cross validation implemented?

I'm currently trying to train a neural network using cross validation, but I'm not sure if I'm getting how cross validation works. I understand the concept, but I can't totally see yet how the concept translates to code implementation. The following is a description of what I've got implemented, which is more-or-less guesswork.
I split the entire data set into K-folds, where 1 fold is the validation set, 1 fold is the testing set, and the data in the remaining folds are dumped into the training set.
Then, I loop K times, each time reassigning the validation and testing sets to other folds. Within each loop, I continuously train the network (update the weights) using only the training set until the error produced by the network meets some threshold. However, the error that is used to decide when to stop training is produced using the validation set, not the training set. After training is done, the error is once again produced, but this time using the testing set. This error from the testing set is recorded. Lastly, all the weights are re-initialized (using the same random number generator used to initialize them originally) or reset in some fashion to undo the learning that was done before moving on to the next set of validation, training, and testing sets.
Once all K loops finish, the errors recorded in each iteration of the K-loop are averaged.
I have bolded the parts where I'm most confused about. Please let me know if I made any mistakes!
I believe your implementation of Cross Validation is generally correct. To answer your questions:
However, the error that is used to decide when to stop training is produced using the validation set, not the training set.
You want to use the error on the validation set because it's reduces overfitting. This is the reason you always want to have a validation set. If you would do as you suggested, you could have a lower threshold, your algorithm will achieve a higher training accuracy than validation accuracy. However, this would generalize poorly to the unseen examples in the real world, that which your validation set is supposed to model.
Lastly, all the weights are re-initialized (using the same random number generator used to initialize them originally) or reset in some fashion to undo the learning that was done before moving on to the next set of validation, training, and testing sets.
The idea behind cross validation is that each iteration is like training the algorithm from scratch. This is desirable since by averaging your validation score, you get a more robust value. It protects against the possibility of a biased validation set.
My only suggestion would be to not use a test set in your cross validation scheme, since your validation set already models unseen examples, a seperate test set during the cross validation is redundant. I would instead split the data into a training and test set before you start cross validation. I would then not touch the test set until you want to gain an objective score for your algorithm.
You could use your cross validation score as an indication of performance on unseen examples, I assume however that you will be choosing parameters on this score, optimizing your model for your training set. Again, the possibility arises this does not generalize well to unseen examples, which is why it is a good practice to keep a seperate unseen test set. Which is only used after you have optimized your algorithm.

Binary classification of sensor data

My problem is the following: I need to classify a data stream coming from an sensor. I have managed to get a baseline using the
median of a window and I subtract the values from that baseline (I want to avoid negative peaks, so I only use the absolute value of the difference).
Now I need to distinguish an event (= something triggered the sensor) from the noise near the baseline:
The problem is that I don't know which method to use.
There are several approaches of which I thought of:
sum up the values in a window, if the sum is above a threshold the class should be EVENT ('Integrate and dump')
sum up the differences of the values in a window and get the mean value (which gives something like the first derivative), if the value is positive and above a threshold set class EVENT, set class NO-EVENT otherwise
combination of both
(unfortunately these approaches have the drawback that I need to guess the threshold values and set the window size)
using SVM that learns from manually classified data (but I don't know how to set up this algorithm properly: which features should I look at, like median/mean of a window?, integral?, first derivative?...)
What would you suggest? Are there better/simpler methods to get this task done?
I know there exist a lot of sophisticated algorithms but I'm confused about what could be the best way - please have a litte patience with a newbie who has no machine learning/DSP background :)
Thank you a lot and best regards.
The key to evaluating your heuristic is to develop a model of the behaviour of the system.
For example, what is the model of the physical process you are monitoring? Do you expect your samples, for example, to be correlated in time?
What is the model for the sensor output? Can it be modelled as, for example, a discretized linear function of the voltage? Is there a noise component? Is the magnitude of the noise known or unknown but constant?
Once you've listed your knowledge of the system that you're monitoring, you can then use that to evaluate and decide upon a good classification system. You may then also get an estimate of its accuracy, which is useful for consumers of the output of your classifier.
Edit:
Given the more detailed description, I'd suggest trying some simple models of behaviour that can be tackled using classical techniques before moving to a generic supervised learning heuristic.
For example, suppose:
The baseline, event threshold and noise magnitude are all known a priori.
The underlying process can be modelled as a Markov chain: it has two states (off and on) and the transition times between them are exponentially distributed.
You could then use a hidden Markov Model approach to determine the most likely underlying state at any given time. Even when the noise parameters and thresholds are unknown, you can use the HMM forward-backward training method to train the parameters (e.g. mean, variance of a Gaussian) associated with the output for each state.
If you know even more about the events, you can get by with simpler approaches: for example, if you knew that the event signal always reached a level above the baseline + noise, and that events were always separated in time by an interval larger than the width of the event itself, you could just do a simple threshold test.
Edit:
The classic intro to HMMs is Rabiner's tutorial (a copy can be found here). Relevant also are these errata.
from your description a correctly parameterized moving average might be sufficient
Try to understand the Sensor and its output. Make a model and do a Simulator that provides mock-data that covers expected data with noise and all that stuff
Get lots of real sensor data recorded
visualize the data and verify your assuptions and model
annotate your sensor data i. e. generate ground truth (your simulator shall do that for the mock data)
from what you learned till now propose one or more algorithms
make a test system that can verify your algorithms against ground truth and do regression against previous runs
implement your proposed algorithms and run them against ground truth
try to understand the false positives and false negatives from the recorded data (and try to adapt your simulator to reproduce them)
adapt your algotithm(s)
some other tips
you may implement hysteresis on thresholds to avoid bouncing
you may implement delays to avoid bouncing
beware of delays if implementing debouncers or low pass filters
you may implement multiple algorithms and voting
for testing relative improvements you may do regression tests on large amounts data not annotated. then you check the flipping detections only to find performance increase/decrease

Resources