I'm training a yolov3 neural network (https://github.com/ultralytics/yolov3/) to recognize objects in an image and was able to get some metrics out.
I was just wondering if anyone knew how to interpret the following metrics (i.e. definition of what these metrics measure).
Objectness
Classification.
yoloV3 Training Metrics Plots
I'm assuming the val Objectness and val Classification are the scores for the validation set.
Thanks!
Sorry for the late reply. Anyways, hope this useful for somebody.
Objectness: measures how well the model is at identifying that an object exists in a proposed region of interest
Classification: measures how well the model is at labeling those objects by their corresponding associated class
Both are usually calculated by nn.BCEWithLogitsLoss as both are classification tasks
Related
I am doing my research regarding object detection using YOLO although I am from civil engineering field and not familiar with computer science. My advisor is asking me to validate my YOLO detection model trained on custom dataset. But my problem is I really don't know how to validate my model. So, please kindly point me out how to validate my model.
Thanks in advance.
I think first you need to make sure that all the cases you are interested in (location of objects, their size, general view of the scene, etc) are represented in your custom dataset - in other words, the collected data reflects your task. You can discuss it with your advisor. Main rule - you label data qualitatively in same manner as you want to see it on the output. more information can be found here
It's really important - garbage in, garbage out, the quality of output of your trained model is determined by the quality of the input (labelled data)
If this is done, it is common practice to split your data into training and test sets. During model training only train set is used, and you can later validate the quality (generalizing ability, robustness, etc) on data that the model did not see - on the test set. It's also important, that this two subsets don't overlap - than your model will be overfitted and the model will not perform the tasks properly.
Than you can train few different models (with some architectural changes for example) on the same train set and validate them on the same test set, and this is a regular validation process.
We are trying to understand the underlying model of Rasa - the forums there still didnt get us an answer - on two main questions:
we understand that Rasa model is a transformer-based architecture. Was it
pre-trained on any data set? (eg wikipedia, etc)
then, if we
understand correctly, the intent classification is a fine tuning task
on top of that transformer. How come it works with such small
training sets?
appreciate any insights!
thanks
Lior
the transformer model is not pre-trained on any dataset. We use quite a shallow stack of transformer which is not as data hungry as deeper stacks of transformers used in large pre-trained language models.
Having said that, there isn't an exact number of data points that will be sufficient for training your assistant as it varies by the domain and your problem. Usually a good estimate is 30-40 examples per intent.
Using sentiment analysis API and want to know how the AI bias that gets in through the training set of data and other biases quantified. Any help would be appreciated.
There are several tools developed to deal with it:
Fair Learn https://fairlearn.github.io/
Interpretability Toolkit https://learn.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability
In Fair Learn you can see how biased a ML model is after it has been trained with the data set and choose a maybe less accurate model which performs better with biases. The explainable ML models provide different correlation of inputs with outputs and combined with Fair Learn can give an idea of the health of the ML model.
I am new to deep learning and I hope you guys can help me.
The following site uses CNN features for multi-class classification:
https://www.mathworks.com/help/deeplearning/examples/feature-extraction-using-alexnet.html
This example extracts features from fully connected layer and the extracted features are fed to ECOC classifier.
In this example, regarding to the whole dataset, there are total 15 samples in each category and in the training dataset, there are 11 samples in each category.
My question are related to the dataset size: If I want to use cnn features for ECOC classification as above example, it must be required to have the number of samples in each category the same?
If so, would you like to explain why?
If not, would you like to show the reference papers which have used different numbers?
Thank you.
You may want to have a balanced dataset to prevent your model from learning a wrong probability distribution. If a category represents 95% of your dataset, a model that classifies everything as part of that category, will have an accuracy of 95%.
My problem is that I obtain a model with very good results (training and cross-validating), but when I test it again (with a different data set) poor results appear.
I got a model which has been trained and cross-validating tested. The model shows AUC=0.933, TPR=0.90 and FPR=0.04
I guess there is no overfitting present looking at pictures, corresponding to learning curve (error), learning curve (score), and deviance curve:
The problem is that when I test this model with a different test data set, I obtain poor results, nothing to do with my previus results AUC=0.52, TPR=0.165 and FPR=0.105
I used Gradient Boosting Classifier to train my model, with learning_rate=0.01, max_depth=12, max_features='auto', min_samples_leaf=3, n_estimators=750
I used SMOTE to balance the class. It is binary model. I vectorized my categorical attributes. I used 75% of my data set to train and 25% tot test. My model has a very low training error, and a low test error, so I guess it is not overfitted. Training error is very low, so there are not outliers in the training and cv-test data sets. What can I do from now on to find the problem? Thanks
If the process generating your datasets is non-stationary it could cause the behavior you describe.
In that case the distribution of the dataset you're using to test has not been used for training