This question pertains to AutoML for Video Intelligence (custom labels).
When setting up training data, you are instructed to only label videos with your custom labels in them (and not videos that don’t have that label). How does the model train to identify true negatives for custom labels?
After applying the score threshold, the predictions made by your model will fall in one of the following four categories.
We can use these categories to calculate precision and recall — metrics that help us gauge the effectiveness of our model.
Related
I want to perform classification on several different images (car, pen, table) and then use the trained weights to create filters specific to car, pen, table. And finally if I apply the car filter to an image it will give me the similarity percentage.
I am new to deep learning. I will be grateful if you guide me.
I want to build filters for each class
I want to detect whether or not an image has a specific (custom) object in it or not. I tried to go through the documentation of google cloud vertex ai, but I am confused. I am not an AI or ML engineer.
They provide the following services for image
Classification (Single Label)
Classification (Multi Label)
Image Object Detection
Image segmentation
Almost All of these features require at least two labels. At least 10 images must be assigned to each label for the features to work.
Now, suppose I have 10 cat images. One of my label name is cat. And then I will have to create another label named non_cat. right? There can be infinite possibilities of an image not having a cat. Does that mean, I upload 10 cat photos and 10 random junk photos in non_cat label??
Currently I have chosen image object detection. It detects multiple attributes of that custom object with confidence score. Should I use these score to identify the custom object in my backend application? Am I going into the right direction?
As per your explanation in comments you're right going with Object Detection model in this case.
Refer the google documentation on how to prepare the data for object detection model.
As per the documentation, the dataset can have minimum 1 label and can go maximum upto 1000 labels for an AutoML or custom-trained model.
Yes. Afer checking the accuracy of your model, you can utilize the confidence score to identify the object in your application.
the dataset in the model I trained for facial emotion recognition contains 48x48 images. The images I want to make predictions are mostly much smaller (20x15, 18x12 etc.) and I think the predictions will be wrong as they are much less detailed than the images in the dataset where I train the model. Do you think I should eliminate pictures whose size is less than a certain level? What should my criteria be here?
I am trying to build a Keras model to implement to approach explained in this paper.
Context of my implementation:
I have two different kinds of data representing the same set of classes(labels) that needs to be classified. The 1st kind is Image data, and the second kind is EEG data (a time series sequence).
I know that to classify image data we can use CNN models like this:
model.add(Conv2D(filters=256, kernel_size=(11,11), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
model.add(Dense(1000))
model.add(Activation('relu'))
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())
# Output Layer
model.add(Dense(40))
model.add(Activation('softmax'))
And to classify sequence data we can use LSTM models like this:
model.add(LSTM(units = 50, return_sequences = True))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(40, activation='softmax'))
But the approach of the paper above shows that EEG feature vectors can be mapped with image vectors through regression like this:
The first approach is to train a CNN to map images to corresponding
EEG feature vectors. Typically, the first layers of CNN attempt to
learn the general (global) features of the images, which are common
between many tasks, thus we initialize the weights of these layers
using pre-trained models, and then learn the weights of the last
layers from scratch in an end-to-end setting. In particular, we used
the pre-trained AlexNet CNN, and modified it by replacing the
softmax classification layer with a regression layer (containing as
many neurons as the dimensionality of the EEG feature vectors),
using Euclidean loss as the objective function.
The second approach consists of extracting image features using
pre-trained CNN models and then employ regression methods to map
image features to EEG feature vectors. We used our fine-tuned
AlexNet as feature extractors by
reading the output of the last fully connected layer, and then
applied several regression methods (namely, k-NN regression, ridge
regression, random forest regression) to obtain the predicted
feature vectors
I am not able to comprehend how to code the above two approaches. I have never used a regressor for feature mapping and then do classification. Any leads on this are much appreciated.
In my understanding the training data consists of (eeg_signal,image,class_label) triplets.
Train the LSTM model with input=eeg_signal, output=class_label. Loss is crossentropy.
Peel off the last layer of the LSTM model. Let's say the pre-last layer's output is a vector of size 20. Let's call it eeg_representation.
Run this truncated model on all your eeg_signal inputs, save the output of eeg_representation. You will get a tensor of [batch, 20]
Take that AlexNet mentioned in the paper (or any other image classifier), peel off the last layer. Let's say the pre-last layer's output is a vector of size 30. Let's call it image_representation.
Stich a linear layer to the end of the previous layer. This layer will convert image_representation to eeg_representation. It has 20 x 30 weight.
Train the stiched model on (image, eeg_representation) pairs. Loss is the Euclidean distance.
And now the fun part: Stich together model trained in step 7. and the peeled off part of model trained in step 1. If you input an image, you will get class predictions.
This sound like not a big deal (because we do image classification all the time), but if this is really working, it means that this is a "prediction that is running through our brains" :)
Thank you bringing up this question and linking the paper.
I feel I just repeated what's in your question and in the the paper.
I would be beneficial to have some toy dataset to be able to provide code examples.
Here's a Tensorflow tutorial on how to "peel off" the last layer of a pretrained image classification model.
I want to use Google AutoML vision API for image classification, but with an incremental learning setup - more specifically I should be able to incrementally provide new training data with possibly brand new (and previously unknown) class labels. For example, lets say I train the network today for three labels: A, B and C. Now, after a week, I want to add some new data labeled with a brand new class D. And then after another week, I want to add even newer data labeled with a brand new class E. At this point, the model should be able to classify an input image into any of those five classes, with each incremental addition to the model causing very little accuracy drop.
Is that possible with google AutoML vision API?
Currently you could keep importing new data into existing AutoML dataset and each week train a new model. There is import API and train API.
The assumption of causing very little accuracy drop may be unrealistic. There may valid cases when adding new label will make the accuracy go down. E.g. add labels that are hard to distinguish from previous labels or adding labels without performing data cleanup (adding label and not applying it to existing images in which objects with this label are visible).