I want to train an object tracking model in Vertex AI for one type of object. The "Train New Model" button says "To train a model, you must have at least two labels and each label included in training must have at least 15 videos assigned to it." I do not find any explanation of this requirement in the documentation. Does anyone know why I must have two labels?
The minimum condition you have mentioned to train a model is required for Vertex AI to know what object to look for, The model will learn to identify the patterns for tracking by setting bounding boxes and label for the object. Generally by having more videos with label will produce a better outcome for the training. To see more details please visit the article here.
Also I believe having more than 1 label is needed for the model to identify an object by having a reference comparison from the 2nd label. This can be handy when you are in the part of evaluating and testing your model as you can tune your score threshold and prediction outcome for a more precise model.
Related
I want to detect whether or not an image has a specific (custom) object in it or not. I tried to go through the documentation of google cloud vertex ai, but I am confused. I am not an AI or ML engineer.
They provide the following services for image
Classification (Single Label)
Classification (Multi Label)
Image Object Detection
Image segmentation
Almost All of these features require at least two labels. At least 10 images must be assigned to each label for the features to work.
Now, suppose I have 10 cat images. One of my label name is cat. And then I will have to create another label named non_cat. right? There can be infinite possibilities of an image not having a cat. Does that mean, I upload 10 cat photos and 10 random junk photos in non_cat label??
Currently I have chosen image object detection. It detects multiple attributes of that custom object with confidence score. Should I use these score to identify the custom object in my backend application? Am I going into the right direction?
As per your explanation in comments you're right going with Object Detection model in this case.
Refer the google documentation on how to prepare the data for object detection model.
As per the documentation, the dataset can have minimum 1 label and can go maximum upto 1000 labels for an AutoML or custom-trained model.
Yes. Afer checking the accuracy of your model, you can utilize the confidence score to identify the object in your application.
I have been reading through this blog in order to find what mAP is .In the sub heading of AP, they give the example of 5 apple images and finding out the average precision.As far I understand false positive is when the object is localised and classified but IOU < 0.5 (in the blog) and false negative is when the model fails to identify an object itself.So what about objects which are misclassified dont they belong to false postives?.
Also what does the table in the blog really respresent.The 'correct?' is for one particluar example or 5 examples together.Could you just brief me what is going on in your own terms or just what the blog says?
What is mAP in object detection?
mAP is just mean average precision which is the mean of APs from all the object classes. For example, if you had 5 object classes each of them would have an average precision (AP) and mAP will be the sum of those APs divided by 5.
false positive is when the object is localized and classified but IOU < 0.5
In object detection, we can have multiple classes of objects. The background is also a class but it is implicit. So for example, if we had 3 classes of objects (e.g. apple, orange, banana) the network considers it as 4 classes (apple, orange, banana, background). Only in the results, the program doesn't draw a bounding box around background objects.
False Positive means that the object detection model has reported a part of the image as an object of a specific class (e.g. apple). However, there is no apple in that part of the image. There is either another fruit like an orange (misclassification) or no fruit at all (background). Both cases are the same in the eye of the network and we consider this as false positive. So the network is considering that part as a positive sample for a specific class by mistake. The IoU can have any value in this case (it does not matter). The misclassified objects are also included in the false positive rate because they are reported as positive (for a specific class) but in fact, they are negative (they belong to another class or background).
False Negative means the model has predicted a part of the image as background when it is actually an object. In other words, the network has failed to detect an object and has reported it as background by mistake.
what does the table in the blog really represent?
The IoU (Intersection over Union) referred to in the blog which is used to report correct is calculated by dividing the area of the intersection between the detected box and the ground truth (the box drawn by a human as the correct box) by the union of those areas.
So if IoU is more than 0.5, it means that the network has predicted the apple position correctly. In the table, correct is for each apple and the precision is calculated from the number of correct predictions divided by all predictions.
I'm developing a simple game, where user can place different but modular objects (for instance: tracks, road etc).
My question is: how to match and place different object when placed one near the other ?
My first approach is to create an hidden child object (a box) for each module objects, and put it in the border where is possible to place other object (see my image example), so i can use that coordinates (x,y,z) to align other object.
But i don't know if the best approach.
Thanks
Summary:
1.Define what is a "snapping point"
2.Define which is your threshold
3.Update new game object position
Little Explanation
1.
So I suppose that you need a way to define which parts of the object are the "snapping points".
Cause they can be clear in some examples, like a Cube, where the whole vertex could be snapping points, but it's hard to define that every vertex in amorphous objects.
A simple solution could be the one exposed by #PierreBaret, whic consists in define on your transform component which are the "snapping points".
The other one is the one you propouse, creating empty game objects that will act as snapping points locations on the game object.
2.After having those snaped points, when you will drop your new gameObject, you need to define a threshold, as long as you don't want that every object snaps allways to the nearest game object.
3.So you define a minimum distance between snapping points, so if your snapping point is under that threshold, you will need to update it's position, to adjust to the the snapped point.
Visual Representation:
Note: The Threshold distance is showing just ONE of the 4 current threshold checks on the 4 vertex in the square, but this dark blue circle should be repilcate 3 more times, one for each green snapping point of the red square
Of course this method seems expensive, you can make some improvements like setting a first threshold between gameobjects, and if the gameObject is inside this threshold, then check snapping threshold distance.
Hope it helps!
Approach for arbitrary objects/models and deformable models.
[A] A physical approach would consider all the surfaces of the 2 objects, and you might need to check that objects don't overlap, using dot products between surfaces. That's a bit more expensive computing, but nothing nasty. If there is no match involved here, you'll be able to add matching features (see [B]). However, that's the only way to work with non predefined models or deformable models.
Approaches for matching simple and complex models
[B] Snapping points are a good thing but it's not sufficient alone. I think you need to make an object have:
a sparse representation (eg., complex oriented sphere to a cube),
and place key snapping points,
tagged by polarity or color, and eventually orientation (that's oriented snapping points); eg., in the case of rails, you'll want rails to snap {+} with {+} and forbid {+} with {-}. In the case of a more complex object, or when you have several orientations (eg., 2 faces of a surface, but only one is candidate for an pair of objects matching) you'll need more than 2 polarities, but 3 different ones per matching candidate surface or feature therefore the colors (or any enumeration). You need 3 different colors to make sure there is a unique 3D space configuration. You create something that is called in chemistry an enantiomer.
You can also use point pair features that describes the relative
position and orientation of two oriented points, when an oriented
surface is not appropriate.
References
Some are computer vision papers or book extracts, but they expose algorithms and concepts to achieve what I developed in my answer.
Model Globally, Match Locally: Efficient and Robust 3D Object Recognition, Drost et al.
3D Models and Matching
I got a classifier with my training set containing images of three types following this guide: https://ch.mathworks.com/help/vision/examples/image-category-classification-using-bag-of-features.html
Now I want to use this classifier to classify the images of another dataset. Outputs are supposed to give me the predicted types of the images and corresponding probabilities.
I found the function "predict" to do the prediction.
Link: https://ch.mathworks.com/help/vision/ref/imagecategoryclassifier.predict.html
However, I have two questions
First, it says:
[labelIdx,score] = predict(categoryClassifier,imds) returns the predicted label index and score for the images specified in imds.
I don't understand this "score". It says: "The score provides a negated average binary loss per class". And the outputs of "score" are negative. So is there any way I can obtain the probability(should be [0,1]) from this "score"?
Second, my testing datasets contains images of 6 types, that is, 3 more types than my classifier. But with the function "predict", it will give a label from one of three types to each images. How can I add an extra label to point out the images that cannot be classified into any of the three types?
I think this one could be solved if I can get the probabilities from my first question. At least I could set a threshold to change the labels manually.
Any suggestions that could help solve these problems? Thanks a lot!
i want to identify a ball in the picture. I am thiking of using sobel edge detection algorithm,with this i can detect the round objects in the image.
But how do i differentiate between different objects. For example, a foot ball is there in one picture and in another picture i have a picture of moon.. how to differentiate what object has been detected.
When i use my algorithm i get ball in both the cases. Any ideas?
Well if all the objects you would like to differentiate are round, you could even use a hough transformation for round objects. This is a very good way of distinguishing round objects.
But your basic problem seems to be classification - sorting the objects on your image into different classes.
For this you don't really need a Neural Network, you could simply try with a Nearest Neighbor match. It's functionalities are a bit like neural networks since you can give it several reference pictures where you tell the system what can be seen there and it will optimize itself to the best average values for each attribute you detected. By this you get a dictionary of clusters for the different types of objects.
But for this you'll of course first need something that distinguishes a ball from a moon.
Since they are all real round objects (which appear as circles) it will be useless to compare for circularity, circumference, diameter or area (only if your camera is steady and if you know a moon will always have the same size on your images, other than a ball).
So basically you need to look inside the objects itself and you can try to compare their mean color value or grayscale value or the contrast inside the object (the moon will mostly have mid-gray values whereas a soccer ball consists of black and white parts)
You could also run edge filters on the segmented objects just to determine which is more "edgy" in its texture. But for this there are better methods I guess...
So basically what you need to do first:
Find several attributes that help you distinguish the different round objects (assuming they are already separated)
Implement something to get these values out of a picture of a round object (which is already segmented of course, so it has a background of 0)
Build a system that you feed several images and their class to have a supervised learning system and feed it several images of each type (there are many implementations of that online)
Now you have your system running and can give other objects to it to classify.
For this you need to segment the objects in the image, by i.e Edge filters or a Hough Transformation
For each of the segmented objects in an image, let it run through your classification system and it should tell you which class (type of object) it belongs to...
Hope that helps... if not, please keep asking...
When you apply an edge detection algorithm you lose information.
Thus the moon and the ball are the same.
The moon has a diiferent color, a different texture, ... you can use these informations to differnentiate what object has been detected.
That's a question in AI.
If you think about it, the reason you know it's a ball and not a moon, is because you've seen a lot of balls and moons in your life.
So, you need to teach the program what a ball is, and what a moon is. Give it some kind of dictionary or something.
The problem with a dictionary of course would be that to match the object with all the objects in the dictionary would take time.
So the best solution would probably using Neural networks. I don't know what programming language you're using, but there are Neural network implementations to most languages i've encountered.
You'll have to read a bit about it, decide what kind of neural network, and its architecture.
After you have it implemented it gets easy. You just give it a lot of pictures to learn (neural networks get a vector as input, so you can give it the whole picture).
For each picture you give it, you tell it what it is. So you give it like 20 different moon pictures, 20 different ball pictures. After that you tell it to learn (built in function usually).
The neural network will go over the data you gave it, and learn how to differentiate the 2 objects.
Later you can use that network you taught, give it a picture, and it a mark of what it thinks it is, like 30% ball, 85% moon.
This has been discussed before. Have a look at this question. More info here and here.