I want to use bag of words for content-based image retrieval.
I'm confused as to how to apply bag-of-words to content based image retrieval.
To clarify:
I've trained my program using SURF features and extract the BoW descriptors. I feed this to a support vector machine as training data. Then, given a query image, the support vector machine can predict which class a given image belongs to.
In other words, given a query image it can find a class. For example, given a query image of a car, the program would return 'car'. How would one find similar images?
Would I, given the class, return images from the training set? Or would the program - given a query image - also return a subset of a test-set on which the SVM predicts the same class?
The title only mentions BoW, but in your text you also use SVMs.
I think the core idea of CBIR is, to find the most similar image, according to some distance measure. You can do this with BoW-features. The SVM is not necessary.
The main purpose of using additional classification is to speed up the process. Because after you obtained a class label for your test image, you only need to search this subgroup of your images for the best match. And of course, if the SVM is better in distinguishing certain classes than your distance measure, it might help to reduce errors.
So the standard workflow would be:
obtain the class
return the best match from the training samples of this class
Related
I got a classifier with my training set containing images of three types following this guide: https://ch.mathworks.com/help/vision/examples/image-category-classification-using-bag-of-features.html
Now I want to use this classifier to classify the images of another dataset. Outputs are supposed to give me the predicted types of the images and corresponding probabilities.
I found the function "predict" to do the prediction.
Link: https://ch.mathworks.com/help/vision/ref/imagecategoryclassifier.predict.html
However, I have two questions
First, it says:
[labelIdx,score] = predict(categoryClassifier,imds) returns the predicted label index and score for the images specified in imds.
I don't understand this "score". It says: "The score provides a negated average binary loss per class". And the outputs of "score" are negative. So is there any way I can obtain the probability(should be [0,1]) from this "score"?
Second, my testing datasets contains images of 6 types, that is, 3 more types than my classifier. But with the function "predict", it will give a label from one of three types to each images. How can I add an extra label to point out the images that cannot be classified into any of the three types?
I think this one could be solved if I can get the probabilities from my first question. At least I could set a threshold to change the labels manually.
Any suggestions that could help solve these problems? Thanks a lot!
I am trying to come up with a way to classify if the given image contains a Red Car.
The possible outcomes of the classifier should be:
Image contains a CAR and it is RED. (Desired case)
All others where Image contains a CAR but it is NOT RED or image does not contain any Cars at all.
I know how to implement a Convolutionary NN that can classify if an image contains a CAR or not.
But I am having trouble on how to implement a fine grained image classification for this where the classifier should only identify Red Cars and ignore all other images that may contain Cars or no cars in the image.
I read the following papers but since my use case is much limited than finding similarities as proposed in the papers I am trying to see if there is a simple approach to implement this.
Fast Training of Triplet-based Deep Binary Embedding Networks
Learning Fine-grained Image Similarity with Deep Ranking
Thanks for your help.
Just treat it as a classification problem with two classes: "Red car" - "No red car". Label every instance of your training data this way. There is no need to train a "car" classifier first.
I know how to implement a Convolutionary NN that can classify if an image contains a CAR or not.
Good. Then this should be done within seconds (+ time for labeling).
I read the following papers but since my use case is much limited than finding similarities as proposed in the papers I am trying to see if there is a simple approach to implement this.
Fast Training of Triplet-based Deep Binary Embedding Networks
Learning Fine-grained Image Similarity with Deep Ranking
Yes, simply treating it as a classification problem as described above. If you need a starter, have a look at the Tensorflow Cifar10 tutorial.
I want to train my svm classifier for image categorization with scikit-learn.
And I want to use opencv-python's SIFT algorithm function to extract image feature.The situation is as follow:
1. what the scikit-learn's input of svm classifier is a 2-d array, which means each row represent one image,and feature amount of each image is the same;here
2. opencv-python's SIFT algorithm returns a list of keypoints which is a numpy array of shape . here
So my question is:
How could I deal with the SIFT features to fit SVM classifier's input? Can you help me ?
update1:
Thanks for pyan's advice, I've adapt my proposal as follow:
1. get SIFT feature vectors from each image
2. perform k-means clustering over all the vectors
3. create feature dictionary, a.k.a. cookbook, based on cluster center
4. re-represent each image based on the feature dictionary, of course dimention amount of each image is the same
5. train my SVM classifier and evaluate it
update2:
I've gathered all image SIFT feature vectors into an array(x * 128),which is so large, and then I need to perform clustering on it.
The problem is:
If I use k-means , parameter cluster number has to be set, and I don't know how can I set the best value; if I do not use k-means, which algorithm may be suitable for this?
note:I want to use scikit-learn to perform clustering
My proposal is :
1. perform dbscan clustering on the vectors, then I can get label_size and labels;
2. because dbscan in scikit-learn can not be used for predicting, I could train a new classifier A based on dbscan result;
3. classifier A is just like a cookbook, I can label every image's SIFT vectors. After that, every image can be re-represented ;
4.based on the above work, I can train my final classifier B.
note:for predict a new image, its SIFT vectors must be transform by classifier A into the vector as classifier B's input
Can you give me some advice?
Image classification can be quite general. In order to define good features, first you need to be clear what kind of output you want. For example, images can be categorized according to the scenes in them into nature view, city view, indoor view etc. Different kind of classifications may require different kind of features.
A common approach used in computer vision for keywords based image classification is bag of words (feature bagging) or dictionary learning. You can do a literature search to familiarize yourself on this topic. In your case, the basic idea would be to group the SIFT features into different clusters. Instead of directly feeding scikit-learn with SIFT features, give the vector of the feature group frequency as input. So each image will be represented by a 1-D vector.
A short introduction from Wikipedia Bag-of-words model in computer vision
How can I work with my own dataset in scikit-learn?
Scikit Tutorial always take as example to load his dataset (digit dataset, flower dataset...)
http://scikit-learn.org/stable/datasets/index.html
ie: from sklearn.datasets import load_iris
I have my images and I have no idea how create new one.
Particularly, for starting, i use this example i found (i use library opencv):
img =cv2.imread('telamone.jpg')
# Convert them to grayscale
imgg =cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# SURF extraction
surf = cv2.SURF()
kp, descritors = surf.detect(imgg,None,useProvidedKeypoints = False)
# Setting up samples and responses for kNN
samples = np.array(descritors)
responses = np.arange(len(kp),dtype = np.float32)
I would like to extract features of a set of images, in a way useful to implement a machine learning algorithm!
You would first need to clearly define what you are trying to achieve: "extract feature to a set of images, in a way useful to implement a machine learning algorithm!" is much too vague to give you any guidance.
Are you trying to do:
image classification of the picture as a whole (e.g. indoor scene vs outdoor scene)?
object recognition (e.g. recognizing several instances of the same object in different pictures) inside sub-parts of a set of pictures, maybe using a scan procedures with windows of various sizes?
object detection and class-based categorization (e.g. finding all occurrences of cars or pedestrians in pictures and a bounding box around each occurrence of instances of those classes)?
full picture semantic parsing a.k.a. segmentation of the pixels + class categorization of each segment (build, road, people, trees)...
Each of those tasks will require different pipelines (feature extraction + machine learning models combo).
You should probably start by reading a book on the subject, for instance: http://szeliski.org/Book/
Also as a side note, stackoverflow is probably not the best place to ask such open ended questions.
How Would the Reverse Image Search Engines like TinEye Work ?
I mean what parameters are required to do an image search ?
Don't know if TinEye use exactly this one, but SURF is a commonly used algorithm for this purpose.
Here you can see an usage example in Mathematica where a partial matching of images is used to compose a landscape:
database: Generaly you have set of images that are collected from web sites.
For each image extract key features (SURF, SIFT, whatever) in a form of numerical vectors associated to each image. Vectors are stored in searchable database.
When you give image to TinEye this image is processed and key features are extracted. Algorithm for matching features to features in database is run and close matches are found. Associated list of images to matched features vectors is extracted and presented as links to web images.
Most likely you want an algorithm with a good locality of the image like for example a space-filling-curve. This sfc subdivide the image into smaller tiles and order and also reduce it complexity to 1-dimension. Then you want to scan the image in this order and do a fourier transformation of each tile because a transformation into frequencies is easier to save in a database. Now you have a fingerprint of your image and can compare it with other frequencies.