how to implement a general image classifier using SIFT and SVM - image

I want to train my svm classifier for image categorization with scikit-learn.
And I want to use opencv-python's SIFT algorithm function to extract image feature.The situation is as follow:
1. what the scikit-learn's input of svm classifier is a 2-d array, which means each row represent one image,and feature amount of each image is the same;here
2. opencv-python's SIFT algorithm returns a list of keypoints which is a numpy array of shape . here
So my question is:
How could I deal with the SIFT features to fit SVM classifier's input? Can you help me ?
update1:
Thanks for pyan's advice, I've adapt my proposal as follow:
1. get SIFT feature vectors from each image
2. perform k-means clustering over all the vectors
3. create feature dictionary, a.k.a. cookbook, based on cluster center
4. re-represent each image based on the feature dictionary, of course dimention amount of each image is the same
5. train my SVM classifier and evaluate it
update2:
I've gathered all image SIFT feature vectors into an array(x * 128),which is so large, and then I need to perform clustering on it.
The problem is:
If I use k-means , parameter cluster number has to be set, and I don't know how can I set the best value; if I do not use k-means, which algorithm may be suitable for this?
note:I want to use scikit-learn to perform clustering
My proposal is :
1. perform dbscan clustering on the vectors, then I can get label_size and labels;
2. because dbscan in scikit-learn can not be used for predicting, I could train a new classifier A based on dbscan result;
3. classifier A is just like a cookbook, I can label every image's SIFT vectors. After that, every image can be re-represented ;
4.based on the above work, I can train my final classifier B.
note:for predict a new image, its SIFT vectors must be transform by classifier A into the vector as classifier B's input
Can you give me some advice?

Image classification can be quite general. In order to define good features, first you need to be clear what kind of output you want. For example, images can be categorized according to the scenes in them into nature view, city view, indoor view etc. Different kind of classifications may require different kind of features.
A common approach used in computer vision for keywords based image classification is bag of words (feature bagging) or dictionary learning. You can do a literature search to familiarize yourself on this topic. In your case, the basic idea would be to group the SIFT features into different clusters. Instead of directly feeding scikit-learn with SIFT features, give the vector of the feature group frequency as input. So each image will be represented by a 1-D vector.
A short introduction from Wikipedia Bag-of-words model in computer vision

Related

Simple but reasonably accurate algorithm to determine if tag / keyword is related to an image

I have a hard problem to solve which is about automatic image keywording. You can assume that I have a database with 100000+ keyworded low quality jpeg images for training (low quality = low resolution about 300x300px + low compression ratio). Each image has about 40 mostly accurate keywords (data may contain slight "noise"). I can also extract some data on keyword correlations.
Given a color image and a keyword, I want to determine the probability that the keyword is related to this image.
I need a creative understandable solution which I could implement on my own in about a month or less (I plan to use python). What I found so far is machine learning, neural networks and genetic algorithms. I was also thinking about generating some kind of signatures for each keyword which I could then use to check against not yet seen images.
Crazy/novel ideas are appreciated as well if they are practicable. I'm also open to using other python libraries.
My current algorithm is extremely complex and computationally heavy. It suggests keywords instead of calculating probability and 50% of suggested keywords are not accurate.
Given the hard requirements of the application, only gross and brainless solutions can be proposed.
For every image, use some segmentation method and keep, say, the four largest segments. Distinguish one or two of them as being background (those extending to the image borders), and the others as foreground, or item of interest.
Characterize the segments in terms of dominant color (using a very rough classification based on color primaries), and in terms of shape (size relative to the image, circularity, number of holes, dominant orientation and a few others).
Then for every keyword you can build a classifier that decides if a given image has/hasn't this keyword. After training, the classifiers will tell you if the image has/hasn't the keyword(s). If you use a fuzzy classification, you get a "probability".

Suggest scikit learn algorithms for spam detection-like image classification task

I have a set of "goob" and "bad" images, presented as gray-scale array. I would like to extract "good" and "bad" features from these images and populate a dictionary.
Here my high-level algorithm to approach this task:
Load images and present them NP matrix img_mtx [ img_mtx.shape = (10, 255, 255)]
Use image.PatchExtractor over img_mtx to get the 1000 patches for each image, in total 10000 7x7 pixel patches [patches.shape = (10000, 49)]
Next step, I assume my patches matrix is something like a bag of words and I want to create a sparse matrix of patches for each image and set "good" or "bad" class for each image.
Now I should have pretty classic task for classification, like spam detection, just adding more images in training set I should have a good result.
But I have some problems here:
How to implement step 3? I've seen the example for text classification, but not for image classification
When I need to classify a new image, again I'm splitting it into patches, but now I need to map these patches from the new image into my "patch dictionary". How to do it the best way, keeping in mind I may never receive 100% match with dictionary? I looks like I need to calculate the closest distance to my dictionary's feature, but this sounds expensive.
... or I took a completely wrong approach to this task?
You should first think about what are good features for your task. Also, you should think about whether your images are always the same shape and aligned.
If you think it is a good idea to describe patches, you may want to look into standard image features like SIFT or SURF or BRIEF - maybe look into scikit-image, opencv or mahotas - though having just the raw patches is a possible first step.
If you want to use patch descriptors and want to throw away the spacial arrangement (which would be the bag of word approach) you need to cluster the descriptors and then build histograms over the "words". Then you can train on the histogram and get a single prediction for the whole image. There is a vast amount of literature on this, not sure what would be a good point to start, though. Maybe look into the book by Szeliski on Computer Vision.

How can I improve my Image comparison Algorithm

I want to compare an image with a set of more than 1000 images. I am generating a photomosaic.
What I have done so far:
I am using the LAB color model to get the L A B value of each image and stored this value in a KD-tree.
This is a 3 dimensional tree with L A* B* values. Then I calculate the LAB value for each grid in an image for which I have to generate the photomosaic. I use the Nearest Neighbor Algorithm and Euclidean distance metric to find the best match.
I am getting a good result but I want to improve my result. I read about SIFT for image comparison, it looks interesting and I will be implementing it in the future. For now can you guys suggest any other features I could compare like brightness, background color or may be another distance metric which is better than Euclidean ?
Appart from SIFT, another feature that has been used is to compare color histograms through the Earth Movers' Distance. You can look at papers such as :
The earth mover's distance as a metric for image retrieval
Also, more similar to SIFT is the GIST of an image, that has been used for "semantic" (more or less) retrieval :
Building the gist of a scene: the role of global image
features in recognition
which has been used for instance in the paper that does scene completion using millions of photographs:
Scene Completion Using Millions of Photographs
You can also adapt the methods that use SIFT for image warping (for instance SIFT Flow: Dense Correspondence across Scenes and its Applications) to derive a metric for image comparison. Often, standard SIFT matching perform poorly and your resulting metric is not great : being able to have a good matching will make things better.
In short, as a comment said, it depends what you are trying to compare and achieve (what do you mean by "good") : do you want to match colors (Histograms) ? structure (SIFT) ? semantics (GIST) ? or else ...?

SVM for image feature classification?

I implemented the Spatial Pyramid Matching algorithm designed by
Lazebnik in Matlab and the last step is to do the svm
classification. And at this point I totally don't understand how I
should do that in terms of what input I should provide to the svmtrain and
svmclassify functions to get the pairs of feature point coordinates of
train and test image in the end.
I have:
coordinates of SIFT feature points on the train image
coordinates of SIFT feature points on the train image
intersection kernel matrix for train image
intersection kernel matrix for test image.
Which of these I should use?
A SVM classifier expects as input a set of objects (images) represented by tuples where each tuple is a set of numeric attributes. Some image features (e.g. gray level histogram) provides an image representation in the form of a vector of numerical values which is suitable to train a SVM. However, feature extraction algorithms like SIFT will output for each image a set of vectors. So the question is:
How can we convert this set of feature vectors to a unique vector that represents the image?
To solve this problem, you will have to use a technique that is called bag of visual words.
The problem is that number of points is different, SVM expects feature vector to be the same size for train and for test.
coordinates of SIFT feature points on the train image coordinates of
SIFT feature points on the train image
The coordinates won't help for SVM.
I would use:
the number of found SIFT feature points
segment the images in small rects and use the presence of a SIFT-Feature point in a
particular rect as boolean feature value. The feature is then the rect/SIFT-feature type
combination. for N-Rects and M-SIFt feature point types you obtain
N*M features.
The second approach requires spatial normalization of images - same size, same rotation
P.S.: I'm not expert in ML. I've only done some experiments on cell-recognition in microscope images.

Image similarity and k-mean clustering

I'm playing a little bit with image similarity. In fact i'm playing with image retrieval system. Ideally i wanna to create some kind of image index that I can query to get similar images.
My current thought is to store some kind of ImageDescriptor into index and each descriptor can have different features in it, e.g. k-mean-cluster-centroids, histograms, ... And i have some simple wight based calculation - each feature has distance function and result of that function is multiplied by it's wight and summed across all features. Final sum is distance from my image. Not sure is this good line of thought?
So i started to play with histograms. I stored index of histograms, than query them for distance between histogram and index stored histograms. It gives some kind of similarity but in most case is far from good ideal.
Now I'm playing with k-mean clustering. I already implemented segmentation based on RGB distance (will try also in Lab color mode). My index consists of vector of centroids (from clustering). Now I'm doing just min-distance comparison between centroids. It give better results but also far from good.
My first question can i do something better with segments (clusters) than to query for distance? How can i include shape information?
Just as side note, most images are images of everyday objects (different pencils, different glasses, different shoes, ...) and with different textures on the background of same color. No natural images, faces, trees, clouds, mountains, ...
Regards
Zaharije
image similarity is not only pixel based. There ara several dimensions for image similarity. For good similarity, you need to have extra informs from images. Low level features etc.

Resources