SVM for image feature classification? - image

I implemented the Spatial Pyramid Matching algorithm designed by
Lazebnik in Matlab and the last step is to do the svm
classification. And at this point I totally don't understand how I
should do that in terms of what input I should provide to the svmtrain and
svmclassify functions to get the pairs of feature point coordinates of
train and test image in the end.
I have:
coordinates of SIFT feature points on the train image
coordinates of SIFT feature points on the train image
intersection kernel matrix for train image
intersection kernel matrix for test image.
Which of these I should use?

A SVM classifier expects as input a set of objects (images) represented by tuples where each tuple is a set of numeric attributes. Some image features (e.g. gray level histogram) provides an image representation in the form of a vector of numerical values which is suitable to train a SVM. However, feature extraction algorithms like SIFT will output for each image a set of vectors. So the question is:
How can we convert this set of feature vectors to a unique vector that represents the image?
To solve this problem, you will have to use a technique that is called bag of visual words.

The problem is that number of points is different, SVM expects feature vector to be the same size for train and for test.

coordinates of SIFT feature points on the train image coordinates of
SIFT feature points on the train image
The coordinates won't help for SVM.
I would use:
the number of found SIFT feature points
segment the images in small rects and use the presence of a SIFT-Feature point in a
particular rect as boolean feature value. The feature is then the rect/SIFT-feature type
combination. for N-Rects and M-SIFt feature point types you obtain
N*M features.
The second approach requires spatial normalization of images - same size, same rotation
P.S.: I'm not expert in ML. I've only done some experiments on cell-recognition in microscope images.

Related

how to implement a general image classifier using SIFT and SVM

I want to train my svm classifier for image categorization with scikit-learn.
And I want to use opencv-python's SIFT algorithm function to extract image feature.The situation is as follow:
1. what the scikit-learn's input of svm classifier is a 2-d array, which means each row represent one image,and feature amount of each image is the same;here
2. opencv-python's SIFT algorithm returns a list of keypoints which is a numpy array of shape . here
So my question is:
How could I deal with the SIFT features to fit SVM classifier's input? Can you help me ?
update1:
Thanks for pyan's advice, I've adapt my proposal as follow:
1. get SIFT feature vectors from each image
2. perform k-means clustering over all the vectors
3. create feature dictionary, a.k.a. cookbook, based on cluster center
4. re-represent each image based on the feature dictionary, of course dimention amount of each image is the same
5. train my SVM classifier and evaluate it
update2:
I've gathered all image SIFT feature vectors into an array(x * 128),which is so large, and then I need to perform clustering on it.
The problem is:
If I use k-means , parameter cluster number has to be set, and I don't know how can I set the best value; if I do not use k-means, which algorithm may be suitable for this?
note:I want to use scikit-learn to perform clustering
My proposal is :
1. perform dbscan clustering on the vectors, then I can get label_size and labels;
2. because dbscan in scikit-learn can not be used for predicting, I could train a new classifier A based on dbscan result;
3. classifier A is just like a cookbook, I can label every image's SIFT vectors. After that, every image can be re-represented ;
4.based on the above work, I can train my final classifier B.
note:for predict a new image, its SIFT vectors must be transform by classifier A into the vector as classifier B's input
Can you give me some advice?
Image classification can be quite general. In order to define good features, first you need to be clear what kind of output you want. For example, images can be categorized according to the scenes in them into nature view, city view, indoor view etc. Different kind of classifications may require different kind of features.
A common approach used in computer vision for keywords based image classification is bag of words (feature bagging) or dictionary learning. You can do a literature search to familiarize yourself on this topic. In your case, the basic idea would be to group the SIFT features into different clusters. Instead of directly feeding scikit-learn with SIFT features, give the vector of the feature group frequency as input. So each image will be represented by a 1-D vector.
A short introduction from Wikipedia Bag-of-words model in computer vision

A Summary of How SURF Works

I am trying to figure out how SURF feature detection works. I think I have made some progress. I would like to know how off I am from what's really going on.
A template image you have already got stored and a real-world image
are compared on the basis of "key points" or some important features
in the two images.
The smallest Euclidean distance between the same points constitutes a
good match.
What constitutes an important feature or keypoint? A corner
(intersection of edges) or a blob (sharp change in intensity).
SURF uses blobs.
It uses a Hessian matrix for blob detection or feature extraction.
The Hessian matrix is a matrix of second derivatives: this is to
figure out the minima and maxima associated with the intensity of a
given region in the image.
sift/surf etc have 3 stages:
find features/keypoints that are likely to be found in different images of same object again (surf uses box filters afair). those features should be scale and rotation invariant if possible. corners, blobs etc are good and most often searched in multiple scales.
find the right "orientation" of that point so that if the image is rotated according to that orientation, both images are aligned in regard to that single keypoint.
computation of a "descriptor" that has information of how the neighborhood of the keypoint looks like (after orientation) in the right scale.
now your euclidean distance computation is done only on the descriptors, not on the keypoint locations!
it is important to know that step 1 isnt fixed for SURF. SURF in fact is step 2-3 but the authors give a suggestion how step 1 can be done to have some synergies with steps 2-3. the synergy is that both, step 1 and 3 use integral images to speed things up, so the integral image has to be computed only once.

How can I improve my Image comparison Algorithm

I want to compare an image with a set of more than 1000 images. I am generating a photomosaic.
What I have done so far:
I am using the LAB color model to get the L A B value of each image and stored this value in a KD-tree.
This is a 3 dimensional tree with L A* B* values. Then I calculate the LAB value for each grid in an image for which I have to generate the photomosaic. I use the Nearest Neighbor Algorithm and Euclidean distance metric to find the best match.
I am getting a good result but I want to improve my result. I read about SIFT for image comparison, it looks interesting and I will be implementing it in the future. For now can you guys suggest any other features I could compare like brightness, background color or may be another distance metric which is better than Euclidean ?
Appart from SIFT, another feature that has been used is to compare color histograms through the Earth Movers' Distance. You can look at papers such as :
The earth mover's distance as a metric for image retrieval
Also, more similar to SIFT is the GIST of an image, that has been used for "semantic" (more or less) retrieval :
Building the gist of a scene: the role of global image
features in recognition
which has been used for instance in the paper that does scene completion using millions of photographs:
Scene Completion Using Millions of Photographs
You can also adapt the methods that use SIFT for image warping (for instance SIFT Flow: Dense Correspondence across Scenes and its Applications) to derive a metric for image comparison. Often, standard SIFT matching perform poorly and your resulting metric is not great : being able to have a good matching will make things better.
In short, as a comment said, it depends what you are trying to compare and achieve (what do you mean by "good") : do you want to match colors (Histograms) ? structure (SIFT) ? semantics (GIST) ? or else ...?

algorithm - warping image to another image and calculate similarity measure

I have a query on calculation of best matching point of one image to another image through intensity based registration. I'd like to have some comments on my algorithm:
Compute the warp matrix at this iteration
For every point of the image A,
2a. We warp the particular image A pixel coordinates with the warp matrix to image B
2b. Perform interpolation to get the corresponding intensity form image B if warped point coordinate is in image B.
2c. Calculate the similarity measure value between warped pixel A intensity and warped image B intensity
Cycle through every pixel in image A
Cycle through every possible rotation and translation
Would this be okay? Is there any relevant opencv code we can reference?
Comments on algorithm
Your algorithm appears good although you will have to be careful about:
Edge effects: You need to make sure that the algorithm does not favour matches where most of image A does not overlap image B. e.g. you may wish to compute the average similarity measure and constrain the transformation to make sure that at least 50% of pixels overlap.
Computational complexity. There may be a lot of possible translations and rotations to consider and this algorithm may be too slow in practice.
Type of warp. Depending on your application you may also need to consider perspective/lighting changes as well as translation and rotation.
Acceleration
A similar algorithm is commonly used in video encoders, although most will ignore rotations/perspective changes and just search for translations.
One approach that is quite commonly used is to do a gradient search for the best match. In other words, try tweaking the translation/rotation in a few different ways (e.g. left/right/up/down by 16 pixels) and pick the best match as your new starting point. Then repeat this process several times.
Once you are unable to improve the match, reduce the size of your tweaks and try again.
Alternative algorithms
Depending on your application you may want to consider some alternative methods:
Stereo matching. If your 2 images come from stereo camera then you only really need to search in one direction (and OpenCV provides useful methods to do this)
Known patterns. If you are able to place a known pattern (e.g. a chessboard) in both your images then it becomes a lot easier to register them (and OpenCV provides methods to find and register certain types of pattern)
Feature point matching. A common approach to image registration is to search for distinctive points (e.g. types of corner or more general places of interest) and then try to find matching distinctive points in the two images. For example, OpenCV contains functions to detect SURF features. Google has published a great paper on using this kind of approach in order to remove rolling shutter noise that I recommend reading.

Image Warp Filter - Algorithm and Rasterization

I'd like to implement a Filter that allows resampling of an image by moving a number of control points that mark edges and tangent directions. The goal is to be able to freely transform an image as seen in Photoshop when you use "Free Transform" and chose Warpmode "Custom". The image is fitted into a some kind of Spline-Patch (if that is a valid name) that can be manipulated.
I understand how simple splines (paths) work but how do you connect them to form a patch?
And how can you sample such a patch to render the morphed image? For each pixel in the target I'd need to know what pixel in the source image corresponds. I don't even know where to start searching...
Any helpful info (keywords, links, papers, reference implementations) are greatly appreciated!
This document will get you a good insight into warping: http://www.gson.org/thesis/warping-thesis.pdf
However, this will include filtering out high frequencies, which will make the implementation a lot more complicated but will give a better result.
An easy way to accomplish what you want to do would be to loop through every pixel in your final image, plug the coordinates into your splines and retrieve the pixel in your original image. This pixel might have coordinates 0.4/1.2 so you could bilinearly interpolate between 0/1, 1/1, 0/2 and 1/2.
As for splines: there are many resources and solutions online for the 1D case. As for 2D it gets a bit trickier to find helpful resources.
A simple example for the 1D case: http://www-users.cselabs.umn.edu/classes/Spring-2009/csci2031/quad_spline.pdf
Here's a great guide for the 2D case: http://en.wikipedia.org/wiki/Bicubic_interpolation
Based upon this you could derive an own scheme for splines for the 2D case. Define a bivariate (with x and y) polynomial and set your constraints to solve for the coefficients of the polynomial.
Just keep in mind that the borders of the spline patches have to be consistent (both in value and derivative) to avoid ugly jumps.
Good luck!

Resources