Algorithms to find difference/similarity in a sequence of images/frames - algorithm

I am searching for known algorithms (perhaps openCV function) that can output similarity or difference in a given specified number of sequence of images/frames (say 2 or 10).
Journal/conference article is also appreciated. Thank you.

You can perform it using Structural Similarity Index
For related publication check out the paper titled Image Quality Assessment: From Error Visibility to
Structural Similarity by Zou Wang in IEEE TRANSACTIONS ON IMAGE PROCESSING
There isn't any function available in OpenCV (as far as I can remember), though there si one in scikit_image. It is the ssim module made available by this line from skimage.measure import structural_similarity as ssim.
The good thing about ssim is it considers:
the mean of pixel intensities for within a given window.
the variance of the pixel intensities
and the covariance
The formula given in equation 13 of the linked paper contains the above three.
A sample implementation is provided HERE

Related

Simple but reasonably accurate algorithm to determine if tag / keyword is related to an image

I have a hard problem to solve which is about automatic image keywording. You can assume that I have a database with 100000+ keyworded low quality jpeg images for training (low quality = low resolution about 300x300px + low compression ratio). Each image has about 40 mostly accurate keywords (data may contain slight "noise"). I can also extract some data on keyword correlations.
Given a color image and a keyword, I want to determine the probability that the keyword is related to this image.
I need a creative understandable solution which I could implement on my own in about a month or less (I plan to use python). What I found so far is machine learning, neural networks and genetic algorithms. I was also thinking about generating some kind of signatures for each keyword which I could then use to check against not yet seen images.
Crazy/novel ideas are appreciated as well if they are practicable. I'm also open to using other python libraries.
My current algorithm is extremely complex and computationally heavy. It suggests keywords instead of calculating probability and 50% of suggested keywords are not accurate.
Given the hard requirements of the application, only gross and brainless solutions can be proposed.
For every image, use some segmentation method and keep, say, the four largest segments. Distinguish one or two of them as being background (those extending to the image borders), and the others as foreground, or item of interest.
Characterize the segments in terms of dominant color (using a very rough classification based on color primaries), and in terms of shape (size relative to the image, circularity, number of holes, dominant orientation and a few others).
Then for every keyword you can build a classifier that decides if a given image has/hasn't this keyword. After training, the classifiers will tell you if the image has/hasn't the keyword(s). If you use a fuzzy classification, you get a "probability".

Algorithms for finding a look alike face?

I'm doing a personal project of trying to find a person's look-alike given a database of photographs of other people all taken in a consistent manner - people looking directly into the camera, neutral expression and no tilt to the head (think passport photo).
I have a system for placing markers for 2d coordinates on the faces and I was wondering if there are any known approaches for finding a look alike of that face given this approach?
I found the following facial recognition algorithms:
http://www.face-rec.org/algorithms/
But none deal with the specific task of finding a look-alike.
Thanks for your time.
I believe you can also try searching for "Face Verification" rather than just "Face Recognition". This might give you more relevant results.
Strictly speaking, the 2 are actually different things in scientific literature but are sometimes lumped under face recognition. For details on their differences and some sample code, take a look here: http://www.idiap.ch/~marcel/labs/faceverif.php
However, for your purposes, what others such as Edvard and Ari has kindly suggested would work too. Basically they are suggesting a K-nearest neighbor style face recognition classifier.
As a start, you can probably try that. First, compute a feature vector for each of your face images in your database. One possible feature to use is the Local Binary Pattern (LBP). You can find the code by googling it. Do the same for your query image. Now, loop through all the feature vectors and compare them to that of your query image using euclidean distance and return the K nearest ones.
While the above method is easy to code, it will generally not be as robust as some of the more sophisticated ones because they generally fail badly when faces are not aligned (known as unconstrained pose. Search for "Labelled Faces in the Wild" to see the results for state of the art for this problem.) or taken under different environmental conditions. But if the faces in your database are aligned and taken under similar conditions as you mentioned, then it might just work. If they are not aligned, you can use the face key points, which you mentioned you are able to compute, to align the faces. In general, comparing faces which are not aligned is a very difficult problem in computer vision and is still a very active area of research. But, if you only consider faces that look alike and in the same pose to be similar (i.e. similar in pose as well as looks) then this shouldn't be a problem.
The website your gave have links to the code for Eigenfaces and Fisherfaces. These are essentially 2 methods for computing feature vectors for your face images. Faces are identified by doing a K nearest neighbor search for faces in the database with feature vectors (computed using PCA and LDA respectively) closest to that of the query image.
I should probably also mention that in the Fisherfaces method, you will need to have "labels" for the faces in your database to identify the faces. This is because Linear Discriminant Analysis (LDA), the classification method used in Fisherfaces, needs this information to compute a projection matrix that will project feature vectors for similar faces close together and dissimilar ones far apart. Comparison is then performed on these projected vectors. Here lies the difference between Face Recognition and Face Verification: for recognition, you need to have "labels" your training images in your database i.e. you need to identify them.
For verification, you are only trying to tell whether any 2 given faces are of the same person. Often, you don't need the "labelled" data in the traditional sense (although some methods might make use of auxiliary training data to help in the face verification).
The code for computing Eigenfaces and Fisherfaces are available in OpenCV in case you use it.
As a side note:
A feature vector is actually just a vector in your linear algebra sense. It is simply n numbers packed together. The word "feature" refers to something like a "statistic" i.e. a feature vector is a vector containing statistics that characterizes the object it represents. For e.g., for the task of face recognition, the simplest feature vector would be the intensity values of the grayscale image of the face. In that case, I just reshape the 2D array of numbers into a n rows by 1 column vector, each entry containing the value of one pixel. The pixel value here is the "feature", and the n x 1 vector of pixel values is the feature vector. In the LBP case, roughly speaking, it computes a histogram at small patches of pixels in the image and joins these histograms together into one histogram, which is then used as the feature vector. So the Local Binary Pattern is the statistic and the histograms joined together is the feature vector. Together they described the "texture" and facial patterns of your face.
Hope this helps.
These two would seem like the equivalent problem, but I do not work in the field. You essentially have the following two problems:
Face recognition: Take a face and try to match it to a person.
Find similar faces: Take a face and try to find similar faces.
Aren't these equivalent? In (1) you start with a picture that you want to match to the owner and you compare it to a database of reference pictures for each person you know. In (2) you pick a picture in your reference database and run (1) for that picture against the other pictures in the database.
Since the algorithms seem to give you a measure of how likely two pictures belong to the same person, in (2) you just sort the measures in decreasing order and pick the top hits.
I assume you should first analyze all the picture in your database with whatever approach you are using. You should then have a set of metrics for each picture which you can compare a specific picture with and statistically find the closest match.
For example, if you can measure the distance between the eyes, you can find faces that have the same distance. You can then find the face that has the overall closest match and return that.

SVM for image feature classification?

I implemented the Spatial Pyramid Matching algorithm designed by
Lazebnik in Matlab and the last step is to do the svm
classification. And at this point I totally don't understand how I
should do that in terms of what input I should provide to the svmtrain and
svmclassify functions to get the pairs of feature point coordinates of
train and test image in the end.
I have:
coordinates of SIFT feature points on the train image
coordinates of SIFT feature points on the train image
intersection kernel matrix for train image
intersection kernel matrix for test image.
Which of these I should use?
A SVM classifier expects as input a set of objects (images) represented by tuples where each tuple is a set of numeric attributes. Some image features (e.g. gray level histogram) provides an image representation in the form of a vector of numerical values which is suitable to train a SVM. However, feature extraction algorithms like SIFT will output for each image a set of vectors. So the question is:
How can we convert this set of feature vectors to a unique vector that represents the image?
To solve this problem, you will have to use a technique that is called bag of visual words.
The problem is that number of points is different, SVM expects feature vector to be the same size for train and for test.
coordinates of SIFT feature points on the train image coordinates of
SIFT feature points on the train image
The coordinates won't help for SVM.
I would use:
the number of found SIFT feature points
segment the images in small rects and use the presence of a SIFT-Feature point in a
particular rect as boolean feature value. The feature is then the rect/SIFT-feature type
combination. for N-Rects and M-SIFt feature point types you obtain
N*M features.
The second approach requires spatial normalization of images - same size, same rotation
P.S.: I'm not expert in ML. I've only done some experiments on cell-recognition in microscope images.

How can I measure image noise

I've found a few ways of reducing noise from image, but my task is to measure it.
So I am interested in algorithm that will give me some number, noise rating. That with that number I will be able to say that one image has less noise than others.
From a view of image processing, you can consult the classic paper "Image quality assessment: From error visibility to structural similarity" published in IEEE Transaction on Image Processing, which has already been cited 3000+ times according to Google Scholar. The basic idea is human's visual perception system is highly sensitive to structural similarity. However, noise (or distortion) often breaks such similarity. Therefore the authors tried to propose an objective measurement for image quality based on this motivation. You can find an implementation in MATLAB here.
To solve my problem I used next approach:
My noise rating is just number of pixels that were recognized as noise. To differentiate normal pixels from noise, I just calculated the medium value of its neighbor pixels and if its value was bigger than some critical value, we say that this one is noise.
if (ABS(1 - (currentPixel.R+currentPixel.G+currentPixel.B)/(neigborsMediumValues.R + neigboursMediumValues.G + neigboursMediumValues.B))) > criticalValue)
then
{
currentPixelIsNoise = TRUE;
}

Explaining the AdaBoost Algorithms to non-technical people

I've been trying to understand the AdaBoost algorithm without much success. I'm struggling with understanding the Viola Jones paper on Face Detection as an example.
Can you explain AdaBoost in laymen's terms and present good examples of when it's used?
Adaboost is an algorithm that combines classifiers with poor performance, aka weak learners, into a bigger classifier with much higher performance.
How does it work? In a very simplified manner:
Train a weak learner.
Add it to the set of weak learners trained so far (with an optimal weight)
Increase the importance of samples that are still miss-classified.
Go to 1.
There is a broad and detailed theory behind the scenes, but the intuition is just that: let each "dumb" classifier focus on the mistakes the previous ones were not able to fix.
AdaBoost is one of the most used algorithms in the machine learning community. In particular, it is useful when you know how to create simple classifiers (possibly many different ones, using different features), and you want to combine them in an optimal way.
In Viola and Jones, each different type of weak-learner is associated to one of the 4 or 5 different Haar features you can have.
AdaBoost uses a number of training sample images (such as faces) to pick a number of good 'features'/'classifiers'. For face recognition a classifiers is typically just a rectangle of pixels that has a certain average color value and a relative size. AdaBoost will look at a number of classifiers and find out which one is the best predictor of a face based on the sample images. After it has chosen the best classifier it will continue to find another and another until some threshold is reached and those classifiers combined together will provide the end result.
This part you may not want to share with non-technical people :) but it is interesting anyway. There are several mathematical tricks which make AdaBoost fast for face recognition such as the ability to add up all the color values of an image and store them in a 2 dimensional array so that the value in any position will be the sum of all the pixels up and to the left of that position. This array can be used to very quickly calculate the average color value of any rectangle within the image by subtracting the value found in the top left corner from the value found in the bottom right corner and dividing by the number of pixels in the rectangle. Using this trick you can quickly scan over an entire image looking for rectangles of different relative sizes that match or are close to a particular color.
Hope this helps.
This is understandable. Most of the papers you can find on Internet retell Viola-Jones and Freund-Shapire papers which are foundation of AdaBoost applied for face recognition in OpenCV. And they mostly consist of difficult formulas and algorithms from several mathematical areas combined.
Here is what can help you (short enough) -
1 - It is used in object and, mostly, in face detection-recognition.The most popular and quite good C++ library is OpenCV from Intel originally. I take the part of Face detection in OpenCV, as an example.
2 - First, a cascade of boosted classifiers working with sample rectangles ("features") is trained on sample of images with faces (called positive) and without faces (negative).
From some Googled paper:
"· Boosting refers to a general and provably effective method of producing a very accurate classifier by combining rough and moderately inaccurate rules of thumb.
· It is based on the observation that finding many rough rules of thumb can be a lot easier than finding a single, highly accurate classifier.
· To begin, we define an algorithm for finding the rules of thumb, which we call a weak learner.
· The boosting algorithm repeatedly calls this weak learner, each time feeding it a different distribution over the training data (in AdaBoost).
· Each call generates a weak classifier and we must combine all of these into a single classifier that, hopefully, is much more accurate than any one of the rules."
During this process the images are scanned to determine the distinctive areas corresponding to certain part of every face. The complex calculation-hypothesis based algorithms are applied (which are not so difficult to understand once you get the main idea).
This can take a week and the output is an XML file which contains the learned information on how to quickly detect the human face, say, in frontal position on any picture (it can be any object in other case).
3 - After that you supply this file to OpenCV face detection program which runs quite fast with up to 99% positive rate (depending on conditions).
As was mentioned here, the scanning speed can be increased greatly with technique known as "integral image".
And finally, these are helpful sources - Object Detection in OpenCV and
Generic Object Detection using AdaBoost from University of California, 2008.

Resources