I'm doing a personal project of trying to find a person's look-alike given a database of photographs of other people all taken in a consistent manner - people looking directly into the camera, neutral expression and no tilt to the head (think passport photo).
I have a system for placing markers for 2d coordinates on the faces and I was wondering if there are any known approaches for finding a look alike of that face given this approach?
I found the following facial recognition algorithms:
http://www.face-rec.org/algorithms/
But none deal with the specific task of finding a look-alike.
Thanks for your time.
I believe you can also try searching for "Face Verification" rather than just "Face Recognition". This might give you more relevant results.
Strictly speaking, the 2 are actually different things in scientific literature but are sometimes lumped under face recognition. For details on their differences and some sample code, take a look here: http://www.idiap.ch/~marcel/labs/faceverif.php
However, for your purposes, what others such as Edvard and Ari has kindly suggested would work too. Basically they are suggesting a K-nearest neighbor style face recognition classifier.
As a start, you can probably try that. First, compute a feature vector for each of your face images in your database. One possible feature to use is the Local Binary Pattern (LBP). You can find the code by googling it. Do the same for your query image. Now, loop through all the feature vectors and compare them to that of your query image using euclidean distance and return the K nearest ones.
While the above method is easy to code, it will generally not be as robust as some of the more sophisticated ones because they generally fail badly when faces are not aligned (known as unconstrained pose. Search for "Labelled Faces in the Wild" to see the results for state of the art for this problem.) or taken under different environmental conditions. But if the faces in your database are aligned and taken under similar conditions as you mentioned, then it might just work. If they are not aligned, you can use the face key points, which you mentioned you are able to compute, to align the faces. In general, comparing faces which are not aligned is a very difficult problem in computer vision and is still a very active area of research. But, if you only consider faces that look alike and in the same pose to be similar (i.e. similar in pose as well as looks) then this shouldn't be a problem.
The website your gave have links to the code for Eigenfaces and Fisherfaces. These are essentially 2 methods for computing feature vectors for your face images. Faces are identified by doing a K nearest neighbor search for faces in the database with feature vectors (computed using PCA and LDA respectively) closest to that of the query image.
I should probably also mention that in the Fisherfaces method, you will need to have "labels" for the faces in your database to identify the faces. This is because Linear Discriminant Analysis (LDA), the classification method used in Fisherfaces, needs this information to compute a projection matrix that will project feature vectors for similar faces close together and dissimilar ones far apart. Comparison is then performed on these projected vectors. Here lies the difference between Face Recognition and Face Verification: for recognition, you need to have "labels" your training images in your database i.e. you need to identify them.
For verification, you are only trying to tell whether any 2 given faces are of the same person. Often, you don't need the "labelled" data in the traditional sense (although some methods might make use of auxiliary training data to help in the face verification).
The code for computing Eigenfaces and Fisherfaces are available in OpenCV in case you use it.
As a side note:
A feature vector is actually just a vector in your linear algebra sense. It is simply n numbers packed together. The word "feature" refers to something like a "statistic" i.e. a feature vector is a vector containing statistics that characterizes the object it represents. For e.g., for the task of face recognition, the simplest feature vector would be the intensity values of the grayscale image of the face. In that case, I just reshape the 2D array of numbers into a n rows by 1 column vector, each entry containing the value of one pixel. The pixel value here is the "feature", and the n x 1 vector of pixel values is the feature vector. In the LBP case, roughly speaking, it computes a histogram at small patches of pixels in the image and joins these histograms together into one histogram, which is then used as the feature vector. So the Local Binary Pattern is the statistic and the histograms joined together is the feature vector. Together they described the "texture" and facial patterns of your face.
Hope this helps.
These two would seem like the equivalent problem, but I do not work in the field. You essentially have the following two problems:
Face recognition: Take a face and try to match it to a person.
Find similar faces: Take a face and try to find similar faces.
Aren't these equivalent? In (1) you start with a picture that you want to match to the owner and you compare it to a database of reference pictures for each person you know. In (2) you pick a picture in your reference database and run (1) for that picture against the other pictures in the database.
Since the algorithms seem to give you a measure of how likely two pictures belong to the same person, in (2) you just sort the measures in decreasing order and pick the top hits.
I assume you should first analyze all the picture in your database with whatever approach you are using. You should then have a set of metrics for each picture which you can compare a specific picture with and statistically find the closest match.
For example, if you can measure the distance between the eyes, you can find faces that have the same distance. You can then find the face that has the overall closest match and return that.
Related
I have set of binary images, on which i need to find the cross (examples attached). I use findcontours to extract borders from the binary image. But i can't understand how can i determine is this shape (border) cross or not? Maybe opencv has some built-in methods, which could help to solve this problem. I thought to solve this problem using Machine learning, but i think there is a simpler way to do this. Thanks!
Viola-Jones object detection could be a good start. Though the main usage of the algorithm (AFAIK) is face detection, it was actually designed for any object detection, such as your cross.
The algorithm is Machine-Learning based algorithm (so, you will need a set of classified "crosses" and a set of classified "not crosses"), and you will need to identify the significant "features" (patterns) that will help the algorithm recognize crosses.
The algorithm is implemented in OpenCV as cvHaarDetectObjects()
From the original image, lets say you've extracted sets of polygons that could potentially be your cross. Assuming that all of the cross is visible, to the extent that all edges can be distinguished as having a length, you could try the following.
Reject all polygons that did not have exactly 12 vertices required to
form your polygon.
Re-order the vertices such that the shortest edge length is first.
Create a best fit perspective transformation that maps your vertices onto a cross of uniform size
Examine the residuals generated by using this transformation to project your cross back onto the uniform cross, where the residual for any given point is the distance between the projected point and the corresponding uniform point.
If all the residuals are within your defined tolerance, you've found a cross.
Note that this works primarily due to the simplicity of the geometric shape you're searching for. Your contours will also need to have noise removed for this to work, e.g. each line within the cross needs to be converted to a single simple line.
Depending on your requirements, you could try some local feature detector like SIFT or SURF. Check OpenSURF which is an interesting implementation of the latter.
after some days of struggle, i came to a conclusion that the only robust way here is to use SVM + HOG. That's all.
You could erode each blob and analyze their number of pixels is going down. No mater the rotation scaling of the crosses they should always go down with the same ratio, excepted when you're closing down on the remaining center. Again, when the blob is small enough you should expect it to be in the center of the original blob. You won't need any machine learning algorithm or training data to resolve this.
I have a query on calculation of best matching point of one image to another image through intensity based registration. I'd like to have some comments on my algorithm:
Compute the warp matrix at this iteration
For every point of the image A,
2a. We warp the particular image A pixel coordinates with the warp matrix to image B
2b. Perform interpolation to get the corresponding intensity form image B if warped point coordinate is in image B.
2c. Calculate the similarity measure value between warped pixel A intensity and warped image B intensity
Cycle through every pixel in image A
Cycle through every possible rotation and translation
Would this be okay? Is there any relevant opencv code we can reference?
Comments on algorithm
Your algorithm appears good although you will have to be careful about:
Edge effects: You need to make sure that the algorithm does not favour matches where most of image A does not overlap image B. e.g. you may wish to compute the average similarity measure and constrain the transformation to make sure that at least 50% of pixels overlap.
Computational complexity. There may be a lot of possible translations and rotations to consider and this algorithm may be too slow in practice.
Type of warp. Depending on your application you may also need to consider perspective/lighting changes as well as translation and rotation.
Acceleration
A similar algorithm is commonly used in video encoders, although most will ignore rotations/perspective changes and just search for translations.
One approach that is quite commonly used is to do a gradient search for the best match. In other words, try tweaking the translation/rotation in a few different ways (e.g. left/right/up/down by 16 pixels) and pick the best match as your new starting point. Then repeat this process several times.
Once you are unable to improve the match, reduce the size of your tweaks and try again.
Alternative algorithms
Depending on your application you may want to consider some alternative methods:
Stereo matching. If your 2 images come from stereo camera then you only really need to search in one direction (and OpenCV provides useful methods to do this)
Known patterns. If you are able to place a known pattern (e.g. a chessboard) in both your images then it becomes a lot easier to register them (and OpenCV provides methods to find and register certain types of pattern)
Feature point matching. A common approach to image registration is to search for distinctive points (e.g. types of corner or more general places of interest) and then try to find matching distinctive points in the two images. For example, OpenCV contains functions to detect SURF features. Google has published a great paper on using this kind of approach in order to remove rolling shutter noise that I recommend reading.
I got school task again. This time, my teacher gave me task to create algorithm to count how many ducks on picture.
The picture is similar to this one:
I think I should use pattern recognition for searching how many ducks on it. But I don't know which pattern match for each duck.
I think that you can solve this problem by segmenting the ducks' beaks and counting the number of connected components in the binary image.
To segment the ducks' beaks, first convert the image to HSV color space and then perform a binarization using the hue component. Note that the ducks' beaks hue are different from other parts of the image.
Here's one way:
Hough transform for circles:
Initialize an accumulator array indexed by (x,y,radius)
For each pixel:
calculate an edge (e.g. Sobel operator will provide both magnitude and direction), if magnitude exceeds some threshold then:
increment every accumulator for which this edge could possibly lend evidence (only the (x,y) in the direction of the edge, only radii between min_duck_radius and max_duck_radius)
Now smooth and threshold the accumulator array, and the coordinates of highest accumulators show you where the heads are. The threshold may leap out at you if you histogram the values in the accumulators (there may be a clear difference between "lots of evidence" and "noise").
So that's very terse, but it can get you started.
It might be just because I'm working with SIFT right now, but to me it looks like it could be good for your problem.
It is an algorithm that matches the same object on two different pictures, where the objects can have different orientations, scales and be viewed from different perspectives on the two pictures. It can also work when an object is partially hidden (as your ducks are) by another object.
I'd suggest finding a good clear picture of a rubber ducky ( :D ) and then use some SIFT implementation (VLFeat - C library with SIFT but no visualization, SIFT++ - based on VLFeat, but in C++ , Rob Hess in C with OpenCV...).
You should bear in mind that matching with SIFT (and anything else) is not perfect - so you might not get the exact number of rubber duckies in the picture.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What's a fast way to sort a given set of images by their similarity to each other.
At the moment I have a system that does histogram analysis between two images, but this is a very expensive operation and seems too overkill.
Optimally I am looking for a algorithm that would give each image a score (for example a integer score, such as the RGB Average) and I can just sort by that score. Identical Scores or scores next to each other are possible duplicates.
0299393
0599483
0499994 <- possible dupe
0499999 <- possible dupe
1002039
4995994
6004994
RGB Average per image sucks, is there something similar?
There has been a lot of research on image searching and similarity measures. It's not an easy problem. In general, a single int won't be enough to determine if images are very similar. You'll have a high false-positive rate.
However, since there has been a lot of research done, you might take a look at some of it. For example, this paper (PDF) gives a compact image fingerprinting algorithm that is suitable for finding duplicate images quickly and without storing much data. It seems like this is the right approach if you want something robust.
If you're looking for something simpler, but definitely more ad-hoc, this SO question has a few decent ideas.
I would recommend considering moving away from just using an RGB histogram.
A better digest of your image can be obtained if you take a 2d Haar wavelet of the image (its a lot easier than it sounds, its just a lot of averaging and some square roots used to weight your coefficients) and just retain the k largest weighted coefficients in the wavelet as a sparse vector, normalize it, and save that to reduce its size. You should rescale R G and B using perceptual weights beforehand at least or I'd recommend switching to YIQ (or YCoCg, to avoid quantization noise) so you can sample chrominance information with reduced importance.
You can now use the dot product of two of these sparse normalized vectors as a measure of similarity. The image pairs with the largest dot products are going to be very similar in structure. This has the benefit of being slightly resistant to resizing, hue shifting and watermarking, and being really easy to implement and compact.
You can trade off storage and accuracy by increasing or decreasing k.
Sorting by a single numeric score is going to be intractable for this sort of classification problem. If you think about it it would require images to only be able to 'change' along one axis, but they don't. This is why you need a vector of features. In the Haar wavelet case its approximately where the sharpest discontinuities in the image occur. You can compute a distance between images pairwise, but since all you have is a distance metric a linear ordering has no way to express a 'triangle' of 3 images that are all equally distant. (i.e. think of an image that is all green, an image that is all red and an image that is all blue.)
That means that any real solution to your problem will need O(n^2) operations in the number of images you have. Whereas if it had been possible to linearize the measure, you could require just O(n log n), or O(n) if the measure was suitable for, say, a radix sort. That said, you don't need to spend O(n^2) since in practice you don't need to sift through the whole set, you just need to find the stuff thats nearer than some threshold. So by applying one of several techniques to partition your sparse vector space you can obtain much faster asymptotics for the 'finding me k of the images that are more similar than a given threshold' problem than naively comparing every image against every image, giving you what you likely need... if not precisely what you asked for.
In any event, I used this a few years ago to good effect personally when trying to minimize the number of different textures I was storing, but there has also been a lot of research noise in this space showing its efficacy (and in this case comparing it to a more sophisticated form of histogram classification):
http://www.cs.princeton.edu/cass/papers/spam_ceas07.pdf
If you need better accuracy in detection, the minHash and tf-idf algorithms can be used with the Haar wavelet (or the histogram) to deal with edits more robustly:
http://cmp.felk.cvut.cz/~chum/papers/chum_bmvc08.pdf
Finally, Stanford has an image search based on a more exotic variant of this kind of approach, based on doing more feature extraction from the wavelets to find rotated or scaled sections of images, etc, but that probably goes way beyond the amount of work you'd want to do.
http://wang14.ist.psu.edu/cgi-bin/zwang/regionsearch_show.cgi
I implemented a very reliable algorithm for this called Fast Multiresolution Image Querying. My (ancient, unmaintained) code for that is here.
What Fast Multiresolution Image Querying does is split the image into 3 pieces based on the YIQ colorspace (better for matching differences than RGB). Then the image is essentially compressed using a wavelet algorithm until only the most prominent features from each colorspace are available. These points are stored in a data structure. Query images go through the same process, and the prominent features in the query image are matched against those in the stored database. The more matches, the more likely the images are similar.
The algorithm is often used for "query by sketch" functionality. My software only allowed entering query images via URL, so there was no user interface. However, I found it worked exceptionally well for matching thumbnails to the large version of that image.
Much more impressive than my software is retrievr which lets you try out the FMIQ algorithm using Flickr images as the source. Very cool! Try it out via sketch or using a source image, and you can see how well it works.
A picture has many features, so unless you narrow yourself to one, like average brightness, you are dealing with an n-dimensional problem space.
If I asked you to assign a single integer to the cities of the world, so I could tell which ones are close, the results wouldn't be great. You might, for example, choose time zone as your single integer and get good results with certain cities. However, a city near the north pole and another city near the south pole can also be in the same time zone, even though they are at opposite ends of the planet. If I let you use two integers, you could get very good results with latitude and longitude. The problem is the same for image similarity.
All that said, there are algorithms that try to cluster similar images together, which is effectively what you're asking for. This is what happens when you do face detection with Picasa. Even before you identify any faces, it clusters similar ones together so that it's easy to go through a set of similar faces and give most of them the same name.
There is also a technique called Principle Component Analysis, which lets you reduce n-dimensional data down to any smaller number of dimensions. So a picture with n features could be reduced to one feature. However, this is still not the best approach for comparing images.
There's a C library ("libphash" - http://phash.org/) that will calculate a "perceptual hash" of an image and allow you to detect similar images by comparing hashes (so you don't have to compare each image directly against every other image) but unfortunately it didn't seem to be very accurate when I tried it.
You have to decide what is "similar." Contrast? Hue?
Is a picture "similar" to the same picture upside-down?
I bet you can find a lot of "close calls" by breaking images up into 4x4 pieces and getting an average color for each grid cell. You'd have sixteen scores per image. To judge similarity, you would just do a sum of squares of differences between images.
I don't think a single hash makes sense, unless it's against a single concept like hue, or brightness, or contrast.
Here's your idea:
0299393
0599483
0499994 <- possible dupe
0499999 <- possible dupe
1002039
4995994
6004994
First of all, I'm going to assume these are decimal numbers that are R*(2^16)+G*(2^8)+B, or something like that. Obviously that's no good because red is weighted inordinately.
Moving into HSV space would be better. You could spread the bits of HSV out into the hash, or you could just settle H or S or V individually, or you could have three hashes per image.
One more thing. If you do weight R, G, and B. Weight green highest, then red, then blue to match human visual sensitivity.
In the age of web services you could try http://tineye.com
The question Good way to identify similar images? seems to provide a solution for your question.
i assumed that other duplicate image search software performs an FFT on the images, and stores the values of the different frequencies as a vectors:
Image1 = (u1, u2, u3, ..., un)
Image2 = (v1, v2, v3, ..., vn)
and then you can compare two images for equalness by computing the distance between the weight vectors of two images:
distance = Sqrt(
(u1-v1)^2 +
(u2-v2)^2 +
(u2-v3)^2 +
...
(un-vn)^2);
One solution is to perform a RMS/RSS comparison on every pair of pictures required to perform a bubble sort. Second, you could perform an FFT on each image and do some axis averaging to retrieve a single integer for each image which you would use as an index to sort by. You may consider doing whatever comparison on a resized (25%, 10%) version of the original depending on how small a difference you choose to ignore and how much speedup you require. Let me know if these solutions are interesting, and we can discuss or I can provide sample code.
Most modern approaches to detect Near duplicate image detection use interesting points detection and descriptors describing area around such points. Often SIFT is used. Then you can quatize descriptors and use clusters as visual word vocabulary.
So if we see on ratio of common visual words of two images to all visual words of these images you estimate similarity between images. There are a lot of interesting articles. One of them is Near Duplicate Image Detection: minHash and tf-idf Weighting
For example using IMMI extension and IMMI you can examine many different ways how to measure similarity between images:
http://spl.utko.feec.vutbr.cz/en/component/content/article/46-image-processing-extension-for-rapidminer-5
By defining some threshold and selecting some method you can measure similarity.
I have two lists containing x-y coordinates (of stars). I could also have magnitudes (brightnesses) attached to each star. Now each star has random position jiggles and there can be a few extra or missing points in each image. My question is, "What is the best 2D point matching algorithm for such a dataset?" I guess both for a simple linear (translation, rotation, scale) and non-linear (say, n-degree polynomials in the coordinates). In the lingo of the point matching field, I'm looking for the algorithms that would win in a shootout between 2D point matching programs with noise and spurious points. There may be a different "winners" depending if the labeling info is used (the magnitudes) and/or the transformation is restricted to being linear.
I am aware that there are many classes of 2D point matching algorithms and many algorithms in each class (literally probably hundreds in total) but I don't know which, if any, is the consider the "best" or the "most standard" by people in the field of computer vision. Sadly, many of the articles to papers I want to read don't have online versions and I can only read the abstract. Before I settle on a particular algorithm to implement it would be good to hear from a few experts to separate the wheat from the chaff.
I have a working matching program that uses triangles but it fails somewhat frequently (~5% of the time) such that the solution transformation has obvious distortions but for no obvious reason. This program was not written by me and is from a paper written almost 20 years ago. I want to write a new implementation that performs most robustly. I am assuming (hoping) that there have been some advances in this area that make this plausible.
If you're interested in star matching, check out the Astrometry.net blind astrometry solver and the paper on it here. They use four point quads to solve star configurations in Flickr pictures of the night sky. Check out this interview.
There is no single "best" algorithm for this. There are lots of different techniques, and each work better than others on specific datasets and types of data.
One thing I'd recommend is to read this introduction to image registration from the tutorials of the Insight Toolkit. ITK supports MANY types of image registration (which is what it sounds like you are attempting), and is very robust in many cases. Most of their users are in the medical field, so you'll have to wade through a lot of medical jargon, but the algorithms and code work with any type of image (including 1,2,3, and n dimensional images, of different types,etc).
You can consider applying your algorithm first only on the N brightest stars, then include progressively the others to refine the result, reducing the search range at the same time.
Using RANSAC for robustness to extra points is also very common.
I'm not sure it would work, but worth a try:
For each star do the circle time ray Fourier transform - centered around it - of all the other stars (note: this is not the standard Fourier transform, which is line times line).
The phase space of circle times ray is integers times line, but since we only have finite accuracy, you just get a matrix; the dimensions of the matrix depend on accuracy. Now try to pair the matrices to one another (e.g. using L_2 norm)
I saw a program on tv a while ago about how researchers were taking pictures of whales and using the spots on them (which are unique for each whale) to id each whale. It used the angles between the spots. By using the angles it didn't matter if the image was rotated or scaled or translated. That sounds similar to what you're doing with your triangles.
I think the "best" (most technical) way would to be to take the Fourier Transform of the original image and of the new linearly modified image. By doing some simple filtering, it should be easy to figure out the orientation and scale of your image with respect to the old one. There is a description of the 2d Fourier Transform here.