I am working on a software that will match a captured image (face) with 3/4 images (faces) of same person. Now there are 2 possibilities
1- That the captured image (face) is of the same person whose 3/4 images (faces) are already stored in database
2. Captured image is of a different person
Now i want to get the results of the above 2 scenarios, i.e matched in case 1 and not matched in case 2. I used 40 Gabor filters so that i can get good results. Moreover i get the results in an array (histogram). But It don't seems to work fine and environment conditions like light also have influence on the matching process. Can any one suggest me a good and efficient technique for this thing to achieve.
Well, This is basically face identification problem.
You can use LBP(Local Binary Pattern) to extract features from images.LBP is very robust and llumination invariance method.
You can try following steps-
Training:-
Extract face region (using OpenCV HaarCascade)
Re-size all the extracted face regions to equal size
Divide resized face into sub regions(Ex: 8*9)
Extract LBP features from each region and concatenate them, , because localization of feature is very important
Train SVM by this concatenated feature, with different label to each different person's image
Testing:-
Take a face image and Follow step 1 to 4
Predict using SVM(about which person's image this is)
Related
I'm working on sets of biological images (neuron calcium imaging), each set is composed of a dozen of images (same area at a day interval) in each of which we can define a number of source (neuron, the item of interest).
Pre-processing of the image and source definition is working well, the goal now is to follow the evolution of each source (i.e point) image after image.
Different point matching techniques have been implemented and tried using matlab (Iterative Closest Point, Coherant Point Drift, TPS-Robust Point Matching) with TPS-RPM giving the best result.
The problem comes from the nature of the data, new data points can appear in each new images and some will disapear, making it hard to implement proper point matching.
The previously named techniques have been used and works to some degree but outliers are still an issue.
Matching point from image n°1 to n°2 yields decent results with some outliers being identified but the cumulative effect of appearing / disapearing points over 10 images renders the technique very inneffective, as few points from the first images are correctly matched to the corresponding points in the last.
My question then is has anyone faced similar issues (from what I've read most point matching problems are dealing with feature detection / matching on homogeneous data and not so much on heterogeneous, locally varying images) and if one could advise on a specific approach / library, ideally in Python or matlab.
Thank you for reading !
I want to identify all occurrences of a list of images in a screenshot and get the label of the emoji (e.g. "smile").
The list of images holds all emojis (full list):
and so on…
And this is the screenshot (show large):
The screenshots can have different resolutions and different heights where the emoji occur.
My ideas were:
Using OpenCV and a variety of filtering and iterate all emoji images over it (maybe using template matching)
Using neural networks like tensorflow, training your own model with the emojis
How would you do it ?
There are several classic ways to answer your problem:
Simple regular correlation: https://en.wikipedia.org/wiki/Cross-correlation.
The simple correlation is used when you have exactly the image you are looking for, with no change in intensity.
Normalized correlation (math behind template matching): https://en.wikipedia.org/wiki/Template_matching.
The simple correlation is used when you have exactly the image you are looking for, with no change in intensity.
If you have different intensities between your screenshot and your emoji base picture, you should use normalized correlation.
Both these methods will give you an image with peaks, and your emojis will be localized at the local maxima of this image.
As your emojis can be very similar to one another, you will have to use a threshold on the correlation image in order to discriminate between the emoji you are testing and the ones that look nearly like him.
This method can be time consuming, but can be easily speed-up by using an image pyramid.
An image pyramid is a set of image where the first one is your image, the second one is a subsampling of the first by a factor of 2, and so on:
https://en.wikipedia.org/wiki/Pyramid_(image_processing).
Then the correlation is applyed on the top level of the pyramid to find an approximate location, then on the top - 1 level around the approxiamte location and so on.
About the neural network, or other machine learning methods you want to try, they are really heavy solutions and you have a pretty simple problem, so you should normaly don't need them.
You have the exact picture you are looking for, without rotation, deformation or intensity change, and the template matching should be very effective.
can someone explain me why we need training or other methods like bag of words to clustering or grouping images based on it's SIFT features.
SIFT is algorithm that is meant to describe feature point so that descriptor is invariant to image translation, scaling, and rotation, changes in illumination and robust to local geometric distortion.
In simple words, you can think about SIRF that it is a way of generating descriptor for specific point in image so that this descriptor won't change if the image is zoomed in, moved around or even rotated. As this descriptor is not affected by image transformations, it can be used to compare features in different images that are taken in different conditions (different view, zoom, lightning).
If you want to compare 2 images, you don't need to do any training / create knowledge base. Just extract feature points from two images and compare their descriptors one-by-one. If the descriptors are the same (or almost the same) you can assume that they belong to the same object in image. Problems start when there are repetitive patterns.
If you want to cluster/group images in some specific way, then you need some criteria by what to do that. That's when knowledge base kicks in. For instance, if you would like to find images that contains human faces, you need a way to tell to the computer how a human face look like.
Of course, those algorithms are not 100% perfect and there are some weak points. For instance, if the image is changed/distorted too much, the descriptors start to differ.
UPDATED:
SIRF is just a method to generate description for specific feature in image. It has nothing to do with image classification by itself.
Bags-of-Words
Bags-of-Words is just a method to simplify analysis of image content. The main idea is that we can compare two images just by comparing their distinct features and their occurrences. If both images contain roughly the same features, those images are considered to be similar or even equal. It does not matter where those features are located in image.
As SIRF descriptors are vectors with 128 dimensions, Bags-of-Words greatly simplifies process. Bags-of-Words could be used both in grouping and classification/recognition.
Knowledge base (training)
Whether you need knowledge base or not is completely dependant on how you do the clustering. If you don't use knowledge base, then you can do general clustering using SIFT to group together similar images without knowing what is on them. If you want to do clustering by some specific feature, then you need knowledge base.
Generally speaking, if you want to classify image to known groups, you use knowledge base. If you want to group similar images together without knowing what each group will contain, you don't need knowledge base.
Example:
Imagine you have 5 images - each of them contains one letter (A, B or C) and some background texture (wood, sand, cloth). Background texture takes most part of each image.
1. A - wood
2. B - cloth
3. C - wood
4. A - sand
5. C - cloth
1) Clustering without knowledge base (grouping)
If you had done clustering without knowledge base, we would every two images to see how similar are they.
We would come up with following:
Group 1 - 1., 3.
Group 2 - 2., 5.
Group 3 - 4.
You would not be able to tell what each group contains, but you would know that images in each group are somehow similiar. In this case they are most probably similar because of the same background.
2) Clustering with knowledge base (classification/recognition)
Now imagine we had a knowledge base that contained a lot of images of each letter.
Now instead of comparing every two images, we could compare input image to knowledge base to determine to see what letter specific image is the most similar to.
Then you would come up with following:
Group A - 1., 4.
Group B - 2.
Group C - 3., 5.
In this case we know what each group contains as we have used knowledge base.
All that said, here is a paper on how object classification is done. In this paper SURF instead of SIRF is used, but it does not change the main idea.
PS. I am sorry if I oversimplify something, but I hope it makes easier to understand.
i need a way to determine wheter a picture is a photograph or not. I've got a bunch of random image files (paper document scans, logos and of course photographs taken by a camera) and i need to filter out only the photographs for creating a preview.
The solution proposed at Determine if image is photograph or drawing, quickly only works in a limited way (i.e. some logos are completly black with wite font, some logos have only colors in it - no white areas) and sometimes i've got scan of a white paper containing multiple photographs with white space arround - i need to identify those, too - because then i have to key out the white part and save the photographs on the scan in seperate files.
Your process to do this should probably be similar to the following:
Extract features from the image (pixel values, groups of pixels,
HoG, SIFT, GIST, DCT, Wavelet, Dictionary learning coefficients,
etc. depending on how much time you have)
Aggregate these features somehow so that you get a fixed length
vector (histogram, pyramid scheme)
Apply a standard classification (SVM, k-NN, neural network, Random
Forest) or clustering algorithm (k-means, GMM, etc.) and measure how
well it works (F1 score is usually okay, ROC may be better for
2-class problems)
Repeat from step 1 with different features if you are unsatisfied with the results from 3
The solution you reference seems to be pretty reasonable in terms of steps 1 and 2.
A simple next step in extracting and aggregating features could be to create histograms from all pixel values in the image. If you have a lot of labeled data you should feed these features to a standard classifier. Otherwise, run a clustering algorithm on these histogram features and check the cluster assignments to see if they are correlated with the photograph/non-photograph assignment.
Check the following paper:
http://www.vision.ee.ethz.ch/~gallju/projects/houghforest/houghforest.html . They provide source code.
I believe the program accepts an input file with negative and positive images for training. The output of the classification part of it will be a image voting map (hough map?). You might need to decide on a threshold value to locate regions of interest. So if there two logos in the image it will mark out both of them. The algorithm worked very well for me in a past.
Training on 100 positive and 100 negative images should be enough, I believe. Don't use big images for training also (256x256 should be enough).
I am amazed at how well (and fast) this software works. I hovered my phone's camera over a small area of a book cover in dim light and it only took a couple of seconds for Google Shopper to identify it. It's almost magical. Does anyone know how it works?
I have no idea how Google Shopper actually works. But it could work like this:
Take your image and convert to edges (using an edge filter, preserving color information).
Find points where edges intersect and make a list of them (including colors and perhaps angles of intersecting edges).
Convert to a rotation-independent metric by selecting pairs of high-contrast points and measuring distance between them. Now the book cover is represented as a bunch of numbers: (edgecolor1a,edgecolor1b,edgecolor2a,edgecolor2b,distance).
Pick pairs of the most notable distance values and ratio the distances.
Send this data as a query string to Google, where it finds the most similar vector (possibly with direct nearest-neighbor computation, or perhaps with an appropriately trained classifier--probably a support vector machine).
Google Shopper could also send the entire picture, at which point Google could use considerably more powerful processors to crunch on the image processing data, which means it could use more sophisticated preprocessing (I've chosen the steps above to be so easy as to be doable on smartphones).
Anyway, the general steps are very likely to be (1) extract scale and rotation-invariant features, (2) match that feature vector to a library of pre-computed features.
In any case, the Pattern Recognition/Machine Learning methods often are based on:
Extract features from the image that can be described as numbers. For instance, using edges (as Rex Kerr explained before), color, texture, etc. A set of numbers that describes or represents an image is called "feature vector" or sometimes "descriptor". After extracting the "feature vector" of an image it is possible to compare images using a distance or (dis)similarity function.
Extract text from the image. There are several method to do it, often based on OCR (optical character recognition)
Perform a search on a database using the features and the text in order to find the closest related product.
It is also likely that the image is also cuted into subimages, since the algorithm often finds a specific logo on the image.
In my opinion, the image features are send to different pattern classifiers (algorithms that are able to predict a "class" using as input a feature vector), in order to recognize logos and, afterwards, the product itself.
Using this approach, it can be: local, remote or mixed. If local, all processing is carried out on the device, and just the "feature vector" and "text" are sent to a server where the database is. If remote, the whole image goes to the server. If mixed (I think this is the most probable one), partially executed locally and partially at the server.
Another interesting software is the Google Googles, that uses CBIR (content-based image retrieval) in order to search for other images that are related to the picture taken by the smartphone. It is related to the problem that is addressed by Shopper.
Pattern Recognition.