Help to learn Image Search algorithm - image

I am a beginner in image processing. I want to write an application in C++ or in C# for
Searching an image in a list of images
Searching for a particular feature (for e.g. face) in a list of images.
Can anybody suggest where should I start from?
What all should I learn before doing this?
Where can I find the correct information regarding this?

In terms of the second one, you should start off with learning how to solve the decision problem of whether a square patch contains a face (or whatever kind of object you are interested in). For that, I suggest you study a little bit of machine learning, the AdaBoost algorithm, Haar features, and Viola-Jones.
Once you know how to do that, the trick is really just to take a sliding window across your image, feeding the contents of that window into your detector. Then you shrink your main input image and repeat the process until your input image has gotten smaller than the minimum size input for your detector. There are, of course, several clever ways to parallelize the computation and speed it up, but the binary detector is really the interesting part of the process.
You may find some of the material linked from the CSE 517: Machine Learning - Syllabus helpful in getting into machine learning and understanding AdaBoost. You will certainly find the Viola-Jones paper of interest.

Related

How can I isolate and recolor specific color range?

Given an image of the region containing the lips and other "noise" (teeth, skin), how can we isolate and recolor only the lips (simulating a "lipstick" effect)?
Attached is a photo describing the lips/mouth states.
What we have tried so far is a three-part process:
Color matching the lips using a stable point on the lips (provided by internal API).
Use this color as the base color for the lips isolation.
Recolor the lips (lipstick behavior)
We tried a few algorithms like hue difference, HSV difference, ∆and E after converting them to CIE color space. Unfortunately, nothing has panned out or has produced artifacts due to the skin's relative similarity in color to the lips and the discoloration from shadows cast by the nose and mouth.
What are we missing? Is there a better way to approach it?
We are looking for a solution/direction from a classic Computer Vision color algorithm, not a solution from the Machine Learning/Depp Learning domain. Thanks!
You probably won't like this answer, but your question is ill-posed (there is no measurable solution that is better than others, there are only peoples' opinions.)
In this case, the best answer you can hope for then is usually:
Ask an expert for a large set of examples that would be acceptable in practice.
Your problem can easily be solved by an appropriate artist (who you trust will produce usable results) with access to the right tools (for example photoshop,) but a single artist (or even a group of them) can't possibly scale to millions (or whatever large number you care about) of examples.
To address the short-coming of the artist-based solution, you can use the following strategy:
Collect a sufficiently large set of before and after images created by artists, who you deem trustworthy.
Apply your favorite machine learning algorithm to learn a mapping from the before to the after images. There are many possible choices, and it almost really doesn't matter which you choose as long as you know how to use it well.
Note, the above two steps are usually not one-and-done, as most algorithms are. Usually, you will come across pathological or not-well behaved examples to your ML solution above in using the product. The key is to collect these examples, pass them through the artist and retrain or update your ML model. Repeat this enough times and you will produce a state-of-the-art solution to your problem.
Whether you have the funding, time, motivation and resources to accomplish this is another matter.
You should try semantic segmentation techniques that would definitely give you very good results and it would be a generalized concept.

How to make a neural net give position?

I understand how to do classification problems and starting to understand convolution networks which I think is the answer to some extent. I'm a bit confused on how to setup a network to give me the output position.
Let's say you have the position of the end point of noses for a data set with faces. To find the end point do you just do a 'classification' type problem where your output layer is something like 64x64 = 4096 points but if the nose is at point row 43 and column 20 of your grid you just set the output as all zero's except for at element 43*64 + 20 = 2772 where you set it equal to 1? Then just map it back to your image dimensions.
I can't find much info on how this part of identification works and this is my best guess. I'm working towards a project at the second with this methodology, but it is going to be a lot of work and want to know if I'm at least on the right track. This seems to be a solved problem, but I just can't seem to find how people do this.
Although what you describe could feasibly work, generally neural networks (convolutional and otherwise) are not used to determine the position of a feature in an image. In particular, Convolutional Neural Networks (CNNs) are specifically designed to be translation invariant so that they will detect features regardless of their position in the input image - this is sort of the inverse of what you're looking for.
One common and effective solution for the kind of problem you're describing is a cascade classifier. They have some limitations, but for the kind of application you're describing, it would probably work quite well. In particular, cascade classifiers are designed to provide good performance owing to the staged approach in which most sections of the input image are very quickly dismissed by the first couple stages.
Don't get me wrong, it may be interesting to experiment with using the approach you described; just be aware that it may prove difficult to get it to scale well.

OpenCV: Fingerprint Image and Compare Against Database

I have a database of images. When I take a new picture, I want to compare it against the images in this database and receive a similarity score (using OpenCV). This way I want to detect, if I have an image, which is very similar to the fresh picture.
Is it possible to create a fingerprint/hash of my database images and match new ones against it?
I'm searching for a alogrithm code snippet or technical demo and not for a commercial solution.
Best,
Stefan
As Pual R has commented, this "fingerprint/hash" is usually a set of feature vectors or a set of feature descriptors. But most of feature vectors used in computer vision are usually too computationally expensive for searching against a database. So this task need a special kind of feature descriptors because such descriptors as SURF and SIFT will take too much time for searching even with various optimizations.
The only thing that OpenCV has for your task (object categorization) is implementation of Bag of visual Words (BOW).
It can compute special kind of image features and train visual words vocabulary. Next you can use this vocabulary to find similar images in your database and compute similarity score.
Here is OpenCV documentation for bag of words. Also OpenCV has a sample named bagofwords_classification.cpp. It is really big but might be helpful.
Content-based image retrieval systems are still a field of active research: http://citeseerx.ist.psu.edu/search?q=content-based+image+retrieval
First you have to be clear, what constitutes similar in your context:
Similar color distribution: Use something like color descriptors for subdivisions of the image, you should get some fairly satisfying results.
Similar objects: Since the computer does not know, what an object is, you will not get very far, unless you have some extensive domain knowledge about the object (or few object classes). A good overview about the current state of research can be seen here (results) and soon here.
There is no "serve all needs"-algorithm for the problem you described. The more you can share about the specifics of your problem, the better answers you might get. Posting some representative images (if possible) and describing the desired outcome is also very helpful.
This would be a good question for computer-vision.stackexchange.com, if it already existed.
You can use pHash Algorithm and store phash value in Database, then use this code:
double const mismatch = algo->compare(image1Hash, image2Hash);
Here 'mismatch' value can easly tell you the similarity ratio between two images.
pHash function:
AverageHash
PHASH
MarrHildrethHash
RadialVarianceHash
BlockMeanHash
BlockMeanHash
ColorMomentHash
These function are well Enough to evaluate Image Similarities in Every Aspects.

What algorithm to use to obtain Objects from an Image

I would like to know what algorithm is used to obtain an image and get the objects present in the image and process (give information about) it. And also, how is this done?
I agree with Sid Farkus, there is no simple answer to this question.
Maybe you can get started by checking out the Open Computer Vision Library. There is a Wiki page on object detection with links to a How-To and to papers.
You may find other examples and approaches (i.e. algorithms); it's likely that the algorithms differ by application (i.e. depending on what you actually want to detect).
There are many ways to do Object Detection and it still an open problem.
You can start with template matching, It is probably the simplest way to solve, which consists of making a convolution with the known image (IA) on the new image (IB). It is a fairly simple idea because it is like applying a filter on the signal, the filter will generate a maximum point in the image when it finds the object, as shown in the video. But that technique has several cons, does not handle variants in scale or rotation so it has no real application.
Also you can find another option more robust feature matching, which consist on create a dataset with features such as SIFT, SURF or ORB of different objects with this you can train a SVM to recognize objects
You can also check deformable part models. However, The state of the art object detection is based on deep-learning such as Faster R-CNN, Alexnet, which learn the features that will be used to detect/recognize the objects
Well this is hardly an answerable question, but for most computer vision applications a good starting point is the Hough Transform

How to design an approximate solution algorithm

I want to write an algorithm that can take parts of a picture and match them to another picture of the same object.
For example, If I gave the computer a picture of a vase and a picture of a scene with the vase in it, I'd expect it to determine where in the image the vase is.
How would I begin to develop an algorithm like this?
The final usage for this algorithm will be an application that for example with a picture of somebody's face could tell if they were in a crowd of people. This algorithm would eventually be applied to video streams.
edit: I'm not expecting an actual solution to this problem as I don't hope to solve it anytime soon. The real question was how do you define something like this to a computer so that you could make an algorithm to do it.
Thanks
A former teacher of mine wrote his doctorate thesis on a similar sort of problem, except his input was a detailed 3D model of something, which he would use to find that object in 2D images. This is a VERY non-trivial problem, there is no single 'answer', certainly nothing that would fit the Stack Overflow format.
My best answer: gather a ton of money and hire a very experienced programmer.
Best of luck to you.
The first problem you describe and the second are both quite different.
A major part of each is solved by the numerous machine vision libraries available. You may need a combination of techniques to achieve any success at either task.
In the first one, you would need something that generically recognizes objects. Probably i'd use a number of algorithms in concert to identify the foreground object in the model image and then do some kind of weighted comparison of the partitioned target image.
In the second case, examining faces, is a much more difficult problem relative to the general recognizer above. Faces all look the same, or nearly so. The things that a general recognizer would notice aren't likely to be good for differentiating faces. You need an algorithm already tuned to facial recognition. Fortunately this is a rapidly maturing field and you can probably do this as well as the first case, but with a different set of functions.
The simple answer is, find a mathematical way to describe faces, that can account for angles and partial missing data, then refine and teach it.
Apparently apple has done something like this, however, it still makes mistakes and has to be taught as it moves forward.
I expect it will be more about the math, than about the programming.
I think you will find this to be quite a challenge. This is an extremely difficult problem and is one of the many areas of computing that fall under the domain of artificial intelligence (AI). Facial recognition would certainly be the most popular variant of this problem and in spite of what you may read in the media, any claimed success are not what they are made out to be. I think the closest solutions involve neural nets and they require very clear and carefully selected images usually.
You could try reading here though. Good luck!

Resources