please help me to understand this idea from a paper which titled is "Scene Summarization for Online Image Collections" by Ian Simon Noah Snavely Steven M. Seitz, University of Washington.
Computing the Feature-Image Matrix :
We first transform the set of views into a feature-image
incidence matrix. To do so, we use the SIFT keypoint detector
to find feature points in all of the images in V. The
feature points are represented using the SIFT descriptor.
Then, for each pair of images, we perform feature matching
on the descriptors to extract a set of candidate matches.
We further prune the set of candidates by estimating a fundamental
matrix using RANSAC and removing all inconsistent
matches After the previous step is complete
for all images,
we organize the matches into tracks,
where a track is a connected component of features. We remove
tracks containing fewer than two features total, or at
least two features in the same image. At this point, we consider
each track as corresponding to a single 3D point in S.
From the set of tracks, it is easy to construct the |S|-by-|V|
feature-image incidence matrix.
the part which i confused about is the italic one.
how we organize matches into tracks ?
and how to construct feature-image incidence matrix ?
pls help me. . .
Example for 3 images track.
Detect features
Perform matching (1 - 2, 2 - 3). Now you have correspondences FeatureA_img1 = FeatureB_img2, FeatureC_img2 = FeatureD_img3, FeatureE_img1 = FeatureF_img3.
Check, if FeatureA_img1 == FeatureB_img2 AND FeatureB_img2 == FeatureC_img3, than you have the same feature in 3 images.
Save it in the array:
img1 img2 img3 ... imgn
FeatureA FeatureB FeatureC ...
Repeat this for all correspondences. The rows in this table are the tracks you are looking for.
Related
I want to find similar images in a very large dataset (at least 50K+ images, potentially much more).
I already successfully implemented several "distance" functions (hashes compared with L2 or Hamming distance for example, image features with % of similarity, etc) - the result is always a "double" number.
What I want now is "grouping" (clustering?) images by similarity. I already achieved some pretty good results, but the groups are not perfect: some images that could be grouped with others are left aside, so my method is not that good.
I've been looking for a solution these last 3 days, but things are not so clear in my head, and maybe I overlooked a possible method?
I already have image pairs with distance : [image A (index, int), image B (index, int), distance (double)], and a list of duplicates (image X similar to images Y, Z, T, image Y similar to X, T, G, F --- etc).
My problem:
find a suitable and efficient algorithm to group images by similarity from the list of duplicates and the pairwise distances - for me the problem is not really spatial because image indexes A and B are NOT coordinates, but there is a 1-n relation between images - one method I found interesting is DBSCAN, or maybe hiearchical solutions would work?
use an efficient structure that is not too memory-hungry, so full matrices of doubles are excluded (50K x 50k or 100k x 100k, or worse 1M x 1M is not reasonable, the more images there are the more matrices "eat" memory, and what's more the matrix would be symmetric because "image A similar to image B" is the same as "B similar to A" so there would be a terrible waste of memory space)
I'm coding with C++, using Qt6 for the interface and OpenCV 4.6 for some image functions, some hashing methods, etc.
Any idea/library/structure to propose? Thanks in advance.
EDIT - to better explain what I want to achieve
Images are the yellow circles.
Image 1 is similar to image 4 with a score=3 and to 5 with a score=2
etc
The problem is that image 4 is also similar to image 5, so image 4 is more similar to 1 than 5.
The example I put here is very simple because there are no more than 2 similar images for each image. With a bigger sample, image 4 could be similar to n images... And what about equal scores?
So is there an algorithm to create groups of images, so that no image is listed twice?
The answers to my own question:
about the structure itself : it is called an "undirected weighted graph" --- English is not my native language and I had a hard time to first find the right words, and then the solution was quickly found!
clustering: there are several algorithms associated to graphs so I'll try some of them
Many thanks to #Similar_Pictures for taking the time to answer me, and opening my eyes upon the fact that the better the similarity algorithm(s), the less is the need to use complicated clustering techniques...
I am actually testing how to combine several similarity techniques: each one has its defaults, but together some work best, using refined thresholds.
I'm kind of new to this image classification stuff so this is a somewhat high-level question. I was wondering if it's possible to train an image classifier (i.e using just TF/Keras or one of the many image recognition libraries and APIs) to identify whether an object is in an object. For example:
Output: A square
Output: A circle
Output: A circle in a square
Output: A square in a circle in a square
Output: A square in a circle and a square in a square
...and so on
If it's possible, what's the best way to go about it? Do I have to train the model to recognize all the variations example by example (which is unfavorable as there are far too many potential examples), or is there some better way? Thanks :)
You can do it by using simpler computer vision techniques instead of going for machine learning.
For example, if you use OpenCV, it has an inbuilt function called findContours, which returns a hierarchy.
Example:
The matrix on top shows how each shape is related to other, according to -
[Next, Previous, First_Child, Parent]
For instance, contours 2 and 4 (circle and rectangle) are at the same level. Hence in the matrix, the next of the second row is 4. You can construct a tree like this to get the output as you desired. You just need to make sure that the inner and outer contours of single shape are not counted as two separate ones which I didn't do here so it shows 5,7 in the output.
I'm doing a personal project of trying to find a person's look-alike given a database of photographs of other people all taken in a consistent manner - people looking directly into the camera, neutral expression and no tilt to the head (think passport photo).
I have a system for placing markers for 2d coordinates on the faces and I was wondering if there are any known approaches for finding a look alike of that face given this approach?
I found the following facial recognition algorithms:
http://www.face-rec.org/algorithms/
But none deal with the specific task of finding a look-alike.
Thanks for your time.
I believe you can also try searching for "Face Verification" rather than just "Face Recognition". This might give you more relevant results.
Strictly speaking, the 2 are actually different things in scientific literature but are sometimes lumped under face recognition. For details on their differences and some sample code, take a look here: http://www.idiap.ch/~marcel/labs/faceverif.php
However, for your purposes, what others such as Edvard and Ari has kindly suggested would work too. Basically they are suggesting a K-nearest neighbor style face recognition classifier.
As a start, you can probably try that. First, compute a feature vector for each of your face images in your database. One possible feature to use is the Local Binary Pattern (LBP). You can find the code by googling it. Do the same for your query image. Now, loop through all the feature vectors and compare them to that of your query image using euclidean distance and return the K nearest ones.
While the above method is easy to code, it will generally not be as robust as some of the more sophisticated ones because they generally fail badly when faces are not aligned (known as unconstrained pose. Search for "Labelled Faces in the Wild" to see the results for state of the art for this problem.) or taken under different environmental conditions. But if the faces in your database are aligned and taken under similar conditions as you mentioned, then it might just work. If they are not aligned, you can use the face key points, which you mentioned you are able to compute, to align the faces. In general, comparing faces which are not aligned is a very difficult problem in computer vision and is still a very active area of research. But, if you only consider faces that look alike and in the same pose to be similar (i.e. similar in pose as well as looks) then this shouldn't be a problem.
The website your gave have links to the code for Eigenfaces and Fisherfaces. These are essentially 2 methods for computing feature vectors for your face images. Faces are identified by doing a K nearest neighbor search for faces in the database with feature vectors (computed using PCA and LDA respectively) closest to that of the query image.
I should probably also mention that in the Fisherfaces method, you will need to have "labels" for the faces in your database to identify the faces. This is because Linear Discriminant Analysis (LDA), the classification method used in Fisherfaces, needs this information to compute a projection matrix that will project feature vectors for similar faces close together and dissimilar ones far apart. Comparison is then performed on these projected vectors. Here lies the difference between Face Recognition and Face Verification: for recognition, you need to have "labels" your training images in your database i.e. you need to identify them.
For verification, you are only trying to tell whether any 2 given faces are of the same person. Often, you don't need the "labelled" data in the traditional sense (although some methods might make use of auxiliary training data to help in the face verification).
The code for computing Eigenfaces and Fisherfaces are available in OpenCV in case you use it.
As a side note:
A feature vector is actually just a vector in your linear algebra sense. It is simply n numbers packed together. The word "feature" refers to something like a "statistic" i.e. a feature vector is a vector containing statistics that characterizes the object it represents. For e.g., for the task of face recognition, the simplest feature vector would be the intensity values of the grayscale image of the face. In that case, I just reshape the 2D array of numbers into a n rows by 1 column vector, each entry containing the value of one pixel. The pixel value here is the "feature", and the n x 1 vector of pixel values is the feature vector. In the LBP case, roughly speaking, it computes a histogram at small patches of pixels in the image and joins these histograms together into one histogram, which is then used as the feature vector. So the Local Binary Pattern is the statistic and the histograms joined together is the feature vector. Together they described the "texture" and facial patterns of your face.
Hope this helps.
These two would seem like the equivalent problem, but I do not work in the field. You essentially have the following two problems:
Face recognition: Take a face and try to match it to a person.
Find similar faces: Take a face and try to find similar faces.
Aren't these equivalent? In (1) you start with a picture that you want to match to the owner and you compare it to a database of reference pictures for each person you know. In (2) you pick a picture in your reference database and run (1) for that picture against the other pictures in the database.
Since the algorithms seem to give you a measure of how likely two pictures belong to the same person, in (2) you just sort the measures in decreasing order and pick the top hits.
I assume you should first analyze all the picture in your database with whatever approach you are using. You should then have a set of metrics for each picture which you can compare a specific picture with and statistically find the closest match.
For example, if you can measure the distance between the eyes, you can find faces that have the same distance. You can then find the face that has the overall closest match and return that.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Detecting thin lines in blurry image
So as the title says, I am trying to detect boundaries of patterns. In the images attached, you can basically see three different patterns.
Close stripe lines
One thick L shaped line
The area between 1 & 2
I am trying to separate these three, in say 3 separate images. Depend on where the answers go, I will upload more images if needed. Both idea or code will be helpful.
You can solve (for some values of "solve") this problem using morphology. First, to make the image more uniform, remove irrelevant minima. One way to do this is using the h-dome transform for regional minima, which suppresses minima of height < h. Now, we want to join the thin lines. That is accomplished by a morphological opening with a horizontal line of length l. If the lines were merged, then the regional minima of the current image is the background. So we can fill holes to obtain the relevant components. The following code summarizes these tasks:
f = rgb2gray(imread('http://i.stack.imgur.com/02X9Z.jpg'));
hm = imhmin(f, h);
o = imopen(hm, strel('line', l, 0));
result = imfill(~imregionalmin(o), 'holes');
Now, you need to determine h and l. The parameter h is expected to be easier since it is not related to the scale of the input, and in your example, values in the range [10, 30] work fine. To determine l maybe a granulometry analysis could help. Another way is to check if the result contains two significant connected components, corresponding to the bigger L shape and the region of the thin lines. There is no need to increase l one by one, you could perform something that resembles a binary search.
Here are the hm, o and result images with h = 30 and l = 15 (l in [13, 19] works equally good here). This approach gives flexibility on parameter choosing, making it easier to pick/find good values.
To calculate the area in the space between the two largest components, we could merge them and simply count the black pixels inside the new connected component.
You can pass a window (10x10 pixels?) and collect features for that window. The features could be something as simple as the cumulative gradients (edges) within that window. This would distinguish the various areas as long as the window is big enough.
Then using each window as a data point, you can do some clustering, or if the patterns don't vary that much you can do some simple thresholds to determine which data points belong to which patterns (the larger gradient sums belong to the small lines: more edges, while the smallest gradient sums belong to the thickest lines: only one edge, and those in between belong to the other "in-between" pattern .
Once you have this classification, you can create separate images if need be.
Just throwing out ideas. You can binarize the image and do connected component labelling. Then perform some analysis on the connected components such as width to discriminate between the regions.
I got school task again. This time, my teacher gave me task to create algorithm to count how many ducks on picture.
The picture is similar to this one:
I think I should use pattern recognition for searching how many ducks on it. But I don't know which pattern match for each duck.
I think that you can solve this problem by segmenting the ducks' beaks and counting the number of connected components in the binary image.
To segment the ducks' beaks, first convert the image to HSV color space and then perform a binarization using the hue component. Note that the ducks' beaks hue are different from other parts of the image.
Here's one way:
Hough transform for circles:
Initialize an accumulator array indexed by (x,y,radius)
For each pixel:
calculate an edge (e.g. Sobel operator will provide both magnitude and direction), if magnitude exceeds some threshold then:
increment every accumulator for which this edge could possibly lend evidence (only the (x,y) in the direction of the edge, only radii between min_duck_radius and max_duck_radius)
Now smooth and threshold the accumulator array, and the coordinates of highest accumulators show you where the heads are. The threshold may leap out at you if you histogram the values in the accumulators (there may be a clear difference between "lots of evidence" and "noise").
So that's very terse, but it can get you started.
It might be just because I'm working with SIFT right now, but to me it looks like it could be good for your problem.
It is an algorithm that matches the same object on two different pictures, where the objects can have different orientations, scales and be viewed from different perspectives on the two pictures. It can also work when an object is partially hidden (as your ducks are) by another object.
I'd suggest finding a good clear picture of a rubber ducky ( :D ) and then use some SIFT implementation (VLFeat - C library with SIFT but no visualization, SIFT++ - based on VLFeat, but in C++ , Rob Hess in C with OpenCV...).
You should bear in mind that matching with SIFT (and anything else) is not perfect - so you might not get the exact number of rubber duckies in the picture.