How to detect similar objects in this picture? - algorithm

I want to find patterns in image. Saying "to find patterns" I mean "to detect similar objects", thus these patterns shouldn't be some high-frequency info like noise.
For example, on this image I'd like to get pattern "window" with ROI/ellipse of each object:
I've read advices to use Autocorrelation, FFT, DCT for this problem. As far as I've understood, Autocorrelation and FFT are alternative, not complementary.
First, I don't know if it is possible to get such high-level info in frequency domain?
As I have FFT implemented, I tried to use it. This is spectrogram:
Could you suggest how to further analyze this spectogram to detect objects "window" with their spatial locations?
Is it needed to find the brightest points/lines on spectrogram?
Should the FFT be done for image chunks instead of whole image?
If that't not possible to find such objects with this approach, what would you advice?
Thanks in advance.
P.S. Sorry for large image size.

Beware this is not my cup of tea so read with extreme prejudice. IIRC for such task are usually used SIFT/SURF + RANSAC methods.
Identify key points of interest of image SIFT/SURF
This will get you list of 2D locations in your image with specific features (which you can handle as integer hash code). I think SIFT (Scale Invariant Feature transform) is ideal for this. They work similarly like our human vision works (identify specific change in some feature and "ignore" the rest of the image). So instead of matching all the pixels of the image we cross match only few of them.
sort by occurrence
each of the found SIFT points have some feature list. if we do a histogram of this features (count how many similar or identical feature points there are) then we can group points with the same occurrence. The idea is that if we got n object placings in the image each of its key points should be n times duplicated in the final image.
So if we have many points with some n times occurrence it hints we got n similar objects in the image. From this we select just these key points for the next step.
find object placings
each object can have different scale,position and orientation. Let assume they got the same aspect ratio. So the corresponding key points in each object should have the same relative properties between the objects (like relative angle between key points, normalized distance, etc).
So the task is to regroup our key points into each object so all the objects have the same key points and the same relative properties.
This can be done by brute force (testing all the combination and checking the properties) or by RANSAC or any other method.
Usually we select one first key point (no matter which) and find 2 others that form the same angle and relative distance ratio (in all of the objects)
so angle is the same and |p1-p0| / |p2-p0| is also the same or close. While grouping realize that key points within objects are more likely closer to each other ... so we can augment our search by distance from the first selected key point.... to decide to which object the key point probably belongs to (if try those first we got high probability we found our combination fast). All the other points pi we can add similarly one by one (using p0,p1,pi)
So I would starting by closest 2 key points ... (this sometimes can be fouled by overlapping or touching mirrored images) as the key point from the neighbor object can be sometimes closer that from the own one ...
After such regrouping just check if all the found objects have the same properties (aspect ratio) ... to visualize them you can find the OBB (Oriented Bounding Box) of the key points (which can be also used for the check)

Related

Recognize recurring images into a larger one

Edit: this is not a duplicate of Determine if an image exists within a larger image, and if so, find it, using Python since I do not know the pattern beforehand
Suppose I have a big image (usually a picture taken with a camera so it might be a bit noisy, but let's assume it's not for now) made up of multiple smaller images all equal among themselves, something like
I need to find the contour of each one of those. The first step is recognizing that there's a recurring image (or unknown pattern) in the 2D image. How can I achieve this first step?
I did read around that I might use a FFT of the original image and search for duplicate frequencies, would that be a feasible approach?
To build a bit on the problem: I do not know the image beforehand, nor its size or how many will there be on the big image. The images can be shot from camera so they might be noisy. The images won't overlap.
You can try to use described keypoints (Sift/SURF/ORB/etc.) to find features in the image and try to detect the same features in the image.
You can see such a result in How to find euclidean distance between keypoints of a single image in opencv where 3x the same image is present and features are detected and linked between those subimages automatically.
In your image the result looks like
so you can see that the different occurances of the same pattern is indeed automatically detected and linked.
Next steps would be to group features to objects, so that the "whole" pattern can be extracted. Once you have a candidate for a pattern, you can extract a homography for each occurance of the pattern (with one reference candidate pattern) to verify that it is a pattern. One open problem is how to find such candidates. Maybe it is worth trying to find "parallel features", so keypoint matches that have parallel lines and/or same length lines (see image). Or maybe there is some graph theory approach.
All in all, this whole approach will have some advantages and disadvantes:
Advantages:
real world applicability - Sift and other keypoints are working quite well even with noise and some perspective effects, so chances are increased to find such patterns.
Disadvantages
slow
parametric (define what it means that two features are successfully
matched)
not suitable for all kind of patterns - your pattern must have some extractable keypoints
Those are some thoughts and probably not complete ;)
Unfortunately no full code yet for your concrete task, but I hope the idea is clear.
For such a clean image, it suffices to segment the patterns by blob analysis and to compare the segments or ROI that contain them. The size is a first matching criterion. The SAD, SSD or correlation similarity scores can do finer comparison.
In practice you will face more difficulties such as
not possible to segment the patterns
geometric variations in size/orientation
partial occlusion
...
Handling these is out of the scope of this answer; it makes things much harder than in the "toy" case.
The goal is to find several equal or very similar patterns which are not known before in a picture. As it is this problem is still a bit ill posed.
Are the patterns exactly equal or only similar (added noise maybe)?
Do you want to have the largest possible patterns or are smaller subpatterns okay too or are all possible patterns needed? Reason is that of course each pattern could consist of equal patterns too.
Is the background always that simple (completely white) or can it be much more difficult? What do we know about it?
Are the patterns always equally oriented, equally scaled, non-overlapping?
For the simple case of non-overlapping patterns with simple background, the answer of Yves Daoust using segmentation is well performing but fails if patterns are very close or overlapping.
For other cases the idea of the keypoints by Micka will help but might not perform well if there is noise or might be slow.
I have one alternative: look at correlations of subblocks of the image.
In pseudocode:
Divide the image in overlapping areas of size MxN for a suitable M,N (pixel width and height chosen to be approximately the size of the desired pattern)
Correlate each subblock with the whole image. Look for local maxima in the correlation. The position of these maxima denotes the position of similar regions.
Choose a global threshold on all correlations (smartly somehow) and find sets of equal patterns.
Determine the fine structure of these patterns by shanging the shape from rectangular (bounding box) to a more sophisticaed shape (maybe by looking at the shape of the peaks in the correlation)
In case the approximate size of the desired patterns is not known before, try with large values of M, N and go down to smaller ones.
To speed up the whole process start on a coarse scale (downscaled version of the image) and then process finer scales only where needed. Needs balancing of zooming in and performing correlations.
Sorry, I cannot make this a full Matlab project right now, but I hope this helps you.

Algorithms for finding a look alike face?

I'm doing a personal project of trying to find a person's look-alike given a database of photographs of other people all taken in a consistent manner - people looking directly into the camera, neutral expression and no tilt to the head (think passport photo).
I have a system for placing markers for 2d coordinates on the faces and I was wondering if there are any known approaches for finding a look alike of that face given this approach?
I found the following facial recognition algorithms:
http://www.face-rec.org/algorithms/
But none deal with the specific task of finding a look-alike.
Thanks for your time.
I believe you can also try searching for "Face Verification" rather than just "Face Recognition". This might give you more relevant results.
Strictly speaking, the 2 are actually different things in scientific literature but are sometimes lumped under face recognition. For details on their differences and some sample code, take a look here: http://www.idiap.ch/~marcel/labs/faceverif.php
However, for your purposes, what others such as Edvard and Ari has kindly suggested would work too. Basically they are suggesting a K-nearest neighbor style face recognition classifier.
As a start, you can probably try that. First, compute a feature vector for each of your face images in your database. One possible feature to use is the Local Binary Pattern (LBP). You can find the code by googling it. Do the same for your query image. Now, loop through all the feature vectors and compare them to that of your query image using euclidean distance and return the K nearest ones.
While the above method is easy to code, it will generally not be as robust as some of the more sophisticated ones because they generally fail badly when faces are not aligned (known as unconstrained pose. Search for "Labelled Faces in the Wild" to see the results for state of the art for this problem.) or taken under different environmental conditions. But if the faces in your database are aligned and taken under similar conditions as you mentioned, then it might just work. If they are not aligned, you can use the face key points, which you mentioned you are able to compute, to align the faces. In general, comparing faces which are not aligned is a very difficult problem in computer vision and is still a very active area of research. But, if you only consider faces that look alike and in the same pose to be similar (i.e. similar in pose as well as looks) then this shouldn't be a problem.
The website your gave have links to the code for Eigenfaces and Fisherfaces. These are essentially 2 methods for computing feature vectors for your face images. Faces are identified by doing a K nearest neighbor search for faces in the database with feature vectors (computed using PCA and LDA respectively) closest to that of the query image.
I should probably also mention that in the Fisherfaces method, you will need to have "labels" for the faces in your database to identify the faces. This is because Linear Discriminant Analysis (LDA), the classification method used in Fisherfaces, needs this information to compute a projection matrix that will project feature vectors for similar faces close together and dissimilar ones far apart. Comparison is then performed on these projected vectors. Here lies the difference between Face Recognition and Face Verification: for recognition, you need to have "labels" your training images in your database i.e. you need to identify them.
For verification, you are only trying to tell whether any 2 given faces are of the same person. Often, you don't need the "labelled" data in the traditional sense (although some methods might make use of auxiliary training data to help in the face verification).
The code for computing Eigenfaces and Fisherfaces are available in OpenCV in case you use it.
As a side note:
A feature vector is actually just a vector in your linear algebra sense. It is simply n numbers packed together. The word "feature" refers to something like a "statistic" i.e. a feature vector is a vector containing statistics that characterizes the object it represents. For e.g., for the task of face recognition, the simplest feature vector would be the intensity values of the grayscale image of the face. In that case, I just reshape the 2D array of numbers into a n rows by 1 column vector, each entry containing the value of one pixel. The pixel value here is the "feature", and the n x 1 vector of pixel values is the feature vector. In the LBP case, roughly speaking, it computes a histogram at small patches of pixels in the image and joins these histograms together into one histogram, which is then used as the feature vector. So the Local Binary Pattern is the statistic and the histograms joined together is the feature vector. Together they described the "texture" and facial patterns of your face.
Hope this helps.
These two would seem like the equivalent problem, but I do not work in the field. You essentially have the following two problems:
Face recognition: Take a face and try to match it to a person.
Find similar faces: Take a face and try to find similar faces.
Aren't these equivalent? In (1) you start with a picture that you want to match to the owner and you compare it to a database of reference pictures for each person you know. In (2) you pick a picture in your reference database and run (1) for that picture against the other pictures in the database.
Since the algorithms seem to give you a measure of how likely two pictures belong to the same person, in (2) you just sort the measures in decreasing order and pick the top hits.
I assume you should first analyze all the picture in your database with whatever approach you are using. You should then have a set of metrics for each picture which you can compare a specific picture with and statistically find the closest match.
For example, if you can measure the distance between the eyes, you can find faces that have the same distance. You can then find the face that has the overall closest match and return that.

Counting object on image algorithm

I got school task again. This time, my teacher gave me task to create algorithm to count how many ducks on picture.
The picture is similar to this one:
I think I should use pattern recognition for searching how many ducks on it. But I don't know which pattern match for each duck.
I think that you can solve this problem by segmenting the ducks' beaks and counting the number of connected components in the binary image.
To segment the ducks' beaks, first convert the image to HSV color space and then perform a binarization using the hue component. Note that the ducks' beaks hue are different from other parts of the image.
Here's one way:
Hough transform for circles:
Initialize an accumulator array indexed by (x,y,radius)
For each pixel:
calculate an edge (e.g. Sobel operator will provide both magnitude and direction), if magnitude exceeds some threshold then:
increment every accumulator for which this edge could possibly lend evidence (only the (x,y) in the direction of the edge, only radii between min_duck_radius and max_duck_radius)
Now smooth and threshold the accumulator array, and the coordinates of highest accumulators show you where the heads are. The threshold may leap out at you if you histogram the values in the accumulators (there may be a clear difference between "lots of evidence" and "noise").
So that's very terse, but it can get you started.
It might be just because I'm working with SIFT right now, but to me it looks like it could be good for your problem.
It is an algorithm that matches the same object on two different pictures, where the objects can have different orientations, scales and be viewed from different perspectives on the two pictures. It can also work when an object is partially hidden (as your ducks are) by another object.
I'd suggest finding a good clear picture of a rubber ducky ( :D ) and then use some SIFT implementation (VLFeat - C library with SIFT but no visualization, SIFT++ - based on VLFeat, but in C++ , Rob Hess in C with OpenCV...).
You should bear in mind that matching with SIFT (and anything else) is not perfect - so you might not get the exact number of rubber duckies in the picture.

Best approach for specific Object/Image Recognition task?

I'm searching for an certain object in my photograph:
Object: Outline of a rectangle with an X in the middle. It looks like a rectangular checkbox. That's all. So, no fill, just lines. The rectangle will have the same ratios of length to width but it could be any size or any rotation in the photograph.
I've looked a whole bunch of image recognition approaches. But I'm trying to determine the best for this specific task. Most importantly, the object is made of lines and is not a filled shape. Also, there is no perspective distortion, so the rectangular object will always have right angles in the photograph.
Any ideas? I'm hoping for something that I can implement fairly easily.
Thanks all.
You could try using a corner detector (e.g. Harris) to find the corners of the box, the ends and the intersection of the X. That simplifies the problem to finding points in the right configuration.
Edit (response to comment):
I'm assuming you can find the corner points in your image, the 4 corners of the rectangle, the 4 line endings of the X and the center of the X, plus a few other corners in the image due to noise or objects in the background. That simplifies the problem to finding a set of 9 points in the right configuration, out of a given set of points.
My first try would be to look at each corner point A. Then I'd iterate over the points B close to A. Now if I assume that (e.g.) A is the upper left corner of the rectangle and B is the lower right corner, I can easily calculate, where I would expect the other corner points to be in the image. I'd use some nearest-neighbor search (or a library like FLANN) to see if there are corners where I'd expect them. If I can find a set of points that matches these expected positions, I know where the symbol would be, if it is present in the image.
You have to try if that is good enough for your application. If you have too many false positives (sets of corners of other objects that accidentially form a rectangle + X), you could check if there are lines (i.e. high contrast in the right direction) where you would expect them. And you could check if there is low contrast where there are no lines in the pattern. This should be relatively straightforward once you know the points in the image that correspond to the corners/line endings in the object you're looking for.
I'd suggest the Generalized Hough Transform. It seems you have a fairly simple, fixed shape. The generalized Hough transform should be able to detect that shape at any rotation or scale in the image. You many need to threshold the original image, or pre-process it in some way for this method to be useful though.
You can use local features to identify the object in image. Feature detection wiki
For example, you can calculate features on some referent image which contains only the object you're looking for and save the results, let's say, to a plain text file. After that you can search for the object just by comparing newly calculated features (on images with some complex scenes containing the object) with the referent ones.
Here's some good resource on local features:
Local Invariant Feature Detectors: A Survey

Raytracing (LoS) on 3D hex-like tile maps

Greetings,
I'm working on a game project that uses a 3D variant of hexagonal tile maps. Tiles are actually cubes, not hexes, but are laid out just like hexes (because a square can be turned to a cube to extrapolate from 2D to 3D, but there is no 3D version of a hex). Rather than a verbose description, here goes an example of a 4x4x4 map:
(I have highlighted an arbitrary tile (green) and its adjacent tiles (yellow) to help describe how the whole thing is supposed to work; but the adjacency functions are not the issue, that's already solved.)
I have a struct type to represent tiles, and maps are represented as a 3D array of tiles (wrapped in a Map class to add some utility methods, but that's not very relevant).
Each tile is supposed to represent a perfectly cubic space, and they are all exactly the same size. Also, the offset between adjacent "rows" is exactly half the size of a tile.
That's enough context; my question is:
Given the coordinates of two points A and B, how can I generate a list of the tiles (or, rather, their coordinates) that a straight line between A and B would cross?
That would later be used for a variety of purposes, such as determining Line-of-sight, charge path legality, and so on.
BTW, this may be useful: my maps use the (0,0,0) as a reference position. The 'jagging' of the map can be defined as offsetting each tile ((y+z) mod 2) * tileSize/2.0 to the right from the position it'd have on a "sane" cartesian system. For the non-jagged rows, that yields 0; for rows where (y+z) mod 2 is 1, it yields 0.5 tiles.
I'm working on C#4 targeting the .Net Framework 4.0; but I don't really need specific code, just the algorithm to solve the weird geometric/mathematical problem. I have been trying for several days to solve this at no avail; and trying to draw the whole thing on paper to "visualize" it didn't help either :( .
Thanks in advance for any answer
Until one of the clever SOers turns up, here's my dumb solution. I'll explain it in 2D 'cos that makes it easier to explain, but it will generalise to 3D easily enough. I think any attempt to try to work this entirely in cell index space is doomed to failure (though I'll admit it's just what I think and I look forward to being proved wrong).
So you need to define a function to map from cartesian coordinates to cell indices. This is straightforward, if a little tricky. First, decide whether point(0,0) is the bottom left corner of cell(0,0) or the centre, or some other point. Since it makes the explanations easier, I'll go with bottom-left corner. Observe that any point(x,floor(y)==0) maps to cell(floor(x),0). Indeed, any point(x,even(floor(y))) maps to cell(floor(x),floor(y)).
Here, I invent the boolean function even which returns True if its argument is an even integer. I'll use odd next: any point point(x,odd(floor(y)) maps to cell(floor(x-0.5),floor(y)).
Now you have the basics of the recipe for determining lines-of-sight.
You will also need a function to map from cell(m,n) back to a point in cartesian space. That should be straightforward once you have decided where the origin lies.
Now, unless I've misplaced some brackets, I think you are on your way. You'll need to:
decide where in cell(0,0) you position point(0,0); and adjust the function accordingly;
decide where points along the cell boundaries fall; and
generalise this into 3 dimensions.
Depending on the size of the playing field you could store the cartesian coordinates of the cell boundaries in a lookup table (or other data structure), which would probably speed things up.
Perhaps you can avoid all the complex math if you look at your problem in another way:
I see that you only shift your blocks (alternating) along the first axis by half the blocksize. If you split up your blocks along this axis the above example will become (with shifts) an (9x4x4) simple cartesian coordinate system with regular stacked blocks. Now doing the raytracing becomes much more simple and less error prone.

Resources