Can't understand the SIFT keypoint extraction (not description) algorithm - image

I have read two references about the SIFT algorithm here and here and I am not truly understanding how just some key points are detected considering that the algorithm works on the difference of Gaussians calculated on several resolutions (they call them octaves). Here are the steps from the technique according to what I've understood from the paper.
Given the input image, blur them with Gaussian filters using different sigmas, resulting in Gaussian filtered images. In the paper, they used 5 Gaussian filters per octave (they tell that two adjacent Gaussian filtered images were filtered using sigma and k * sigma gaussian filter parameters), and they consider 4 octaves in the algorithm. So, there are a total of 20 gaussian filtered images (5 per octave), but we will act on 5 Gaussian filtered images from each octave individually.
For each octave, we calculate 4 Difference Of Gaussian images (DOG) from the 5 Gaussian filtered images by just subtracting adjacent Gaussian filtered images. So, now we have a total of 16 difference of Gaussian images, but we will consider 4 Difference of Gaussian images from each octave individually.
Find local extrema (maximum or minimum values) pixels by comparing a pixel in each DOG with 26 neighbor pixels. Among these, 8 pixels are in the same scale as the pixel (in a 3x3 window), 9 are in a 3x3 window in the scale above (Difference of Gaussian image from the same octave) and 9 others in a 3x3 window in the scale below.
Once finding these local extrema in different octaves, we must refine them, eliminating low-contrast points or weak edge points. They filter bad candidates using threshold in a Taylor expansion function and eigenvalue ratio threshold calculated in a Hessian matrix.
(this part I don't understand perfectly): For each interest point that survived (in each octave, I believe), they consider a neighborhood around it and calculate gradient magnitudes and orientation of each pixel from that region. They build a gradient orientation histogram covering 360 degrees and select the highest peak and also peaks that are higher than 80% of the highest peak. They define that the orientation of the keypoint is defined in a parabola (fitting function?) to the 3 histogram values closest to each peak to interpolate the peak position (I really dont understand this part perfectly).
What I am not understanding
1- The tutorial and even the original paper are not clear on how to detect a single key point as we are dealing with multiple octaves (images resolutions). For example, suppose I have detected 1000 key points in the first octave, 500 in the second, 250 in the third and 125 in the fourth octave. The SIFT algorithm will return me the following data about the key points: 1-(x,y) coordinates 2- scale (what is that?) 3-orientation and 4- the feature vector (which I easily understood how it is built). There are also Python functions from Opencv that can draw these keypoints using the original image (thus, the first octave), but how if the keypoints are detected in different octaves and thus the algorithm considers DOG images with different resolutions?
2- I don't understand the part 5 of the algorithm very well. Is it useful for defining the orientation of the keypoint, right? can somebody explain that to me with other words and maybe I can understand?
3- To find Local extrema per octave (step 3), they don't explain how to do that in the first and last DOG images. As we are considering 4 DOG images, It is possible to do that only in the second and third DOG images.
4- There is another thing that the author wrote that completely messed the understanding of the approach from me:
Figure 1: For each octave of scale space, the initial image is
repeatedly convolved with Gaussians to produce the set of scale space
images shown on the left. Adjacent Gaussian images are subtracted to
produce the difference-of-Gaussian images on the right. After each
octave, the Gaussian image is down-sampled by a factor of 2, and the
process repeated.
What? Does he downsample only one Gaussian image? how can the process be repeated by doing that? I mean, the difference of Gaussians is originally done by filtering the INPUT IMAGE differently. So, I believe the INPUT IMAGE, and not the Gaussian image must be resampled. Or the author forgot to write that THE GAUSSIAN IMAGES from a given octave are downsampled and the process is repeated for another octave?

Related

Applying Delaunay Triangulation on RGB channels instead of final image

First I applied Delaunay Triangulation on an image with 3000 triangles. I measured similarity (SSIM) to original image as 0.75. (The higher value more similar)
Then I applied Delaunay Triangulation on the image's RGB channels separately as 1000 triangles each. Then I combined 3 images and formed the final image. Then I measured similarity of this (SSIM) to original image as 0.65. (The higher value more similar)
In both cases; points chosen randomly, median value of pixels containing triangles choosen as color of the triangle
I did lots of trials but none of the trials showed better results.
Isn't this weird? Think about it. I just use 1000 random triangles on one layer. Then 1000 more on second layer. Then 1000 more on third layer. When these are put on top of it, it should create more than 3000 unique polygons compared to final image triangulation. Because they do not coincide.
a) What can be the reason behind this?
b) What advantages can I obtain when I apply delaunay triangulation on RGB channels separately instead of applying it on image itself? It is obvious I can not get better similarity. But maybe Storage wise can I get better? Maybe in other areas? What can they be?
When the triangles in each layer don't coincide, it creates a low-pass filtering effect in brightness, because the three triangles that contribute to a pixel's brightness are larger than the single triangle you get in the other case.
It's hard to suggest any 'advantages' to either approach, since we don't really know why you are doing this in the first place.
If you want better similarity, though, then you have to pick better points. I would suggest making the probability of selecting a point proportional to the magnitude of the gradient at that point.

Fast algorithm to detect the inclination of the lines in the image

My requirement is to find the inclination of the lines (all 8 lines) surrounding the data matrix, as shown in the edge detected image:
The two main restrictions:
The inclination detected should have precision of at least 0.1 deg (the best achievable in this image)
Time taken should be less than 30 ms
I am implementing the algo on a Blackfin DSP, and have used Blackfin image processing toolbox.
I tried using Hough transform and Contour detection to find out the lines and thus their inclinations however the time limit exceeds. Any suggestions to use a different algorithm or optimize this one would help.
[for my use case the higher the angle precision the better, I am targeting at least 0.02 - 0.05 with a higher resolution image]
find bounding box
scan all points and found xmin,ymin,xmax,ymax of set pixels
find the gaps
cast scan lines through half of bounding box remembering/measure the gap sizes. To avoid line miss (due to holes) you can cast more scan lines or scan with wider ray.
If you need some examples for ray cast/scanning see:
How to find horizon line efficiently in a high-altitude photo?
segmentate the image into regions
just shrink bounding box by some fraction (50%) of a gap ... something like this:
forming 8 rectangular regions each one with single line without noise from the edges.
regress/fit lines
The idea is to make a list of all set pixels for each region separately and fit a line that has smallest distance to all of them.
I would try to use this:
Given n points on a 2D plane, find the maximum number of points that lie on the same straight line
Or use approximation search and fit something like
Curve fitting with y points on repeated x positions
just ignore the curvature and fit line equation parameters directly instead cubics.
After the lines are fitted you can compute their slope directly by atan2(dy,dx)
A fast and easy approch would be to scan every line and column for the first and second white pixel starting from left, right, top and bottom. Then simply use some robust line fit algorithm to get the lines.
Unless you did not already try to do that you can reduce the data for Hough transform or other algorithms by cropping the image down to DMC size.
The required angle accuracy cannot be achieved as you don't have enough resultion. And even if you had your results would suffer from any noise and outliers.

Comparing Images for common features/patterns

I have a collection of more than 100 images. Although I am able to see the common patterns between all of them I was asked to write some kind of the script/program to check for similarities or degree of similarity between them.
All of the images are 255x255 in size. There are only two colors black and white. In the most cases images are composed of three primitive shapes:
Squares/rectangles
Vertical/horizontal/diagonal lines
Blob/cloud like shapes stretched diagonally on the image
This shapes in the most cases are in the same place on the images but in different sizes and shapes. For example squares appears in the corners of the image but in different sizes.
My question is : Is there any kind of the software that can give me a numerical value that would represent the degree of similarity between images?
You can compute these operations and match them:
Median filter / Erode / Dilate
Compute gradients of images (Scharr, Sobel) to extract strong patterns
Compute Hough lines transform or contours extraction on the result of 2
4a. Perform Mahalanobis distance on HU moments from the result of 3
4b. (Alternative) compute histograms and match them on results of 3
One of the best libraries so far is OpenCV (http://www.opencv.org), which is well documented (docs.opencv.org)

Aligning to height/depth maps

I have 2 2D depth/height maps of equal dimension (256 x 256). Each pixel/cell of the depth map image, contains a float value. There are some pixels that have no information so are set to nan currently. The percentage of non nan cells can vary from ~ 20% to 80%. The depth maps are taken of the same area through point sampling an underlying common surface.
The idea is that the images represent a partial, yet overlapping, sampling of an underlying surface. And I need to align these images to create a combined sampled representation of the surface. If done blindly then the combined images have discontinuities especially in the z dimension (the float value).
What would be a fast method of aligning the 2 images? Translation in the x and y direction should be minimal only a few pixels (~ 0 to 10 pixels). But the float values of one image may need to be adjusted to align the images better. So minimizing the difference between the 2 images is the goal.
Thnx for any advice.
If your images are lacunar, one way is the exhaustive computation of a matching score in the window of overlap, ruling out the voids. FFT convolution will not apply. (Workload = overlap area * X-range * Y-range).
If both images differ in noise only, use the SAD matching score. If they also differ by the reference zero, subtract the average height before comparing.
You can achieve some acceleration by using an image pyramid, but you'll need to handle the voids.
Other approach could be to fill in the gaps by some interpolation method that somehow ensures that the interpolated values are compatible between both images.

Matlab - How to measure the dispersion of black in a binary image?

I am comparing RGB images of small colored granules spilled randomly on a white backdrop. My current method involves importing the image into Matlab, converting to a binary image, setting a threshold and forcing all pixels above it to white. Next, I am calculating the percentage of the pixels that are black. In comparing the images to one another, the measurement of % black pixels is great; however, it does not take into account how well the granules are dispersed. Although the % black from two different images may be identical, the images may be far from being alike. For example, assume I have two images to compare. Both show a % black pixels of 15%. In one picture, the black pixels are randomly distributed throughout the image. In the other, a clump of black pixels are in one corner and are very sparse in the rest of the image.
What can I use in Matlab to numerically quantify how "spread out" the black pixels are for the purpose of comparing the two images?
I haven't been able to wrap my brain around this one yet, and need some help. Your thoughts/answers are most appreciated.
Found an answer to a very similar problem -> https://stats.stackexchange.com/a/13274
Basically, you would use the average distance from a central point to every black pixel as a measure of dispersion.
My idea is based upon the mean free path ()used in ideal gad theory / thermodynamics)
First, you must separate your foreground objects, using something like bwconncomp.
The mean free path is calculated by the mean distance between the centers of your regions. So for n regions, you take all n/2*(n-1) pairs, calculate all the distances and average them. If the mean distance is big, your particles are well spread. If it is small, your objects are close together.
You may want to multiply the resulting mean by n and divide it by the edge length to get a dimensionless number. (Independent of your image size and independent of the number of particles)

Resources