The paper "Fast Approximated SIFT" (M Grabner, H Grabner, ACCV 2006)
http://www.icg.tu-graz.ac.at/publications/pubobjects/mgrabner06FastApproxSIFT
shows an improved method to extract SIFT descriptors from image using integral histograms.
It says "for the descriptor we rotate the midpoints of each sub-patch relative to the orientation and compute the histograms of overlapping sub-patches without aligning the squared region but shifting the sub-patch histogram relative to the main orientation."
In this paper, the histogram of the 4*4 sub-patches around the keypoint can be computed easily using integral histogram. However, the result histograms are not rotated with orientation of the keypoint. The conventional SIFT needs every pixel in the sub-patches to be rotated with an orientation, then compute the histogram. But it seems this new method in the paper can make the rotation after getting the non-rotated histogram by "shifting the sub-patch histogram relative to the main orientation". I do not understand how to "shifting the sub-patch histogram relative to the main orientation"?
I quote here:"for the descriptor we rotate the midpoints of each sub-patch relative to the orientation and compute the histograms of overlapping sub-patches without aligning the squared region but shifting the sub-patch histogram relative to the main orientation."
For example if a non-rotated sub-patch histogram has 8 bins from 0 to 2pi, with an interval pi/4, each bin's value 2,4,5,3,6,8,7,1, and the orientation of keypoint is pi/6, how to know the new value of 8 bins in the rotated histogram?
As far as I understand it: They round the orientation to the next Pi/4 interval. That way you can just rotate the entire array and
2 4 5 3 6 8 7 1 becomes
_ 4 5 3 6 8 7 1 2 which represents the histogram of the rotated patch.
Related
I have some area X by Y pixels and I need to fill it up pixel by pixel. The problem is that at any given moment the drawn shape should be as round as possible.
I think that this algorithm is subset of Ordered Dithering, when converting grayscale images to one-bit, but I could not find any references nor could I figure it out myself.
I am aware of Bresenham's Circle, but it is used to draw circle of certain radius not area.
I created animation of all filling percents for 10 by 10 pixel grid. As full area is 10x10=100px, then each frame is exactly 1% inc.
A filled disk has the equation
(X - Xc)² + (Y - Yc)² ≤ C.
When you increase C, the number of points that satisfies the equation increases, but because of symmetry it increases in bursts.
To obtain the desired filling effect, you can compute (X - Xc)² + (Y - Yc)² for every pixel, sort on this value, and let the pixels appear one by one (or in a single go if you know the desired number of pixels).
You can break ties in different ways:
keep the original order as when you computed the pixels, by using a stable sort;
shuffle the runs of equal values;
slightly alter the center coordinates so that there are no ties.
Filling with the de-centering trick.
Values:
Order:
Given a 2D pixel array where any pixel can be either 0 or 1, what algorithm would output a new 2D pixel array, where each pixel with a value of 1 would get a new value based on the minimum amount of line segments required to reach a specific "light source" pixel, while only crossing pixels that have an input value of 1? Input values of 0 would not change.
Example of input array, magenta cross represents the "light source" pixel:
https://cdn3.imggmi.com/uploads/2019/1/8/2a5f6dd0ebdc9c72115f9ce93af3337a-full.png
Output array with output values 1 and 2 (Photoshopped, not a pixel perfect image):
https://cdn3.imggmi.com/uploads/2019/1/8/0025709aaa826c26ee0a8e17476419cb-full.png
Red region = 1 line segment away from source
Yellow region = 2 line segments away from source
White region = 3 or more line segments away from source
(The algorithm wouldn't stop at 3, it would continue until every pixel is evaluated.)
EDIT: I'm not sure whether StackOverflow is the right Stack Exchange site to post this in, if not please let me know!
Make your point of origin the origin of a polar coordinate system. Convert the block corners to polar coordinates.
Now, treat your point source as a searchlight, sweeping from 0 to 2*PI. The beam continues until it hits the frame edge or a black box. This defines a polygon that you fill with magenta (1 line segment, direct lighting).
That's the easy part. Now you get to repeat this for every pixel that lies on a magenta-white (1-0) boundary of the polygon. This defines a finite set of secondary polygons; fill those with yellow (code 2).
Repeat this process with the yellow-white (2-0) boundaries to identify the 3 pixels; iterate until you run out of pixels.
In other paradigms, I've applied interval algebra to blocking segments (e.g. where one block partially shadows another), but I think that the polar polygon attack will get you to a solution in fewer hours of coding.
I have read two references about the SIFT algorithm here and here and I am not truly understanding how just some key points are detected considering that the algorithm works on the difference of Gaussians calculated on several resolutions (they call them octaves). Here are the steps from the technique according to what I've understood from the paper.
Given the input image, blur them with Gaussian filters using different sigmas, resulting in Gaussian filtered images. In the paper, they used 5 Gaussian filters per octave (they tell that two adjacent Gaussian filtered images were filtered using sigma and k * sigma gaussian filter parameters), and they consider 4 octaves in the algorithm. So, there are a total of 20 gaussian filtered images (5 per octave), but we will act on 5 Gaussian filtered images from each octave individually.
For each octave, we calculate 4 Difference Of Gaussian images (DOG) from the 5 Gaussian filtered images by just subtracting adjacent Gaussian filtered images. So, now we have a total of 16 difference of Gaussian images, but we will consider 4 Difference of Gaussian images from each octave individually.
Find local extrema (maximum or minimum values) pixels by comparing a pixel in each DOG with 26 neighbor pixels. Among these, 8 pixels are in the same scale as the pixel (in a 3x3 window), 9 are in a 3x3 window in the scale above (Difference of Gaussian image from the same octave) and 9 others in a 3x3 window in the scale below.
Once finding these local extrema in different octaves, we must refine them, eliminating low-contrast points or weak edge points. They filter bad candidates using threshold in a Taylor expansion function and eigenvalue ratio threshold calculated in a Hessian matrix.
(this part I don't understand perfectly): For each interest point that survived (in each octave, I believe), they consider a neighborhood around it and calculate gradient magnitudes and orientation of each pixel from that region. They build a gradient orientation histogram covering 360 degrees and select the highest peak and also peaks that are higher than 80% of the highest peak. They define that the orientation of the keypoint is defined in a parabola (fitting function?) to the 3 histogram values closest to each peak to interpolate the peak position (I really dont understand this part perfectly).
What I am not understanding
1- The tutorial and even the original paper are not clear on how to detect a single key point as we are dealing with multiple octaves (images resolutions). For example, suppose I have detected 1000 key points in the first octave, 500 in the second, 250 in the third and 125 in the fourth octave. The SIFT algorithm will return me the following data about the key points: 1-(x,y) coordinates 2- scale (what is that?) 3-orientation and 4- the feature vector (which I easily understood how it is built). There are also Python functions from Opencv that can draw these keypoints using the original image (thus, the first octave), but how if the keypoints are detected in different octaves and thus the algorithm considers DOG images with different resolutions?
2- I don't understand the part 5 of the algorithm very well. Is it useful for defining the orientation of the keypoint, right? can somebody explain that to me with other words and maybe I can understand?
3- To find Local extrema per octave (step 3), they don't explain how to do that in the first and last DOG images. As we are considering 4 DOG images, It is possible to do that only in the second and third DOG images.
4- There is another thing that the author wrote that completely messed the understanding of the approach from me:
Figure 1: For each octave of scale space, the initial image is
repeatedly convolved with Gaussians to produce the set of scale space
images shown on the left. Adjacent Gaussian images are subtracted to
produce the difference-of-Gaussian images on the right. After each
octave, the Gaussian image is down-sampled by a factor of 2, and the
process repeated.
What? Does he downsample only one Gaussian image? how can the process be repeated by doing that? I mean, the difference of Gaussians is originally done by filtering the INPUT IMAGE differently. So, I believe the INPUT IMAGE, and not the Gaussian image must be resampled. Or the author forgot to write that THE GAUSSIAN IMAGES from a given octave are downsampled and the process is repeated for another octave?
I have 1000 of 2D gray-scale images and would like to cluster them in python in a way that images with more similarities stay in same group. The images represents simple geometrical shapes including circles, triangle etc.
If I wan to flatten each image to have a vector and then run the clustering algorithm, it would be very complicated. The images are 400*500, so my clustering training data would be 1000*200000 which means 200000 features!
Just wondering if anyone has come across this issue before?
This is a similar question to this one
Read my answer
Of course you don't use each picture as a feature.
In your case I would recommend features like:
Find corners and calculate their number
Assuming each edge is a straight line - do a histogram of orientations. In each pixel calculate the derivative angle atan(dy,dx), take the strongest 1% of derivative pixels and do a histogram. The amount of peaks in the histogram will correspond to amount of edges (will cluster triangles, squares, circles, etc)
Use connected components analysis to calculate how many shapes you have in the image. Calculate the amount of holes in each shape. Calculate the ratio between the circumference and the area o the shape. For geometrical shapes, geometrical features work extremely well
As you asked in the comment I am adding more info for issue 2.
Please read more about HOG feature here. I assume your are familiar with that is an edge in the image and what a gradient is. Imagine you have a triangle in the image. Only Pixels that lie on the edges of the shape will have a high gradient. Moreover you expect that all the gradients devide into 3 different directions, one for each edge. You don't know in which direction since you don't know the orientation of the triangle but you know that there should be 3 directions. With a square there would be 2 directions and with circle there will not be a clear direction. You want to count the amount of directions. Use the following steps. First find the pixels which have a high gradient value. Say from the entire image there is only 1000 such pixels (they lie on the edges of the shape). For each pixel calculate the angle of the gradient. So you have 1000 pixels, each may have an angle of [0..179] (Angle of 180 is equal to 0). There are 180 different angles. Lets assume that in order to reduce noise you don't need the exact angle but +- 1 degrees. So each angle is divided by 2 and rounded to the nearest integer. So totally you have 1000 pixels, each having only 90 options for different angle. Now make a histogram of angles. If the shape was a circle you expect that roughly ~11 (=1000/90) pixels will fall into each bin of the histogram. If it was a square you expect the histogram to be largely empty except for 2 bins with a very high amount of pixels in it and the bins being at distance of 45 from each other. Example: bin 13 has 400 pixels in it, bi 58 has
400 pixels in it and the rest 200 are noise split somehow in the other bins. Now you know that you are facing a square and you also know its rotation in the image.
If it was a triangle you expect 3 large bins in the histogram.
I am currently working on a computer-vision program that requires me to determine the "direction" of a color blob in an image. The color blob generally follows an elliptical shape and thus can be used to track direction (with respect to an initially defined/determined orientation) through time.
The means by which I figured I would calculate changes in direction are described as follows:
Quantize possible directions (360 degrees) into N directions (potentially 8, for 45 degree angle increments).
Given a stored matrix representing the initial state (t0) of the color blob, also acquire a matrix representing the current state (tn) of the blob.
Iterate through these N directions and search for the longest stretch of the color value for that given direction. (e.g. if the ellipse is rotated 45 degrees with 0 being vertical, the longest length should be attributed to the 45 degree mark / or 225 degrees).
The concept itself isn't complicated, but I'm having trouble with the following:
Calculating the longest stretch of a value at any angle in an image. This is simple for angles such as 0, 45, 90, etc. but more difficult for the in-between angles. "Quantizing" the angles is not as easy to me as it sounds.
Please do not worry about potential issue with distinguishing angles such as 0 and 90. Inertia can be used to determine the most likely direction of the color blob (in other words, based upon past orientation states).
My main concern is identifying the "longest stretch" in the matrix.
Thank you for your help!
You can use image moments as suggested here: Matlab - Image Momentum Calculation.
In matlab you would use regionprops with the property 'Orientation', but the wiki article in the previous answer should give you all of the information you need to code it in the language of your choice.