Aligning to height/depth maps - algorithm

I have 2 2D depth/height maps of equal dimension (256 x 256). Each pixel/cell of the depth map image, contains a float value. There are some pixels that have no information so are set to nan currently. The percentage of non nan cells can vary from ~ 20% to 80%. The depth maps are taken of the same area through point sampling an underlying common surface.
The idea is that the images represent a partial, yet overlapping, sampling of an underlying surface. And I need to align these images to create a combined sampled representation of the surface. If done blindly then the combined images have discontinuities especially in the z dimension (the float value).
What would be a fast method of aligning the 2 images? Translation in the x and y direction should be minimal only a few pixels (~ 0 to 10 pixels). But the float values of one image may need to be adjusted to align the images better. So minimizing the difference between the 2 images is the goal.
Thnx for any advice.

If your images are lacunar, one way is the exhaustive computation of a matching score in the window of overlap, ruling out the voids. FFT convolution will not apply. (Workload = overlap area * X-range * Y-range).
If both images differ in noise only, use the SAD matching score. If they also differ by the reference zero, subtract the average height before comparing.
You can achieve some acceleration by using an image pyramid, but you'll need to handle the voids.
Other approach could be to fill in the gaps by some interpolation method that somehow ensures that the interpolated values are compatible between both images.

Related

What does the rawintden measurement in ImageJ mean?

I am having trouble understanding the rawintden measurement in imageJ. It outputs numbers as large as 300,000, but pixel intensity is only measured on a scale of 0-255. Why is this scale so much larger and what does the calculation represent?
Thank you for your help!!
The value raw integrated density (RawIntDen) is the sum of all pixel values in the ROI (region of interest). Dividing this value by the number of pixels in the ROI gives the Mean. Since it is the sum of several pixels, its value is usually larger than the bit-depth of the image.
There is another measurement called IntDen which is the Area multiplied by the Mean. In an uncalibrated image, RawIntDen and IntDen are equal.

Drawing pixel circle of given area

I have some area X by Y pixels and I need to fill it up pixel by pixel. The problem is that at any given moment the drawn shape should be as round as possible.
I think that this algorithm is subset of Ordered Dithering, when converting grayscale images to one-bit, but I could not find any references nor could I figure it out myself.
I am aware of Bresenham's Circle, but it is used to draw circle of certain radius not area.
I created animation of all filling percents for 10 by 10 pixel grid. As full area is 10x10=100px, then each frame is exactly 1% inc.
A filled disk has the equation
(X - Xc)² + (Y - Yc)² ≤ C.
When you increase C, the number of points that satisfies the equation increases, but because of symmetry it increases in bursts.
To obtain the desired filling effect, you can compute (X - Xc)² + (Y - Yc)² for every pixel, sort on this value, and let the pixels appear one by one (or in a single go if you know the desired number of pixels).
You can break ties in different ways:
keep the original order as when you computed the pixels, by using a stable sort;
shuffle the runs of equal values;
slightly alter the center coordinates so that there are no ties.
Filling with the de-centering trick.
Values:
Order:

Can't understand the SIFT keypoint extraction (not description) algorithm

I have read two references about the SIFT algorithm here and here and I am not truly understanding how just some key points are detected considering that the algorithm works on the difference of Gaussians calculated on several resolutions (they call them octaves). Here are the steps from the technique according to what I've understood from the paper.
Given the input image, blur them with Gaussian filters using different sigmas, resulting in Gaussian filtered images. In the paper, they used 5 Gaussian filters per octave (they tell that two adjacent Gaussian filtered images were filtered using sigma and k * sigma gaussian filter parameters), and they consider 4 octaves in the algorithm. So, there are a total of 20 gaussian filtered images (5 per octave), but we will act on 5 Gaussian filtered images from each octave individually.
For each octave, we calculate 4 Difference Of Gaussian images (DOG) from the 5 Gaussian filtered images by just subtracting adjacent Gaussian filtered images. So, now we have a total of 16 difference of Gaussian images, but we will consider 4 Difference of Gaussian images from each octave individually.
Find local extrema (maximum or minimum values) pixels by comparing a pixel in each DOG with 26 neighbor pixels. Among these, 8 pixels are in the same scale as the pixel (in a 3x3 window), 9 are in a 3x3 window in the scale above (Difference of Gaussian image from the same octave) and 9 others in a 3x3 window in the scale below.
Once finding these local extrema in different octaves, we must refine them, eliminating low-contrast points or weak edge points. They filter bad candidates using threshold in a Taylor expansion function and eigenvalue ratio threshold calculated in a Hessian matrix.
(this part I don't understand perfectly): For each interest point that survived (in each octave, I believe), they consider a neighborhood around it and calculate gradient magnitudes and orientation of each pixel from that region. They build a gradient orientation histogram covering 360 degrees and select the highest peak and also peaks that are higher than 80% of the highest peak. They define that the orientation of the keypoint is defined in a parabola (fitting function?) to the 3 histogram values closest to each peak to interpolate the peak position (I really dont understand this part perfectly).
What I am not understanding
1- The tutorial and even the original paper are not clear on how to detect a single key point as we are dealing with multiple octaves (images resolutions). For example, suppose I have detected 1000 key points in the first octave, 500 in the second, 250 in the third and 125 in the fourth octave. The SIFT algorithm will return me the following data about the key points: 1-(x,y) coordinates 2- scale (what is that?) 3-orientation and 4- the feature vector (which I easily understood how it is built). There are also Python functions from Opencv that can draw these keypoints using the original image (thus, the first octave), but how if the keypoints are detected in different octaves and thus the algorithm considers DOG images with different resolutions?
2- I don't understand the part 5 of the algorithm very well. Is it useful for defining the orientation of the keypoint, right? can somebody explain that to me with other words and maybe I can understand?
3- To find Local extrema per octave (step 3), they don't explain how to do that in the first and last DOG images. As we are considering 4 DOG images, It is possible to do that only in the second and third DOG images.
4- There is another thing that the author wrote that completely messed the understanding of the approach from me:
Figure 1: For each octave of scale space, the initial image is
repeatedly convolved with Gaussians to produce the set of scale space
images shown on the left. Adjacent Gaussian images are subtracted to
produce the difference-of-Gaussian images on the right. After each
octave, the Gaussian image is down-sampled by a factor of 2, and the
process repeated.
What? Does he downsample only one Gaussian image? how can the process be repeated by doing that? I mean, the difference of Gaussians is originally done by filtering the INPUT IMAGE differently. So, I believe the INPUT IMAGE, and not the Gaussian image must be resampled. Or the author forgot to write that THE GAUSSIAN IMAGES from a given octave are downsampled and the process is repeated for another octave?

Matlab - How to measure the dispersion of black in a binary image?

I am comparing RGB images of small colored granules spilled randomly on a white backdrop. My current method involves importing the image into Matlab, converting to a binary image, setting a threshold and forcing all pixels above it to white. Next, I am calculating the percentage of the pixels that are black. In comparing the images to one another, the measurement of % black pixels is great; however, it does not take into account how well the granules are dispersed. Although the % black from two different images may be identical, the images may be far from being alike. For example, assume I have two images to compare. Both show a % black pixels of 15%. In one picture, the black pixels are randomly distributed throughout the image. In the other, a clump of black pixels are in one corner and are very sparse in the rest of the image.
What can I use in Matlab to numerically quantify how "spread out" the black pixels are for the purpose of comparing the two images?
I haven't been able to wrap my brain around this one yet, and need some help. Your thoughts/answers are most appreciated.
Found an answer to a very similar problem -> https://stats.stackexchange.com/a/13274
Basically, you would use the average distance from a central point to every black pixel as a measure of dispersion.
My idea is based upon the mean free path ()used in ideal gad theory / thermodynamics)
First, you must separate your foreground objects, using something like bwconncomp.
The mean free path is calculated by the mean distance between the centers of your regions. So for n regions, you take all n/2*(n-1) pairs, calculate all the distances and average them. If the mean distance is big, your particles are well spread. If it is small, your objects are close together.
You may want to multiply the resulting mean by n and divide it by the edge length to get a dimensionless number. (Independent of your image size and independent of the number of particles)

How to get the histogram orientation of a 'one' cell according to Dalal and Triggs?

I am trying to implement the method of Dalal and Triggs. I could implement the first stage compute gradients on an image, and I could create the code who walk across the image in cells, but I don't understand the logic behind this stage.
I know is necessary identify first between a signed (0-360 degrees) or unsigned (0-180 degrees) gradients.
I know I must create a data structure to store each cell histogram, whit n bins. I know what is a histogram, hence I understand I must visit each pixel, but I I don't fully understand about the method for classify each pixel, get the gradient orientation of this pixel and build the histogram with this data.
In short HOG is nothing but a dense representation of gradient orientations weighted by their strengths over a overlapped local neighbourhoods.
You asked what is the significance of finding each pixel gradient orientation. In an image the gradient orientation at each pixel indicates the direction of the boundary(edge between two textures) of the object at that location with respect to X and Y axis. So if you group the orientations of a patch or block or part of an object it represents the distribution of edge directions of object at that region in a very strong way or unique way... Now let us take a simple example, a circle if you plot the gradient orientations of a circle as a histogram you will get a straight line (Don't imagine HOG just a simple plot of gradient orientations) because the orientations of edges of circle ranges from 0 degrees to 360 degrees if u sampled at 360 consecutive locations, For a different object it is different, HOG also do the same thing but in a more sophisticated manner by dividing image into overlapping blocks and dividing each block into cells and making the histogram weighted by the strengths of the local gradients...
Hope it is useful ...

Resources