Integral image box filtering - image

I'm trying to compare the performance in Halide of a 2-pass, separable approach to the integral-image-based box filtering approach to gain a better understanding of Halide scheduling. I cannot find any examples of Integral Image creation in Halide where the Integral Image Function is used in the defintion of a subsequent function.
ImageParam input(type_of<uint8_t>(), 3, "image 1");
Var x("x"), y("y"), c("c"), xi("xi"), yi("yi");
Func ip("ip");
ip(x, y, c) = cast<float>(BoundaryConditions::repeat_edge(input)(x, y, c));
Param<int> radius("radius", 15, 1, 50);
RDom imageDomain(input);
RDom r(-radius, radius, -radius, radius);
// Make an integral image
Func integralImage = ip;
integralImage(x, imageDomain.y, c) += integralImage(x, imageDomain.y - 1, c);
integralImage(imageDomain.x, y, c) += integralImage(imageDomain.x - 1, y, c);
integralImage.compute_root(); // Come up with a better schedule for this
// Apply box filter to integral image
Func outputImage;
outputImage(x,y,c) = integralImage(x+radius,y+radius,c)
+ integralImage(x-radius,y-radius,c)
- integralImage(x-radius,y+radius,c)
- integralImage(x-radius,y+radius,c);
Expr normFactor = (2*radius+1) * (2*radius+1);
outputImage(x,y,c) = outputImage(x,y,c) / normFactor;
result(x,y,c) = cast<uint8_t>(outputImage(x,y,c));
result.parallel(y).vectorize(x,8)
I did find the following code in the tests:
https://github.com/halide/Halide/blob/master/test/correctness/multi_pass_reduction.cpp
But this example uses realize to compute the integral image as a buffer over a fixed domain and doesn't consume the definition of integral image as a function in the definition of a subsequent function.
When I run this code, I observe that:
The computation of the Integral Image is extremely slow (moves my pipeline to 0 fps)
I get an incorrect answer. I feel like I must be somehow misdefining my integral image
I also have a related question, how would one best schedule the computation of an integral image in this type of scenario in Halide?

My problem was in the definition of my integral image. If I change my implementation to the standard one pass definition of the integral image, I get expected behavior:
Func integralImage;
integralImage(x,y,c) = 0.0f; // Pure definition
integralImage(intImDom.x,intImDom.y,c) = ip(intImDom.x,intImDom.y,c)
+ integralImage(intImDom.x-1,intImDom.y,c)
+ integralImage(intImDom.x,intImDom.y-1,c)
- integralImage(intImDom.x-1,intImDom.y-1,c);
integralImage.compute_root();
I still have remaining questions about the most efficient algorithm/schedule in Halide for computing an integral image, but I'll re-post that as a more specific question, as my current post was kind of open-ended.
As an aside, there is a second problem in the code above in that padding of the input image is not handled correctly.

Related

How does Func.realize in Halide works?

I can understand the explanation in tutorial 6, which is:
// Func gradient("gradient");
// Var x("x"), y("y");
// gradient(x, y) = x + y;
// gradient.realize(8, 8);
//
// This does three things internally:
// 1) Generates code than can evaluate gradient over an arbitrary
// rectangle.
// 2) Allocates a new 8 x 8 image.
// 3) Runs the generated code to evaluate gradient for all x, y
// from (0, 0) to (7, 7) and puts the result into the image.
// 4) Returns the new image as the result of the realize call.
However, follow the description, I can't figure it out how such a example works:
Func histogram("hist_serial");
histogram(i) = 0;
RDom r(0, input.width(), 0, input.height());
histogram(input(r.x, r.y) / 32) += 1;
histogram.vectorize(i, 8);
histogram.realize(8);
What I am confusing is: in the "gradient" example, evaluating gradient for all x, y from (0,0) to (7,7) can give us a result, like gradient(1,1)=1+1=2. But in the second example, evaluating histogram for i from 0 to 7 looks strange to me, as I think that we are trying to calculate the result from back to front. A more natural way is to evaluate the input first, then calculate histogram.
So, how the "realize" in the second example works?
Halide automatically infers all of the values which need to be computed to produce a requested region of output. realize just asks the pipeline to compute the requested region of the output Func(s). Halide then automatically infers what regions of which earlier Funcs are required, and recursively evaluates all of the those, up to the inputs, before producing the requested region of output.

How to convert cv2.addWeighted and cv2.GaussianBlur into MATLAB?

I have this Python code:
cv2.addWeighted(src1, 4, cv2.GaussianBlur(src1, (0, 0), 10), src2, -4, 128)
How can I convert it to Matlab? So far I got this:
f = imread0('X.jpg');
g = imfilter(f, fspecial('gaussian',[size(f,1),size(f,2)],10));
alpha = 4;
beta = -4;
f1 = f*alpha+g*beta+128;
I want to subtract local mean color image.
Input image:
Blending output from OpenCV:
The documentation for cv2.addWeighted has the definition such that:
cv2.addWeighted(src1, alpha, src2, beta, gamma[, dst[, dtype]]) → dst
Also, the operations performed on the output image is such that:
(source: opencv.org)
Therefore, what your code is doing is exactly correct... at least for cv2.addWeighted. You take alpha, multiply this by the first image, then beta, multiply this by the second image, then add gamma on top of this. The only intricacy left to deal with is saturate, which means that any values that are beyond the dynamic range of the data type you are dealing with, you cap it at that much. Because there is a potential for negatives to occur in the result, the saturate option simply means to make any values that are negative 0 and any values that are greater than the maximum expected to that max. In this case, you'll want to make any values larger than 1 equal to 1. As such, it'll be a good idea to convert your image to double through im2double because you want to allow the addition and subtraction of values beyond the dynamic range to happen first, then you saturate after. By using the default image precision of the image (which is uint8), the saturation will happen even before the saturate operation occurs, and that'll give you the wrong results. Because you're doing this double conversion, you'll want to convert the addition of 128 for your gamma to 0.5 to compensate.
Now, the only slight problem is your Gaussian Blur. Looking at the documentation, by doing cv2.GaussianBlur(src1, (0, 0), 10), you are telling OpenCV to infer on the mask size while the standard deviation is 10. MATLAB does not infer the size of the mask for you, so you need to do this yourself. A common practice is to simply find six-times the standard deviation, take the floor and add 1. This is for both the horizontal and vertical dimensions of the mask. You can see my post here on the justification as to why this is common practice: By which measures should I set the size of my Gaussian filter in MATLAB?
Therefore, in MATLAB, you would do this with your Gaussian blur instead. BTW, it's simply imread, not imread0:
f = im2double(imread('http://i.stack.imgur.com/kl3Md.jpg')); %// Change - Reading image directly from StackOverflow
sigma = 10; %// Change
sz = 1 + floor(6*sigma); %// Change
g = imfilter(f, fspecial('gaussian', sz, sigma)); %// Change
%// Rest of the code is the same
alpha = 4;
beta = -4;
f1 = f*alpha+g*beta+0.5; %// Change
%// Saturate
f1(f1 > 1) = 1;
f1(f1 < 0) = 0;
I get this image:
Take a note that there is a slight difference in the way this appears between OpenCV in MATLAB... especially the hallowing around the eye. This is because OpenCV does something different when inferring the mask size for the Gaussian blur. This I'm not sure what is going on, but how I specified the mask size by looking at the standard deviation is one of the most common heuristics for it. Play around with the standard deviation until you get something you like.

High Pass Butterworth Filter on images in MATLAB

I need to implement a high pass Butterworth filter in MATLAB for the purposes of image filtering. I have implemented one but it looks like it doesn't work. Here is the code I have written. Can anyone tell me what is wrong?
n=1;
d=50;
A=1.5;
im=imread('imagex.jpg');
h=size(im,1);
w=size(im,2);
[x y]=meshgrid(-floor(w/2):floor(w-1/2),-floor(h/2):floor(h-1/2));
hhp=(1./(d./(x.^2+y.^2).^0.5).^(2*n));
image_2Dfilter=fftshift(fft2(im));
Image_butterworth=image_2Dfilter;
imshow(Image_butterworth);
ifftshow(Image_butterworth);
For one thing, there is no such command called ifftshow. Secondly, you aren't filtering anything. All you're doing is visualizing the spectrum of the image.
In terms of visualizing the spectrum, how you're doing it right now is very dangerous. For one thing, you are visualizing the coefficients at each spatial frequency component which is complex-valued in nature. If you want to visualize the spectrum in a way that makes sense to most of us, it's better to take a look at either the magnitude or phase. However, because this is a Butterworth filter, it's best to apply it to the magnitude of the filter.
You can find the magnitude of the spectrum by using the abs function. Even when you do that, if you did imshow directly on the magnitude, you will get a visualization that is zero everywhere except for the middle. This is because the DC component is so large and the rest of the spectrum is small in comparison.
Let me show you an example. This is the cameraman image that is part of the image processing toolbox:
im = imread('cameraman.tif');
figure;
imshow(im);
Now, let's visualize the spectrum and ensuring that the DC component is in the centre of the image - you already did this with fftshift. It's also a good idea to cast the image to double to ensure the best precision of data. In addition, make sure you apply abs to find the magnitude:
fftim = fftshift(fft2(double(im)));
mag = abs(fftim);
figure;
imshow(mag, []);
As you can see, it's not very useful due to the reason that I mentioned. A better way to visualize the spectrum of the image is usually to apply a log transformation to the spectrum. This is also useful if you want to de-mean or remove the mean so that the dynamic range fits better for display. In other words, you would add 1 to the magnitude, then apply a logarithm to the magnitude so that higher values can taper off. It doesn't matter which base you use, so I'll just use the natural logarithm which is encapsulated by the log command:
figure;
imshow(log(1 + mag), []);
Now that's much better. Now we'll get onto your filtering mechanism. Your Butterworth filter is slightly incorrect. The meshgrid of coordinates is slightly wrong. The -1 operation that's at the ending interval needs to go outside:
[x y]=meshgrid(-floor(w/2):floor(w/2)-1,-floor(h/2):floor(h/2)-1);
Remember, you are defining a symmetric interval about the centre of the image, and what you had originally wasn't correct. I'd also like to mention that this looks like a high-pass filter, so the output should look like an edge detection. In addition, the definition of the Butterworth high pass filter is incorrect. The correct definition of the filter in frequency domain is:
D(u,v) is the distance from the centre of the image in frequency domain, Do is the cutoff distance while B is a controlling scale factor controlling what the desired gain would be at the cutoff distance. n is the order of the filter. Do in your case is d = 50. In practice, B = sqrt(2) - 1 so that at the cutoff distance of Do, D(u,v) = 1 / sqrt(2) = 0.707, which is the 3 dB cutoff frequency mostly seen in electronics circuit filters. Sometimes you'll see B being set to 1 for simplicity, but it's common to set this to B = sqrt(2) - 1.
However, your current code isn't doing any filtering. To filter in the frequency domain, you simply multiply the spectrum of the image with the spectrum of the filter itself. This is equivalent to convolution in the spatial domain. Once you do that, you simply undo the fftshift that was performed on the image, take the inverse FFT and then eliminate any imaginary components that are due to numerical imprecision. Also, let's cast to uint8 to make sure that we respect the original image type.
That can be done like so:
%// Your code with meshgrid fix
n=1;
d=50;
h=size(im,1);
w=size(im,2);
fftim = fftshift(fft2(double(im)));
[x y]=meshgrid(-floor(w/2):floor(w/2)-1,-floor(h/2):floor(h/2)-1);
%hhp=(1./(d./(x.^2+y.^2).^0.5).^(2*n));
%%%%%%// New code
B = sqrt(2) - 1; %// Define B
D = sqrt(x.^2 + y.^2); %// Define distance to centre
hhp = 1 ./ (1 + B * ((d ./ D).^(2 * n)));
out_spec_centre = fftim .* hhp;
%// Uncentre spectrum
out_spec = ifftshift(out_spec_centre);
%// Inverse FFT, get real components, and cast
out = uint8(real(ifft2(out_spec)));
%// Show image
imshow(out);
If you want to see what the filtered spectrum looks like, just do this:
figure;
imshow(log(1 + abs(out_spec_centre)), []);
We get:
This makes sense. You see that in the middle of the spectrum, it's slightly darker in comparison to the outer edges of the spectrum. That's because with the high-pass Butterworth filter, you are amplifying the higher frequency terms and it gets visualized to be a higher intensity.
Now, out contains your filtered image, and we finally get this:
That looks like a fine result! However, naively casting the image to uint8 truncates any negative values to 0 and any positive values greater than 255 to 255. Because this is an edge detection, you want to detect both the negative and positive transitions... so a good idea would be to normalize the output so that it ranges from [0,1], and then cast with uint8 after you multiply by 255. This way, no changes in the image get visualized to gray, negative changes get visualized as dark and positive changes get visualized as white.... so you'd do something like this:
%// Your code with meshgrid fix
n=1;
d=50;
h=size(im,1);
w=size(im,2);
fftim = fftshift(fft2(double(im)));
[x y]=meshgrid(-floor(w/2):floor(w/2)-1,-floor(h/2):floor(h/2)-1);
%hhp=(1./(d./(x.^2+y.^2).^0.5).^(2*n));
%%%%%%// New code
B = sqrt(2) - 1; %// Define B
D = sqrt(x.^2 + y.^2); %// Define distance to centre
hhp = 1 ./ (1 + B * ((d ./ D).^(2 * n)));
out_spec_centre = fftim .* hhp;
%// Uncentre spectrum
out_spec = ifftshift(out_spec_centre);
%// Inverse FFT, get real components
out = real(ifft2(out_spec));
%// Normalize and cast
out = (out - min(out(:))) / (max(out(:)) - min(out(:)));
out = uint8(255*out);
%// Show image
imshow(out);
We get this:
I think that you should work a little bit diferent
n=1;
D0=50; % change the name for d0, d is usuaally the (u²+v²)⁽1/2)
A=1.5; % normally the amplitude is 1
im=imread('cameraman.jpg');
[M,N]=size(im); % is easy to get the h and w like this
% compute the 2d fourier transform in order to multiply
F=fft2(double(im));
% compute your filter and do the meshgrid for your matrix but it is M*n, and get only the real part
u=0:(M-1);
v=0:(N-1);
idx=find(u>M/2);
u(idx)=u(idx)-M;
idy=find(v>N/2);
v(idy)=v(idy)-N;
[V,U]=meshgrid(v,u);
D=sqrt(U.^2+V.^2);
H =A * (1./(1 + (D0./D).^(2*n)));
% multiply element by element
G=H.*F;
g=real(ifft2(double(G)));
subplot(1,2,1); imshow(im); title('Input image');
subplot(1,2,2); imshow(g,[ ]); title('filtered image');

Matlab function gradient for fminunc

f = #(w) sum(log(1 + exp(-t .* (phis * w'))))/size(phis, 1) + coef * w*w';
options = optimset('Display', 'notify', 'MaxFunEvals', 2e+6, 'MaxIter', 2e+6);
w = fminunc(f, ones(1, size(phis, 2)), options);
phis size is NxN+1
t size is Nx1
coef is const
I'm trying to minimize function f, firstly I was using fminsearch but it works long time, that's why now I use fminunc, but there is one problem: I need function gradient for acceleration. Can you help me please construct gradient for function f, coz I always get this warning:
Warning: Gradient must be provided for trust-region algorithm;
using line-search algorithm instead.
What you are trying to do is called logistic regression, with a L2-regularization. There are far better ways to solve this problem than a call to a Matlab function, since the log-likelihood function is concave.
You should ask your question in the statistical website, or have a look at my former question there.

algorithm for scaling an image from a given pivot point

standard scaling using the center of an image as the pivot point and is uniform in all dimensions. I'd like to figure out a way to scale an image from an arbitrary pivot point such that points closer to the pivot point scale less than points away from that point.
Well, I don't know what framework/library you're using but you can think of it as:
translation to make your pivot point the center point
standard scaling
opposite transalation to make the center point the original pivot point
Translation and scaling are isomorphismes so you can represent them as matrix. Each transformation is a matrix and you can multiply them for find the combined transformation matrix. So:
T = transformation
S = scalling
T' = opposite transformation
If you apply T.x being x a point vector it gives you the new coordinates. The same for S.x.
So if you want to do that operations you have to do: T'. (S. (T.x))
I think you can associate operations so it's the same as (T'.S.T).x
If you are using a framework apply three operations (or combine operations and apply).
If you are using crude math... go the matrix way :)
PS: If you are doing this by hand. I know that if you are scaling you will want to find the coordinates of the original point given a transformed point. So you can iterate over the resulting points (each of the pixels) and see what coordinates (or point in between) from the original image you have to use. In that case what you need is the inverse matrix. So instead of using S you want to use S^(-1). If you know that you want to apply T'.S.T you can find this resulting matrix and next find (T'.S.T)^(-1). Then you have your inverse matrix to find original points given the resulting points.
This is an oversimplification, but should help you get started. For one, since standard resampling is uniform, there isn't really a concept of a pivot-point. If anything, they usually just start from a corner, as it's easier to run the for loops that way.
Generally the algorithm is something like this pseudo-code
function resample (srcImg, dstSize) {
dstImg = makeImage(dstSize)
for (r = 0; r < dstSize.height; ++r) {
for (c = 0; r < dstSize.width; ++c) {
// getResampleLoc returns float coordinate
resampleLoc = getResampleLoc(c, r, dstImg.size, srcImg.size)
color = getColor(srcImg, resampleLoc)
dstImg.setColor(c, r, color)
}
}
return dstImage
}
For uniform resampling, getResampleLoc is just a simple scale of x and y from the dstImg size to the srcImg size. It returns float coordinates, which are passed to getColor. The implementation of getColor is what determines the various resampling algorithms. Basically, it blends the pixels surrounding the coordinate in some ratio. In reality, there are optimizations that can be done to make information generated inside of getColor shared between calls, but don't worry about that.
For you, you would need something like:
function resample (srcImg, dstSize, pivotPt) {
dstImg = makeImage(dstSize)
for (r = 0; r < dstSize.height; ++r) {
for (c = 0; r < dstSize.width; ++c) {
// getResampleLoc returns float coordinate
resampleLoc = getResampleLoc(c, r, dstImg.size, srcImg.size, pivotPt)
color = getColor(srcImg, resampleLoc)
dstImg.setColor(c, r, color)
}
}
return dstImage
}
And then you just need to implement getResampleLoc to take pivotPt into account. Probably the simplest thing is to log-scale the distance to the edge.

Resources