when resize my photo, what should i use function among floor, round, ceil in Matlab? - image

when resize my photo, what should i use function among floor, round, ceil in Matlab?
myimg (256, 256)
when scalefactor is 0.8
256 * 0.8 = 204.8
and then, scaled size of myimg (204.8 , 204.8)
in this case, ceil(204.8) or floor(204.8) or round(204.8)
what should I do?

As the previous commenter outlined it depends on what you need and the use case. Just for anyone looking for functional clarity:
ceil() : Returns the closest integer greater than the input value.
round(): Rounds the input to the closest integer (decimals greater than 0.5 round-up).
floor() : Returns the closest integer lesser than the input value.
Example:
ceil(204.8) → 205
round(204.8) → 205 and round(204.2) → 204
floor(204.8) → 204
Extension:
In this case, if your criteria required an image of at least 80% of the original I would use ceil(). If you require an image size of less than 80% of the original then floor() would best suit. In any other case where the scenario is flexible round() is a good option which will give the closest image resizing down to 80%.

Related

Average set of color images and standard deviation

I am learning image analysis and trying to average set of color images and get standard deviation at each pixel
I have done this, but it is not by averaging RGB channels. (for ex rchannel = I(:,:,1))
filelist = dir('dir1/*.jpg');
ims = zeros(215, 300, 3);
for i=1:length(filelist)
imname = ['dir1/' filelist(i).name];
rgbim = im2double(imread(imname));
ims = ims + rgbim;
end
avgset1 = ims/length(filelist);
figure;
imshow(avgset1);
I am not sure if this is correct. I am confused as to how averaging images is useful.
Also, I couldn't get the matrix holding standard deviation.
Any help is appreciated.
If you are concerned about finding the mean RGB image, then your code is correct. What I like is that you converted the images using im2double before accumulating the mean and so you are making everything double precision. As what Parag said, finding the mean image is very useful especially in machine learning. It is common to find the mean image of a set of images before doing image classification as it allows the dynamic range of each pixel to be within a normalized range. This allows the training of the learning algorithm to converge quickly to the optimum solution and provide the best set of parameters to facilitate the best accuracy in classification.
If you want to find the mean RGB colour which is the average colour over all images, then no your code is not correct.
You have summed over all channels individually which is stored in sumrgbims, so the last step you need to do now take this image and sum over each channel individually. Two calls to sum in the first and second dimensions chained together will help. This will produce a 1 x 1 x 3 vector, so using squeeze after this to remove the singleton dimensions and get a 3 x 1 vector representing the mean RGB colour over all images is what you get.
Therefore:
mean_colour = squeeze(sum(sum(sumrgbims, 1), 2));
To address your second question, I'm assuming you want to find the standard deviation of each pixel value over all images. What you will have to do is accumulate the square of each image in addition to accumulating each image inside the loop. After that, you know that the standard deviation is the square root of the variance, and the variance is equal to the average sum of squares subtracted by the mean squared. We have the mean image, now you just have to square the mean image and subtract this with the average sum of squares. Just to be sure our math is right, supposing we have a signal X with a mean mu. Given that we have N values in our signal, the variance is thus equal to:
Source: Science Buddies
The standard deviation would simply be the square root of the above result. We would thus calculate this for each pixel independently. Therefore you can modify your loop to do that for you:
filelist = dir('set1/*.jpg');
sumrgbims = zeros(215, 300, 3);
sum2rgbims = sumrgbims; % New - for standard deviation
for i=1:length(filelist)
imname = ['set1/' filelist(i).name];
rgbim = im2double(imread(imname));
sumrgbims = sumrgbims + rgbim;
sum2rgbims = sum2rgbims + rgbim.^2; % New
end
rgbavgset1 = sumrgbims/length(filelist);
% New - find standard deviation
rgbstdset1 = ((sum2rgbims / length(filelist)) - rgbavgset.^2).^(0.5);
figure;
imshow(rgbavgset1, []);
% New - display standard deviation image
figure;
imshow(rgbstdset1, []);
Also to make sure, I've scaled the display of each imshow call so the smallest value gets mapped to 0 and the largest value gets mapped to 1. This does not change the actual contents of the images. This is just for display purposes.

How to convert cv2.addWeighted and cv2.GaussianBlur into MATLAB?

I have this Python code:
cv2.addWeighted(src1, 4, cv2.GaussianBlur(src1, (0, 0), 10), src2, -4, 128)
How can I convert it to Matlab? So far I got this:
f = imread0('X.jpg');
g = imfilter(f, fspecial('gaussian',[size(f,1),size(f,2)],10));
alpha = 4;
beta = -4;
f1 = f*alpha+g*beta+128;
I want to subtract local mean color image.
Input image:
Blending output from OpenCV:
The documentation for cv2.addWeighted has the definition such that:
cv2.addWeighted(src1, alpha, src2, beta, gamma[, dst[, dtype]]) → dst
Also, the operations performed on the output image is such that:
(source: opencv.org)
Therefore, what your code is doing is exactly correct... at least for cv2.addWeighted. You take alpha, multiply this by the first image, then beta, multiply this by the second image, then add gamma on top of this. The only intricacy left to deal with is saturate, which means that any values that are beyond the dynamic range of the data type you are dealing with, you cap it at that much. Because there is a potential for negatives to occur in the result, the saturate option simply means to make any values that are negative 0 and any values that are greater than the maximum expected to that max. In this case, you'll want to make any values larger than 1 equal to 1. As such, it'll be a good idea to convert your image to double through im2double because you want to allow the addition and subtraction of values beyond the dynamic range to happen first, then you saturate after. By using the default image precision of the image (which is uint8), the saturation will happen even before the saturate operation occurs, and that'll give you the wrong results. Because you're doing this double conversion, you'll want to convert the addition of 128 for your gamma to 0.5 to compensate.
Now, the only slight problem is your Gaussian Blur. Looking at the documentation, by doing cv2.GaussianBlur(src1, (0, 0), 10), you are telling OpenCV to infer on the mask size while the standard deviation is 10. MATLAB does not infer the size of the mask for you, so you need to do this yourself. A common practice is to simply find six-times the standard deviation, take the floor and add 1. This is for both the horizontal and vertical dimensions of the mask. You can see my post here on the justification as to why this is common practice: By which measures should I set the size of my Gaussian filter in MATLAB?
Therefore, in MATLAB, you would do this with your Gaussian blur instead. BTW, it's simply imread, not imread0:
f = im2double(imread('http://i.stack.imgur.com/kl3Md.jpg')); %// Change - Reading image directly from StackOverflow
sigma = 10; %// Change
sz = 1 + floor(6*sigma); %// Change
g = imfilter(f, fspecial('gaussian', sz, sigma)); %// Change
%// Rest of the code is the same
alpha = 4;
beta = -4;
f1 = f*alpha+g*beta+0.5; %// Change
%// Saturate
f1(f1 > 1) = 1;
f1(f1 < 0) = 0;
I get this image:
Take a note that there is a slight difference in the way this appears between OpenCV in MATLAB... especially the hallowing around the eye. This is because OpenCV does something different when inferring the mask size for the Gaussian blur. This I'm not sure what is going on, but how I specified the mask size by looking at the standard deviation is one of the most common heuristics for it. Play around with the standard deviation until you get something you like.

image fast feature extraction

I've got an image in matlab defined on a real scale (0,1) and a mask defined on a integer scale.
Example
mask = [ 1 1 1 3 4 ;
1 1 1 2 4 ;
1 1 2 2 2 ]
img = [ 0.1 0.1 0.2 0.2 0.3 ;
0.1 0.1 0.2 0.3 0.3 ;
0.1 0.1 0.3 0.3 0.3 ]
and for each region in the mask (i.e. 1,2,3,4) I want to compute a certain feature (say, the mean) on the corresponding image's intensities.
The algorithm I've used is
for i = labels
region = img(mask==i);
feature(i) = mean(region);
end
Now, this algorithm is pretty slow for images of size 300x400x500, and a cardinality of the labels set > 40000 (which, btw, is exactly my case).
Any suggestion on how to speed up my code?
The regionprops function in the Image Processing Toolbox should do it for you. For example, use this syntax:
stats = regionprops(L,I, 'MeanIntensity');
to get the mean value for every region. L is the array containing the labels. I is your image.
To get the mean elegantly (though it may not be the absolute fastest approach), you can use accumarray:
meanFeatures = accumarray(mask(:),image(:),[],#mean);
If you want e.g. mean and std, you use
meanAndStd = cell2mat(accumarray(mask(:),image(:),[],#(x){mean(x), std(x)}));
bsxfun and fast matrix-multiplication based approach to get the mean values -
num_labels = max(mask(:)); %// number of labels used
labels = 1:num_labels; %// array of all labels
%// Get a 2D mask for all labels, with each column representing a label.
%// This is a setup for use with matix-multiplication later on
mask_labels = bsxfun(#eq,mask(:),labels);
%// Perform fast matrix multiplication as a way to perform summation for
%// all labels and then do elementwise division to get the mean values
feature_out = (img(:)'*mask_labels)./sum(mask_labels,1);
I think we need to know more about:
How you compute the mask?
What are the features that you want to compute on these regions? (Mean only?)
What do you mean by slow?
If the formula that assigns values to each element of the mask can be vectorized, then so can the access to your volume.
In the general case though, I would say that this would be easy to do using Mex files instead.
If you want to go down that way, I could recommend using my tiny library to ease the transition to Mex, but you'll need to compile using C++11 (-std=c++0x flag).
If you want to stay in Matlab, and that you only want to compute averages on these regions, then accumarray is the function you are looking for.
To the best of my knowledge, this is not the kind of problem Matlab is very good at. I would try the naive - non Matlab approach:
label_sum = zeros(1,num_labels);
label_size = zeros(1,num_labels);
for z=1:500,
for x=1:400,
for y=1:300,
label = labels(y, x, z);
label_sum(label) += img(y, x, z);
label_size(label) += 1;
end
end
end
label_sum(label_size > 0) ./ label_size(label_size > 0)
it is not ideal but will probably take way less than 10 minutes...
You may use arrayfun and avoid using the for loop. Using your labels vector of values to be iterated, it goes as:
features=arrayfun(#(x) mean(img(mask==x)), labels);
The syntax #(x) will iterate upon each element of labels, replacing in this way the for loop. You may also use any function instead of mean, either it is a built-in function or a function that you have created. You should check its speed, but I think this is the fastest way.

d3.js using d3.scale.sqrt()

What does the d3.scale.sqrt() scale do? As per the documentation it is similar to d3.scale.pow().exponent(.5), so the returned scale is equivalent to the sqrt function for numbers; for example:
sqrt(0.25) returns 0.5.
so when we apply a domain similar to this:
d3.scale.sqrt().domain([1, 100]).range([10, 39])
does it signify it takes the value between 1-100 and return the sqrt function which ranges between 10-39? Could anybody clarify and provide more details on how this scale works?
The way scales work in D3 is that they map input values (defined by .domain()) to output values (defined by .range()). So
d3.scale.sqrt().domain([1, 100]).range([10, 39])
maps values from 1 to 100 to the 10 to 39 range. That is, 1 corresponds to 10 and 100 to 39. This has nothing to do with the transformation the scale applies, which only affects the distribution of values within the range. For the sqrt function, the growth is sub-linear, which means that more of the input values will fall into the latter part of the output range.

Confusion with FFT algorithm

I am trying to understand the FFT algorithm and so far I think that I understand the main concept behind it. However I am confused as to the difference between 'framesize' and 'window'.
Based on my understanding, it seems that they are redundant with each other? For example, I present as input a block of samples with a framesize of 1024. So I have byte[1024] presented as input.
What then is the purpose of the windowing function? Since initially, I thought the purpose of the windowing function is to select the block of samples from the original data.
Thanks!
What then is the purpose of the windowing function?
It's to deal with so-called "spectral leakage": the FFT assumes an infinite series that repeats the given sample frame over and over again. If you have a sine wave that is an integral number of cycles within the sample frame, then all is good, and the FFT gives you a nice narrow peak at the proper frequency. But if you have a sine wave that is not an integral number of cycles, there's a discontinuity between the last and first sample, and the FFT gives you false harmonics.
Windowing functions lower the amplitudes at the beginning and the end of the sample frame, to reduce the harmonics caused by this discontinuity.
some diagrams from a National Instruments webpage on windowing:
integral # of cycles:
non-integer # of cycles:
for additional information:
http://www.tmworld.com/article/322450-Windowing_Functions_Improve_FFT_Results_Part_I.php
http://zone.ni.com/reference/en-XX/help/371361B-01/lvanlsconcepts/char_smoothing_windows/
http://www.physik.uni-wuerzburg.de/~praktiku/Anleitung/Fremde/ANO14.pdf
A rectangular window of length M has frequency response of sin(ω*M/2)/sin(ω/2), which is zero when ω = 2*π*k/M, for k ≠ 0. For a DFT of length N, where ω = 2*π*n/N, there are nulls at n = k * N/M. The ratio N/M isn't necessarily an integer. For example, if N = 40, and M = 32, then there are nulls at multiples of 1.25, but only the integer multiples will appear in the DFT, which is bins 5, 10, 15, and 20 in this case.
Here's a plot of the 1024-point DFT of a 32-point rectangular window:
M = 32
N = 1024
w = ones(M)
W = rfft(w, N)
K = N/M
nulls = abs(W[K::K])
plot(abs(W))
plot(r_[K:N/2+1:K], nulls, 'ro')
xticks(r_[:512:64])
grid(); axis('tight')
Note the nulls at every N/M = 32 bins. If N=M (i.e. the window length equals the DFT length), then there are nulls at all bins except at n = 0.
When you multiply a window by a signal, the corresponding operation in the frequency domain is the circular convolution of the window's spectrum with the signal's spectrum. For example, the DTFT of a sinusoid is a weighted delta function (i.e. an impulse with infinite height, infinitesimal extension, and finite area) located at the positive and negative frequency of the sinusoid. Convolving a spectrum with a delta function just shifts it to the location of the delta and scales it by the delta's weight. Therefore when you multiply a window by a sinusoid in the sample domain, the window's frequency response is scaled and shifted to the frequency of the sinusoid.
There are a couple of scenarios to examine regarding the length of a rectangular window. First let's look at the case where the window length is an integer multiple of the sinusoid's period, e.g. a 32-sample rectangular window of a cosine with a period of 32/8 = 4 samples:
x1 = cos(2*pi*8*r_[:32]/32) # ω0 = 8π/16, bin 8/32 * 1024 = 256
X1 = rfft(x1 * w, 1024)
plot(abs(X1))
xticks(r_[:513:64])
grid(); axis('tight')
As before, there are nulls at multiples of N/M = 32. But the window's spectrum has been shifted to bin 256 of the sinusoid and scaled by its magnitude, which is 0.5 split between the positive frequency and the negative frequency (I'm only plotting positive frequencies). If the DFT length had been 32, the nulls would line up at every bin, prompting the appearance that there's no leakage. But that misleading appearance is only a function of the DFT length. If you pad the windowed signal with zeros (as above), you'll get to see the sinc-like response at frequencies between the nulls.
Now let's look at a case where the window length is not an integer multiple of the sinusoid's period, e.g. a cosine with an angular frequency of 7.5π/16 (the period is 64 samples):
x2 = cos(2*pi*15*r_[:32]/64) # ω0 = 7.5π/16, bin 15/64 * 1024 = 240
X2 = rfft(x2 * w, 1024)
plot(abs(X2))
xticks(r_[-16:513:64])
grid(); axis('tight')
The center bin location is no longer at an integer multiple of 32, but shifted by a half down to bin 240. So let's see what the corresponding 32-point DFT would look like (inferring a 32-point rectangular window). I'll compute and plot the 32-point DFT of x2[n] and also superimpose a 32x decimated copy of the 1024-point DFT:
X2_32 = rfft(x2, 32)
X2_sample = X2[::32]
stem(r_[:17],abs(X2_32))
plot(abs(X2_sample), 'rs') # red squares
grid(); axis([0,16,0,11])
As you can see in the previous plot, the nulls are no longer aligned at multiples of 32, so the magnitude of the 32-point DFT is non-zero at each bin. In the 32 point DFT, the window's nulls are still spaced every N/M = 32/32 = 1 bin, but since ω0 = 7.5π/16, the center is at 'bin' 7.5, which puts the nulls at 0.5, 1.5, etc, so they're not present in the 32-point DFT.
The general message is that spectral leakage of a windowed signal is always present but can be masked in the DFT if the signal specrtum, window length, and DFT length come together in just the right way to line up the nulls. Beyond that you should just ignore these DFT artifacts and concentrate on the DTFT of your signal (i.e. pad with zeros to sample the DTFT at higher resolution so you can clearly examine the leakage).
Spectral leakage caused by convolving with a window's spectrum will always be there, which is why the art of crafting particularly shaped windows is so important. The spectrum of each window type has been tailored for a specific task, such as dynamic range or sensitivity.
Here's an example comparing the output of a rectangular window vs a Hamming window:
from pylab import *
import wave
fs = 44100
M = 4096
N = 16384
# load a sample of guitar playing an open string 6
# with a fundamental frequency of 82.4 Hz
g = fromstring(wave.open('dist_gtr_6.wav').readframes(-1),
dtype='int16')
L = len(g)/4
g_t = g[L:L+M]
g_t = g_t / float64(max(abs(g_t)))
# compute the response with rectangular vs Hamming window
g_rect = rfft(g_t, N)
g_hamm = rfft(g_t * hamming(M), N)
def make_plot():
fmax = int(82.4 * 4.5 / fs * N) # 4 harmonics
subplot(211); title('Rectangular Window')
plot(abs(g_rect[:fmax])); grid(); axis('tight')
subplot(212); title('Hamming Window')
plot(abs(g_hamm[:fmax])); grid(); axis('tight')
if __name__ == "__main__":
make_plot()
If you don't modify the sample values, and select the same length of data as the FFT length, this is equivalent to using a rectangular window, in which case the frame and the window are identical. However multiplying your input data by a rectangular window in the time domain is the same as convolving the input signal's spectrum with a Sinc function in the frequency domain, which will spread any spectral peaks for frequencies which are not exactly periodic in the FFT aperture across the entire spectrum.
Non-rectangular windows are often used so the the resulting FFT spectrum is convolved with something a bit more "focused" than a Sinc function.
You can also use a rectangular window that is a different size than the FFT length or aperture. In the case of a shorter data window, the FFT frame can be zero padded, which can result in an smoother looking interpolated FFT result spectrum. You can even use a rectangular window that is longer that the length of the FFT by wrapping data around the FFT aperture in a summed circular manner for some interesting effects with the frequency resolution.
ADDED due to a request:
Multiplying by a window in the time domain produces the same result as convolving with the transform of that window in the frequency domain.
In general, a narrower time domain window with produce a wider looking frequency domain transform. This is the reason that zero-padding produces a smoother frequency plot. The narrower time domain window produces a wider Sinc with fatter and smoother curves in relation to the frame width than would a window the full width of the FFT frame, thus making the interpolated frequency results look smoother than an non-zero padded FFT of the same frame length.
The converse is also true to some extent. A wider rectangular window will produce a narrower Sinc, with the nulls closer to the peak. Thus you might be able to use a carefully chosen wider window to produce a narrower looking Sinc to null a frequency closer to a bin of interest than 1 frequency bin away. How do you use a wider window? Wrap the data around and sum, which is identical to using FT basis vectors that are not truncated to 1 FFT frame in length. However, since when doing this the FFT result vector is shorter than the data, this is a lossy process which will introduce artifacts, and introduce some new novel aliasing. But it will give you a sharper frequency selection peak at each bin, and notch filters that can be placed less than 1 bin away, say halfway between bins, etc.

Resources