How to speed this kind of for-loop? - performance

I would like to compute the maximum of translated images along the direction of a given axis. I know about ordfilt2, however I would like to avoid using the Image Processing Toolbox.
So here is the code I have so far:
imInput = imread('tire.tif');
n = 10;
imMax = imInput(:, n:end);
for i = 1:(n-1)
imMax = max(imMax, imInput(:, i:end-(n-i)));
end
Is it possible to avoid using a for-loop in order to speed the computation up, and, if so, how?
First edit: Using Octave's code for im2col is actually 50% slower.
Second edit: Pre-allocating did not appear to improve the result enough.
sz = [size(imInput,1), size(imInput,2)-n+1];
range_j = 1:size(imInput, 2)-sz(2)+1;
range_i = 1:size(imInput, 1)-sz(1)+1;
B = zeros(prod(sz), length(range_j)*length(range_i));
counter = 0;
for j = range_j % left to right
for i = range_i % up to bottom
counter = counter + 1;
v = imInput(i:i+sz(1)-1, j:j+sz(2)-1);
B(:, counter) = v(:);
end
end
imMax = reshape(max(B, [], 2), sz);
Third edit: I shall show the timings.

For what it's worth, here's a vectorized solution using IM2COL function from the Image Processing Toolbox:
imInput = imread('tire.tif');
n = 10;
sz = [size(imInput,1) size(imInput,2)-n+1];
imMax = reshape(max(im2col(imInput, sz, 'sliding'),[],2), sz);
imshow(imMax)
You could perhaps write your own version of IM2COL as it simply consists of well crafted indexing, or even look at how Octave implements it.

Check out the answer to this question about doing a rolling median in c. I've successfully made it into a mex function and it is way faster than even ordfilt2. It will take some work to do a max, but I'm sure it's possible.
Rolling median in C - Turlach implementation

Related

Is heap sort supposed to be very slow on MATLAB?

I wrote a heap sort function in MATLAB and it works fine, except that when the length of input is greater or equal to 1000, it can take a long time (e.g. the length of 1000 takes half a second). I'm not sure if it's that MATLAB doesn't run very fast on heap sort algorithm or it's just my code needs to be improved.
My code is shown below:
function b = heapsort(a)
[~,n] = size(a);
b = zeros(1,n);
for i = 1:n
a = build_max_heap(a);
b(n+1-i) = a(1);
temp = a(1);
a(1) = a(n+1-i);
a(n+1-i) = temp;
a(n+1-i) = [];
a = heapify(a,1);
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function a = build_max_heap(a)
[~,n] = size(a);
m = floor(n/2);
for i = m:-1:1
a = heapify(a,i);
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function a = heapify(a,i)
[~,n] = size(a);
left = 2*i;
right = 2*i + 1;
if left <= n
if a(left) >= a(i)
large = left;
else
large = i;
end
else
return
end
if right <= n
if a(right) >= a(large)
large = right;
end
end
if large ~= i
temp = a(large);
a(large) = a(i);
a(i) = temp;
a = heapify(a,large);
end
end
I'm aware that maybe it's the code a(n+1-i) = []; that may consume a lot of time. But when I changed the [] into -999 (lower than any number of the input vector), it doesn't help but took even more time.
You should use the profiler to check which lines that takes the most time. It's definitely a(n+1-i) = []; that's slowing down your function.
Resizing arrays in loops is very slow, so you should always try to avoid it.
A simple test:
Create a function that takes a large vector as input, and iteratively removes elements until it's empty.
Create a function that takes the same vector as input and iteratively sets each value to 0, Inf, NaN or something else.
Use timeit to check which function is faster. You'll see that the last function is approximately 100 times faster (depending on the size of the vector of course).
The reason why -999 takes more time is most likely because a no longer gets smaller and smaller, thus a = heapify(a,1); won't get faster and faster. I haven't tested it, but if you try the following in your first function you'll probably get a much faster program (you must insert the n+1-i) other places in your code as well, but I'll leave that to you):
a(n+1-ii) = NaN;
a(1:n+1-ii) = heapify(a(1:n+1-ii),1);
Note that I changed i to ii. That's partially because I want to give you a good advice, and partially to avoid being reminded to not use i and j as variables in MATLAB.

How to multiply tensors in MATLAB without looping?

Suppose I have:
A = rand(1,10,3);
B = rand(10,16);
And I want to get:
C(:,1) = A(:,:,1)*B;
C(:,2) = A(:,:,2)*B;
C(:,3) = A(:,:,3)*B;
Can I somehow multiply this in a single line so that it is faster?
What if I create new tensor b like this
for i = 1:3
b(:,:,i) = B;
end
Can I multiply A and b to get the same C but faster? Time taken in creation of b by the loop above doesn't matter since I will be needing C for many different A-s while B stays the same.
Permute the dimensions of A and B and then apply matrix multiplication:
C = B.'*permute(A, [2 3 1]);
If A is a true 3D array, something like A = rand(4,10,3) and assuming that B stays as a 2D array, then each A(:,:,1)*B would yield a 2D array.
So, assuming that you want to store those 2D arrays as slices in the third dimension of output array, C like so -
C(:,:,1) = A(:,:,1)*B;
C(:,:,2) = A(:,:,2)*B;
C(:,:,3) = A(:,:,3)*B; and so on.
To solve this in a vectorized manner, one of the approaches would be to use reshape A into a 2D array merging the first and third dimensions and then performing matrix-muliplication. Finally, to bring the output size same as the earlier listed C, we need a final step of reshaping.
The implementation would look something like this -
%// Get size and then the final output C
[m,n,r] = size(A);
out = permute(reshape(reshape(permute(A,[1 3 2]),[],n)*B,m,r,[]),[1 3 2]);
Sample run -
>> A = rand(4,10,3);
B = rand(10,16);
C(:,:,1) = A(:,:,1)*B;
C(:,:,2) = A(:,:,2)*B;
C(:,:,3) = A(:,:,3)*B;
>> [m,n,r] = size(A);
out = permute(reshape(reshape(permute(A,[1 3 2]),[],n)*B,m,r,[]),[1 3 2]);
>> all(C(:)==out(:)) %// Verify results
ans =
1
As per the comments, if A is a 3D array with always a singleton dimension at the start, you can just use squeeze and then matrix-multiplication like so -
C = B.'*squeeze(A)
EDIT: #LuisMendo points out that this is indeed possible for this specific use case. However, it is not (in general) possible if the first dimension of A is not 1.
I've grappled with this for a while now, and I've never been able to come up with a solution. Performing element-wise calculations is made nice by bsxfun, but tensor multiplication is something which is woefully unsupported. Sorry, and good luck!
You can check out this mathworks file exchange file, which will make it easier for you and supports the behavior you're looking for, but I believe that it relies on loops as well. Edit: it relies on MEX/C++, so it isn't a pure MATLAB solution if that's what you're looking for.
I have to agree with #GJSein, the for loop is really fast
time
0.7050 0.3145
Here's the timer function
function time
n = 1E7;
A = rand(1,n,3);
B = rand(n,16);
t = [];
C = {};
tic
C{length(C)+1} = squeeze(cell2mat(cellfun(#(x) x*B,num2cell(A,[1 2]),'UniformOutput',false)));
t(length(t)+1) = toc;
tic
for i = 1:size(A,3)
C{length(C)+1}(:,i) = A(:,:,i)*B;
end
t(length(t)+1) = toc;
disp(t)
end

Matlab vectorization of for loops

Is there any way to vectorize such a for loop in MATLAB? It's taking a lot of time to execute.
for i = 1:numberOfFrames-1
frameDifferencesEroded(:,:,i+1) = imabsdiff(frameDifferencesErodedTemp(:,:,i+1),frameDifferencesErodedTemp(:,:,1));
for k=1:numel(frameDifferences(1,:,i))
for m=1:numel(frameDifferences(:,1,i))
if(frameDifferencesEroded(m,k,i+1)>thresold)
frameDifferences(m,k,i+1) = 255;
else
frameDifferences(m,k,i+1) = 0;
end
end
end
end
Assuming you want frameDifferencesEroded(:,:,1) and frameDifferences(:,:,1) to be all zeros, as you are not inputting values into those with your code, this might work for you -
%// Replace imabsdiff with abs(bsxfun(#minus..)), which might be faster
frameDifferencesEroded = abs(bsxfun(#minus,frameDifferencesErodedTemp, frameDifferencesErodedTemp(:,:,1)))
%// Get the thresholding done next
frameDifferences = (frameDifferencesEroded>thresold).*255
You could try somehting like this:
[M, N, P] = size(frameDifferences);
for i = 2:P
frameDifferencesEroded(:,:,i) = imabsdiff(frameDifferencesErodedTemp(:,:,i),frameDifferencesErodedTemp(:,:,1));
frameDifferences(:, :, i) = (frameDifferencesEroded(:, :, i) > thresold) .* 255;
end
Do you need to keep frameDifferencesEroded? If not you can make it a temporary 2-D matrix inside this loop.
But try to rearrange your data by swapping the 1st and 3rd dimension: m(i,:,:) are stored in memory consecutively, whereas m(:,:,1) are not which might make it slower.

How can I avoid if else statements within a for loop?

I have a code that yields a solution similar to the desired output, and I don't know how to perfect this.
The code is as follows.
N = 4; % sampling period
for nB = -30:-1;
if rem(nB,N)==0
xnB(abs(nB)) = -(cos(.1*pi*nB)-(4*sin(.2*pi*nB)));
else
xnB(abs(nB)) = 0;
end
end
for nC = 1:30;
if rem(nC,N)==0
xnC(nC) = cos(.1*pi*nC)-(4*sin(.2*pi*nC));
else
xnC(nC) = 0;
end
end
nB = -30:-1;
nC = 1:30;
nD = 0;
xnD = 0;
plot(nA,xnA,nB,xnB,'r--o',nC,xnC,'r--o',nD,xnD,'r--o')
This produces something that is close, but not close enough for proper data recovery.
I have tried using an index that has the same length but simply starts at 1 but the output was even worse than this, though if that is a viable option please explain thoroughly, how it should be done.
I have tried running this in a single for-loop with one if-statement but there is a problem when the counter passes zero. What is a way around this that would allow me to avoid using two for-loops? (I'm fairly confident that, solving this issue would increase the accuracy of my output enough to successfully recover the signal.)
EDIT/CLARIFICATION/ADD - 1
I do in fact want to evaluate the signal at the index of zero. The if-statement cannot handle an index of zero which is an index that I'd prefer not to skip.
The goal of this code is to be able to sample a signal, and then I will build a code that will put it through a recovery filter.
EDIT/UPDATE - 2
nA = -30:.1:30; % n values for original function
xnA = cos(.1*pi*nA)-(4*sin(.2*pi*nA)); % original function
N = 4; % sampling period
n = -30:30;
xn = zeros(size(n));
xn(rem(n,N)==0) = -(cos(.1*pi*n)-(4*sin(.2*pi*n)));
plot(nA,xnA,n,xn,'r--o')
title('Original seq. x and Sampled seq. xp')
xlabel('n')
ylabel('x(n) and xp(n)')
legend('original','sampled');
This threw an error at the line xn(rem(n,N)==0) = -(cos(.1*pi*n)-(4*sin(.2*pi*n))); which read: In an assignment A(I) = B, the number of elements in B and I must be the same. I have ran into this error before, but my previous encounters were usually the result of faulty looping. Could someone point out why it isn't working this time?
EDIT/Clarification - 3
N = 4; % sampling period
for nB = -30:30;
if rem(nB,N)==0
xnB(abs(nB)) = -(cos(.1*pi*nB)-(4*sin(.2*pi*nB)));
else
xnB(abs(nB)) = 0;
end
end
The error message resulting is as follows: Attempted to access xnB(0); index must be a positive integer or logical.
EDIT/SUCCESS - 4
After taking another look at the answers posted, I realized that the negative sign in front of the cos function wasn't supposed to be in the original coding.
You could do something like the following:
nB = -30:1
nC = 1:30
xnB = zeros(size(nB));
remB = rem(nB,N)==0;
xnB(remB) = -cos(.1*pi*nB(remB))-(4*sin(.2*pi*nB(remB));
xnC = zeros(size(nC));
remC = rem(nC,N)==0;
xnC(remC) = cos(.1*pi*nC(remC))-(4*sin(.2*pi*nC(remC)));
This avoids the issue of having for-loops entirely. However, this would produce the exact same output as you had before, so I'm not sure that it would fix your initial problem...
EDIT for your most recent addition:
nB = -30:30;
xnB = zeros(size(nB));
remB = rem(nB,N)==0;
xnB(remB) = -(cos(.1*pi*nB(remB))-(4*sin(.2*pi*nB(remB)));
In your original post you had the sign dependent on the sign of nB - if you wanted to maintain this functionality, you would do the following:
xnB(remB) = sign(nB(remB).*(cos(.1*pi*nB(remB))-(4*sin(.2*pi*nB(remB)));
From what I understand, you want to iterate over all integer values in [-30, 30] excluding 0 using a single for loop. this can be easily done as:
for ii = [-30:-1,1:30]
%Your code
end
Resolution for edit - 2
As per your updated code, try replacing
xn(rem(n,N)==0) = -(cos(.1*pi*n)-(4*sin(.2*pi*n)));
with
xn(rem(n,N)==0) = -(cos(.1*pi*n(rem(n,N)==0))-(4*sin(.2*pi*n(rem(n,N)==0))));
This should fix the dimension mismatch.
Resolution for edit - 3
Try:
N = 4; % sampling period
for nB = -30:30;
if rem(nB,N)==0
xnB(nB-(-30)+1) = -(cos(.1*pi*nB)-(4*sin(.2*pi*nB)));
else
xnB(nB-(-30)+1) = 0;
end
end

Speeding up a nested for loop

I've been working on speeding up the following function, but with no results:
function beta = beta_c(k,c,gamma)
beta = zeros(size(k));
E = #(x) (1.453*x.^4)./((1 + x.^2).^(17/6));
for ii = 1:size(k,1)
for jj = 1:size(k,2)
E_int = integral(E,k(ii,jj),10000);
beta(ii,jj) = c*gamma/(k(ii,jj)*sqrt(E_int));
end
end
end
Up to now, I solved it this way:
function beta = beta_calc(k,c,gamma)
k_1d = reshape(k,[1,numel(k)]);
E_1d =#(k) 1.453.*k.^4./((1 + k.^2).^(17/6));
E_int = zeros(1,numel(k_1d));
parfor ii = 1:numel(k_1d)
E_int(ii) = quad(E_1d,k_1d(ii),10000);
end
beta_1d = c*gamma./(k_1d.*sqrt(E_int));
beta = reshape(beta_1d,[size(k,1),size(k,2)]);
end
Seems to me, it didn't really enhance performances. What do you think about this?
Would you mind to shed a light?
I thank you in advance.
EDIT
I am gonna introduce some theoretical background involving my question.
Generally, beta is to be calculated as follows
Therefore, in the reduced case of unidimensional k array, E_int may be calculated as
E = 1.453.*k.^4./((1 + k.^2).^(17/6));
E_int = 1.5 - cumtrapz(k,E);
or, alternatively as
E_int(1) = 1.5;
for jj = 2:numel(k)
E =#(k) 1.453.*k.^4./((1 + k.^2).^(17/6));
E_int(jj) = E_int(jj - 1) - integral(E,k(jj-1),k(jj));
end
Nonetheless, k is currently a matrix k(size1,size2).
Here's another approach, parallelize, because it's easy using spmd or parfor. Instead of integral consider quad, see this link for examples...
I like this question.
The problem: the function integral takes as integration limits only scalars. Hence, it is difficult to vectorize the computation of of E_int.
A clue: there seems to be lot of redundancy in integrating the same function over and over from k(ii,jj) to infinity...
Proposed solution: How about sorting the values of k from smallest to largest and integrating E_sort_int(si) = integral( E, sortedK(si), sortedK(si+1) ); with sortedK( numel(k) + 1 ) = 10000;. Then the full value of E_int = cumsum( E_sort_int ); (you only need to "undo" the sorting and reshape it back to the size of k).

Resources