Faster matrix recursion in Matlab - performance

The matrix-recursion of the n x n matrices Y_t looks like this:
Y_{t} = A + \sum_{i=1}^{p} B_{i} * Y_{t-i}
A and B are given.
This is my attempt, but it runs slowly:
Y = zeros(n,n,T); %Going to fill the 3rd dimension for Y_t, t=1:T
Y(:,:,1:p) = initializingY
for t=(p+1):T
Y(:,:,t) = A;
for i=1:p
Y(:,:,t) = Y(:,:,t) + B(:,:,i)*Y(:,:,t-i);
end
end
Can you think of a more efficient way to do this?

You can kill the inner loop with matrix-multiplication after some reshaping & permuting, like so -
Y = zeros(n,n,T);
%// Y(:,:,1:p) = initializingY
for t=(p+1):T
Br = reshape(B(:,:,1:p),n,[]);
Yr = reshape(permute(Y(:,:,t-(1:p)),[1 3 2]),[],n);
Y(:,:,t) = A + Br*Yr;
end

Short of using clever mathematical tricks to reduce the number of operations, the best shot is to optimize the memory access. That is: avoid subsrefing, increase the locality of your code, reduce the cache misses by manipulating short arrays instead of large ones.
n = 50;
T = 1000;
p = 10;
Y = zeros(n,n,T);
B = zeros(n,n,p);
A = rand(n);
for t = 1:p
Y(:,:,t) = rand(n);
B(:,:,t) = rand(n);
end
fprintf('Original attempt: '); tic;
for t=(p+1):T
Y(:,:,t) = A;
for k=1:p
Y(:,:,t) = Y(:,:,t) + B(:,:,k)*Y(:,:,t-k);
end;
end;
toc;
%'This solution was taken from Divakar'
fprintf('Using reshaping: '); tic;
Br = reshape(B(:,:,1:p),n,[]);
for t=(p+1):T
Yr = reshape(permute(Y(:,:,t-(1:p)),[1 3 2]),[],n);
Y(:,:,t) = A + Br*Yr;
end;
toc;
%'proposed solution'
Y = cell(1,T);
B = cell(1,p);
A = rand(n);
for t = 1:p
Y{t} = rand(n);
B{t} = rand(n);
end
fprintf('Using cells: '); tic;
for t=(p+1):T
U = A;
for k=1:p
U = U + B{k}*Y{t-k};
end;
Y{t} = U;
end;
toc;
For setups given in my example I get a two-fold speed increase for a decent machine (i5 + 4Gb, MATLAB R2012a). I am curious how well it does on your machine.

Related

Matlab: How to convert nested far loop into parfor

I am having problems with the following loop, since it is taking too much time. Hence, I would like to use parallel processing, specifically parfor function.
P = numel(scaleX); % quite BIG number
sz = P;
start = 1;
sqrL = 10; % sqr len
e = 200;
A = false(sz, sz);
for m = sz-sqrL/2:(-1)*sqrL:start
for n = M(m):-sqrL:1
temp = [scaleX(m), scaleY(m); scaleX(n), scaleY(n)];
d = pdist(temp, 'euclidean');
if d < e
A(m, n) = 1;
end
end
end
Can anyone, please, help me to convert the outer 'far' loop into 'parfor' in this code?

How to remove the for loop in the following MATLAB code?

I need to perform the following computation in an image processing project. It is the logarthmic of the summation of H3. I've written the following code but this loop has a very high computation time. Is there any way to eliminate the for loop?
for k=1:i
for l=1:j
HA(i,j)=HA(i,j)+log2((H3(k,l)/probA).^q);
end;
end;
Thanks in advance!
EDIT:
for i=1:256
for j=1:240
probA = 0;
probC = 0;
subProbA = H3(1:i,1:j);
probA = sum(subProbA(:));
probC = 1-probA;
for k=1:i
for l=1:j
HA(i,j)=HA(i,j)+log2((H3(k,l)/probA).^q);
end;
end;
HA(i,j)=HA(i,j)/(1-q);
for k=i+1:256
for l=j+1:240
HC(i,j)=HC(i,j)+log2((H3(k,l)/probC).^q);
end;
end;
HC(i,j)=HC(i,j)/(1-q);
e1(i,j) = HA(i,j) + HC(i,j);
if e1(i) >= emax
emax = e1(i);
tt1 = i-1;
end;
end;
end;
Assuming the two loops are nested inside some other outer loops that are iterated with i and j (though using i and j as iterators are not the best practices) and also assuming that probA and q are scalars, try this -
HA(i,j) = sum(sum(log2((H3(1:i,1:j)./probA).^q)))
Using the above code snippet, yon can replace your actual code posted in the EDIT section with this -
for i=1:256
for j=1:240
subProbA = H3(1:i,1:j);
probA = sum(subProbA(:));
probC = 1-probA;
HA(i,j) = sum(sum(log2((subProbA./probA).^q)))./(1-q);
HC(i,j) = sum(sum(log2((subProbA./probC).^q)))./(1-q);
e1(i,j) = HA(i,j) + HC(i,j);
if e1(i) >= emax
emax = e1(i);
tt1 = i-1;
end
end
end
Note that in this code, probA = 0; and probC = 0; are removed as they are over-written anyway later in the original code.
Assuming that q is scalar value, this code removes all the four for loops. Also in your given code you are calculating the maximum value of e1 only along the first column. If that is so then you should put in out of the second loop
height = 256;
width = 240;
a = repmat((1:height)',1,width);
b = repmat(1:width,height,1);
probA = arrayfun(#(ii,jj)(sum(sum(H3(1:ii,1:jj)))),a,repmat(1:width,height,1));
probC = 1 - probA;
HA = arrayfun(#(ii,jj)(sum(sum(log2((H3(1:ii,1:jj)/probA(ii,jj)).^q)))/(1-q)),a,b);
HC = arrayfun(#(ii,jj)(sum(sum(log2((H3(ii+1:height,jj+1:width)/probC(ii,jj)).^q)))/(1-q)),a,b);
e1 = HA + HC;
[emax tt_temp] = max(e1(:,1));
tt1 = tt_temp - 1;

Why I got this Error The variable in a parfor cannot be classified

I'm trying to use parfor to estimate the time it takes over 96 sec and I've more than one image to treat but I got this error:
The variable B in a parfor cannot be classified
this the code I've written:
Io=im2double(imread('C:My path\0.1s.tif'));
Io=double(Io);
In=Io;
sigma=[1.8 20];
[X,Y] = meshgrid(-3:3,-3:3);
G = exp(-(X.^2+Y.^2)/(2*1.8^2));
dim = size(In);
B = zeros(dim);
c = parcluster
matlabpool(c)
parfor i = 1:dim(1)
for j = 1:dim(2)
% Extract local region.
iMin = max(i-3,1);
iMax = min(i+3,dim(1));
jMin = max(j-3,1);
jMax = min(j+3,dim(2));
I = In(iMin:iMax,jMin:jMax);
% Compute Gaussian intensity weights.
H = exp(-(I-In(i,j)).^2/(2*20^2));
% Calculate bilateral filter response.
F = H.*G((iMin:iMax)-i+3+1,(jMin:jMax)-j+3+1);
B(i,j) = sum(F(:).*I(:))/sum(F(:));
end
end
matlabpool close
any Idea?
Unfortunately, it's actually dim that is confusing MATLAB in this case. You can fix it by doing
[n, m] = size(In);
parfor i = 1:n
for j = 1:m
B(i, j) = ...
end
end

Using matrix structure to speed up matlab

Suppose that I have an N-by-K matrix A, N-by-P matrix B. I want to do the following calculations to get my final N-by-P matrix X.
X(n,p) = B(n,p) - dot(gamma(p,:),A(n,:))
where
gamma(p,k) = dot(A(:,k),B(:,p))/sum( A(:,k).^2 )
In MATLAB, I have my code like
for p = 1:P
for n = 1:N
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
which are highly inefficient since it uses three for loops! Is there a good way to speed up this code?
Use bsxfun for the division and matrix multiplication for the loops:
gamma = bsxfun(#rdivide, B.'*A, sum(A.^2));
x = B - A*gamma.';
And here is a test script
N = 3;
K = 4;
P = 5;
A = rand(N, K);
B = rand(N, P);
for p = 1:P
for n = 1:N
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
gamma2 = bsxfun(#rdivide, B.'*A, sum(A.^2));
X2 = B - A*gamma2.';
isequal(x, X2)
isequal(gamma, gamma2)
which returns
ans =
1
ans =
1
It looks to me like you can hoist the gamma calculations out of the loop; at least, I don't see any dependencies on N in the gamma calculations.
So something like this:
for p = 1:P
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
end
for p = 1:P
for n = 1:N
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
I'm not familiar enough with your code (or matlab) to really know if you can merge the two loops, but if you can:
for p = 1:P
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
for n = 1:N
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
bxfun is slow...
How about something like the following (I might have a transpose wrong)
modA = A * (1./sum(A.^2,2)) * ones(1,k);
gamma = B' * modA;
x = B - A * gamma';

Averaging Matlab matrix

In the Matlab programs I use I often have to average within a matrix (interpolation). The most straightforward way is to add the matrix and a shifted one (avg). However you could do the same operation using matrix multiplication (avg2). I noticed a considerable speed increase in the case of using matrix multiplication in the case of large matrices.
Could anyone explain why Matlab is able to process this multiplication faster than adding the same matrix? Also what are the possible downsides of using avg2() in respect to avg()?
Difference in runtime was a factor ~6 for this case (n=500).
function [] = speed()
%Speed test for averaging a matrix
n = 500;
A = rand(n,n);
tic
for i=1:100
avg(A);
end
toc
tic
for i=1:100
avg2(A);
end
toc
end
function B = avg(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2, B = (A(2:end,:)+A(1:end-1,:))/2; else B = avg(A,k-1); end
if size(A,2)==1, B = B'; end
end
function B = avg2(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2,
m = size(A,1);
e = ones(m,1);
S = spdiags(e*[1 1],-1:0,m,m-1)'/2;
B = S*A; else B = avg2(A,k-1); end
if size(A,2)==1, B = B'; end
end
Im afraid I cant give you an answer to the inner workings of the functions you are using. However, as they seem overly complicated, I felt I should make you aware of an easier (and a bit faster) way of doing this averaging.
You can instead use conv2 with a kernel of [0.5;0.5]. I have extended your code below:
function [A, T1, T2 T3] = speed()
%Speed test for averaging a matrix
n = 900;
A = rand(n,n);
tic
for i=1:100
T1 = avg(A);
end
toc
tic
for i=1:100
T2 = avg2(A);
end
toc
tic
for i=1:100
T3 = conv2(A,[1;1]/2,'valid');
end
toc
if sum(sum(abs(T3-T2))) > 0
warning('Method 3 not equal the other methods')
end
end
function B = avg(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2, B = (A(2:end,:)+A(1:end-1,:))/2; else B = avg(A,k-1); end
if size(A,2)==1, B = B'; end
end
function B = avg2(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2,
m = size(A,1);
e = ones(m,1);
S = spdiags(e*[1 1],-1:0,m,m-1)'/2;
B = S*A; else B = avg2(A,k-1); end
if size(A,2)==1, B = B'; end
end
Results:
Elapsed time is 10.201399 seconds.
Elapsed time is 1.088003 seconds.
Elapsed time is 1.040471 seconds.
Apologies if you already knew this.

Resources