Averaging Matlab matrix - performance

In the Matlab programs I use I often have to average within a matrix (interpolation). The most straightforward way is to add the matrix and a shifted one (avg). However you could do the same operation using matrix multiplication (avg2). I noticed a considerable speed increase in the case of using matrix multiplication in the case of large matrices.
Could anyone explain why Matlab is able to process this multiplication faster than adding the same matrix? Also what are the possible downsides of using avg2() in respect to avg()?
Difference in runtime was a factor ~6 for this case (n=500).
function [] = speed()
%Speed test for averaging a matrix
n = 500;
A = rand(n,n);
tic
for i=1:100
avg(A);
end
toc
tic
for i=1:100
avg2(A);
end
toc
end
function B = avg(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2, B = (A(2:end,:)+A(1:end-1,:))/2; else B = avg(A,k-1); end
if size(A,2)==1, B = B'; end
end
function B = avg2(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2,
m = size(A,1);
e = ones(m,1);
S = spdiags(e*[1 1],-1:0,m,m-1)'/2;
B = S*A; else B = avg2(A,k-1); end
if size(A,2)==1, B = B'; end
end

Im afraid I cant give you an answer to the inner workings of the functions you are using. However, as they seem overly complicated, I felt I should make you aware of an easier (and a bit faster) way of doing this averaging.
You can instead use conv2 with a kernel of [0.5;0.5]. I have extended your code below:
function [A, T1, T2 T3] = speed()
%Speed test for averaging a matrix
n = 900;
A = rand(n,n);
tic
for i=1:100
T1 = avg(A);
end
toc
tic
for i=1:100
T2 = avg2(A);
end
toc
tic
for i=1:100
T3 = conv2(A,[1;1]/2,'valid');
end
toc
if sum(sum(abs(T3-T2))) > 0
warning('Method 3 not equal the other methods')
end
end
function B = avg(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2, B = (A(2:end,:)+A(1:end-1,:))/2; else B = avg(A,k-1); end
if size(A,2)==1, B = B'; end
end
function B = avg2(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2,
m = size(A,1);
e = ones(m,1);
S = spdiags(e*[1 1],-1:0,m,m-1)'/2;
B = S*A; else B = avg2(A,k-1); end
if size(A,2)==1, B = B'; end
end
Results:
Elapsed time is 10.201399 seconds.
Elapsed time is 1.088003 seconds.
Elapsed time is 1.040471 seconds.
Apologies if you already knew this.

Related

Vectorize double for loops

I need to evaluate an integral, and my code is
r=0:25;
t=0:250;
Ti=exp(-r.^2);
T=zeros(length(r),length(t));
for n=1:length(t)
w=1/2/t(n);
for m=1:length(r)
T(m,n)=w*trapz(r,Ti.*exp(-(r(m).^2+r.^2)*w/2).*r.*besseli(0,r(m)*r*w));
end
end
Currently the evaluation is fairly fast, but I wonder if there is a way to vectorize the double for-loop and make it even faster, especially when function trapz is used.
You can optimize it by passing matrix argument Y to trapz(A,Y), and using dim = 2, i.e. the loop becomes:
r = 0:25;
t = 0:250;
Ti = exp(-r.^2);
tic
T = zeros(length(r),length(t));
for n = 1:length(t)
w = 1/2/t(n);
for m = 1:length(r)
T(m,n) = w*trapz(r,Ti.*exp(-(r(m).^2+r.^2)*w/2).*r.*besseli(0,r(m)*r*w));
end
end
toc
tic
T1 = zeros(length(r),length(t));
for n = 1:length(t)
w = 1/2/t(n);
Y = bsxfun(#times,Ti.*r, exp(-bsxfun(#plus,r'.^2,r.^2)*w/2).*besseli(0,bsxfun(#times,r',r*w)));
T1(:,n) = w* trapz(r,Y,2);
end
toc
max(abs(T(:)-T1(:)))
You could probably vectorize it completely, will have a look later.

Matlab: How to convert nested far loop into parfor

I am having problems with the following loop, since it is taking too much time. Hence, I would like to use parallel processing, specifically parfor function.
P = numel(scaleX); % quite BIG number
sz = P;
start = 1;
sqrL = 10; % sqr len
e = 200;
A = false(sz, sz);
for m = sz-sqrL/2:(-1)*sqrL:start
for n = M(m):-sqrL:1
temp = [scaleX(m), scaleY(m); scaleX(n), scaleY(n)];
d = pdist(temp, 'euclidean');
if d < e
A(m, n) = 1;
end
end
end
Can anyone, please, help me to convert the outer 'far' loop into 'parfor' in this code?

Why I got this Error The variable in a parfor cannot be classified

I'm trying to use parfor to estimate the time it takes over 96 sec and I've more than one image to treat but I got this error:
The variable B in a parfor cannot be classified
this the code I've written:
Io=im2double(imread('C:My path\0.1s.tif'));
Io=double(Io);
In=Io;
sigma=[1.8 20];
[X,Y] = meshgrid(-3:3,-3:3);
G = exp(-(X.^2+Y.^2)/(2*1.8^2));
dim = size(In);
B = zeros(dim);
c = parcluster
matlabpool(c)
parfor i = 1:dim(1)
for j = 1:dim(2)
% Extract local region.
iMin = max(i-3,1);
iMax = min(i+3,dim(1));
jMin = max(j-3,1);
jMax = min(j+3,dim(2));
I = In(iMin:iMax,jMin:jMax);
% Compute Gaussian intensity weights.
H = exp(-(I-In(i,j)).^2/(2*20^2));
% Calculate bilateral filter response.
F = H.*G((iMin:iMax)-i+3+1,(jMin:jMax)-j+3+1);
B(i,j) = sum(F(:).*I(:))/sum(F(:));
end
end
matlabpool close
any Idea?
Unfortunately, it's actually dim that is confusing MATLAB in this case. You can fix it by doing
[n, m] = size(In);
parfor i = 1:n
for j = 1:m
B(i, j) = ...
end
end

Using matrix structure to speed up matlab

Suppose that I have an N-by-K matrix A, N-by-P matrix B. I want to do the following calculations to get my final N-by-P matrix X.
X(n,p) = B(n,p) - dot(gamma(p,:),A(n,:))
where
gamma(p,k) = dot(A(:,k),B(:,p))/sum( A(:,k).^2 )
In MATLAB, I have my code like
for p = 1:P
for n = 1:N
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
which are highly inefficient since it uses three for loops! Is there a good way to speed up this code?
Use bsxfun for the division and matrix multiplication for the loops:
gamma = bsxfun(#rdivide, B.'*A, sum(A.^2));
x = B - A*gamma.';
And here is a test script
N = 3;
K = 4;
P = 5;
A = rand(N, K);
B = rand(N, P);
for p = 1:P
for n = 1:N
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
gamma2 = bsxfun(#rdivide, B.'*A, sum(A.^2));
X2 = B - A*gamma2.';
isequal(x, X2)
isequal(gamma, gamma2)
which returns
ans =
1
ans =
1
It looks to me like you can hoist the gamma calculations out of the loop; at least, I don't see any dependencies on N in the gamma calculations.
So something like this:
for p = 1:P
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
end
for p = 1:P
for n = 1:N
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
I'm not familiar enough with your code (or matlab) to really know if you can merge the two loops, but if you can:
for p = 1:P
for k = 1:K
gamma(p,k) = dot(A(:,k),B(:,p))/sum(A(:,k).^2);
end
for n = 1:N
x(n,p) = B(n,p) - dot(gamma(p,:),A(n,:));
end
end
bxfun is slow...
How about something like the following (I might have a transpose wrong)
modA = A * (1./sum(A.^2,2)) * ones(1,k);
gamma = B' * modA;
x = B - A * gamma';

Permutations with order restrictions

Let L be a list of objects. Moreover, let C be a set of constraints, e.g.:
C(1) = t1 comes before t2, where t1 and t2 belong to L
C(2) = t3 comes after t2, where t3 and t2 belong to L
How can I find (in MATLAB) the set of permutations for which the constraints in C are not violated?
My first solution is naive:
orderings = perms(L);
toBeDeleted = zeros(1,size(orderings,1));
for ii = 1:size(orderings,1)
for jj = 1:size(constraints,1)
idxA = find(orderings(ii,:) == constraints(jj,1));
idxB = find(orderings(ii,:) == constraints(jj,2));
if idxA > idxB
toBeDeleted(ii) = 1;
end
end
end
where constraints is a set of constraints (each constraint is on a row of two elements, specifying that the first element comes before the second element).
I was wondering whether there exists a simpler (and more efficient) solution.
Thanks in advance.
I'd say that's a pretty good solution you have so far.
There is a few optimizations I see though. Here's my variation:
% INITIALIZE
NN = 9;
L = rand(1,NN-1);
while numel(L) ~= NN;
L = unique( randi(100,1,NN) ); end
% Some bogus constraints
constraints = [...
L(1) L(2)
L(3) L(6)
L(3) L(5)
L(8) L(4)];
% METHOD 0 (your original method)
tic
orderings = perms(L);
p = size(orderings,1);
c = size(constraints,1);
toKeep = true(p,1);
for perm = 1:p
for constr = 1:c
idxA = find(orderings(perm,:) == constraints(constr,1));
idxB = find(orderings(perm,:) == constraints(constr,2));
if idxA > idxB
toKeep(perm) = false;
end
end
end
orderings0 = orderings(toKeep,:);
toc
% METHOD 1 (your original, plus a few optimizations)
tic
orderings = perms(L);
p = size(orderings,1);
c = size(constraints,1);
toKeep = true(p,1);
for perm = 1:p
for constr = 1:c
% break on first condition breached
if toKeep(perm)
% find only *first* entry
toKeep(perm) = ...
find(orderings(perm,:) == constraints(constr,1), 1) < ...
find(orderings(perm,:) == constraints(constr,2), 1);
else
break
end
end
end
orderings1 = orderings(toKeep,:);
toc
% METHOD 2
tic
orderings = perms(L);
p = size(orderings,1);
c = size(constraints,1);
toKeep = true(p,1);
for constr = 1:c
% break on first condition breached1
if any(toKeep)
% Vectorized search for constraint values
[i1, j1] = find(orderings == constraints(constr,1));
[i2, j2] = find(orderings == constraints(constr,2));
% sort by rows
[i1, j1i] = sort(i1);
[i2, j2i] = sort(i2);
% Check if columns meet condition
toKeep = toKeep & j1(j1i) < j2(j2i);
else
break
end
end
orderings2 = orderings(toKeep,:);
toc
% Check for equality
all(orderings2(:) == orderings1(:))
Results:
Elapsed time is 17.911469 seconds. % your method
Elapsed time is 10.477549 seconds. % your method + optimizations
Elapsed time is 2.184242 seconds. % vectorized outer loop
ans =
1
ans =
1
The whole approach however has one fundamental flaw IMHO; the direct use of perms. This inherently poses a limitation due to memory constraints (NN < 10, as stated in help perms).
I have a strong suspicion you can get better performance, both time-wise and memory-wise, when you put together a customized perms. Luckily, perms is not built-in, so you can start by copy-pasting that code into your custom function.

Resources