MATLAB code running slow on MacBookPro, triple while loop - performance

I have been running a MATLAB program for almost six hours now, and it is still not complete. It is cycling through three while loops (the outer two loops are n=855, the inner loop is n=500). Is this a surprise that it is taking this long? Is there anything I can do to increase the speed? I am including the code below, as well as the variable data types underneath that.
while i < (numAtoms + 1)
pointAccessible = ones(numPoints,1);
j = 1;
while j <(numAtoms + 1)
if (i ~= j)
k=1;
while k < (numPoints + 1)
if (pointAccessible(k) == 1)
sphereCoord = [cell2mat(atomX(i)) + p + sphereX(k), cell2mat(atomY(i)) + p + sphereY(k), cell2mat(atomZ(i)) + p + sphereZ(k)];
neighborCoord = [cell2mat(atomX(j)), cell2mat(atomY(j)), cell2mat(atomZ(j))];
coords(1,:) = [sphereCoord];
coords(2,:) = [neighborCoord];
if (pdist(coords) < (atomRadius(j) + p))
pointAccessible(k)=0;
end
end
k = k + 1;
end
end
j = j+1;
end
remainingPoints(i) = sum(pointAccessible);
i = i +1;
end
Variable Data Types:
numAtoms = 855
numPoints = 500
p = 1.4
atomRadius = <855 * 1 double>
pointAccessible = <500 * 1 double>
atomX, atomY, atomZ = <1 * 855 cell>
sphereX, sphereY, sphereZ = <500 * 1 double>
remainingPoints = <855 * 1 double>

Related

Solving Project Euler #12 with Matlab

I am trying to solve Problem #12 of Project Euler with Matlab and this is what I came up with to find the number of divisors of a given number:
function [Divisors] = ND(n)
p = primes(n); %returns a row vector containing all the prime numbers less than or equal to n
i = 1;
count = 0;
Divisors = 1;
while n ~= 1
while rem(n, p(i)) == 0 %rem(a, b) returns the remainder after division of a by b
count = count + 1;
n = n / p(i);
end
Divisors = Divisors * (count + 1);
i = i + 1;
count = 0;
end
end
After this, I created a function to evaluate the number of divisors of the product n * (n + 1) / 2 and when this product achieves a specific limit:
function [solution] = Solution(limit)
n = 1;
product = 0;
while(product < limit)
if rem(n, 2) == 0
product = ND(n / 2) * ND(n + 1);
else
product = ND(n) * ND((n + 1) / 2);
end
n = n + 1;
end
solution = n * (n + 1) / 2;
end
I already know the answer and it's not what comes back from the function Solution. Could someone help me find what's wrong with the coding.
When I run Solution(500) (500 is the limit specified in the problem), I get 76588876, but the correct answer should be:
76576500.
The trick is quite simple while it also bothering me for a while: The iteration in you while loop is misplaced, which would cause the solution a little bigger than the true answer.
function [solution] = Solution(limit)
n = 1;
product = 0;
while(product < limit)
n = n + 1; %%%But Here
if rem(n, 2) == 0
product = ND(n / 2) * ND(n + 1);
else
product = ND(n) * ND((n + 1) / 2);
end
%n = n + 1; %%%Not Here
end
solution = n * (n + 1) / 2;
end
The output of Matlab 2015b:
>> Solution(500)
ans =
76576500

Vectorized code slower than loops? MATLAB

In the problem Im working on there is such a part of code, as shown below. The definition part is just to show you the sizes of arrays. Below I pasted vectorized version - and it is >2x slower. Why it happens so? I know that i happens if vectorization requiers large temporary variables, but (it seems) it is not true here.
And generally, what (other than parfor, with I already use) can I do to speed up this code?
maxN = 100;
levels = maxN+1;
xElements = 101;
umn = complex(zeros(levels, levels));
umn2 = umn;
bessels = ones(xElements, xElements, levels); % 1.09 GB
posMcontainer = ones(xElements, xElements, maxN);
tic
for j = 1 : xElements
for i = 1 : xElements
for n = 1 : 2 : maxN
nn = n + 1;
mm = 1;
for m = 1 : 2 : n
umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
mm = mm + 1;
end
end
end
end
toc % 0.520594 seconds
tic
for j = 1 : xElements
for i = 1 : xElements
for n = 1 : 2 : maxN
nn = n + 1;
m = 1:2:n;
numOfEl = ceil(n/2);
umn2(nn, 1:numOfEl) = bessels(i, j, nn) * posMcontainer(i, j, m);
end
end
end
toc % 1.275926 seconds
sum(sum(umn-umn2)) % veryfying, if all done right
Best regards,
Alex
From the profiler:
Edit:
In reply to #Jason answer, this alternative takes the same time:
for n = 1:2:maxN
nn(n) = n + 1;
numOfEl(n) = ceil(n/2);
end
for j = 1 : xElements
for i = 1 : xElements
for n = 1 : 2 : maxN
umn2(nn(n), 1:numOfEl(n)) = bessels(i, j, nn(n)) * posMcontainer(i, j, 1:2:n);
end
end
end
Edit2:
In reply to #EBH :
The point is to do the following:
parfor i = 1 : xElements
for j = 1 : xElements
umn = complex(zeros(levels, levels)); % cleaning
for n = 0:maxN
mm = 1;
for m = -n:2:n
nn = n + 1; % for indexing
if m < 0
umn(nn, mm) = bessels(i, j, nn) * negMcontainer(i, j, abs(m));
end
if m > 0
umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
end
if m == 0
umn(nn, mm) = bessels(i, j, nn);
end
mm = mm + 1; % for indexing
end % m
end % n
beta1 = sum(sum(Aj1.*umn));
betaSumSq1(i, j) = abs(beta1).^2;
beta2 = sum(sum(Aj2.*umn));
betaSumSq2(i, j) = abs(beta2).^2;
end % j
end % i
I speeded it up as much, as I was able to. What you have written is taking only the last bessels and posMcontainer values, so it does not produce the same result. In the real code, those two containers are filled not with 1, but with some precalculated values.
After your edit, I can see that umn is just a temporary variable for another calculation. It still can be mostly vectorizable:
betaSumSq1 = zeros(xElements); % preallocating
betaSumSq2 = zeros(xElements); % preallocating
% an index matrix to fetch the right values from negMcontainer and
% posMcontainer:
indmat = tril(repmat([0 1;1 0],ceil((maxN+1)/2),floor(levels/2)));
indmat(end,:) = [];
% an index matrix to fetch the values in correct order for umn:
b_ind = repmat([1;0],ceil((maxN+1)/2),1);
b_ind(end) = [];
tempind = logical([fliplr(indmat) b_ind indmat+triu(ones(size(indmat)))]);
% permute the arrays to prevent squeeze:
PM = permute(posMcontainer,[3 1 2]);
NM = permute(negMcontainer,[3 1 2]);
B = permute(bessels,[3 1 2]);
for k = 1 : maxN+1 % third dim
for jj = 1 : xElements % columns
b = B(:,jj,k); % get one vector of B
% perform b*NM for every row of NM*indmat, than flip the result:
neg = fliplr(bsxfun(#times,bsxfun(#times,indmat,NM(:,jj,k).'),b));
% perform b*PM for every row of PM*indmat:
pos = bsxfun(#times,bsxfun(#times,indmat,PM(:,jj,k).'),b);
temp = [neg mod(1:levels,2).'.*b pos].'; % concat neg and pos
% assign them to the right place in umn:
umn = reshape(temp(tempind.'),[levels levels]).';
beta1 = Aj1.*umn;
betaSumSq1(jj,k) = abs(sum(beta1(:))).^2;
beta2 = Aj2.*umn;
betaSumSq2(jj,k) = abs(sum(beta2(:))).^2;
end
end
This reduce running time from ~95 seconds to less 3 seconds (both without parfor), so it improves in almost 97%.
I would suspect it is memory allocation. You are re-allocating the m array in a 3 deep loop.
try rearranging the code:
tic
for n = 1 : 2 : maxN
nn = n + 1;
m = 1:2:n;
numOfEl = ceil(n/2);
for j = 1 : xElements
for i = 1 : xElements
umn2(nn, 1:numOfEl) = bessels(i, j, nn) * posMcontainer(i, j, m);
end
end
end
toc % 1.275926 seconds
I was trying this in Igor pro, which a similar language, but with different optimizations. So the direct translations don't time the same way as Matlab (vectorized was slightly faster in Igor). But reordering the loops did speed up the vectorized form.
In your second part of the code, that is setting umn2, inside the loops, you have:
nn = n + 1;
m = 1:2:n;
numOfEl = ceil(n/2);
Those 3 lines don't require any input from the i and j loops, they only use the n loop. So reordering the loops such that i and j are inside the n loop will mean that those 3 lines are done xElements^2 (100^2) times less often. I suspect it is that m = 1:2:n line that takes time, since that is allocating an array.

Fast way to initialize a tensor in torch7

I need to initialize a 3D tensor with an index-dependent function in torch7, i.e.
func = function(i,j,k) --i, j is the index of an element in the tensor
return i*j*k --do operations within func which're dependent of i, j
end
then I initialize a 3D tensor A like this:
for i=1,A:size(1) do
for j=1,A:size(2) do
for k=1,A:size(3) do
A[{i,j,k}] = func(i,j,k)
end
end
end
But this code runs very slow, and I found it takes up 92% of total running time. Are there any more efficient ways to initialize a 3D tensor in torch7?
See the documentation for the Tensor:apply
These functions apply a function to each element of the tensor on
which the method is called (self). These methods are much faster than
using a for loop in Lua.
The example in the docs initializes a 2D array based on its index i (in memory). Below is an extended example for 3 dimensions and below that one for N-D tensors. Using the apply method is much, much faster on my machine:
require 'torch'
A = torch.Tensor(100, 100, 1000)
B = torch.Tensor(100, 100, 1000)
function func(i,j,k)
return i*j*k
end
t = os.clock()
for i=1,A:size(1) do
for j=1,A:size(2) do
for k=1,A:size(3) do
A[{i, j, k}] = i * j * k
end
end
end
print("Original time:", os.difftime(os.clock(), t))
t = os.clock()
function forindices(A, func)
local i = 1
local j = 1
local k = 0
local d3 = A:size(3)
local d2 = A:size(2)
return function()
k = k + 1
if k > d3 then
k = 1
j = j + 1
if j > d2 then
j = 1
i = i + 1
end
end
return func(i, j, k)
end
end
B:apply(forindices(A, func))
print("Apply method:", os.difftime(os.clock(), t))
EDIT
This will work for any Tensor object:
function tabulate(A, f)
local idx = {}
local ndims = A:dim()
local dim = A:size()
idx[ndims] = 0
for i=1, (ndims - 1) do
idx[i] = 1
end
return A:apply(function()
for i=ndims, 0, -1 do
idx[i] = idx[i] + 1
if idx[i] <= dim[i] then
break
end
idx[i] = 1
end
return f(unpack(idx))
end)
end
-- usage for 3D case.
tabulate(A, function(i, j, k) return i * j * k end)

Can someone help me vectorize / speed up this Matlab Loop?

correlation = zeros(length(s1), 1);
sizeNum = 0;
for i = 1 : length(s1) - windowSize - delta
s1Dat = s1(i : i + windowSize);
s2Dat = s2(i + delta : i + delta + windowSize);
if length(find(isnan(s1Dat))) == 0 && length(find(isnan(s2Dat))) == 0
if(var(s1Dat) ~= 0 || var(s2Dat) ~= 0)
sizeNum = sizeNum + 1;
correlation(i) = abs(corr(s1Dat, s2Dat)) ^ 2;
end
end
end
What's happening here:
Run through every values in s1. For every value, get a slice for s1
till s1 + windowSize.
Do the same for s2, only get the slice after an intermediate delta.
If there are no NaN's in any of the two slices and they aren't flat,
then get the correlaton between them and add that to the
correlation matrix.
This is not an answer, I am trying to understand what is being asked.
Take some data:
N = 1e4;
s1 = cumsum(randn(N, 1)); s2 = cumsum(randn(N, 1));
s1(randi(N, 50, 1)) = NaN; s2(randi(N, 50, 1)) = NaN;
windowSize = 200; delta = 100;
Compute correlations:
tic
corr_s = zeros(N - windowSize - delta, 1);
for i = 1:(N - windowSize - delta)
s1Dat = s1(i:(i + windowSize));
s2Dat = s2((i + delta):(i + delta + windowSize));
corr_s(i) = corr(s1Dat, s2Dat);
end
inds = isnan(corr_s);
corr_s(inds) = 0;
corr_s = corr_s .^ 2; % square of correlation coefficient??? Why?
sizeNum = sum(~inds);
toc
This is what you want to do, right? A moving window correlation function? This is a very interesting question indeed …

How to accelerate matlab code?

I'm using matlab to implement a multilayer neural network. In the code I represent
the value of each node AS netValue{k}
the weight between layer k and k + 1 AS weight{k}
etc.
Since these data is three-dimensional, I have to use cell to hold a 2-D matrix to enable matrix multiply.
So it becomes really really slow to train the model, which I expect to have resulted from the usage of cell.
Can anyone tell me how to accelerate this code? Thanks
clc;
close all;
clear all;
input = [-2 : 0.4 : 2;-2:0.4:2];
ican = 4;
depth = 4; % total layer - 1, by convension
[featureNum , sampleNum] = size(input);
levelNum(1) = featureNum;
levelNum(2) = 5;
levelNum(3) = 5;
levelNum(4) = 5;
levelNum(5) = 2;
weight = cell(0);
for k = 1 : depth
weight{k} = rand(levelNum(k+1), levelNum(k)) - 2 * rand(levelNum(k+1) , levelNum(k));
threshold{k} = rand(levelNum(k+1) , 1) - 2 * rand(levelNum(k+1) , 1);
end
runCount = 0;
sumMSE = 1; % init MSE
minError = 1e-5;
afa = 0.1; % step of "gradient ascendence"
% training loop
while(runCount < 100000 & sumMSE > minError)
sumMSE = 0; % sum of MSE
for i = 1 : sampleNum % sample loop
netValue{1} = input(:,i);
for k = 2 : depth
netValue{k} = weight{k-1} * netValue{k-1} + threshold{k-1}; %calculate each layer
netValue{k} = 1 ./ (1 + exp(-netValue{k})); %apply logistic function
end
netValue{depth+1} = weight{depth} * netValue{depth} + threshold{depth}; %output layer
e = 1 + sin((pi / 4) * ican * netValue{1}) - netValue{depth + 1}; %calc error
assistS{depth} = diag(ones(size(netValue{depth+1})));
s{depth} = -2 * assistS{depth} * e;
for k = depth - 1 : -1 : 1
assistS{k} = diag((1-netValue{k+1}).*netValue{k+1});
s{k} = assistS{k} * weight{k+1}' * s{k+1};
end
for k = 1 : depth
weight{k} = weight{k} - afa * s{k} * netValue{k}';
threshold{k} = threshold{k} - afa * s{k};
end
sumMSE = sumMSE + e' * e;
end
sumMSE = sqrt(sumMSE) / sampleNum;
runCount = runCount + 1;
end
x = [-2 : 0.1 : 2;-2:0.1:2];
y = zeros(size(x));
z = 1 + sin((pi / 4) * ican .* x);
% test
for i = 1 : length(x)
netValue{1} = x(:,i);
for k = 2 : depth
netValue{k} = weight{k-1} * netValue{k-1} + threshold{k-1};
netValue{k} = 1 ./ ( 1 + exp(-netValue{k}));
end
y(:, i) = weight{depth} * netValue{depth} + threshold{depth};
end
plot(x(1,:) , y(1,:) , 'r');
hold on;
plot(x(1,:) , z(1,:) , 'g');
hold off;
Have you used the profiler to find out what functions are actually slowing down your code? It shows what lines take the most time to execute.

Resources