I'm trying to speed up my code using parfor. The purpose of the code is to slide a 3D square window on a 3D image and for each block of mxmxm apply a function.
I wrote this code:
function [ o_image ] = SlidingWindow( i_image, i_padSize, i_fun, i_options )
%SLIDINGWINDOW Summary of this function goes here
% Detailed explanation goes here
o_image = zeros(size(i_image,1),size(i_image,2),size(i_image,3));
i_image = padarray(i_image,i_padSize,'symmetric');
i_padSize = num2cell(i_padSize);
[m,n,p] = deal(i_padSize{:});
[row,col,depth] = size(i_image);
windowShape = i_options.windowShape;
mask = i_options.mask;
parfor (i = m+1:row-m,i_options.cores)
temp = i_image(i-m:i+m,:,:);
for j = n+1:col-n
for h = p+1:depth-p
ii = i-m;
jj = j-n;
hh = h-p;
temp = temp(:,j-n:j+n, h-p:h+p);
o_image(ii,jj,hh) = parfeval(i_fun, temp, windowShape, mask);
end
end
end
end
I get one warning and one error that I don't understand how to solve.
The warning says:
the entire array or structure 'i_image' is a broadcast variable.
The error says:
the PARFOR loop can not run due to the way variable 'o_image' is used.
I don't understand how to fix these two things. Any help is greatly appreciated!
As far as I understand, parfeval takes care of running your function on the available number of workers, which is why it doesn't need to be surrounded by parfor. Assuming you already have an active parpool, changing the external parfor into for eliminates both problems.
Unfortunately, I can't support my answer with a benchmark or suggest a more fitting solution because your inputs are unknown.
It seems to me that the code can be optimized in other ways, mainly by vectorization. I would suggest you looked into the following resources:
This question, for additional info on parfeval.
Examples on how to use bsxfun and permute and benchmarks thereof: ex1, ex2, ex3.
P.S.: The 2nd part of (i = m+1:row-m,i_options.cores) seems out of place...
Related
The following is a snippet of code of Ant Colony Optimization. I've removed whatever I feel would absolutely not be necessary to understand the code. The rest I'm not sure as I'm unfamiliar with coding on matlab. However, I'm running this algorithm on 500 or so cities with 500 ants and 1000 iterations, and the code runs extremely slow as compared to other algorithm implementations on matlab. For the purposes of my project, I simply need the datasets, not demonstrate coding capability on matlab, and I had time constraints that simply did not allow me to learn matlab from scratch, as that was not taken into consideration nor expected when the deadline was given, so I got the algorithm from an online source.
Matlab recommends preallocating two variables inside a loop as they are arrays that change size I believe. However I do not fully understand the purpose of those two parts of the code, so I haven't been able to do so. I believe both arrays increment a new item every iteration of the loop, so technically they should both be zero-able and could preallocate the size of both expected final sizes based on the for loop condition, but I'm not sure. I've tried preallocating zeroes to the two arrays, but it does not seem to fix anything as Matlab still shows preallocate for speed recommendation.
I've added two comments on the two variables recommended by MATLAB to preallocate below. If someone would be kind as to skim over it and let me know if it is possible, it'd be much appreciated.
x = 10*rand(50,1);
y = 10*rand(50,1);
n=numel(x);
D=zeros(n,n);
for i=1:n-1
for j=i+1:n
D(i,j)=sqrt((x(i)-x(j))^2+(y(i)-y(j))^2);
D(j,i)=D(i,j);
end
end
model.n=n;
model.x=x;
model.y=y;
model.D=D;
nVar=model.n;
MaxIt=100;
nAnt=50;
Q=1;
tau0=10*Q/(nVar*mean(model.D(:)));
alpha=1;
beta=5;
rho=0.6;
eta=1./model.D;
tau=tau0*ones(nVar,nVar);
BestCost=zeros(MaxIt,1);
empty_ant.Tour=[];
empty_ant.Cost=[];
ant=repmat(empty_ant,nAnt,1);
BestSol.Cost=inf;
for it=1:MaxIt
for k=1:nAnt
ant(k).Tour=randi([1 nVar]);
for l=2:nVar
i=ant(k).Tour(end);
P=tau(i,:).^alpha.*eta(i,:).^beta;
P(ant(k).Tour)=0;
P=P/sum(P);
r=rand;
C=cumsum(P);
j=find(r<=C,1,'first');
ant(k).Tour=[ant(k).Tour j];
end
tour = ant(k).Tour;
n=numel(tour);
tour=[tour tour(1)]; %MatLab recommends preallocation here
ant(k).Cost=0;
for i=1:n
ant(k).Cost=ant(k).Cost+model.D(tour(i),tour(i+1));
end
if ant(k).Cost<BestSol.Cost
BestSol=ant(k);
end
end
for k=1:nAnt
tour=ant(k).Tour;
tour=[tour tour(1)];
for l=1:nVar
i=tour(l);
j=tour(l+1);
tau(i,j)=tau(i,j)+Q/ant(k).Cost;
end
end
tau=(1-rho)*tau;
BestCost(it)=BestSol.Cost;
figure(1);
tour=BestSol.Tour;
tour=[tour tour(1)]; %MatLab recommends preallocation here
plot(model.x(tour),model.y(tour),'g.-');
end
If you change the size of an array, that means copying it to a new location in memory. This is not a huge problem for small arrays but for large arrays this slows down your code immensely. The tour arrays you're using are fixed size (51 or n+1 in this case) so you should preallocate them as zero arrays. the only thing you do is add the first element of the tour again to the end so all you have to do is set the last element of the array.
Here is what you should change:
x = 10*rand(50,1);
y = 10*rand(50,1);
n=numel(x);
D=zeros(n,n);
for i=1:n-1
for j=i+1:n
D(i,j)=sqrt((x(i)-x(j))^2+(y(i)-y(j))^2);
D(j,i)=D(i,j);
end
end
model.n=n;
model.x=x;
model.y=y;
model.D=D;
nVar=model.n;
MaxIt=1000;
nAnt=50;
Q=1;
tau0=10*Q/(nVar*mean(model.D(:)));
alpha=1;
beta=5;
rho=0.6;
eta=1./model.D;
tau=tau0*ones(nVar,nVar);
BestCost=zeros(MaxIt,1);
empty_ant.Tour=zeros(n, 1);
empty_ant.Cost=[];
ant=repmat(empty_ant,nAnt,1);
BestSol.Cost=inf;
for it=1:MaxIt
for k=1:nAnt
ant(k).Tour=randi([1 nVar]);
for l=2:nVar
i=ant(k).Tour(end);
P=tau(i,:).^alpha.*eta(i,:).^beta;
P(ant(k).Tour)=0;
P=P/sum(P);
r=rand;
C=cumsum(P);
j=find(r<=C,1,'first');
ant(k).Tour=[ant(k).Tour j];
end
tour = zeros(n+1,1);
tour(1:n) = ant(k).Tour;
n=numel(ant(k).Tour);
tour(end) = tour(1); %MatLab recommends preallocation here
ant(k).Cost=0;
for i=1:n
ant(k).Cost=ant(k).Cost+model.D(tour(i),tour(i+1));
end
if ant(k).Cost<BestSol.Cost
BestSol=ant(k);
end
end
for k=1:nAnt
tour(1:n)=ant(k).Tour;
tour(end) = tour(1);
for l=1:nVar
i=tour(l);
j=tour(l+1);
tau(i,j)=tau(i,j)+Q/ant(k).Cost;
end
end
tau=(1-rho)*tau;
BestCost(it)=BestSol.Cost;
figure(1);
tour(1:n) = BestSol.Tour;
tour(end) = tour(1); %MatLab recommends preallocation here
plot(model.x(tour),model.y(tour),'g.-');
end
I think that the warning that the MATLAB Editor gives in this case is misplaced. The array is not repeatedly resized, it is just resized once. In principle, tour(end+1)=tour(1) is more efficient than tour=[tour,tour(1)], but in this case you might not notice the difference in cost.
If you want to speed up this code you could think of vectorizing some of its loops, and of reducing the number of indexing operations performed. For example this section:
tour = ant(k).Tour;
n=numel(tour);
tour=[tour tour(1)]; %MatLab recommends preallocation here
ant(k).Cost=0;
for i=1:n
ant(k).Cost=ant(k).Cost+model.D(tour(i),tour(i+1));
end
if ant(k).Cost<BestSol.Cost
BestSol=ant(k);
end
could be written as:
tour = ant(k).Tour;
ind = sub2ind(size(model.D),tour,circshift(tour,-1));
ant(k).Cost = sum(model.D(ind));
if ant(k).Cost < BestSol.Cost
BestSol = ant(k);
end
This rewritten code doesn't have a loop, which usually makes things a little faster, and it also doesn't repeatedly do complicated indexing (ant(k).Cost is two indexing operations, within a loop that will slow you down more than necessary).
There are more opportunities for optimization like these, but rewriting the whole function is outside the scope of this answer.
I have not tried running the code, please let me know if there are any errors when using the proposed change.
My code seems to run very slowly and I can't think of any way to make it faster. All my arrays have been preallocated. S is a large number of element (say 10000 element, for example). I know my code runs slowly because of the "for k=1:S" but i cant think of another way to perform this loop at a relatively fast speed. Can i please get help because it takes hours to run.
[M,~] = size(Sample2000_X);
[N,~] = size(Sample2000_Y);
[S,~] = size(Prediction_Point);
% Speed Preallocation
Distance = zeros(M,N);
Distance_Prediction = zeros(M,1);
for k=1:S
for i=1:M
for j=1:N
Distance(i,j) = sqrt(power((Sample2000_X(i)-Sample2000_X(j)),2)+power((Sample2000_Y(i)-Sample2000_Y(j)),2));
end
Distance_Prediction(i,1) = sqrt(power((Prediction_Point(k,1)-Sample2000_X(i)),2)+power((Prediction_Point(k,2)-Sample2000_Y(i)),2));
end
end
Thanks.
I realized the major problem was organization of my code. I was performing calculation in a loop where it was absolutely unnecessary. So i seperated the code in two blocks and it Works much faster.
for i=1:M
for j=1:N
Distance(i,j) = sqrt(power((Sample2000_X(i)-Sample2000_X(j)),2)+power((Sample2000_Y(i)-Sample2000_Y(j)),2));
end
end
for k=1:S
for i=1:M
Distance_Prediction(i,1) = sqrt(power((Prediction_Point(k,1)-Sample2000_X(i)),2)+power((Prediction_Point(k,2)-Sample2000_Y(i)),2));
end
end
Thanks to the community for the help.
Your matrix Distance does not depend on k, so you can easily calculate it outside the main for-loop, for instance using:
d = sqrt((repmat(Sample2000_X, [1,M]) - repmat(Sample2000_X', [M,1])).^2 + (repmat(Sample2000_Y, [1,N]) - repmat(Sample2000_Y', [N,1])).^2);
I assume M=N, because elsewise your code won't work. Next, you can calculate your Distance_Prediction matrix. It is rather strange that you calculate this inside the for-loop over k, because the matrix will be changed in every iteration without using it. Anyway, this will do exactly the same as your code:
for k=1:S
Distance_Prediction = sqrt((Sample2000_X - Prediction_Point(k,1)).^2 + (Sample2000_Y - Prediction_Point(k,1)).^2);
end
Currently I'm using this code for MATLAB R2015b support vector machine(SVM) 10-fold cross validation.
indices = crossvalind('Kfold',output,10);
cp = classperf(binary_output);
for i = 1:10
test = (indices == i); train = ~test;
SVMModel = fitcsvm(INPUT(train,:), output(train,:),'KernelFunction','RBF',...
'KernelScale','auto');
class = predict(SVMModel, INPUT(test,:));
classperf(cp,class,test);
end
z = cp.ErrorRate;
sensitivity = cp.Sensitivity;
specificity = cp.Specificity;
I need to extract sensitivity and specificity of this binary classification. Otherwise I'm running this code in a loop.
This structure of cross validation is so slow. Any other implementation for faster execution?
An easy way is to use the multi-threading from the crossval function.
tic
%data partition
order = unique(y); % Order of the group labels
cp = cvpartition(y,'k',10); %10-folds
%prediction function
f = #(xtr,ytr,xte,yte)confusionmat(yte,...
predict(fitcsvm(xtr, ytr,'KernelFunction','RBF',...
'KernelScale','auto'),xte),'order',order);
% missclassification error
cfMat = crossval(f,INPUT,output,'partition',cp);
cfMat = reshape(sum(cfMat),2,2)
toc
With the confusion matrix you can get the sensitivity and specificity easily. For the record there is no improvement of time between your script and the crossval function provided by Matlab.
An alternative to the crossval function is to use parfor
to split the computation of each iteration between different cores.
Ps: for any parallel computation, parpool has to be run before (or that might be run by some of the functions) and it takes few seconds to set it.
Where can I find the implementation of barrier function in Matlab?
I am trying to see how the algorithm interior-point is implemented, and this is what I found in the end of fmincon.m
elseif strcmpi(OUTPUT.algorithm,interiorPoint)
defaultopt.MaxIter = 1000; defaultopt.MaxFunEvals = 3000; defaultopt.TolX = 1e-10;
defaultopt.Hessian = 'bfgs';
mEq = lin_eq + sizes.mNonlinEq + nnz(xIndices.fixed); % number of equalities
% Interior-point-specific options. Default values for lbfgs memory is 10, and
% ldl pivot threshold is 0.01
options = getIpOptions(options,sizes.nVar,mEq,flags.constr,defaultopt,10,0.01);
[X,FVAL,EXITFLAG,OUTPUT,LAMBDA,GRAD,HESSIAN] = barrier(funfcn,X,A,B,Aeq,Beq,l,u,confcn,options.HessFcn, ...
initVals.f,initVals.g,initVals.ncineq,initVals.nceq,initVals.gnc,initVals.gnceq,HESSIAN, ...
xIndices,options,optionFeedback,finDiffFlags,varargin{:});
So I want to see what's in barrier but failed.
edit barrier.m
I got:
The barrier function is defined in a p-file (precisely located in MATLABROOT/toolbox/optim/optim/barrier.p).
Unfortunately the point of p-files is exactly that they are obfuscated, i.e. you cannot read the source code. This is a recurent questio on SO, see this thread for instance.
I'm afraid you cannot read what's inside barrier. Maybe if you ask the Mathworks kindly they can give you some information on the content.
Best
There is a example for Employ early bail-out in this book (http://www.amazon.com/Accelerating-MATLAB-Performance-speed-programs/dp/1482211297) (#YairAltman). for speed improvement we can convert this code:
data = [];
newData = [];
outerIdx = 1;
while outerIdx <= 20
outerIdx = outerIdx + 1;
for innerIdx = -100 : 100
if innerIdx == 0
continue % skips to next innerIdx (=1)
elseif outerIdx > 15
break % skips to next outerIdx
else
data(end+1) = outerIdx/innerIdx;
newData(end+1) = process(data);
end
end % for innerIdx
end % while outerIdx
to this code:
function bailableProcessing()
for outerIdx = 1 : 5
middleIdx = 10
while middleIdx <= 20
middleIdx = middleIdx + 1;
for innerIdx = -100 : 100
data = outerIdx/innerIdx + middleIdx;
if data == SOME_VALUE
return
else
process(data);
end
end % for innerIdx
end % while middleIdx
end % for outerIdx
end % bailableProcessing()
How we did this conversion? Why we have different middleIdx range in new code? Where is checking for innerIdx and outerIdx in new code? what is this new data = outerIdx/innerIdx + middleIdx calculation?
we have only this information for second code :
We could place the code segment that should be bailed-out within a
dedicated function and return from the function when the bail-out
condition occurs.
I am sorry that I did not clarify within the text that the second code segment is not a direct replacement of the first. If you reread the early bail-out section (3.1.3) perhaps you can see that it has two main parts:
The first part of the section (which includes the top code segment) illustrates the basic mechanism of using break/continue in order to bail-out from a complex processing loop, in order to save processing time in computing values that are not needed.
In contrast, the second part of the section deals with cases when we wish to break out of an ancestor loop that is not the direct parent loop. I mention in the text that there are three alternatives that we can use in this case, and the second code segment that you mentioned is one of them (the other alternatives are to use dedicated flags with break/continue and to use try/catch blocks). The three code segments that I provided in this second part of the section should all be equivalent to each other, but they are NOT equivalent to the code-segment at the top of the section.
Perhaps I should have clarified this in the text, or maybe I should have used the same example throughout. I will think about this for the second edition of the book (if and when it ever appears).
I have used a variant of these code segments in other sections of the book to illustrate various other aspects of performance speedups (for example, 3.1.4 & 3.1.6) - in all these cases the code segments are NOT equivalent to each other. They are merely used to illustrate the corresponding text.
I hope you like my book in general and think that it is useful. I would be grateful if you would place a positive feedback about it on Amazon (direct link).
p.s. - #SamRoberts was correct to surmise that mention of my name would act as a "bat-signal", attracting my attention :-)
it's all far more simple than you think!
How we did this conversion?
Irrationally. Those two codes are completely different.
Why we have different middleIdx range in new code?
Randomness. The point of the author is something different.
Where is checking for innerIdx and outerIdx in new code?
dont need that, as it's not intended to be the same code.
what is this new data = outerIdx/innerIdx + middleIdx calculation?
a random calculation as well as data(end+1) = outerIdx/innerIdx; in the original code.
i suppose the author wants to illustrate something far more profoundly: that if you wrap your code that does (possibly many) loops (fors/whiles, doesnt matter) inside a function and you issue a return statement if you somehow detect that you're done, it will result in an effectively "bailable" computation, e.g. the method that does the work returns earlier than it would normally do. that is illustrated here by the condition that checks on data == SOME_VALUE; you can have your favourite bailout condition there instead :-)
moreover, the keywords [continue/break] inside the first example are meant to illustrate that you can [skip the rest of/leave] the inner-most loop from whereever you call them. in principal, you can implement a bailout using these by e.g.
bailing = false;
for outer = 1:1000
for inner = 1:1000
if <somebailingcondition>
bailing = true;
break;
else
<do stuff>
end
end
if bailing
break;
end
end
but that would be very clumsy as that "cascade" of breaks will need to be as long as you have nested loops and messes up the code.
i hope that could clarify your issues.