Speed up a for loop with matrix multiplication? - performance

one of the part of my programme contains this piece of code:
size2=2500;
gran=3;
A=ones(size2,size2);
for k=1:gran:(size2-gran)
for j=1:gran:(size2-gran)
X=rand*2*pi-pi;
for h=1:gran
for l=1:gran
A(k+l-1,j+h-1) = A(k+l-1,j+h-1) *exp(+1i*X); %phase in the square gran x gran
end
end
end
end
My pc runs this code in 0.60 seconds but I would like to know if it is possible to speed up this process.
A faster way would be to write this as a matrix multiplication but in order to write X I think I have to create a for loop.
Is there any way to improve the speed of this code?

your for loop needs to be replaced by a matrix of random phases that has a 2x2 repeat. can create the random variable as a matrix of size A
X = rand(size2/2)*2*pi-pi;
X = kron(X,ones(2));
then
A = A.*X;

Related

Optimize/ Vectorize Mahalanobis distance calculations in MATLAB

I have the following piece of Matlab code, which calculates Mahalanobis distances between a vector and a matrix with several iterations. I am trying to find a faster method to do this by vectorization but without success.
S.data=0+(20-0).*rand(15000,3);
S.a=0+(20-0).*rand(2500,3);
S.resultat=ones(length(S.data),length(S.a))*nan;
S.b=ones(length(S.a),3,length(S.a))*nan;
for i=1:length(S.data)
for j=1:length(S.a)
S.a2=S.a;
S.a2(j,:)=S.data(i,:);
S.b(:,:,j)=S.a2;
if j==length(S.a)
for k=1:length(S.a);
S.resultat(i,k)=mahal(S.a(k,:),S.b(:,:,k));
end
end
end
end
I have now modified the code and avoid one of the loop. But it is still very long. If someone have an idea, I will be very greatful!
S.data=0+(20-0).*rand(15000,3);
S.a=0+(20-0).*rand(2500,3);
S.resultat=ones(length(S.data),length(S.a))*nan;
for i=1:length(S.data)
for j=1:length(S.a)
S.a2=S.a;
S.a2(j,:)=S.data(i,:);
S.resultat(i,j)=mahal(S.a(j,:),S.a2);
end
end
Introduction and solution code
You can replace the innermost loop that uses mahal with something that is a bit vectorized, as it uses some pre-calculated values (with the help of bsxfun) inside a loop-shortened and hacked version of mahal.
Basically you have a 2D array, let's call it A for easy reference and a 3D array, let's call it B. Let the output be stored be into a variable out. So, the innermost code snippet could be extracted and based on the assumed variable names.
Original loopy code
for k=1:size(A,1)
out(k)=mahal(A(k,:),B(:,:,k));
end
So, what I did was to hack into mahal.m and look for portions that could be vectorized when the inputs are 2D and 3D. Now, mahal uses qr inside it, which could not be vectorized. Thus, we end up with a hacked code.
Hacked code
%// Pre-calculate certain values that could be avoided than using into loop
meanB = mean(B,1); %// mean of B along dim-1
B_meanB = bsxfun(#minus,B,meanB); %// B minus mean values of B
A_B_meanB = A' - reshape(meanB,size(B,2),[]); %//'# A minus B_meanB
%// QR calculations in a for-loop starts until the output is obtained
for k = 1:size(A,1)
[~,R] = qr(B_meanB(:,:,k),0);
out2(k) = sum((R'\A_B_meanB(:,k)).^2)*(size(A,1)-1);
end
Now, to extend this hack solution to the problem code, one can introduce few more tweaks to pre-calculate more values being used those nested loops.
Final solution code
A = S.a; %// Get data from S
[rx,cx] = size(A); %// Get size parameters
Atr = A'; %//'# Pre-calculate transpose of A
%// Pre-calculate replicated B and the indices to be modified at each iteration
B_rep = repmat(S.a,1,1,rx);
B_idx = bsxfun(#plus,[(0:cx-1)*rx + 1]',[0:rx-1]*(rx*cx+1)); %//'
out = zeros(size(S.data,1),rx); %// initialize output array
for i=1:length(S.data)
B = B_rep;
B(B_idx) = repmat(S.data(i,:)',1,rx); %//'
meanB = mean(B,1); %// mean of B along dim-1
B_meanB = bsxfun(#minus,B,meanB); %// B minus mean values of B
A_B_meanB = Atr - reshape(meanB,3,[]); %// A minus B_meanB
for jj = 1:rx
[~,R] = qr(B_meanB(:,:,jj),0);
out(i,jj) = sum((R'\A_B_meanB(:,jj)).^2)*(rx-1); %//'
end
end
S.resultat = out;
Benchmarking
Here's the benchmarking code to compare the proposed solution against the code listed in the problem -
%// Random inputs
S.data=0+(20-0).*rand(1500,3); %(size 10x reduced for a quicker runtime test)
S.a=0+(20-0).*rand(250,3);
S.resultat=ones(length(S.data),length(S.a))*nan;
disp('----------------------------- With original code')
tic
S.b=ones(length(S.a),3,length(S.a))*nan;
for i=1:length(S.data)
for j=1:length(S.a)
S.a2=S.a;
S.a2(j,:)=S.data(i,:);
S.b(:,:,j)=S.a2;
if j==length(S.a)
for k=1:length(S.a);
S.resultat(i,k)=mahal(S.a(k,:),S.b(:,:,k));
end
end
end
end
toc, clear i j S.a2 k S.resultat
S.resultat=ones(length(S.data),length(S.a))*nan;
disp('----------------------------- With proposed solution code')
tic
[ ... Proposed solution code ...]
toc
Runtimes -
----------------------------- With original code
Elapsed time is 17.734394 seconds.
----------------------------- With proposed solution code
Elapsed time is 6.602860 seconds.
Thus, we might get around 2.7x speedup with the proposed approach and some tweaks!

How to improve the Matlab code for faster processing time?

I developed some code for my masters project that will simulate 90 years daily data using 1000 different data sets. The code is working fine and gives the correct output that i wanted but the processing time are very high. It took about 8 hours to finish the simulation. Here is the code that i used:
tic
%% importing the csv file with selected column
files=dir('*_scen_*.csv');
for i=1:length(files);
LHR=importcsv(files(i).name);
%% Definable variables
% Define These Value
TAW=-216; %total available water
RAW=-129; %readily available water
KC=1.0; %crop coefficient
IRL=15; %intense rain level
RC=(80/100); %percentage of recharge
RO=(1-RC); %percentage of runoff
% The very first row of Soil Moisture Deficit
for j=1
SMD(j,i)=(LHR.RAIN(j)-LHR.PET(j));
if SMD(j,i)>0;
SMD(j,i)=0;
elseif SMD(j,i)<RAW;
SMD(j,i)=(LHR.RAIN(j)-(LHR.PET(j)*((TAW-SMD(j-1))/(TAW-RAW))));
end
end
%for the following SMD Calculation
for k=2:(length(LHR.RAIN));
SMD(k,i)=SMD(k-1,i)+(LHR.RAIN(k)-LHR.PET(k));
% The SMD conditions
if SMD(k,i)>0;
SMD(k,i)=0;
elseif SMD(k,i)<RAW;
SMD(k,i)=SMD(k-1,i)+(LHR.RAIN(k)-(LHR.PET(k)*((TAW-SMD(k-1,i))/(TAW-RAW))));
end
end
%Convert negative SMD to Positive
SMD=abs(SMD);
%%Evapotranspiration Calculation
for l=1:(length(SMD));
if SMD(l,i)<abs(RAW);
AET(l,i)=LHR.PET(l);
elseif SMD(l,i)>abs(RAW);
AET(l,i)=KC*LHR.PET(l)*((abs(TAW)-(SMD(l,i)))/(abs(TAW)-abs(RAW)));
end
end
for m=2:(length(SMD));
if SMD(m,i)<abs(RAW);
AET(m,i)=LHR.PET(m);
elseif SMD(m,i)>abs(RAW);
AET(m,i)=KC*LHR.PET(m)*((abs(TAW)-(SMD(m-1,i)))/(abs(TAW)-abs(RAW)));
end
end
%% HER calculation
for n=1:length(SMD);
if SMD(n,i)<(LHR.RAIN(n)-AET(n,i));
HER(n,i)=(LHR.RAIN(n)-AET(n,i)-SMD(n,i));
elseif SMD(n,i)>(LHR.RAIN(n)-AET(n,i));
HER(n,i)=0;
end
end
%% Calculation of recharge anf runoff
for o=1:(length(HER));
if (HER(o,i)+(abs(TAW)-SMD(o,i)))<abs(TAW);
RUNOFF(o,i)=0;
elseif (HER(o,i)+(abs(TAW)-SMD(o,i)))>abs(TAW);
if HER(o,i)>IRL;
RUNOFF(o,i)=RO*HER(o,i);
elseif HER(o,i)<IRL;
RUNOFF(o,i)=0;
end
end
if (HER(o,i)+(abs(TAW)-SMD(o,i)))<abs(TAW);
RECHARGE(o,i)=0;
elseif (HER(o,i)+(abs(TAW)-SMD(o,i)))>abs(TAW);
if HER(o,i)>IRL;
RECHARGE(o,i)=RC*HER(o,i);
elseif HER(o,i)<IRL;
RECHARGE(o,i)=HER(o,i);
end
end
end
%% rainfall
for p=1:length(LHR.RAIN);
RAINFALL(p,i)=LHR.RAIN(p);
PET(p,i)=LHR.PET(p);
end
end
clear i
clear j
clear k
clear l
clear m
clear n
clear o
clear p
toc
Is there any improvement scope for this code that might reduce the processing time? Sorry if the code looks unprofessional because i am in the beginner stage for MATLAB programming.
If there is one thing Matlab is good at, it's matrix and vector computations. With your loop-like code, especially for big datasets, you are completely missing this advantage.
I didn't look into the details, but it seems like all your loops do element-wise computations and logical operations. You could replace these by matrix calculations.
For example, let's consider your first loop:
for k=2:(length(LHR.RAIN));
SMD(k,i)=SMD(k-1,i)+(LHR.RAIN(k)-LHR.PET(k));
% …
end
Could be replaced by something like (untested) SMD(2:end,i)=SMD(1:end-1,i)+LHR.RAIN(1:end-1)-LHR.PET(1:end-1)
And logical operations like this:
for k=2:(length(LHR.RAIN));
% ...
% The SMD conditions
if SMD(k,i)>0;
SMD(k,i)=0;
% ...
end
Can be replaced by this:
SMD(SMD>0)=0;
Etc.
One thing that speed up (a lot!) scripts is to declare your matrices prior to entering a loop. For example, for your SMD, AET, RECHARGE, etc matrix, you should use something like
SMD=NaN(nrow,ncol);
where nrow and ncol is the size of the final matrix (if known of course).
Then do your loop.

Why is my Matlab for-loop code faster than my vectorized version

I had always heard that vectorized code runs faster than for loops in MATLAB. However, when I tried vectorizing my MATLAB code it seemed to run slower.
I used tic and toc to measure the times. I changed only the implementation of a single function in my program. My vectorized version ran in 47.228801 seconds and my for-loop version ran in 16.962089 seconds.
Also in my main program I used a large number for N, N = 1000000and DataSet's size is 1 301, and I ran each version several times for different data sets with the same size and N.
Why is the vectorized so much slower and how can I improve the speed further?
The "vectorized" version
function [RNGSet] = RNGAnal(N,DataSet)
%Creates a random number generated set of numbers to check accuracy overall
% This function will produce random numbers and normalize a new Data set
% that is derived from an old data set by multiply random numbers and
% then dividing by N/2
randData = randint(N,length(DataSet));
tempData = repmat(DataSet,N,1);
RNGSet = randData .* tempData;
RNGSet = sum(RNGSet,1) / (N/2); % sum and normalize by the N
end
The "for-loop" version
function [RNGData] = RNGAnsys(N,Data)
%RNGAnsys This function produces statistical RNG data using a for loop
% This function will produce RNGData that will be used to plot on another
% plot that possesses the actual data
multData = zeros(N,length(Data));
for i = 1:length(Data)
photAbs = randint(N,1); % Create N number of random 0's or 1's
multData(:,i) = Data(i) * photAbs; % multiply each element in the molar data by the random numbers
end
sumData = sum(multData,1); % sum each individual energy level's data point
RNGData = (sumData/(N/2))'; % divide by n, but account for 0.5 average by n/2
end
Vectorization
First glance at the for-loop code tells us that since photAbs is a binary array each column of which is scaled according to each element of Data, this binary feature could be used for vectorization. This is abused in the code here -
function RNGData = RNGAnsys_vect1(N,Data)
%// Get the 2D Matrix of random ones and zeros
photAbsAll = randint(N,numel(Data));
%// Take care of multData internally by summing along the columns of the
%// binary 2D matrix and then multiply each element of it with each scalar
%// taken from Data by performing elementwise multiplication
sumData = Data.*sum(photAbsAll,1);
%// Divide by n, but account for 0.5 average by n/2
RNGData = (sumData./(N/2))'; %//'
return;
After profiling, it appears that the bottleneck is the random binary array creating part. So, using a faster random binary array creator as suggested in this smart solution, the above function could be further optimized like so -
function RNGData = RNGAnsys_vect2(N,Data)
%// Create a random binary array and sum along the columns on the fly to
%// save on any variable space that would be required otherwise.
%// Also perform the elementwise multiplication as discussed before.
sumData = Data.*sum(rand(N,numel(Data))<0.5,1);
%// Divide by n, but account for 0.5 average by n/2
RNGData = (sumData./(N/2))'; %//'
return;
Using the smart binary random array creator, the original code could be optimized as well, that will be used for a fair benchmarking between optimized for-loop and vectorized codes later on. The optimized for-loop code is listed here -
function RNGData = RNGAnsys_opt1(N,Data)
multData = zeros(N,numel(Data));
for i = 1:numel(Data)
%// Create N number of random 0's or 1's using a smart approach
%// Then, multiply each element in the molar data by the random numbers
multData(:,i) = Data(i) * rand(N,1)<.5;
end
sumData = sum(multData,1); % sum each individual energy level's data point
RNGData = (sumData/(N/2))'; % divide by n, but account for 0.5 average by n/2
return;
Benchmarking
Benchmarking Code
N = 15000; %// Kept at this value as it going out of memory with higher N's.
%// Size of dataset is more important anyway as that decides how
%// well is vectorized code against a for-loop code
DS_arr = [50 100 200 500 800 1500 5000]; %// Dataset sizes
timeall = zeros(2,numel(DS_arr));
for k1 = 1:numel(DS_arr)
DS = DS_arr(k1);
Data = rand(1,DS);
f = #() RNGAnsys_opt1(N,Data);%// Optimized for-loop code
timeall(1,k1) = timeit(f);
clear f
f = #() RNGAnsys_vect2(N,Data);%// Vectorized Code
timeall(2,k1) = timeit(f);
clear f
end
%// Display benchmark results
figure,hold on, grid on
plot(DS_arr,timeall(1,:),'-ro')
plot(DS_arr,timeall(2,:),'-kx')
legend('Optimized for-loop code','Vectorized code')
xlabel('Dataset size ->'),ylabel('Time(sec) ->')
avg_speedup = mean(timeall(1,:)./timeall(2,:))
title(['Average Speedup with vectorized code = ' num2str(avg_speedup) 'x'])
Results
Concluding remarks
Based on the experience I had so far with MATLAB, neither for loops nor vectorized techniques are fit for all situations, but everything is situation-specific.
Try using the matlab profiler to determine which line or lines of code are using the most amount of time. That way you can find out if the repmat function is what is slowing you down as is being suggested. Let us know what you find, I'm interested!
randData = randint(N,length(DataSet));
allocates a 1.2GB array. (4*301*1000000). Implicitly you create up to 4 of these monsters in your program, causing continuous cache-misses.
You for-loop code could nearly run in the processor cache (or it does on the bigger xeons).

How to reduce for-loops in this code?

I am doing calculations that involves too many for-loops. I would appreciate any idea that could eliminates some of the loops to make the algorithm more efficient. Here is the mathematical expression I want to get: A discrete distribution of random variable Y.
Pr(Y=y )=
∑_Pr(Z=z) ∙∑_Pr((X=x) ∑_Pr(W=w) ∙∑_Pr(R=r│W=w) ∙Pr(S=z+y-x-r|W=w)
Y,Z,X,W,R,S are discrete random variable, they are dependent. I know the expression for each term, but there are just probability calculations – not close-form distributions.
array Y[max_Y+1]; % store the distribution of Y
temp1=0, temp2=0, temp3=0, temp4=0; % summation for partial distributions
for y = 0 max_Y
temp1=0;
for z = 0 : 5- y
temp2=0;
for x=0:5
temp3=0;
for w=0:5
temp4=0
for r=0:w
temp4=temp4+Pr(R=r│W=w)∙Pr(S=z+y-x-r|W=w);
end
temp3=temp3+temp4*Pr(W=w);
end
temp2= temp2+temp3*Pr(X=x);
end
temp1=temp1+temp2* P(Z=z);
end
Y[y]=temp1;
end
Thanks a lot!
Ester
From what I notice in every iteration only the term Pr(S=z+y-x-r|W=w) & Pr(Z=z) is dependent on your function input variable Y so all other value can be precomputed using separate for-loops and then just compute Pr(S=z+y-x-r|W=w)*Pr(Z=z)*precomputed

Matlab - if exists a faster way to assign values to big matrix?

I am a new student learning to use Matlab.
Could anyone please tell me is there a faster way possibly without loops:
to assign for each row only two values 1, -1 into different positions of a big sparse matrix.
My code to build a bimatrix or bibimatrix for the MILP problem of condition :
f^k_{ij} <= y_{ij} for every arc (i,j) and all k ~=r; in a multi-commodity flow model.
Naive approach:
bimatrix=[];
% create each row and then add to bimatrix
newrow4= zeros(1,n*(n+1)^2);
for k=1:n
for i=0:n
for j=1: n
if j~=i
%change value of some positions to -1 and 1
newrow4(i*n^2+(j-1)*n+k)=1;
newrow4((n+1)*n^2+i*n+j)=-1;
% add to bimatrix
bimatrix=[bimatrix; newrow4];
% change newrow4 back to zeros row.
newrow4(i*n^2+(j-1)*n+k)=0;
newrow4((n+1)*n^2+i*n+j)=0;
end
end
end
end
OR:
% Generate the big sparse matrix first.
bibimatrix=zeros(n^3 ,n*(n+1)^2);
t=1;
for k=1:n
for i=0:n
for j=1: n
if j~=i
%Change 2 positions in each row to -1 and 1 in each row.
bibimatrix(t,i*n^2+(j-1)*n+k)=1;
bibimatrix(t,(n+1)*n^2+i*n+j)=-1;
t=t+1
end
end
end
end
With these above code in Matlab, the time to generate this matrix, with n~12, is more than 3s. I need to generate a larger matrix in less time.
Thank you.
Suggestion: Use sparse matrices.
You should be able to create two vectors containing the column number where you want your +1 and -1 in each row. Let's call these two vectors vec_1 and vec_2. You should be able to do this without loops (if not, I still think the procedure below will be faster).
Let the size of your matrix be (max_row X max_col). Then you can create your matrix like this:
bibimatrix = sparse(1:max_row,vec_1,1,max_row,max_col);
bibimatrix = bibimatrix + sparse(1:max_row, vec_2,-1,max_row,max_col)
If you want to see the entire matrix (which you don't, since it's huge) you can write: full(bibimatrix).
EDIT:
You may also do it this way:
col_vec = [vec_1, vec_2];
row_vec = [1:max_row, 1:max_row];
s = [ones(1,max_row), -1*ones(1,max_row)];
bibimatrix = sparse(row_vec, col_vec, s, max_row, max_col)
Disclaimer: I don't have MATLAB available, so it might not be error-free.

Resources