Vectorization of Matlab Code involving ODE solver at each iteration - performance

I want to write a fast MATLAB code where I need to write a for loop and I need to solve an ordinary differential equation each time.Is there any way to vectorize the code?
Following is the part of the code:
tspan=0:0.01:20;
dw=rand(p,1);
M0=repmat([0 0 1],p,1)';
for p=1:ns
[t,M(:,:,p)]=ode45(#(t,M) testfun(t,M,dw(p)),tspan,M0(:,p));
end
where
function dM=testfun(t,M,w1)
M_x=M(1);
M_y=M(2);
M_z=M(3);
dM=[w1*M_y;-w1*M_x+w1*M_z-2*w1*M_y;-w1*M_y-(1-M_z)];

Try this and let me know how it works.
right hand side of the ODE system:
function dM = testfun(t,M,w1)
dM = zeros(length(M), 1);
M_x = M(1:3:end, 1);
M_y = M(2:3:end, 1);
M_z = M(3:3:end, 1);
dM(1:3:end) = (w1.*M_y)';
dM(2:3:end) = (-w1.*M_x - 2*w1.*M_y + w1.*M_z)';
dM(3:3:end) = (-w1.*M_y - (1-M_z))';
end
main program:
clear all
clc
ns = input('Please tell me how many time you need to integrate the ODE system: ');
tspan = 0:0.01:20;
dw = rand(ns,1);
M0 = repmat([0; 0; 1], 1, ns);
[t, my_M] = ode45(#(t,my_M) testfun(t,my_M,dw), tspan, M0);
s = size(my_M);
for i = 1:ns
M(:, 1:s(2)/ns, i) = my_M(:, s(2)/ns*(i-1)+1:s(2)/ns*i);
end

Related

Can anyone explain how different is this hybrid PSOGA from normal GA?

Does this code have mutation, selection, and crossover, just like the original genetic algorithm.
Since this, a hybrid algorithm (i.e PSO with GA) does it use all steps of original GA or skips some
of them.Please do tell me.
I am just new to this and still trying to understand. Thank you.
%%% Hybrid GA and PSO code
function [gbest, gBestScore, all_scores] = QAP_PSO_GA(CreatePopFcn, FitnessFcn, UpdatePosition, ...
nCity, nPlant, nPopSize, nIters)
% Set algorithm parameters
constant = 0.95;
c1 = 1.5; %1.4944; %2;
c2 = 1.5; %1.4944; %2;
w = 0.792 * constant;
% Allocate memory and initialize
gBestScore = inf;
all_scores = inf * ones(nPopSize, nIters);
x = CreatePopFcn(nPopSize, nCity);
v = zeros(nPopSize, nCity);
pbest = x;
% update lbest
cost_p = inf * ones(1, nPopSize); %feval(FUN, pbest');
for i=1:nPopSize
cost_p(i) = FitnessFcn(pbest(i, 1:nPlant));
end
lbest = update_lbest(cost_p, pbest, nPopSize);
for iter = 1 : nIters
if mod(iter,1000) == 0
parents = randperm(nPopSize);
for i = 1:nPopSize
x(i,:) = (pbest(i,:) + pbest(parents(i),:))/2;
% v(i,:) = pbest(parents(i),:) - x(i,:);
% v(i,:) = (v(i,:) + v(parents(i),:))/2;
end
else
% Update velocity
v = w*v + c1*rand(nPopSize,nCity).*(pbest-x) + c2*rand(nPopSize,nCity).*(lbest-x);
% Update position
x = x + v;
x = UpdatePosition(x);
end
% Update pbest
cost_x = inf * ones(1, nPopSize);
for i=1:nPopSize
cost_x(i) = FitnessFcn(x(i, 1:nPlant));
end
s = cost_x<cost_p;
cost_p = (1-s).*cost_p + s.*cost_x;
s = repmat(s',1,nCity);
pbest = (1-s).*pbest + s.*x;
% update lbest
lbest = update_lbest(cost_p, pbest, nPopSize);
% update global best
all_scores(:, iter) = cost_x;
[cost,index] = min(cost_p);
if (cost < gBestScore)
gbest = pbest(index, :);
gBestScore = cost;
end
% draw current fitness
figure(1);
plot(iter,min(cost_x),'cp','MarkerEdgeColor','k','MarkerFaceColor','g','MarkerSize',8)
hold on
str=strcat('Best fitness: ', num2str(min(cost_x)));
disp(str);
end
end
% Function to update lbest
function lbest = update_lbest(cost_p, x, nPopSize)
sm(1, 1)= cost_p(1, nPopSize);
sm(1, 2:3)= cost_p(1, 1:2);
[cost, index] = min(sm);
if index==1
lbest(1, :) = x(nPopSize, :);
else
lbest(1, :) = x(index-1, :);
end
for i = 2:nPopSize-1
sm(1, 1:3)= cost_p(1, i-1:i+1);
[cost, index] = min(sm);
lbest(i, :) = x(i+index-2, :);
end
sm(1, 1:2)= cost_p(1, nPopSize-1:nPopSize);
sm(1, 3)= cost_p(1, 1);
[cost, index] = min(sm);
if index==3
lbest(nPopSize, :) = x(1, :);
else
lbest(nPopSize, :) = x(nPopSize-2+index, :);
end
end
If you are new to Optimization, I recommend you first to study each algorithm separately, then you may study how GA and PSO maybe combined, Although you must have basic mathematical skills in order to understand the operators of the two algorithms and in order to test the efficiency of these algorithm (this is what really matter).
This code chunk is responsible for parent selection and crossover:
parents = randperm(nPopSize);
for i = 1:nPopSize
x(i,:) = (pbest(i,:) + pbest(parents(i),:))/2;
% v(i,:) = pbest(parents(i),:) - x(i,:);
% v(i,:) = (v(i,:) + v(parents(i),:))/2;
end
Is not really obvious how selection randperm is done (I have no experience about Matlab).
And this is the code that is responsible for updating the velocity and position of each particle:
% Update velocity
v = w*v + c1*rand(nPopSize,nCity).*(pbest-x) + c2*rand(nPopSize,nCity).*(lbest-x);
% Update position
x = x + v;
x = UpdatePosition(x);
This version of velocity updating strategy is utilizing what is called Interia-Weight W, which basically mean we are preserving the velocity history of each particle (not completely recomputing it).
It worth mentioning that velocity updating is done more often than crossover (each 1000 iteration).

The levenberg-marquardt method for solving non-linear equations

I tried implement the levenberg-marquardt method for solving non-linear equations on Julia based on Numerical Optimization using the
Levenberg-Marquardt Algorithm presentation. This my code:
function get_J(ArrOfFunc,X,delta)
N = length(ArrOfFunc)
J = zeros(Float64,N,N)
for i = 1:N
for j=1:N
Temp = copy(X);
Temp[j]=Temp[j]+delta;
J[i,j] = (ArrOfFunc[i](Temp)-ArrOfFunc[i](X))/delta;
end
end
return J
end
function get_resudial(ArrOfFunc,Arg)
return map((x)->x(Arg),ArrOfFunc)
end
function lm_solve(Funcs,Init)
X = copy(Init)
delta = 0.01;
Lambda = 0.01;
Factor = 2;
J = get_J(Funcs,X,delta)
R = get_resudial(Funcs,X)
N = 5
for t = 1:N
G = J'*J+Lambda.*eye(length(X))
dC = J'*R
C = sum(R.*R)/2;
Xnew = X-(inv(G)\dC);
Rnew = get_resudial(Funcs,Xnew)
Cnew = sum(Rnew.*Rnew)/2;
if ( Cnew < C)
X = Xnew;
R = Rnew;
Lambda = Lambda/Factor;
J = get_J(Funcs,X,delta)
else
Lambda = Lambda*Factor;
end
if(maximum(abs(Rnew)) < 0.001)
return X
end
end
return X
end
function test()
ArrOfFunc = [
(X)->X[1]+X[2]-2;
(X)->X[1]-X[2]
];
X = lm_solve(ArrOfFunc,Float64[3;3])
println(X)
return X
end
But from any starting point the step not accepted. What's I doing wrong?
Any help would be appreciated.
I have at the moment no way to test this, but one line does not make sense mathematically:
In the computation of Xnew it should be either inv(G)*dC or G\dC, but not a mix of both. Preferably the second, since the solution of a linear system does not require the computation of the inverse matrix.
With this one wrong calculation at the center of the iteration, the trajectory of the computation is almost surely going astray.

Reduce the calculation time for the matlab code

To calculate an enhancement function for an input image I have written the following piece of code:
Ig = rgb2gray(imread('test.png'));
N = numel(Ig);
meanTotal = mean2(Ig);
[row,cal] = size(Ig);
IgTransformed = Ig;
n = 3;
a = 1;
b = 1;
c = 1;
k = 1;
for ii=2:row-1
for jj=2:cal-1
window = Ig(ii-1:ii+1,jj-1:jj+1);
IgTransformed(ii,jj) = ((k*meanTotal)/(std2(window) + b))*abs(Ig(ii,jj)-c*mean2(window)) + mean2(window).^a;
end
end
How can I reduce the calculation time?
Obviously, one of the factors is the small window (3x3) that should be made in the loop each time.
Here you go -
Igd = double(Ig);
std2v = colfilt(Igd, [3 3], 'sliding', #std);
mean2v = conv2(Igd,ones(3),'same')/9;
Ig_out = uint8((k*meanTotal)./(std2v + b).*abs(Igd-cal*mean2v) + mean2v.^a);
This will change the boundary elements too, which if not desired could be set back to the original ones with few additional steps, like so -
Ig_out(:,[1 end]) = Ig(:,[1 end])
Ig_out([1 end],:) = Ig([1 end],:)

replace repmat with bsxfun in MATLAB

In the following function i want to make some changes to make it fast. By itself it is fast but i have to use it many times in a for loop so it takes long. I think if i replace the repmat with bsxfun will make it faster but i am not sure. How can i do these replacements
function out = lagcal(y1,y1k,source)
kn1 = y1(:);
kt1 = y1k(:);
kt1x = repmat(kt1,1,length(kt1));
eq11 = 1./(prod(kt1x-kt1x'+eye(length(kt1))));
eq1 = eq11'*eq11;
dist = repmat(kn1,1,length(kt1))-repmat(kt1',length(kn1),1);
[fixi,fixj] = find(dist==0); dist(fixi,fixj)=eps;
mult = 1./(dist);
eq2 = prod(dist,2);
eq22 = repmat(eq2,1,length(kt1));
eq222 = eq22 .* mult;
out = eq1 .* (eq222'*source*eq222);
end
Does it really speed up my function?
Introduction and code changes
All the repmat usages used in the function code are to expand inputs to sizes so that later on the mathemtical operations involving these inputs could be performed. This is tailor-made situation for bsxfun. Sadly though the real bottleneck of the function code seems to be something else. Stay on as we discuss all the performance related aspects of the code.
Code with repmat replaced by bsxfun is presented next and the replaced codes
are kept as comments for comparison -
function out = lagcal(y1,y1k,source)
kn1 = y1(:);
kt1 = y1k(:);
%//kt1x = repmat(kt1,1,length(kt1));
%//eq11 = 1./(prod(kt1x-kt1x'+eye(length(kt1)))) %//'
eq11 = 1./prod(bsxfun(#minus,kt1,kt1.') + eye(numel(kt1))) %//'
eq1 = eq11'*eq11; %//'
%//dist = repmat(kn1,1,length(kt1))-repmat(kt1',length(kn1),1) %//'
dist = bsxfun(#minus,kn1,kt1.') %//'
[fixi,fixj] = find(dist==0);
dist(fixi,fixj)=eps;
mult = 1./(dist);
eq2 = prod(dist,2);
%//eq22 = repmat(eq2,1,length(kt1));
%//eq222 = eq22 .* mult
eq222 = bsxfun(#times,eq2,mult)
out = eq1 .* (eq222'*source*eq222); %//'
return; %// Better this way to end a function
One more modification could be added here. In the last line, we could do
something like as shown below, but the timing results don't show a huge benefit
with it -
out = bsxfun(#times,eq11.',bsxfun(#times,eq11,eq222'*source*eq222))
This would avoid the calculation of eq1 done earlier in the original code, so you would save little more time that way.
Benchmarking
Benchmarking on the bsxfun modified portions of the code versus the original
repmat based codes is discussed next.
Benchmarking Code
N_arr = [50 100 200 500 1000 2000 3000]; %// array elements for N (datasize)
blocks = 3;
timeall = zeros(2,numel(N_arr),blocks);
for k1 = 1:numel(N_arr)
N = N_arr(k1);
y1 = rand(N,1);
y1k = rand(N,1);
source = rand(N);
kn1 = y1(:);
kt1 = y1k(:);
%% Block 1 ----------------
block = 1;
f = #() block1_org(kt1);
timeall(1,k1,block) = timeit(f);
clear f
f = #() block1_mod(kt1);
timeall(2,k1,block) = timeit(f);
eq11 = feval(f);
clear f
%% Block 1 ----------------
eq1 = eq11'*eq11; %//'
%% Block 2 ----------------
block = 2;
f = #() block2_org(kn1,kt1);
timeall(1,k1,block) = timeit(f);
clear f
f = #() block2_mod(kn1,kt1);
timeall(2,k1,block) = timeit(f);
dist = feval(f);
clear f
%% Block 2 ----------------
[fixi,fixj] = find(dist==0);
dist(fixi,fixj)=eps;
mult = 1./(dist);
eq2 = prod(dist,2);
%% Block 3 ----------------
block = 3;
f = #() block3_org(eq2,mult,length(kt1));
timeall(1,k1,block) = timeit(f);
clear f
f = #() block3_mod(eq2,mult);
timeall(2,k1,block) = timeit(f);
clear f
%% Block 3 ----------------
end
%// Display benchmark results
figure,
for k2 = 1:blocks
subplot(blocks,1,k2),
title(strcat('Block',num2str(k2),' results :'),'fontweight','bold'),hold on
plot(N_arr,timeall(1,:,k2),'-ro')
plot(N_arr,timeall(2,:,k2),'-kx')
legend('REPMAT Method','BSXFUN Method')
xlabel('Datasize (N) ->'),ylabel('Time(sec) ->')
end
Associated functions
function out = block1_org(kt1)
kt1x = repmat(kt1,1,length(kt1));
out = 1./(prod(kt1x-kt1x'+eye(length(kt1))));
return;
function out = block1_mod(kt1)
out = 1./prod(bsxfun(#minus,kt1,kt1.') + eye(numel(kt1)));
return;
function out = block2_org(kn1,kt1)
out = repmat(kn1,1,length(kt1))-repmat(kt1',length(kn1),1);
return;
function out = block2_mod(kn1,kt1)
out = bsxfun(#minus,kn1,kt1.');
return;
function out = block3_org(eq2,mult,length_kt1)
eq22 = repmat(eq2,1,length_kt1);
out = eq22 .* mult;
return;
function out = block3_mod(eq2,mult)
out = bsxfun(#times,eq2,mult);
return;
Results
Conclusions
bsxfun based codes show around 2x speedups over repmat based ones which is encouraging. But a profiling of the original code across a varying datasize show the multiple matrix multiplications in the final line seem to be occupying most of the runtime for the function code, which are supposedly very efficient within MATLAB. Unless you have some way to avoid those multiplications by using some other mathematical technique, they look like the bottleneck.

Simple Speed-up of code

It is known that MATLAB works slow with for loop. I have tried to vectorize the following code without success. Perhaps I am wrong with the implementation.
for I = NS2:-1:1
A = 0;
for J=1:8
A = A + KS2(J,I)*FA(J);
end
S2 = S2 + ( SS2(1,I)*sin(A) + SS2(2,I)*cos(A) );
end
where:
FA = matrix 1x8
KS2 = matrix 8x25
SS2 = matrix 2x25
A = scalar
S2 = scalar
I try to improve it in this way:
A = 0;
J = 1:8;
for I = NS2:-1:1
A = FA(1,J)*KS2(J,I);
S2 = S2 + ( SS2(1,I)*sin(A) + SS2(2,I)*cos(A) );
end
However, the runtime for this improvement is similar to the original code.
Try this instead (no loops):
A = (FA*KS2).'; %'# A is now 25-by-1
S2 = SS2(1,:)*sin(A) + SS2(2,:)*cos(A);

Resources