Performance of updating/inserting into a sparse matrix in Matlab?

Performance of updating/inserting into a sparse matrix in Matlab? - performance

i have written a fairly large class for the calculation of measurement uncertainties, but it is painfully slow. Profiling the code shows that the slowest operation, by far, is to insert the computation results into a large sparse matrix. About 97% of all time is spent on that operation. The matrix keeps the uncertainties of all measurement data, and I cannot change the data structures without breaking a lot of other code. So my only option is to optimize the data insertion step. This is done about 5700 times in my benchmark, and every time the amout of data increases.
First solution, extremely slow:
% this automatically sums up duplicate yInd entries
[zInd_grid, yInd_grid] = ndgrid(1:numel(z), yInd(:));
Uzw = sparse(zInd_grid(:), yInd_grid(:), Uzy(:), numel(z), numel(obj.w));
% this automatically sums up duplicate yInd entries
dz_dw = sparse(zInd_grid(:), yInd_grid(:), dz_dy(:), numel(z), numel(obj.w));
obj.w = [obj.w; z(:)]; % insert new measurement results into the column vector obj.w
obj.Uww = [obj.Uww, transpose(Uzw); Uzw, Uzz]; % insert new uncertainties of the measurement results
obj.dw_dw = [obj.dw_dw, transpose(dz_dw); dz_dw, dz_dz]; % insert "dependencies" of new measurements on old results
The line obj.Uww = [obj.Uww, transpose(Uzw); Uzw, Uzz]; is the slowest, by far. Perhaps it is slow because Matlab needs to allocate a new, larger buffer for obj.Uww and copy everything over. Thus I changed the code to the following:
% Preallocation in the class constructor
obj.w = spalloc(nnz_w, 1, nnz_w);
obj.Uww = spalloc(nnz_w, nnz_w, nnz_Uww);
obj.dw_dw = spalloc(nnz_w, nnz_w, nnz_dw_dw);
obj.num_w = 0; % to manually keep track of the "real" size of obj.w, obj.Uww and obj.dw_dw
The class constructor is called with the sizes the three properties w, Uww and dw_dw will have at the end of the computation (nzz_w is approximately 0.1 million, nzz_Uww is at 9 million and nzz_dw_dw is about 1.6 million). Thus, no new allocation of memory should be needed. This is the inserting step now:
% this automatically sums up duplicate yInd entries
[zInd_grid, yInd_grid] = ndgrid(1:numel(z), yInd(:));
Uzw = sparse(zInd_grid(:), yInd_grid(:), Uzy(:), numel(z), obj.num_w);
% this automatically sums up duplicate yInd entries
dz_dw = sparse(zInd_grid(:), yInd_grid(:), dz_dy(:), numel(z), obj.num_w);
wInd = 1:obj.num_w;
obj.w(zInd, 1) = z(:); % insert new measurement results
obj.num_w = zInd(end); % new "real" size of w, Uww and dw_dw
obj.Uww(zInd, wInd) = Uzw; % about 51% of all computation time
obj.Uww(wInd, zInd) = transpose(Uzw); % about 15% of all computation time
obj.Uww(zInd, zInd) = Uzz; % about 14.4% of all computation time
obj.dw_dw(zInd, wInd) = dz_dw; % about 13% of all computation time
obj.dw_dw(wInd, zInd) = transpose(dz_dw); % about 3.5% of all computation time
obj.dw_dw(zInd, zInd) = dz_dz; % less than 3.5% of all computation time
Still, these lines account for 97% of all computation time, and no speed improvement. Thus I tried version three:
obj.w = [obj.w; z(:)];
[zInd_zy, yInd_zy] = ndgrid(zInd, yInd(:));
[zzInd_i, zzInd_j] = ndgrid(zInd, zInd);
[Uww_i, Uww_j, Uww_v] = find(obj.Uww); % 14% of all computation time
Uww_new = sparse( ... % this statement takes 66% of all computation time
[Uww_i; zInd_zy(:); yInd_zy(:); zzInd_i(:)], ...
[Uww_j; yInd_zy(:); zInd_zy(:); zzInd_j(:)], ...
[Uww_v; Uzy(:); Uzy(:); Uzz(:)], ...
numel(obj.w), numel(obj.w));
[dw_dw_i, dw_dw_j, dw_dw_v] = find(obj.dw_dw);
dw_dw_new = sparse( ... % 14% of all computation time
[dw_dw_i; zInd_zy(:); yInd_zy(:); zzInd_i(:)], ...
[dw_dw_j; yInd_zy(:); zInd_zy(:); zzInd_j(:)], ...
[dw_dw_v; dz_dy(:); dz_dy(:); dz_dz(:)], ...
numel(obj.w), numel(obj.w));
obj.Uww = Uww_new;
obj.dw_dw = dw_dw_new;
which is even slower thant the two other versions. Why is inserting into an already preallocated array so slow? And how can I speed it up?
(All the matrices are symmetric, but I did not try to exploit that yet.)

I don't understand the details of your update pattern, but keep in mind that Matlab stores sparse matrices internally in compressed-sparse column format. So adding entries in sequence column-by-column is significantly faster than other orders. E.g., on my old version of Matlab (R2006a), this:
n=10000;
nz=400000;
v=floor(n*rand(nz,3))+1;
fprintf('Random\n');
A=sparse(n, n);
tic
for k=1:nz
A(v(k,1), v(k,2))=v(k,3);
end
toc
fprintf('Row-wise\n');
v=sortrows(v);
A=sparse(n, n);
tic
for k=1:nz
A(v(k,1), v(k,2))=v(k,3);
end
toc
fprintf('Column-wise\n');
v=sortrows(v, [2 1]);
A=sparse(n, n);
tic
for k=1:nz
A(v(k,1), v(k,2))=v(k,3);
end
toc
gives this:
>> sparsetest
Random
Elapsed time is 19.276089 seconds.
Row-wise
Elapsed time is 20.714324 seconds.
Column-wise
Elapsed time is 1.498150 seconds.
Likely best of all would be if you can somehow just collect the nonzeros in a form suitable for spconvert or sparse and then make the whole sparse matrix at the end, but I gather that you might not be able to do that.

#bg2b Pointed out how adding data column-wise is much faster.
It turns out adding rows to a sparse matrix or sparse vector is painfully slow (more precisely, to the lower triangular part). Thus, I now store only the upper triangular part of the sparse matrix, because that is fast. When I need data from the matrix, I recreate it from the upper triangular sparse matrix. See the end of the benchmarking script for this.
This is my benchmarking script. It nicely shows the exponential increase in computation time for adding data.
% Benchmark the extension of sparse matrices of the form
% Uww_new = [ Uww, Uwz; ...
% Uzw, Uzz];
% where Uzw = transpose(Uwz). Uww and Uzz are always square and symmetric.
close all;
clearvars;
rng(70101557, 'twister'); % the seed is the number of the stack overflow question
density = 0.25;
nZ = 10;
iterations = 5e2;
n = nZ * iterations;
nonzeros = n*n*density;
all_Uzz = cell(iterations, 1);
all_Uwz = cell(iterations, 1);
for k = 1:iterations
% Uzz must be symmetric!
Uzz_nonsymmetric = sprand(nZ, nZ, density);
all_Uzz{k} = (Uzz_nonsymmetric + transpose(Uzz_nonsymmetric))./2;
all_Uwz{k} = sprand((k-1)*nZ, nZ, density);
end
f = figure();
ax = axes(f);
hold(ax, 'on');
grid(ax, 'on');
xlabel(ax, 'Iteration');
ylabel(ax, 'Elapsed time in seconds.');
h = gobjects(1, 0);
name = 'Optimised.';
fprintf('%s\n', name);
Uww_optimised = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
Uzz = all_Uzz{k};
Uwz = all_Uwz{k};
zInd = size(Uww_optimised, 1) + (1:size(Uzz, 1));
wInd = 1:size(Uww_optimised, 1);
Uzz_triu = triu(Uzz);
Uww_optimised(wInd, zInd) = Uwz; % add columns
Uww_optimised(zInd, zInd) = Uzz_triu; % add rows and columns
elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
name = 'Only Uwz and Uzz.';
fprintf('%s\n', name);
Uww_wz_zz = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
Uzz = all_Uzz{k};
Uwz = all_Uwz{k};
zInd = size(Uww_wz_zz, 1) + (1:size(Uzz, 1));
wInd = 1:size(Uww_wz_zz, 1);
Uzw = transpose(Uwz);
Uww_wz_zz(wInd, zInd) = Uwz; % add columns
% Uww_wz_zz(zInd, wInd) = Uzw; % add rows
Uww_wz_zz(zInd, zInd) = Uzz; % add rows and columns
elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
name = 'Only Uzw and Uzz.';
fprintf('%s\n', name);
Uww_zw_zz = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
Uzz = all_Uzz{k};
Uwz = all_Uwz{k};
zInd = size(Uww_zw_zz, 1) + (1:size(Uzz, 1));
wInd = 1:size(Uww_zw_zz, 1);
Uzw = transpose(Uwz);
% Uww_zw_zz(wInd, zInd) = Uwz;
Uww_zw_zz(zInd, wInd) = Uzw;
Uww_zw_zz(zInd, zInd) = Uzz;
elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
name = 'Uzw, Uwz and Uzz.';
fprintf('%s\n', name);
Uww = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
Uzz = all_Uzz{k};
Uwz = all_Uwz{k};
zInd = size(Uww, 1) + (1:size(Uzz, 1));
wInd = 1:size(Uww, 1);
Uzw = transpose(Uwz);
Uww(wInd, zInd) = Uwz;
Uww(zInd, wInd) = Uzw;
Uww(zInd, zInd) = Uzz;
elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
leg = legend(ax, h, 'Location', 'northwest');
assert(issymmetric(Uww));
assert(istriu(Uww_optimised));
assert(isequal(Uww, Uww_optimised + transpose(triu(Uww_optimised, 1))));
%% Get Uyy from Uww_optimised. Uyy is a symmetric subset of Uww
yInd = randi(size(Uww_optimised, 1), 1, nZ); % indices to extract
[yIndRowInds, yIndColInds] = ndgrid(yInd, yInd);
indsToFlip = yIndRowInds > yIndColInds;
temp = yIndColInds(indsToFlip);
yIndColInds(indsToFlip) = yIndRowInds(indsToFlip);
yIndRowInds(indsToFlip) = temp;
linInd = sub2ind(size(Uww_optimised), yIndRowInds, yIndColInds);
assert(issymmetric(linInd));
Uyy = Uww_optimised(linInd);
assert(issymmetric(Uyy));

Related

Is there a faster alternative to find() function in MATLAB?

I'm running a kinetic Monte Carlo simulation code wherein I have a large sparse array of which I first calculate cumsum() and then find the first element greater than or equal to a given value using find().
vecIndex = find(cumsum(R) >= threshold, 1);
Since I'm calling the function a large number of times, I'd like to speed up my code. Is there a faster way to carry out this operation?
the complete function:
function Tr = select_transition(Fr,Rt,R)
N_dep = (1/(Rt+1))*Fr; %N flux-rate
Ga_dep = (1-(1/(Rt+1)))*Fr; %Ga flux-rate
Tr = zeros(4,1);
RVec = R(:, :, :, 3);
RVec = RVec(:);
sumR = Fr + sum(RVec); %Sum of the rates of all possible transitions
format long
sumRx = rand * sumR; %for randomly selecting one to the transitions
%disp(sumRx);
if sumRx <= Fr %adatom addition
Tr(1) = 0;
if sumRx <= Ga_dep
Tr(2) = 10; %Ga deposition
elseif sumRx > Ga_dep
Tr (2) = -10; %N deposition
end
else
Tr(1) = 1; %adatom hopping
vecIndex = find(cumsum(RVec) >= sumRx - Fr, 1);
[Tr(2), Tr(3), Tr(4)] = ind2sub(size(R(:, :, :, 3)), vecIndex); %determines specific hopping transition
end
end

If Rvec is sparse it is more efficient to extract its nonzero values and the corresponding indexes and apply cumsum on those values.
Tr(1) = 1;
[r,c,v] = find(RVec); % extract nonzeros
cum = cumsum(v);
f = find(cum >= sumRx - Fr, 1);
Tr(2) = r(f);
sz = size(R);
[Tr(3), Tr(4)] = ind2sub(sz(2:3), c(f));

Can anyone explain how different is this hybrid PSOGA from normal GA?

Does this code have mutation, selection, and crossover, just like the original genetic algorithm.
Since this, a hybrid algorithm (i.e PSO with GA) does it use all steps of original GA or skips some
of them.Please do tell me.
I am just new to this and still trying to understand. Thank you.
%%% Hybrid GA and PSO code
function [gbest, gBestScore, all_scores] = QAP_PSO_GA(CreatePopFcn, FitnessFcn, UpdatePosition, ...
nCity, nPlant, nPopSize, nIters)
% Set algorithm parameters
constant = 0.95;
c1 = 1.5; %1.4944; %2;
c2 = 1.5; %1.4944; %2;
w = 0.792 * constant;
% Allocate memory and initialize
gBestScore = inf;
all_scores = inf * ones(nPopSize, nIters);
x = CreatePopFcn(nPopSize, nCity);
v = zeros(nPopSize, nCity);
pbest = x;
% update lbest
cost_p = inf * ones(1, nPopSize); %feval(FUN, pbest');
for i=1:nPopSize
cost_p(i) = FitnessFcn(pbest(i, 1:nPlant));
end
lbest = update_lbest(cost_p, pbest, nPopSize);
for iter = 1 : nIters
if mod(iter,1000) == 0
parents = randperm(nPopSize);
for i = 1:nPopSize
x(i,:) = (pbest(i,:) + pbest(parents(i),:))/2;
% v(i,:) = pbest(parents(i),:) - x(i,:);
% v(i,:) = (v(i,:) + v(parents(i),:))/2;
end
else
% Update velocity
v = w*v + c1*rand(nPopSize,nCity).*(pbest-x) + c2*rand(nPopSize,nCity).*(lbest-x);
% Update position
x = x + v;
x = UpdatePosition(x);
end
% Update pbest
cost_x = inf * ones(1, nPopSize);
for i=1:nPopSize
cost_x(i) = FitnessFcn(x(i, 1:nPlant));
end
s = cost_x<cost_p;
cost_p = (1-s).*cost_p + s.*cost_x;
s = repmat(s',1,nCity);
pbest = (1-s).*pbest + s.*x;
% update lbest
lbest = update_lbest(cost_p, pbest, nPopSize);
% update global best
all_scores(:, iter) = cost_x;
[cost,index] = min(cost_p);
if (cost < gBestScore)
gbest = pbest(index, :);
gBestScore = cost;
end
% draw current fitness
figure(1);
plot(iter,min(cost_x),'cp','MarkerEdgeColor','k','MarkerFaceColor','g','MarkerSize',8)
hold on
str=strcat('Best fitness: ', num2str(min(cost_x)));
disp(str);
end
end
% Function to update lbest
function lbest = update_lbest(cost_p, x, nPopSize)
sm(1, 1)= cost_p(1, nPopSize);
sm(1, 2:3)= cost_p(1, 1:2);
[cost, index] = min(sm);
if index==1
lbest(1, :) = x(nPopSize, :);
else
lbest(1, :) = x(index-1, :);
end
for i = 2:nPopSize-1
sm(1, 1:3)= cost_p(1, i-1:i+1);
[cost, index] = min(sm);
lbest(i, :) = x(i+index-2, :);
end
sm(1, 1:2)= cost_p(1, nPopSize-1:nPopSize);
sm(1, 3)= cost_p(1, 1);
[cost, index] = min(sm);
if index==3
lbest(nPopSize, :) = x(1, :);
else
lbest(nPopSize, :) = x(nPopSize-2+index, :);
end
end

If you are new to Optimization, I recommend you first to study each algorithm separately, then you may study how GA and PSO maybe combined, Although you must have basic mathematical skills in order to understand the operators of the two algorithms and in order to test the efficiency of these algorithm (this is what really matter).
This code chunk is responsible for parent selection and crossover:
parents = randperm(nPopSize);
for i = 1:nPopSize
x(i,:) = (pbest(i,:) + pbest(parents(i),:))/2;
% v(i,:) = pbest(parents(i),:) - x(i,:);
% v(i,:) = (v(i,:) + v(parents(i),:))/2;
end
Is not really obvious how selection randperm is done (I have no experience about Matlab).
And this is the code that is responsible for updating the velocity and position of each particle:
% Update velocity
v = w*v + c1*rand(nPopSize,nCity).*(pbest-x) + c2*rand(nPopSize,nCity).*(lbest-x);
% Update position
x = x + v;
x = UpdatePosition(x);
This version of velocity updating strategy is utilizing what is called Interia-Weight W, which basically mean we are preserving the velocity history of each particle (not completely recomputing it).
It worth mentioning that velocity updating is done more often than crossover (each 1000 iteration).

efficient matlab implementation for Lukas-Kanade step

I got an assignment in a video processing course - to implement the Lucas-Kanade algorithm. Since we have to do it in the pyramidal model, I first build a pyramid for each of the 2 input images, and then for each level I perform a number of LK iterations. in each step (iteration), the following code runs (note: the images are zero-padded so I can handle the image edges easily):
function [du,dv]= LucasKanadeStep(I1,I2,WindowSize)
It = I2-I1;
[Ix, Iy] = imgradientxy(I2);
Ixx = imfilter(Ix.*Ix, ones(5));
Iyy = imfilter(Iy.*Iy, ones(5));
Ixy = imfilter(Ix.*Iy, ones(5));
Ixt = imfilter(Ix.*It, ones(5));
Iyt = imfilter(Iy.*It, ones(5));
half_win = floor(WindowSize/2);
du = zeros(size(It));
dv = zeros(size(It));
A = zeros(2);
b = zeros(2,1);
%iterate only on the relevant parts of the images
for i = 1+half_win : size(It,1)-half_win
for j = 1+half_win : size(It,2)-half_win
A(1,1) = Ixx(i,j);
A(2,2) = Iyy(i,j);
A(1,2) = Ixy(i,j);
A(2,1) = Ixy(i,j);
b(1,1) = -Ixt(i,j);
b(2,1) = -Iyt(i,j);
U = pinv(A)*b;
du(i,j) = U(1);
dv(i,j) = U(2);
end
end
end
mathematically what I'm doing is calculating for every pixel (i,j) the following optical flow:
as you can see, in the code I am calculating this for each pixel, which takes quite a long time (the whole processing for 2 images - including building 3 levels pyramids and 3 LK steps like the one above on each level - takes about 25 seconds (!) on a remote connection to my university servers).
My question: Is there a way to calculate this single LK step without the nested for loops? it must be more efficient because the next step of the assignment is to stabilize a short video using this algorithm.. thanks.

I ran your code on my system and did profiling. Here is what I got.
As you can see inverting the matrix(pinv) is taking most of the time. You can try and vectorise your code I guess, but I am not sure how to do it. But I do know a trick to improve the compute time. You have to exploit the minimum variance of the matrix A. That is, compute the inverse only if the minimum variance of A is greater than some threshold. This will improve the speed as you won't be inverting the matrix for all the pixel.
You do this by modifying your code to the one shown below.
function [du,dv]= LucasKanadeStep(I1,I2,WindowSize)
It = double(I2-I1);
[Ix, Iy] = imgradientxy(I2);
Ixx = imfilter(Ix.*Ix, ones(5));
Iyy = imfilter(Iy.*Iy, ones(5));
Ixy = imfilter(Ix.*Iy, ones(5));
Ixt = imfilter(Ix.*It, ones(5));
Iyt = imfilter(Iy.*It, ones(5));
half_win = floor(WindowSize/2);
du = zeros(size(It));
dv = zeros(size(It));
A = zeros(2);
B = zeros(2,1);
%iterate only on the relevant parts of the images
for i = 1+half_win : size(It,1)-half_win
for j = 1+half_win : size(It,2)-half_win
A(1,1) = Ixx(i,j);
A(2,2) = Iyy(i,j);
A(1,2) = Ixy(i,j);
A(2,1) = Ixy(i,j);
B(1,1) = -Ixt(i,j);
B(2,1) = -Iyt(i,j);
% +++++++++++++++++++++++++++++++++++++++++++++++++++
% Code I added , threshold better be outside the loop.
lambda = eig(A);
threshold = 0.2
if (min(lambda)> threshold)
U = A\B;
du(i,j) = U(1);
dv(i,j) = U(2);
end
% end of addendum
% +++++++++++++++++++++++++++++++++++++++++++++++++++
% U = pinv(A)*B;
% du(i,j) = U(1);
% dv(i,j) = U(2);
end
end
end
I have set the threshold to 0.2. You can experiment with it. By using eigen value trick I was able to get the compute time from 37 seconds to 10 seconds(shown below). Using eigen, pinv hardly takes up the time like before.
Hope this helped. Good luck :)

Eventually I was able to find a much more efficient solution to this problem.
It is based on the formula shown in the question. The last 3 lines are what makes the difference - we get a loop-free code that works way faster. There were negligible differences from the looped version (~10^-18 or less in terms of absolute difference between the result matrices, ignoring the padding zone).
Here is the code:
function [du,dv]= LucasKanadeStep(I1,I2,WindowSize)
half_win = floor(WindowSize/2);
% pad frames with mirror reflections of itself
I1 = padarray(I1, [half_win half_win], 'symmetric');
I2 = padarray(I2, [half_win half_win], 'symmetric');
% create derivatives (time and space)
It = I2-I1;
[Ix, Iy] = imgradientxy(I2, 'prewitt');
% calculate dP = (du, dv) according to the formula
Ixx = imfilter(Ix.*Ix, ones(WindowSize));
Iyy = imfilter(Iy.*Iy, ones(WindowSize));
Ixy = imfilter(Ix.*Iy, ones(WindowSize));
Ixt = imfilter(Ix.*It, ones(WindowSize));
Iyt = imfilter(Iy.*It, ones(WindowSize));
% calculate the whole du,dv matrices AT ONCE!
invdet = (Ixx.*Iyy - Ixy.*Ixy).^-1;
du = invdet.*(-Iyy.*Ixt + Ixy.*Iyt);
dv = invdet.*(Ixy.*Ixt - Ixx.*Iyt);
end

Sum of outer products multiplied by a scalar in MATLAB

I would like to vectorize the sum of products hereafter in order to speed up my Matlab code. Would it be possible?
for i=1:N
A=A+hazard(i)*Z(i,:)'*Z(i,:);
end
where hazard is a vector (N x 1) and Z is a matrix (N x p).
Thanks!

With matrix multiplication only:
A = A + Z'*diag(hazard)*Z;
Note, however, that this requires more operations than Divakar's bsxfun approach, because diag(hazard) is an NxN matrix consisting mostly of zeros.
To save some time, you could define the inner matrix as sparse using spdiags, so that multiplication can be optimized:
A = A + full(Z'*spdiags(hazard, 0, zeros(N))*Z);
Benchmarking
Timing code:
Z = rand(N,p);
hazard = rand(N,1);
timeit(#() Z'*diag(hazard)*Z)
timeit(#() full(Z'*spdiags(hazard, 0, zeros(N))*Z))
timeit(#() bsxfun(#times,Z,hazard)'*Z)
With N = 1000; p = 300;
ans =
0.1423
ans =
0.0441
ans =
0.0325
With N = 2000; p = 1000;
ans =
1.8889
ans =
0.7110
ans =
0.6600
With N = 1000; p = 2000;
ans =
1.8159
ans =
1.2471
ans =
1.2264
It is seen that the bsxfun-based approach is consistently faster.

You can use bsxfun and matrix-multiplication -
A = bsxfun(#times,Z,hazard).'*Z + A

Classify points according to euclidean distance - optimize code

I have a matrix A consisting of 200 vectors of size d.
I want that a matrix B consisting of 4096 vectors gets classified to these points according to the nearest distance rule.
Thus the result should have rows of size B having the id number ( from 1 to 200 ) to which it belongs.
I have written this code via 2 for loops and it takes lots of time for calculation.
for i = 1:4096
counter = 1;
vector1 = FaceImage(i,:);
vector2 = Centroids(1,:);
distance = pdist( [ vector1 ; vector2] , 'euclidean' );
for j = 2:200
vector2 = Centroids(j,:);
temp = pdist( [ vector1 ; vector2] , 'euclidean' );
if temp < distance
distance = temp;
counter = j;
end
end
Histogram( i ) = counter;
end
Can somebody help me out increasing the efficiency of the above code ... or perhaps suggest me an inbuilt function ?
Thanks

You can do this in one line with pdist2:
[~, Histogram] = pdist2( Centroids, FaceImage, 'euclidian', 'Smallest', 1);
Timing for original code:
FaceImage = rand(4096, 100);
Centroids = rand(200, 100);
tic
* your code *
toc
Elapsed time is 87.434877 seconds.
Timing for my code:
tic
[~, Histogram_2] = pdist2( Centroids, FaceImage, 'euclidean', 'Smallest', 1);
toc
Elapsed time is 0.111736 seconds.
Asserting the results are the same:
>> all(Histogram==Histogram_2)
ans =
1

Try out this
vector2 = Centroids(1,:);
vector = [ vector2 ; FaceImage ];
temp = pdist( vector , 'euclidean' );
answer = temp[1:4096]; % will contain first 4096 lines as distances between vector2 and rows of Face Image
Now you can find the minimum of these distances and that `row + 1` will be the vector that is closest to the point

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Performance of updating/inserting into a sparse matrix in Matlab? - performance

Related

Is there a faster alternative to find() function in MATLAB?

Can anyone explain how different is this hybrid PSOGA from normal GA?

efficient matlab implementation for Lukas-Kanade step

Sum of outer products multiplied by a scalar in MATLAB

Classify points according to euclidean distance - optimize code

Categories

Resources