Classify points according to euclidean distance - optimize code - performance

I have a matrix A consisting of 200 vectors of size d.
I want that a matrix B consisting of 4096 vectors gets classified to these points according to the nearest distance rule.
Thus the result should have rows of size B having the id number ( from 1 to 200 ) to which it belongs.
I have written this code via 2 for loops and it takes lots of time for calculation.
for i = 1:4096
counter = 1;
vector1 = FaceImage(i,:);
vector2 = Centroids(1,:);
distance = pdist( [ vector1 ; vector2] , 'euclidean' );
for j = 2:200
vector2 = Centroids(j,:);
temp = pdist( [ vector1 ; vector2] , 'euclidean' );
if temp < distance
distance = temp;
counter = j;
end
end
Histogram( i ) = counter;
end
Can somebody help me out increasing the efficiency of the above code ... or perhaps suggest me an inbuilt function ?
Thanks

You can do this in one line with pdist2:
[~, Histogram] = pdist2( Centroids, FaceImage, 'euclidian', 'Smallest', 1);
Timing for original code:
FaceImage = rand(4096, 100);
Centroids = rand(200, 100);
tic
* your code *
toc
Elapsed time is 87.434877 seconds.
Timing for my code:
tic
[~, Histogram_2] = pdist2( Centroids, FaceImage, 'euclidean', 'Smallest', 1);
toc
Elapsed time is 0.111736 seconds.
Asserting the results are the same:
>> all(Histogram==Histogram_2)
ans =
1

Try out this
vector2 = Centroids(1,:);
vector = [ vector2 ; FaceImage ];
temp = pdist( vector , 'euclidean' );
answer = temp[1:4096]; % will contain first 4096 lines as distances between vector2 and rows of Face Image
Now you can find the minimum of these distances and that `row + 1` will be the vector that is closest to the point

Related

Performance of updating/inserting into a sparse matrix in Matlab?

i have written a fairly large class for the calculation of measurement uncertainties, but it is painfully slow. Profiling the code shows that the slowest operation, by far, is to insert the computation results into a large sparse matrix. About 97% of all time is spent on that operation. The matrix keeps the uncertainties of all measurement data, and I cannot change the data structures without breaking a lot of other code. So my only option is to optimize the data insertion step. This is done about 5700 times in my benchmark, and every time the amout of data increases.
First solution, extremely slow:
% this automatically sums up duplicate yInd entries
[zInd_grid, yInd_grid] = ndgrid(1:numel(z), yInd(:));
Uzw = sparse(zInd_grid(:), yInd_grid(:), Uzy(:), numel(z), numel(obj.w));
% this automatically sums up duplicate yInd entries
dz_dw = sparse(zInd_grid(:), yInd_grid(:), dz_dy(:), numel(z), numel(obj.w));
obj.w = [obj.w; z(:)]; % insert new measurement results into the column vector obj.w
obj.Uww = [obj.Uww, transpose(Uzw); Uzw, Uzz]; % insert new uncertainties of the measurement results
obj.dw_dw = [obj.dw_dw, transpose(dz_dw); dz_dw, dz_dz]; % insert "dependencies" of new measurements on old results
The line obj.Uww = [obj.Uww, transpose(Uzw); Uzw, Uzz]; is the slowest, by far. Perhaps it is slow because Matlab needs to allocate a new, larger buffer for obj.Uww and copy everything over. Thus I changed the code to the following:
% Preallocation in the class constructor
obj.w = spalloc(nnz_w, 1, nnz_w);
obj.Uww = spalloc(nnz_w, nnz_w, nnz_Uww);
obj.dw_dw = spalloc(nnz_w, nnz_w, nnz_dw_dw);
obj.num_w = 0; % to manually keep track of the "real" size of obj.w, obj.Uww and obj.dw_dw
The class constructor is called with the sizes the three properties w, Uww and dw_dw will have at the end of the computation (nzz_w is approximately 0.1 million, nzz_Uww is at 9 million and nzz_dw_dw is about 1.6 million). Thus, no new allocation of memory should be needed. This is the inserting step now:
% this automatically sums up duplicate yInd entries
[zInd_grid, yInd_grid] = ndgrid(1:numel(z), yInd(:));
Uzw = sparse(zInd_grid(:), yInd_grid(:), Uzy(:), numel(z), obj.num_w);
% this automatically sums up duplicate yInd entries
dz_dw = sparse(zInd_grid(:), yInd_grid(:), dz_dy(:), numel(z), obj.num_w);
wInd = 1:obj.num_w;
obj.w(zInd, 1) = z(:); % insert new measurement results
obj.num_w = zInd(end); % new "real" size of w, Uww and dw_dw
obj.Uww(zInd, wInd) = Uzw; % about 51% of all computation time
obj.Uww(wInd, zInd) = transpose(Uzw); % about 15% of all computation time
obj.Uww(zInd, zInd) = Uzz; % about 14.4% of all computation time
obj.dw_dw(zInd, wInd) = dz_dw; % about 13% of all computation time
obj.dw_dw(wInd, zInd) = transpose(dz_dw); % about 3.5% of all computation time
obj.dw_dw(zInd, zInd) = dz_dz; % less than 3.5% of all computation time
Still, these lines account for 97% of all computation time, and no speed improvement. Thus I tried version three:
obj.w = [obj.w; z(:)];
[zInd_zy, yInd_zy] = ndgrid(zInd, yInd(:));
[zzInd_i, zzInd_j] = ndgrid(zInd, zInd);
[Uww_i, Uww_j, Uww_v] = find(obj.Uww); % 14% of all computation time
Uww_new = sparse( ... % this statement takes 66% of all computation time
[Uww_i; zInd_zy(:); yInd_zy(:); zzInd_i(:)], ...
[Uww_j; yInd_zy(:); zInd_zy(:); zzInd_j(:)], ...
[Uww_v; Uzy(:); Uzy(:); Uzz(:)], ...
numel(obj.w), numel(obj.w));
[dw_dw_i, dw_dw_j, dw_dw_v] = find(obj.dw_dw);
dw_dw_new = sparse( ... % 14% of all computation time
[dw_dw_i; zInd_zy(:); yInd_zy(:); zzInd_i(:)], ...
[dw_dw_j; yInd_zy(:); zInd_zy(:); zzInd_j(:)], ...
[dw_dw_v; dz_dy(:); dz_dy(:); dz_dz(:)], ...
numel(obj.w), numel(obj.w));
obj.Uww = Uww_new;
obj.dw_dw = dw_dw_new;
which is even slower thant the two other versions. Why is inserting into an already preallocated array so slow? And how can I speed it up?
(All the matrices are symmetric, but I did not try to exploit that yet.)
I don't understand the details of your update pattern, but keep in mind that Matlab stores sparse matrices internally in compressed-sparse column format. So adding entries in sequence column-by-column is significantly faster than other orders. E.g., on my old version of Matlab (R2006a), this:
n=10000;
nz=400000;
v=floor(n*rand(nz,3))+1;
fprintf('Random\n');
A=sparse(n, n);
tic
for k=1:nz
A(v(k,1), v(k,2))=v(k,3);
end
toc
fprintf('Row-wise\n');
v=sortrows(v);
A=sparse(n, n);
tic
for k=1:nz
A(v(k,1), v(k,2))=v(k,3);
end
toc
fprintf('Column-wise\n');
v=sortrows(v, [2 1]);
A=sparse(n, n);
tic
for k=1:nz
A(v(k,1), v(k,2))=v(k,3);
end
toc
gives this:
>> sparsetest
Random
Elapsed time is 19.276089 seconds.
Row-wise
Elapsed time is 20.714324 seconds.
Column-wise
Elapsed time is 1.498150 seconds.
Likely best of all would be if you can somehow just collect the nonzeros in a form suitable for spconvert or sparse and then make the whole sparse matrix at the end, but I gather that you might not be able to do that.
#bg2b Pointed out how adding data column-wise is much faster.
It turns out adding rows to a sparse matrix or sparse vector is painfully slow (more precisely, to the lower triangular part). Thus, I now store only the upper triangular part of the sparse matrix, because that is fast. When I need data from the matrix, I recreate it from the upper triangular sparse matrix. See the end of the benchmarking script for this.
This is my benchmarking script. It nicely shows the exponential increase in computation time for adding data.
% Benchmark the extension of sparse matrices of the form
% Uww_new = [ Uww, Uwz; ...
% Uzw, Uzz];
% where Uzw = transpose(Uwz). Uww and Uzz are always square and symmetric.
close all;
clearvars;
rng(70101557, 'twister'); % the seed is the number of the stack overflow question
density = 0.25;
nZ = 10;
iterations = 5e2;
n = nZ * iterations;
nonzeros = n*n*density;
all_Uzz = cell(iterations, 1);
all_Uwz = cell(iterations, 1);
for k = 1:iterations
% Uzz must be symmetric!
Uzz_nonsymmetric = sprand(nZ, nZ, density);
all_Uzz{k} = (Uzz_nonsymmetric + transpose(Uzz_nonsymmetric))./2;
all_Uwz{k} = sprand((k-1)*nZ, nZ, density);
end
f = figure();
ax = axes(f);
hold(ax, 'on');
grid(ax, 'on');
xlabel(ax, 'Iteration');
ylabel(ax, 'Elapsed time in seconds.');
h = gobjects(1, 0);
name = 'Optimised.';
fprintf('%s\n', name);
Uww_optimised = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
Uzz = all_Uzz{k};
Uwz = all_Uwz{k};
zInd = size(Uww_optimised, 1) + (1:size(Uzz, 1));
wInd = 1:size(Uww_optimised, 1);
Uzz_triu = triu(Uzz);
Uww_optimised(wInd, zInd) = Uwz; % add columns
Uww_optimised(zInd, zInd) = Uzz_triu; % add rows and columns
elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
name = 'Only Uwz and Uzz.';
fprintf('%s\n', name);
Uww_wz_zz = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
Uzz = all_Uzz{k};
Uwz = all_Uwz{k};
zInd = size(Uww_wz_zz, 1) + (1:size(Uzz, 1));
wInd = 1:size(Uww_wz_zz, 1);
Uzw = transpose(Uwz);
Uww_wz_zz(wInd, zInd) = Uwz; % add columns
% Uww_wz_zz(zInd, wInd) = Uzw; % add rows
Uww_wz_zz(zInd, zInd) = Uzz; % add rows and columns
elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
name = 'Only Uzw and Uzz.';
fprintf('%s\n', name);
Uww_zw_zz = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
Uzz = all_Uzz{k};
Uwz = all_Uwz{k};
zInd = size(Uww_zw_zz, 1) + (1:size(Uzz, 1));
wInd = 1:size(Uww_zw_zz, 1);
Uzw = transpose(Uwz);
% Uww_zw_zz(wInd, zInd) = Uwz;
Uww_zw_zz(zInd, wInd) = Uzw;
Uww_zw_zz(zInd, zInd) = Uzz;
elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
name = 'Uzw, Uwz and Uzz.';
fprintf('%s\n', name);
Uww = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
Uzz = all_Uzz{k};
Uwz = all_Uwz{k};
zInd = size(Uww, 1) + (1:size(Uzz, 1));
wInd = 1:size(Uww, 1);
Uzw = transpose(Uwz);
Uww(wInd, zInd) = Uwz;
Uww(zInd, wInd) = Uzw;
Uww(zInd, zInd) = Uzz;
elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
leg = legend(ax, h, 'Location', 'northwest');
assert(issymmetric(Uww));
assert(istriu(Uww_optimised));
assert(isequal(Uww, Uww_optimised + transpose(triu(Uww_optimised, 1))));
%% Get Uyy from Uww_optimised. Uyy is a symmetric subset of Uww
yInd = randi(size(Uww_optimised, 1), 1, nZ); % indices to extract
[yIndRowInds, yIndColInds] = ndgrid(yInd, yInd);
indsToFlip = yIndRowInds > yIndColInds;
temp = yIndColInds(indsToFlip);
yIndColInds(indsToFlip) = yIndRowInds(indsToFlip);
yIndRowInds(indsToFlip) = temp;
linInd = sub2ind(size(Uww_optimised), yIndRowInds, yIndColInds);
assert(issymmetric(linInd));
Uyy = Uww_optimised(linInd);
assert(issymmetric(Uyy));

efficient matlab implementation for Lukas-Kanade step

I got an assignment in a video processing course - to implement the Lucas-Kanade algorithm. Since we have to do it in the pyramidal model, I first build a pyramid for each of the 2 input images, and then for each level I perform a number of LK iterations. in each step (iteration), the following code runs (note: the images are zero-padded so I can handle the image edges easily):
function [du,dv]= LucasKanadeStep(I1,I2,WindowSize)
It = I2-I1;
[Ix, Iy] = imgradientxy(I2);
Ixx = imfilter(Ix.*Ix, ones(5));
Iyy = imfilter(Iy.*Iy, ones(5));
Ixy = imfilter(Ix.*Iy, ones(5));
Ixt = imfilter(Ix.*It, ones(5));
Iyt = imfilter(Iy.*It, ones(5));
half_win = floor(WindowSize/2);
du = zeros(size(It));
dv = zeros(size(It));
A = zeros(2);
b = zeros(2,1);
%iterate only on the relevant parts of the images
for i = 1+half_win : size(It,1)-half_win
for j = 1+half_win : size(It,2)-half_win
A(1,1) = Ixx(i,j);
A(2,2) = Iyy(i,j);
A(1,2) = Ixy(i,j);
A(2,1) = Ixy(i,j);
b(1,1) = -Ixt(i,j);
b(2,1) = -Iyt(i,j);
U = pinv(A)*b;
du(i,j) = U(1);
dv(i,j) = U(2);
end
end
end
mathematically what I'm doing is calculating for every pixel (i,j) the following optical flow:
as you can see, in the code I am calculating this for each pixel, which takes quite a long time (the whole processing for 2 images - including building 3 levels pyramids and 3 LK steps like the one above on each level - takes about 25 seconds (!) on a remote connection to my university servers).
My question: Is there a way to calculate this single LK step without the nested for loops? it must be more efficient because the next step of the assignment is to stabilize a short video using this algorithm.. thanks.
I ran your code on my system and did profiling. Here is what I got.
As you can see inverting the matrix(pinv) is taking most of the time. You can try and vectorise your code I guess, but I am not sure how to do it. But I do know a trick to improve the compute time. You have to exploit the minimum variance of the matrix A. That is, compute the inverse only if the minimum variance of A is greater than some threshold. This will improve the speed as you won't be inverting the matrix for all the pixel.
You do this by modifying your code to the one shown below.
function [du,dv]= LucasKanadeStep(I1,I2,WindowSize)
It = double(I2-I1);
[Ix, Iy] = imgradientxy(I2);
Ixx = imfilter(Ix.*Ix, ones(5));
Iyy = imfilter(Iy.*Iy, ones(5));
Ixy = imfilter(Ix.*Iy, ones(5));
Ixt = imfilter(Ix.*It, ones(5));
Iyt = imfilter(Iy.*It, ones(5));
half_win = floor(WindowSize/2);
du = zeros(size(It));
dv = zeros(size(It));
A = zeros(2);
B = zeros(2,1);
%iterate only on the relevant parts of the images
for i = 1+half_win : size(It,1)-half_win
for j = 1+half_win : size(It,2)-half_win
A(1,1) = Ixx(i,j);
A(2,2) = Iyy(i,j);
A(1,2) = Ixy(i,j);
A(2,1) = Ixy(i,j);
B(1,1) = -Ixt(i,j);
B(2,1) = -Iyt(i,j);
% +++++++++++++++++++++++++++++++++++++++++++++++++++
% Code I added , threshold better be outside the loop.
lambda = eig(A);
threshold = 0.2
if (min(lambda)> threshold)
U = A\B;
du(i,j) = U(1);
dv(i,j) = U(2);
end
% end of addendum
% +++++++++++++++++++++++++++++++++++++++++++++++++++
% U = pinv(A)*B;
% du(i,j) = U(1);
% dv(i,j) = U(2);
end
end
end
I have set the threshold to 0.2. You can experiment with it. By using eigen value trick I was able to get the compute time from 37 seconds to 10 seconds(shown below). Using eigen, pinv hardly takes up the time like before.
Hope this helped. Good luck :)
Eventually I was able to find a much more efficient solution to this problem.
It is based on the formula shown in the question. The last 3 lines are what makes the difference - we get a loop-free code that works way faster. There were negligible differences from the looped version (~10^-18 or less in terms of absolute difference between the result matrices, ignoring the padding zone).
Here is the code:
function [du,dv]= LucasKanadeStep(I1,I2,WindowSize)
half_win = floor(WindowSize/2);
% pad frames with mirror reflections of itself
I1 = padarray(I1, [half_win half_win], 'symmetric');
I2 = padarray(I2, [half_win half_win], 'symmetric');
% create derivatives (time and space)
It = I2-I1;
[Ix, Iy] = imgradientxy(I2, 'prewitt');
% calculate dP = (du, dv) according to the formula
Ixx = imfilter(Ix.*Ix, ones(WindowSize));
Iyy = imfilter(Iy.*Iy, ones(WindowSize));
Ixy = imfilter(Ix.*Iy, ones(WindowSize));
Ixt = imfilter(Ix.*It, ones(WindowSize));
Iyt = imfilter(Iy.*It, ones(WindowSize));
% calculate the whole du,dv matrices AT ONCE!
invdet = (Ixx.*Iyy - Ixy.*Ixy).^-1;
du = invdet.*(-Iyy.*Ixt + Ixy.*Iyt);
dv = invdet.*(Ixy.*Ixt - Ixx.*Iyt);
end

K-means for color quantization - Code not vectorized

I'm doing this exercise by Andrew NG about using k-means to reduce the number of colors in an image. It worked correctly but I'm afraid it's a little slow because of all the for loops in the code, so I'd like to vectorize them. But there are those loops that I just can't seem to vectorize effectively. Please help me, thank you very much!
Also if possible please give some feedback on my coding style :)
Here is the link of the exercise, and here is the dataset.
The correct result is given in the link of the exercise.
And here is my code:
function [] = KMeans()
Image = double(imread('bird_small.tiff'));
[rows,cols, RGB] = size(Image);
Points = reshape(Image,rows * cols, RGB);
K = 16;
Centroids = zeros(K,RGB);
s = RandStream('mt19937ar','Seed',0);
% Initialization :
% Pick out K random colours and make sure they are all different
% from each other! This prevents the situation where two of the means
% are assigned to the exact same colour, therefore we don't have to
% worry about division by zero in the E-step
% However, if K = 16 for example, and there are only 15 colours in the
% image, then this while loop will never exit!!! This needs to be
% addressed in the future :(
% TODO : Vectorize this part!
done = false;
while done == false
RowIndex = randperm(s,rows);
ColIndex = randperm(s,cols);
RowIndex = RowIndex(1:K);
ColIndex = ColIndex(1:K);
for i = 1 : K
for j = 1 : RGB
Centroids(i,j) = Image(RowIndex(i),ColIndex(i),j);
end
end
Centroids = sort(Centroids,2);
Centroids = unique(Centroids,'rows');
if size(Centroids,1) == K
done = true;
end
end;
% imshow(imread('bird_small.tiff'))
%
% for i = 1 : K
% hold on;
% plot(RowIndex(i),ColIndex(i),'r+','MarkerSize',50)
% end
eps = 0.01; % Epsilon
IterNum = 0;
while 1
% E-step: Estimate membership given parameters
% Membership: The centroid that each colour is assigned to
% Parameters: Location of centroids
Dist = pdist2(Points,Centroids,'euclidean');
[~, WhichCentroid] = min(Dist,[],2);
% M-step: Estimate parameters given membership
% Membership: The centroid that each colour is assigned to
% Parameters: Location of centroids
% TODO: Vectorize this part!
OldCentroids = Centroids;
for i = 1 : K
PointsInCentroid = Points((find(WhichCentroid == i))',:);
NumOfPoints = size(PointsInCentroid,1);
% Note that NumOfPoints is never equal to 0, as a result of
% the initialization. Or .... ???????
if NumOfPoints ~= 0
Centroids(i,:) = sum(PointsInCentroid , 1) / NumOfPoints ;
end
end
% Check for convergence: Here we use the L2 distance
IterNum = IterNum + 1;
Margins = sqrt(sum((Centroids - OldCentroids).^2, 2));
if sum(Margins > eps) == 0
break;
end
end
IterNum;
Centroids ;
% Load the larger image
[LargerImage,ColorMap] = imread('bird_large.tiff');
LargerImage = double(LargerImage);
[largeRows,largeCols,NewRGB] = size(LargerImage); % RGB is always 3
% TODO: Vectorize this part!
largeRows
largeCols
NewRGB
% Replace each of the pixel with the nearest centroid
NewPoints = reshape(LargerImage,largeRows * largeCols, NewRGB);
Dist = pdist2(NewPoints,Centroids,'euclidean');
[~,WhichCentroid] = min(Dist,[],2);
NewPoints = Centroids(WhichCentroid,:);
LargerImage = reshape(NewPoints,largeRows,largeCols,NewRGB);
% for i = 1 : largeRows
% for j = 1 : largeCols
% Dist = pdist2(Centroids,reshape(LargerImage(i,j,:),1,RGB),'euclidean');
% [~,WhichCentroid] = min(Dist);
% LargerImage(i,j,:) = Centroids(WhichCentroid,:);
% end
% end
% Display new image
imshow(uint8(round(LargerImage)),ColorMap)
UPDATE: Replaced
for i = 1 : K
for j = 1 : RGB
Centroids(i,j) = Image(RowIndex(i),ColIndex(i),j);
end
end
with
for i = 1 : K
Centroids(i,:) = Image(RowIndex(i),ColIndex(i),:);
end
I think this may be vectorized further by using linear indexing, but for now I should just focus on the while loop since it takes most of the time.
Also when I tried #Dev-iL's suggestion and replaced
for i = 1 : K
PointsInCentroid = Points((find(WhichCentroid == i))',:);
NumOfPoints = size(PointsInCentroid,1);
% Note that NumOfPoints is never equal to 0, as a result of
% the initialization. Or .... ???????
if NumOfPoints ~= 0
Centroids(i,:) = sum(PointsInCentroid , 1) / NumOfPoints ;
end
end
with
E = sparse(1:size(WhichCentroid), WhichCentroid' , 1, Num, K, Num);
Centroids = (E * spdiags(1./sum(E,1)',0,K,K))' * Points ;
the results were always worse: With K = 16, the first takes 2,414s , the second takes 2,455s ; K = 32, the first takes 4,529s , the second takes 5,022s. Seems like vectorization does not help, but maybe there's something wrong with my code :( .
Replaced
for i = 1 : K
for j = 1 : RGB
Centroids(i,j) = Image(RowIndex(i),ColIndex(i),j);
end
end
with
for i = 1 : K
Centroids(i,:) = Image(RowIndex(i),ColIndex(i),:);
end
I think this may be vectorized further by using linear indexing, but for now I should just focus on the while loop since it takes most of the time.
Also when I tried #Dev-iL's suggestion and replaced
for i = 1 : K
PointsInCentroid = Points((find(WhichCentroid == i))',:);
NumOfPoints = size(PointsInCentroid,1);
% Note that NumOfPoints is never equal to 0, as a result of
% the initialization. Or .... ???????
if NumOfPoints ~= 0
Centroids(i,:) = sum(PointsInCentroid , 1) / NumOfPoints ;
end
end
with
E = sparse(1:size(WhichCentroid), WhichCentroid' , 1, Num, K, Num);
Centroids = (E * spdiags(1./sum(E,1)',0,K,K))' * Points ;
the results were always worse: With K = 16, the first takes 2,414s , the second takes 2,455s ; K = 32, the first took 4,529s , the second took 5,022s. Seems like vectorization did not help in this case.
However, when I replaced
Dist = pdist2(Points,Centroids,'euclidean');
[~, WhichCentroid] = min(Dist,[],2);
(in the while loop) with
Dist = bsxfun(#minus,dot(Centroids',Centroids',1)' / 2 , Centroids * Points' );
[~, WhichCentroid] = min(Dist,[],1);
WhichCentroid = WhichCentroid';
the code ran much faster, especially when K is large (K=32)
Thank you everyone!

Implement Adaptive watershed segmentation in Matlab

I will like to implement "Adaptive Watershed Segmentation" in Matlab.
There are six steps in this algorithm. Input is figure(a) and result is figure(d).
Would you please to help me check is there any mistake in my code, and I don't know how to implement the sixth step.
Thank you so much!
Load image:
input_image = imread('test.gif');
Step 1 : Calculate D(x,y) at each (x,y), obtain the Euclidian distance map of the binary image and assign each value of M(x,y) as 0.
DT = bwdist(input_image,'euclidean'); % Trandform distance:Euclidian distance
[h,w]=size(DT);
M = zeros(h,w);
Step 2 : Smooth the distance map using Gaussian filter to merge the adjacent maxima, set M(x,y) as 1 if D(x,y) is a local maximum, and then obtain the marker map of the distance map.
H = fspecial('gaussian');
gfDT = imfilter(DT,H);
M = imregionalmax(gfDT); % maker map, M = local maximum of gfDT
Step3 : Scan the marker map pixel by pixel. If M(x0,y0) is 1, seek the spurious maxima in its neighbourhood with a radius of D(x ,y ).When M(x,y) equals 1 and sqr((x − x0)^2 + (y − y0)^2 ) ≤ D(x0, y0) , set M(x,y) as 0 if D(x,y) < D(x0,y0).
for x0 = 1:h
for y0 = 1:w
if M(x0,y0) == 1
r = ceil(gfDT(x0,y0));
% range begin:(x0-r,y0-r) end:(x0+r,y0+r)
xb = x0-r;
if xb <= 0
xb =1;
end
yb = y0-r;
if yb <= 0
yb =1;
end
xe = x0+r;
if xe > w
xe = w;
end
ye = y0+r;
if ye > h
ye = h;
end
for x = yb:ye
for y = xb:xe
if M(x,y)==1
Pos = [x0,y0 ;x,y];
Dis = pdist(Pos,'euclidean');
IFA = Dis<= (gfDT(x0,y0));
IFB = gfDT(x,y)<gfDT(x0,y0);
if ( IFA && IFB)
M(x,y) = 0;
end
end
end
end
end
end
end
Step 4:
Calculate the inverse of the distance map,and the local maxima turn out to be the local minima.
igfDT = -(gfDT);
STep5:
Segment the distance map according to the markers by the conventional watershed algorithm and obtain the segmentation of binary image.
I2 = imimposemin(igfDT,M);
L = watershed(I2);
igfDT (L==0)=0;
Step 6 : Straighten the watershed lines by linking the ends of the watershed lines with a straight line and reclassifying the pixels along the straight line.
I don't know how to implement this step
Try distance transform and then watershed transform.
im=imread('n6BRI.gif');
imb=bwdist(im);
sigma=3;
kernel = fspecial('gaussian',4*sigma+1,sigma);
im2=imfilter(imb,kernel,'symmetric');
L = watershed(max(im2(:))-im2);
[x,y]=find(L==0);
lblImg = bwlabel(L&~im);
figure,imshow(label2rgb(lblImg,'jet','k','shuffle'));

Optimizing for loop in Matlab

I'm writing a "Peak finder" in Matlab. I've never used Matlab or anything similar before this project, so I'm new to "vectorizing" my code. Essentially, the program needs to take a video of molecules and plot circles on the molecules present in each frame of the video. If a molecule is crowded then it gets a red circle, but if it is not crowded it gets a green circle.
My problem is that some of these videos have 2000 frames and my program takes up to ~25 seconds to process a single frame, which is not practical.
Using tic and toc I've found the trouble maker: A for-loop which calls a function that contains a for-loop.
function getPeaks( orgnl_img, processed_img, cut_off, radius )
% find large peaks peaks by thresholding, i.e. you accept a peak only
% if its more than 'threshold' higher than its neighbors
threshold = 2*std2(orgnl_img);
peaks = (orgnl_img - processed_img) > threshold;
% m and n are dimensions of the frame, named peaks
[m, n] = size(peaks);
cc_centroids = regionprops(peaks, 'centroid');
% Pre-allocate arrays
x_centroid = zeros(1, length(cc_centroids));
y_centroid = zeros(1, length(cc_centroids));
for i = 1:length(cc_centroids)
% Extract the x and y components from cc_centroids struct
x_centroid(i) = cc_centroids(i).Centroid(1);
y_centroid(i) = cc_centroids(i).Centroid(2);
row = int64(x_centroid(i));
col = int64(y_centroid(i));
% Assure that script doesnt attempt to exceed frame
if col-2 > 0 && row-2 > 0 && col+2 < m && row+2 < n
region_of_interest = orgnl_img(col-2:col+2,row-2:row+2);
local_intensity = max(region_of_interest(:));
% Do not plot circle when intensity is 'cut off' or lower
if local_intensity > cut_off
dist_bool = findDistance(cc_centroids, x_centroid(i), y_centroid(i), radius);
if dist_bool == 1
color = 'g';
else
color = 'r';
end
plotCircle(color, x_centroid(i), y_centroid(i), radius)
end
end
end
end
And here is the findDistance function which contains another for-loop and determines if the circles overlap:
function dist_bool = findDistance( all_centroids, x_crrnt, y_crrnt, radius )
x_nearby = zeros(1, length(all_centroids));
y_nearby = zeros(1, length(all_centroids));
for i = 1:length(all_centroids)
if all_centroids(i).Centroid(1) < (x_crrnt+2*radius) &&...
all_centroids(i).Centroid(1) > (x_crrnt-2*radius) &&...
all_centroids(i).Centroid(2) < (y_crrnt+2*radius) &&...
all_centroids(i).Centroid(2) > (y_crrnt-2*radius)
x_nearby(i) = all_centroids(i).Centroid(1);
y_nearby(i) = all_centroids(i).Centroid(2);
pts_of_interest = [x_nearby(i),y_nearby(i);x_crrnt,y_crrnt];
dist = pdist(pts_of_interest);
if dist == 0 || dist > 2*radius
dist_bool = 1;
else
dist_bool = 0;
break
end
end
end
end
I think that there must be much improvement to be done here. I would appreciate any advice.
Thanks! :)
UPDATE: Here are the profiler results. The first section of code is the "getPeaks" function
http://i.stack.imgur.com/VaLdH.png
The hold comes from the plot circle function:
function plotCircle( color, x_centroid, y_centroid, radius )
hold on;
th = 0:pi/50:2*pi;
xunit = radius * cos(th) + x_centroid;
yunit = radius * sin(th) + y_centroid;
plot(xunit, yunit, color);
hold off;
end

Resources