Matlab: How to vectorize a nested loop over a 2D set of vectors - performance

I have a function in the following form:
function Out = DecideIfAPixelIsWithinAnEllipsoidalClass(pixel,means,VarianceCovarianceMatrix)
ellipsoid = (pixel-means)'*(VarianceCovarianceMatrix^(-1))*(pixel-means);
if ellipsoid <= 1
Out = 1;
else
Out = 0;
end
end
I am doing remote-sensing processes with matlab and I want to classify a LandSatTM images.This picture has 7 bands and is 2048*2048.So I stored them in 3 dimentinal 2048*2048*7 matrix.in this function means is a 7*1 matrix calculated earlier using the sample of the class in a function named ExtractStatisticalParameters and VarianceCovarianceMatrix is a 7*7 matrix in fact you see that:
ellipsoid = (pixel-means)'*(VarianceCovarianceMatrix^(-1))*(pixel-means);
is the equation of an ellipsoid.My problem is that each time you can pass a single pixel(it is a 7*1 vector where each row is the value of the pixel in a seperated band) to this function so I need to write a loop like this:
for k1=1:2048
for k2=1:2048
pixel(:,1)=image(k1,k2,:);
Out = DecideIfAPixelIsWithinAnEllipsoidalClass(pixel,means,VarianceCovarianceMatrix);
end
end
and you know it will take alot of time and energy of the system.Can you suggest me a way to reduce the pressure applied on the systam?

No need for loops!
pMinusMean = bsxfun( #minus, reshape( image, [], 7 ), means' ); %//' subtract means from all pixes
iCv = inv( arianceCovarianceMatrix );
ell = sum( (pMinusMean * iCv ) .* pminusMean, 2 ); % note the .* the second time!
Out = reshape( ell <= 1, size(image(:,:,1)) ); % out is 2048-by-2048 logical image
Update:
After a (somewhat heated) debate in the comments below I add a correction made by Rody Oldenhuis:
pMinusMean = bsxfun( #minus, reshape( image, [], 7 ), means' ); %//' subtract means from all pixes
ell = sum( (pMinusMean / varianceCovarianceMatrix ) .* pminusMean, 2 ); % note the .* the second time!
Out = reshape( ell <= 1, size(image(:,:,1)) );
The key issue in this change is that Matlab's inv() is poorly implemented and it is best to use mldivide and mrdivide (operators / and \) instead.

Related

efficient matlab implementation for Lukas-Kanade step

I got an assignment in a video processing course - to implement the Lucas-Kanade algorithm. Since we have to do it in the pyramidal model, I first build a pyramid for each of the 2 input images, and then for each level I perform a number of LK iterations. in each step (iteration), the following code runs (note: the images are zero-padded so I can handle the image edges easily):
function [du,dv]= LucasKanadeStep(I1,I2,WindowSize)
It = I2-I1;
[Ix, Iy] = imgradientxy(I2);
Ixx = imfilter(Ix.*Ix, ones(5));
Iyy = imfilter(Iy.*Iy, ones(5));
Ixy = imfilter(Ix.*Iy, ones(5));
Ixt = imfilter(Ix.*It, ones(5));
Iyt = imfilter(Iy.*It, ones(5));
half_win = floor(WindowSize/2);
du = zeros(size(It));
dv = zeros(size(It));
A = zeros(2);
b = zeros(2,1);
%iterate only on the relevant parts of the images
for i = 1+half_win : size(It,1)-half_win
for j = 1+half_win : size(It,2)-half_win
A(1,1) = Ixx(i,j);
A(2,2) = Iyy(i,j);
A(1,2) = Ixy(i,j);
A(2,1) = Ixy(i,j);
b(1,1) = -Ixt(i,j);
b(2,1) = -Iyt(i,j);
U = pinv(A)*b;
du(i,j) = U(1);
dv(i,j) = U(2);
end
end
end
mathematically what I'm doing is calculating for every pixel (i,j) the following optical flow:
as you can see, in the code I am calculating this for each pixel, which takes quite a long time (the whole processing for 2 images - including building 3 levels pyramids and 3 LK steps like the one above on each level - takes about 25 seconds (!) on a remote connection to my university servers).
My question: Is there a way to calculate this single LK step without the nested for loops? it must be more efficient because the next step of the assignment is to stabilize a short video using this algorithm.. thanks.
I ran your code on my system and did profiling. Here is what I got.
As you can see inverting the matrix(pinv) is taking most of the time. You can try and vectorise your code I guess, but I am not sure how to do it. But I do know a trick to improve the compute time. You have to exploit the minimum variance of the matrix A. That is, compute the inverse only if the minimum variance of A is greater than some threshold. This will improve the speed as you won't be inverting the matrix for all the pixel.
You do this by modifying your code to the one shown below.
function [du,dv]= LucasKanadeStep(I1,I2,WindowSize)
It = double(I2-I1);
[Ix, Iy] = imgradientxy(I2);
Ixx = imfilter(Ix.*Ix, ones(5));
Iyy = imfilter(Iy.*Iy, ones(5));
Ixy = imfilter(Ix.*Iy, ones(5));
Ixt = imfilter(Ix.*It, ones(5));
Iyt = imfilter(Iy.*It, ones(5));
half_win = floor(WindowSize/2);
du = zeros(size(It));
dv = zeros(size(It));
A = zeros(2);
B = zeros(2,1);
%iterate only on the relevant parts of the images
for i = 1+half_win : size(It,1)-half_win
for j = 1+half_win : size(It,2)-half_win
A(1,1) = Ixx(i,j);
A(2,2) = Iyy(i,j);
A(1,2) = Ixy(i,j);
A(2,1) = Ixy(i,j);
B(1,1) = -Ixt(i,j);
B(2,1) = -Iyt(i,j);
% +++++++++++++++++++++++++++++++++++++++++++++++++++
% Code I added , threshold better be outside the loop.
lambda = eig(A);
threshold = 0.2
if (min(lambda)> threshold)
U = A\B;
du(i,j) = U(1);
dv(i,j) = U(2);
end
% end of addendum
% +++++++++++++++++++++++++++++++++++++++++++++++++++
% U = pinv(A)*B;
% du(i,j) = U(1);
% dv(i,j) = U(2);
end
end
end
I have set the threshold to 0.2. You can experiment with it. By using eigen value trick I was able to get the compute time from 37 seconds to 10 seconds(shown below). Using eigen, pinv hardly takes up the time like before.
Hope this helped. Good luck :)
Eventually I was able to find a much more efficient solution to this problem.
It is based on the formula shown in the question. The last 3 lines are what makes the difference - we get a loop-free code that works way faster. There were negligible differences from the looped version (~10^-18 or less in terms of absolute difference between the result matrices, ignoring the padding zone).
Here is the code:
function [du,dv]= LucasKanadeStep(I1,I2,WindowSize)
half_win = floor(WindowSize/2);
% pad frames with mirror reflections of itself
I1 = padarray(I1, [half_win half_win], 'symmetric');
I2 = padarray(I2, [half_win half_win], 'symmetric');
% create derivatives (time and space)
It = I2-I1;
[Ix, Iy] = imgradientxy(I2, 'prewitt');
% calculate dP = (du, dv) according to the formula
Ixx = imfilter(Ix.*Ix, ones(WindowSize));
Iyy = imfilter(Iy.*Iy, ones(WindowSize));
Ixy = imfilter(Ix.*Iy, ones(WindowSize));
Ixt = imfilter(Ix.*It, ones(WindowSize));
Iyt = imfilter(Iy.*It, ones(WindowSize));
% calculate the whole du,dv matrices AT ONCE!
invdet = (Ixx.*Iyy - Ixy.*Ixy).^-1;
du = invdet.*(-Iyy.*Ixt + Ixy.*Iyt);
dv = invdet.*(Ixy.*Ixt - Ixx.*Iyt);
end

Get better performance for converting matrix to vector

when working with images, usually they include 3 layers, (RGB). In order to do some computation, I need to convert each layer of the image into a vector.
I1 = ones(70,50,3); % the first image
I2 = 0.4 * ones(70,50,3); % the second image
for dd = 1:3
ILayer1 = I1(:,:,dd);
ILayerLinear1 = ILayer1(:);
ILayer2 = I2(:,:,dd);
ILayerLinear2 = ILayer2(:);
comp = ILayerLinear1 * ILayerLinear1.';
end
Here I have replaced the main computation part with a very simple computation, but that is not the point.
Is there a better way to not repeat the matrix-to-vector conversion, or do it more efficiently? Because it may happen multiple times through the code.
Update:
I can also define a function as follows to pass an Image and retrieve a vector, but it still is not improving the code.
function V = I2V(I)
[r,c,d] = size(I);
V = zeros(d,r*c);
for dd = 1:d
layer = I(:,:,dd);
V(dd,:) = layer(:);
end
end
I'm not sure about the outer product but, here's everything else.
I1 = reshape(1:70*50*3, 70,50,3);
I2 = 0.4*reshape(1:70*50*3, 70,50,3);
i1 = reshape(I1, [], 3);
i2 = reshape(I2, [], 3);

Speed up an Enumeration process

After a few days of optimization this is my code for an enumeration process that consist in finding the best combination for every row of W. The algorithm separates the matrix W in one where the elements of W are grather of LimiteInferiore (called W_legali) and one that have only element below the limit (called W_nlegali).
Using some parameters like Media (aka Mean), rho_b_legali The algorithm minimizes the total cost function. In the last part, I find where is the combination with the lowest value of objective function and save it in W_ottimo
As you can see the algorithm is not so "clean" and with very large matrix (142506x3000) is damn slow...So, can somebody help me to speed it up a little bit?
for i=1:3000
W = PesoIncertezza * MatriceCombinazioni';
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
W_legali = W;
W_legali(W<LimiteInferiore) = nan;
if i==1
Media = W_legali;
rho_b_legale = ones(size (W_legali,1),size(MatriceCombinazioni,1));
else
Media = (repmat(sum(W_tot_migl,2),1,size(MatriceCombinazioni,1))+W_legali)/(size(W_tot_migl,2)+1);
rho_b_legale = repmat(((n_b+1)/i),1,size(MatriceCombinazioni,1));
end
[W_legali_migl,comb] = min(C_u .* Media .* (1./rho_b_legale) + (1./rho_b_legale) .* c_0 + (c_1./(i * rho_b_legale)),[],2);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
MatriceCombinazioni_2 = MatriceCombinazioni;
MatriceCombinazioni_2(sum(MatriceCombinazioni_2,2)<2,:)=[];
W_nlegali = PesoIncertezza * MatriceCombinazioni_2';
W_nlegali(W_nlegali>=LimiteInferiore) = nan;
if i==1
Media = W_nlegali;
rho_b_nlegale = zeros(size (W_nlegali,1),size(MatriceCombinazioni_2,1));
else
Media = (repmat(sum(W_tot_migl,2),1,size(MatriceCombinazioni_2,1))+W_nlegali)/(size(W_tot_migl,2)+1);
rho_b_nlegale = repmat(((n_b)/i),1,size(MatriceCombinazioni_2,1));
end
[W_nlegali_migliori,comb2] = min(C_u .* Media .* (1./rho_b_nlegale) + (1./rho_b_nlegale) .* c_0 + (c_1./(i * rho_b_nlegale)),[],2);
z = [W_legali_migl, W_nlegali_migliori];
[z_ott,comb3] = min(z,[],2);
%Increasing n_b
if i==1
n_b = zeros(size(W,1),1);
end
index = find(comb3==1);
increment = ones(size(index,1),1);
B = accumarray(index,increment);
nzIndex = (B ~= 0);
n_b(nzIndex) = n_b(nzIndex) + B(nzIndex);
%Using comb3 to find where is the best configuration, is in
%W_legali or in W_nLegali?
combinazione = comb.*logical(comb3==1) + comb2.*logical(comb3==2);
W_ottimo = W(sub2ind(size(W),[1:size(W,1)],combinazione'))';
W_tot_migl(:,i) = W_ottimo;
FunzObb(:,i) = z_ott;
[PesoCestelli] = Simulazione_GenerazioneNumeriCasuali (PianoSperimentale,NumeroCestelli,NumeroEsperimenti,Alfa);
[PesoIncertezza_2] = Simulazione_GenerazioneIncertezza (NumeroCestelli,NumeroEsperimenti,IncertezzaCella,PesoCestelli);
PesoIncertezza(MatriceCombinazioni(combinazione,:)~=0) = PesoIncertezza_2(MatriceCombinazioni(combinazione,:)~=0); %updating just the hoppers that has been discharged
end
When you see repmat you should think bsxfun. For example, replace:
Media = (repmat(sum(W_tot_migl,2),1,size(MatriceCombinazioni,1))+W_legali) / ...
(size(W_tot_migl,2)+1);
with
Media = bsxfun(#plus,sum(W_tot_migl,2),W_legali) / ...
(size(W_tot_migl,2)+1);
The purpose of bsxfun is to do a virtual "singleton expansion" like repmat, without actually replicating the array into a matrix of the same size as W_legali.
Also note that in the above code, sum(W_tot_migl,2) is computed twice. There are other small optimizations, but changing to bsxfun should give you a good improvement.
The values of 1./rho_b_legale are effectively computed three times. Store this quotient matrix.

How to calculate center of gravity in grid?

Given a grid (or table) with x*y cells. Each cell contains a value. Most of these cells have a value of 0, but there may be a "hot spot" somewhere on this grid with a cell that has a high value. The neighbours of this cell then also have a value > 0. As farer away from the hot spot as lower the value in the respective grid cell.
So this hot spot can be seen as the top of a hill, with decreasing values the farer we are away from this hill. At a certain distance the values drop to 0 again.
Now I need to determine the cell within the grid that represents the grid's center of gravity. In the simple example above this centroid would simply be the one cell with the highest value. However it's not always that simple:
the decreasing values of neighbour cells around the hot spot cell may not be equally distributed, or a "side of the hill" may fall down to 0 sooner than another side.
there is another hot spot/hill with values > 0 elsewehere within the grid.
I could think that this is kind of a typical problem. Unfortunately I am no math expert so I don't know what to search for (at least I have not found an answer in Google).
Any ideas how can I solve this problem?
Thanks in advance.
You are looking for the "weighted mean" of the cell values. Assuming each cell has a value z(x,y), then you can do the following
zx = sum( z(x, y) ) over all values of y
zy = sum( z(x, y) ) over all values of x
meanX = sum( x * zx(x)) / sum ( zx(x) )
meanY = sum( y * zy(y)) / sum ( zy(y) )
I trust you can convert this into a language of your choice...
Example: if you know Matlab, then the above would be written as follows
zx = sum( Z, 1 ); % sum all the rows
zy = sum( Z, 2 ); % sum all the columns
[ny nx] = size(Z); % find out the dimensions of Z
meanX = sum((1:nx).*zx) / sum(zx);
meanY = sum((1:ny).*zy) / sum(zy);
This would give you the meanX in the range 1 .. nx : if it's right in the middle, the value would be (nx+1)/2. You can obviously scale this to your needs.
EDIT: one more time, in "almost real" code:
// array Z(N, M) contains values on an evenly spaced grid
// assume base 1 arrays
zx = zeros(N);
zy = zeros(M);
// create X profile:
for jj = 1 to M
for ii = 1 to N
zx(jj) = zx(jj) + Z(ii, jj);
next ii
next jj
// create Y profile:
for ii = 1 to N
for jj = 1 to M
zy(ii) = zy(ii) + Z(ii, jj);
next jj
next ii
xsum = 0;
zxsum = 0;
for ii = 1 to N
zxsum += zx(ii);
xsum += ii * zx(ii);
next ii
xmean = xsum / zxsum;
ysum = 0;
zysum = 0;
for jj = 1 to M
zysum += zy(jj);
ysum += jj * zy(ii);
next jj
ymean = ysum / zysum;
This Wikipedia entry may help; the section entitled "A system of particles" is all you need. Just understand that you need to do the calculation once for each dimension, of which you apparently have two.
And here is a complete Scala 2.10 program to generate a grid full of random integers (using dimensions specified on the command line) and find the center of gravity (where rows and columns are numbered starting at 1):
object Ctr extends App {
val Array( nRows, nCols ) = args map (_.toInt)
val grid = Array.fill( nRows, nCols )( util.Random.nextInt(10) )
grid foreach ( row => println( row mkString "," ) )
val sum = grid.map(_.sum).sum
val xCtr = ( ( for ( i <- 0 until nRows; j <- 0 until nCols )
yield (j+1) * grid(i)(j) ).sum :Float ) / sum
val yCtr = ( ( for ( i <- 0 until nRows; j <- 0 until nCols )
yield (i+1) * grid(i)(j) ).sum :Float ) / sum
println( s"Center is ( $xCtr, $yCtr )" )
}
You could def a function to keep the calculations DRYer, but I wanted to keep it as obvious as possible. Anyway, here we run it a couple of times:
$ scala Ctr 3 3
4,1,9
3,5,1
9,5,0
Center is ( 1.8378378, 2.0 )
$ scala Ctr 6 9
5,1,1,0,0,4,5,4,6
9,1,0,7,2,7,5,6,7
1,2,6,6,1,8,2,4,6
1,3,9,8,2,9,3,6,7
0,7,1,7,6,6,2,6,1
3,9,6,4,3,2,5,7,1
Center is ( 5.2956524, 3.626087 )

Classify points according to euclidean distance - optimize code

I have a matrix A consisting of 200 vectors of size d.
I want that a matrix B consisting of 4096 vectors gets classified to these points according to the nearest distance rule.
Thus the result should have rows of size B having the id number ( from 1 to 200 ) to which it belongs.
I have written this code via 2 for loops and it takes lots of time for calculation.
for i = 1:4096
counter = 1;
vector1 = FaceImage(i,:);
vector2 = Centroids(1,:);
distance = pdist( [ vector1 ; vector2] , 'euclidean' );
for j = 2:200
vector2 = Centroids(j,:);
temp = pdist( [ vector1 ; vector2] , 'euclidean' );
if temp < distance
distance = temp;
counter = j;
end
end
Histogram( i ) = counter;
end
Can somebody help me out increasing the efficiency of the above code ... or perhaps suggest me an inbuilt function ?
Thanks
You can do this in one line with pdist2:
[~, Histogram] = pdist2( Centroids, FaceImage, 'euclidian', 'Smallest', 1);
Timing for original code:
FaceImage = rand(4096, 100);
Centroids = rand(200, 100);
tic
* your code *
toc
Elapsed time is 87.434877 seconds.
Timing for my code:
tic
[~, Histogram_2] = pdist2( Centroids, FaceImage, 'euclidean', 'Smallest', 1);
toc
Elapsed time is 0.111736 seconds.
Asserting the results are the same:
>> all(Histogram==Histogram_2)
ans =
1
Try out this
vector2 = Centroids(1,:);
vector = [ vector2 ; FaceImage ];
temp = pdist( vector , 'euclidean' );
answer = temp[1:4096]; % will contain first 4096 lines as distances between vector2 and rows of Face Image
Now you can find the minimum of these distances and that `row + 1` will be the vector that is closest to the point

Resources