efficient matlab implementation for Lukas-Kanade step - algorithm

I got an assignment in a video processing course - to implement the Lucas-Kanade algorithm. Since we have to do it in the pyramidal model, I first build a pyramid for each of the 2 input images, and then for each level I perform a number of LK iterations. in each step (iteration), the following code runs (note: the images are zero-padded so I can handle the image edges easily):
function [du,dv]= LucasKanadeStep(I1,I2,WindowSize)
It = I2-I1;
[Ix, Iy] = imgradientxy(I2);
Ixx = imfilter(Ix.*Ix, ones(5));
Iyy = imfilter(Iy.*Iy, ones(5));
Ixy = imfilter(Ix.*Iy, ones(5));
Ixt = imfilter(Ix.*It, ones(5));
Iyt = imfilter(Iy.*It, ones(5));
half_win = floor(WindowSize/2);
du = zeros(size(It));
dv = zeros(size(It));
A = zeros(2);
b = zeros(2,1);
%iterate only on the relevant parts of the images
for i = 1+half_win : size(It,1)-half_win
for j = 1+half_win : size(It,2)-half_win
A(1,1) = Ixx(i,j);
A(2,2) = Iyy(i,j);
A(1,2) = Ixy(i,j);
A(2,1) = Ixy(i,j);
b(1,1) = -Ixt(i,j);
b(2,1) = -Iyt(i,j);
U = pinv(A)*b;
du(i,j) = U(1);
dv(i,j) = U(2);
end
end
end
mathematically what I'm doing is calculating for every pixel (i,j) the following optical flow:
as you can see, in the code I am calculating this for each pixel, which takes quite a long time (the whole processing for 2 images - including building 3 levels pyramids and 3 LK steps like the one above on each level - takes about 25 seconds (!) on a remote connection to my university servers).
My question: Is there a way to calculate this single LK step without the nested for loops? it must be more efficient because the next step of the assignment is to stabilize a short video using this algorithm.. thanks.

I ran your code on my system and did profiling. Here is what I got.
As you can see inverting the matrix(pinv) is taking most of the time. You can try and vectorise your code I guess, but I am not sure how to do it. But I do know a trick to improve the compute time. You have to exploit the minimum variance of the matrix A. That is, compute the inverse only if the minimum variance of A is greater than some threshold. This will improve the speed as you won't be inverting the matrix for all the pixel.
You do this by modifying your code to the one shown below.
function [du,dv]= LucasKanadeStep(I1,I2,WindowSize)
It = double(I2-I1);
[Ix, Iy] = imgradientxy(I2);
Ixx = imfilter(Ix.*Ix, ones(5));
Iyy = imfilter(Iy.*Iy, ones(5));
Ixy = imfilter(Ix.*Iy, ones(5));
Ixt = imfilter(Ix.*It, ones(5));
Iyt = imfilter(Iy.*It, ones(5));
half_win = floor(WindowSize/2);
du = zeros(size(It));
dv = zeros(size(It));
A = zeros(2);
B = zeros(2,1);
%iterate only on the relevant parts of the images
for i = 1+half_win : size(It,1)-half_win
for j = 1+half_win : size(It,2)-half_win
A(1,1) = Ixx(i,j);
A(2,2) = Iyy(i,j);
A(1,2) = Ixy(i,j);
A(2,1) = Ixy(i,j);
B(1,1) = -Ixt(i,j);
B(2,1) = -Iyt(i,j);
% +++++++++++++++++++++++++++++++++++++++++++++++++++
% Code I added , threshold better be outside the loop.
lambda = eig(A);
threshold = 0.2
if (min(lambda)> threshold)
U = A\B;
du(i,j) = U(1);
dv(i,j) = U(2);
end
% end of addendum
% +++++++++++++++++++++++++++++++++++++++++++++++++++
% U = pinv(A)*B;
% du(i,j) = U(1);
% dv(i,j) = U(2);
end
end
end
I have set the threshold to 0.2. You can experiment with it. By using eigen value trick I was able to get the compute time from 37 seconds to 10 seconds(shown below). Using eigen, pinv hardly takes up the time like before.
Hope this helped. Good luck :)

Eventually I was able to find a much more efficient solution to this problem.
It is based on the formula shown in the question. The last 3 lines are what makes the difference - we get a loop-free code that works way faster. There were negligible differences from the looped version (~10^-18 or less in terms of absolute difference between the result matrices, ignoring the padding zone).
Here is the code:
function [du,dv]= LucasKanadeStep(I1,I2,WindowSize)
half_win = floor(WindowSize/2);
% pad frames with mirror reflections of itself
I1 = padarray(I1, [half_win half_win], 'symmetric');
I2 = padarray(I2, [half_win half_win], 'symmetric');
% create derivatives (time and space)
It = I2-I1;
[Ix, Iy] = imgradientxy(I2, 'prewitt');
% calculate dP = (du, dv) according to the formula
Ixx = imfilter(Ix.*Ix, ones(WindowSize));
Iyy = imfilter(Iy.*Iy, ones(WindowSize));
Ixy = imfilter(Ix.*Iy, ones(WindowSize));
Ixt = imfilter(Ix.*It, ones(WindowSize));
Iyt = imfilter(Iy.*It, ones(WindowSize));
% calculate the whole du,dv matrices AT ONCE!
invdet = (Ixx.*Iyy - Ixy.*Ixy).^-1;
du = invdet.*(-Iyy.*Ixt + Ixy.*Iyt);
dv = invdet.*(Ixy.*Ixt - Ixx.*Iyt);
end

Related

Image Processing: Algorithm taking too long in MATLAB

I am working in MATLAB to process two 512x512 images, the domain image and the range image. What I am trying to accomplish is the following:
Divide both domain and range images into 8x8 pixel blocks
For each 8x8 block in the domain image, I have to apply a linear transformations to it and compare each of the 4096 transformed blocks with each of the 4096 range blocks.
Compute error in each case between the transformed block and the range image block and find the minimum error.
Finally I'll have for each 8x8 range block, the id of the 8x8 domain block for which the error was minimum (error between the range block and the transformed domain block)
To achieve this, I have written the following code:
RangeImagecolor = imread('input.png'); %input is 512x512
DomainImagecolor = imread('input.png'); %Range and Domain images are identical
RangeImagetemp = rgb2gray(RangeImagecolor);
DomainImagetemp = rgb2gray(DomainImagecolor);
RangeImage = im2double(RangeImagetemp);
DomainImage = im2double(DomainImagetemp);
%For the (k,l)th 8x8 range image block
for k = 1:64
for l = 1:64
minerror = 9999;
min_i = 0;
min_j = 0;
for i = 1:64
for j = 1:64
%here I compute for the (i,j)th domain block, the transformed domain block stored in D_trans
error = 0;
D_trans = zeros(8,8);
R = zeros(8,8); %Contains the pixel values of the (k,l)th range block
for m = 1:8
for n = 1:8
R(m,n) = RangeImage(8*k-8+m,8*l-8+n);
%ApplyTransformation can depend on (k,l) so I can't compute the transformation outside the k,l loop.
[m_dash,n_dash] = ApplyTransformation(8*i-8+m,8*j-8+n);
D_trans(m,n) = DomainImage(m_dash,n_dash);
error = error + (R(m,n)-D_trans(m,n))^2;
end
end
if(error < minerror)
minerror = error;
min_i = i;
min_j = j;
end
end
end
end
end
As an example ApplyTransformation, one can use the identity transformation:
function [x_dash,y_dash] = Iden(x,y)
x_dash = x;
y_dash = y;
end
Now the problem I am facing is the high computation time. The order of computation in the above code is 64^5, which is of the order 10^9. This computation should take at the worst minutes or an hour. It takes about 40 minutes to compute just 50 iterations. I don't know why the code is running so slow.
Thanks for reading my question.
You can use im2col* to convert the image to column format so each block forms a column of a [64 * 4096] matrix. Then apply transformation to each column and use bsxfun to vectorize computation of error.
DomainImage=rand(512);
RangeImage=rand(512);
DomainImage_col = im2col(DomainImage,[8 8],'distinct');
R = im2col(RangeImage,[8 8],'distinct');
[x y]=ndgrid(1:8);
function [x_dash, y_dash] = ApplyTransformation(x,y)
x_dash = x;
y_dash = y;
end
[x_dash, y_dash] = ApplyTransformation(x,y);
idx = sub2ind([8 8],x_dash, y_dash);
D_trans = DomainImage_col(idx,:); %transformation is reduced to matrix indexing
Error = 0;
for mn = 1:64
Error = Error + bsxfun(#minus,R(mn,:),D_trans(mn,:).').^2;
end
[minerror ,min_ij]= min(Error,[],2); % linear index of minimum of each block;
[min_i min_j]=ind2sub([64 64],min_ij); % convert linear index to subscript
Explanation:
Our goal is to reduce number of loops as much as possible. For it we should avoid matrix indexing and instead we should use vectorization. Nested loops should be converted to one loop. As the first step we can create a more optimized loop as here:
min_ij = zeros(4096,1);
for kl = 1:4096 %%% => 1:size(D_trans,2)
minerror = 9999;
min_ij(kl) = 0;
for ij = 1:4096 %%% => 1:size(R,2)
Error = 0;
for mn = 1:64
Error = Error + (R(mn,kl) - D_trans(mn,ij)).^2;
end
if(Error < minerror)
minerror = Error;
min_ij(kl) = ij;
end
end
end
We can re-arrange the loops and we can make the most inner loop as the outer loop and separate computation of the minimum from the computation of the error.
% Computation of the error
Error = zeros(4096,4096);
for mn = 1:64
for kl = 1:4096
for ij = 1:4096
Error(kl,ij) = Error(kl,ij) + (R(mn,kl) - D_trans(mn,ij)).^2;
end
end
end
% Computation of the min
min_ij = zeros(4096,1);
for kl = 1:4096
minerror = 9999;
min_ij(kl) = 0;
for ij = 1:4096
if(Error(kl,ij) < minerror)
minerror = Error(kl,ij);
min_ij(kl) = ij;
end
end
end
Now the code is arranged in a way that can best be vectorized:
Error = 0;
for mn = 1:64
Error = Error + bsxfun(#minus,R(mn,:),D_trans(mn,:).').^2;
end
[minerror ,min_ij] = min(Error, [], 2);
[min_i ,min_j] = ind2sub([64 64], min_ij);
*If you don't have the Image Processing Toolbox a more efficient implementation of im2col can be found here.
*The whole computation takes less than a minute.
First things first - your code doesn't do anything. But you likely do something with this minimum error stuff and only forgot to paste this here, or still need to code that bit. Never mind for now.
One big issue with your code is that you calculate transformation for 64x64 blocks of resulting image AND source image. 64^5 iterations of a complex operation are bound to be slow. Rather, you should calculate all transformations at once and save them.
allTransMats = cell(64);
for i = 1 : 64
for j = 1 : 64
allTransMats{i,j} = getTransformation(DomainImage, i, j)
end
end
function D_trans = getTransformation(DomainImage, i,j)
D_trans = zeros(8);
for m = 1 : 8
for n = 1 : 8
[m_dash,n_dash] = ApplyTransformation(8*i-8+m,8*j-8+n);
D_trans(m,n) = DomainImage(m_dash,n_dash);
end
end
end
This serves to get allTransMat and is OUTSIDE the k, l loop. Preferably as a simple function.
Now, you make your big k, l, i, j loop, where you compare all the elements as needed. Comparison could be also done block-wise instead of filling a small 8x8 matrix, yet doing it per element for some reason.
m = 1 : 8;
n = m;
for ...
R = RangeImage(...); % This will give 8x8 output as n and m are vectors.
D = allTransMats{i,j};
difference = sum(sum((R-D).^2));
if (difference < minDifference) ...
end
Even though this is a simple no transformations case, this speeds up code a lot.
Finally, are you sure you need to compare each block of transformed output with each block in the source? Typically you compare block1(a,b) with block2(a,b) - blocks (or pixels) on the same position.
EDIT: allTransMats requires k and l too. Ouch. There is NO WAY to make this fast for a single iteration, as you require 64^5 calls to ApplyTransformation (or a vectorization of that function, but even then it might not be fast - we would have to see the function to help here).
Therefore, I will re-iterate my advice to generate all transformations and then perform lookup: this upper part of the answer with allTransMats generation should be changed to have all 4 loops and generate allTransMats{i,j,k,l};. It WILL be slow, there is no way around that as I mentioned in the upper part of edit. But, it is a cost you pay once, as after saving the allTransMats, all further image analyses will be able to simply load it instead of generating it again.
But ... what do you even do? Transformation that depends on source and destination block indices plus pixel indices (= 6 values total) sounds like a mistake somewhere, or a prime candidate to optimize instead of all the rest.

Get better performance for converting matrix to vector

when working with images, usually they include 3 layers, (RGB). In order to do some computation, I need to convert each layer of the image into a vector.
I1 = ones(70,50,3); % the first image
I2 = 0.4 * ones(70,50,3); % the second image
for dd = 1:3
ILayer1 = I1(:,:,dd);
ILayerLinear1 = ILayer1(:);
ILayer2 = I2(:,:,dd);
ILayerLinear2 = ILayer2(:);
comp = ILayerLinear1 * ILayerLinear1.';
end
Here I have replaced the main computation part with a very simple computation, but that is not the point.
Is there a better way to not repeat the matrix-to-vector conversion, or do it more efficiently? Because it may happen multiple times through the code.
Update:
I can also define a function as follows to pass an Image and retrieve a vector, but it still is not improving the code.
function V = I2V(I)
[r,c,d] = size(I);
V = zeros(d,r*c);
for dd = 1:d
layer = I(:,:,dd);
V(dd,:) = layer(:);
end
end
I'm not sure about the outer product but, here's everything else.
I1 = reshape(1:70*50*3, 70,50,3);
I2 = 0.4*reshape(1:70*50*3, 70,50,3);
i1 = reshape(I1, [], 3);
i2 = reshape(I2, [], 3);

Implement a fast optimization algorithm using fixed point method in matlab

I am implementing a fast optimization algorithm using fixed point method in matlab. The goal of that method is that find optimal value of u. Denote u={u_i,i=1..2}. The optimal value of u can be obtained as following steps:
Sorry about my image because I cannot type mathematics equation in here.
To do that task, I tried to find u follows above steps. However, I don't know how to implement the term \sum_{j!=i} (u_j-1) in equation 25. This is my code. Please see it and could you give me some comment or suggestion about my implementation to correct them. Currently, I tried to run that code but it give an incorrect answer.
function u = compute_u_TV(Im0, N_class)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Initialization
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
theta=0.001;
gamma=0.01;
tau=0.1;
sigma=0.1;
N_class=2; % only have u1 and u2
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Iterative segmentation process
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
for i=1:N_class
v(:,:,i) = Im0/max(Im0(:)); % u between 0 and 1.
qxv(:,:,i) = zeros(size(Im0));
qyv(:,:,i) = zeros(size(Im0));
u(:,:,i) = v(:,:,i);
for iteration=1:10000
u_temp=u;
% Update v
Divqi = ( BackwardX(qxv(:,:,i)) + BackwardY(qyv(:,:,i)) );
Term = Divqi - u(:,:,i)/ (theta*gamma);
TermX = ForwardX(Term);
TermY = ForwardY(Term);
Norm = sqrt(TermX.^2 + TermY.^2);
Denom = 1 + tau*Norm;
%Equation 24
qxv(:,:,i) = (qxv(:,:,i) + tau*TermX)./Denom;
qyv(:,:,i) = (qyv(:,:,i) + tau*TermY)./Denom;
v(:,:,i) = u(:,:,i) - theta*gamma* Divqi; %Equation 23
% Update u
u(:,:,i) = (v(:,:,i) - theta* gamma* Divqi -theta*gamma*sigma*(sum(u(:))-u(:,:,i)-1))./(1+theta* gamma*sigma);
u(:,:,i) = max(u(:,:,i),0);
u(:,:,i) = min(u(:,:,i),1);
check=u_temp(:,:,i)-u(:,:,i);
if(abs(sum(check(:)))<=0.1)
break;
end
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Sub-functions- X.Berson
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [dx]=BackwardX(u);
[Ny,Nx] = size(u);
dx = u;
dx(2:Ny-1,2:Nx-1)=( u(2:Ny-1,2:Nx-1) - u(2:Ny-1,1:Nx-2) );
dx(:,Nx) = -u(:,Nx-1);
function [dy]=BackwardY(u);
[Ny,Nx] = size(u);
dy = u;
dy(2:Ny-1,2:Nx-1)=( u(2:Ny-1,2:Nx-1) - u(1:Ny-2,2:Nx-1) );
dy(Ny,:) = -u(Ny-1,:);
function [dx]=ForwardX(u);
[Ny,Nx] = size(u);
dx = zeros(Ny,Nx);
dx(1:Ny-1,1:Nx-1)=( u(1:Ny-1,2:Nx) - u(1:Ny-1,1:Nx-1) );
function [dy]=ForwardY(u);
[Ny,Nx] = size(u);
dy = zeros(Ny,Nx);
dy(1:Ny-1,1:Nx-1)=( u(2:Ny,1:Nx-1) - u(1:Ny-1,1:Nx-1) );
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% End of sub-function
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
You should do
u(:,:,i) = (v(:,:,i) - theta* gamma* Divqi -theta*gamma*sigma* ...
(sum(u(:,:,1:size(u,3) ~= i),3) -1))./(1+theta* gamma*sigma);
The part you were searching for is
sum(u(:,:,1:size(u,3) ~= i),3)
Let's decompose this :
1:size(u,3) ~= i
is a vector containing all values from 1 to the max size of u on the third dimension except i.
Then
u(:,:,1:size(u,3) ~= i)
is all the matrix of the third dimension of u except for j = i
Finally,
sum(...,3)
is the sum of all the matrix by the thrid dimension.
Let me know if it does help!

Faster concatenation of cell arrays of different sizes

I have a cell array of size m x 1 and each cell is again s x t cell array (size varies). I would like to concatenate vertically. The code is as follows:
function(cell_out) = vert_cat(cell_in)
[row,col] = cellfun(#size,cell_in,'Uni',0);
fcn_vert = #(x)([x,repmat({''},size(x,1),max(cell2mat(col))-size(x,2))]);
cell_out = cellfun(fcn_vert,cell_in,'Uni',0); % Taking up lot of time
cell_out = vertcat(cell_out{:});
end
Step 3 takes a lot of time. Is it the right way to do or is there any another faster way to achieve this?
cellfun has been found to be slower than loops (kind of old, but agrees with what I have seen).
In addition, repmat has also been a performance hit in the past (though that may be different now).
Try this two-loop code that aims to accomplish your task:
function cellOut = vert_cat(c)
nElem = length(c);
colPad = zeros(nElem,1);
nRow = zeros(nElem,1);
for k = 1:nElem
[nRow(k),colPad(k)] = size(c{k});
end
colMax = max(colPad);
colPad = colMax - colPad;
cellOut = cell(sum(nRow),colMax);
bottom = cumsum(nRow) - nRow + 1;
top = bottom + nRow - 1;
for k = 1:nElem
cellOut(bottom(k):top(k),:) = [c{k},cell(nRow(k),colPad(k))];
end
end
My test for this code was
A = rand(20,20);
A = mat2cell(A,ones(20,1),ones(20,1));
C = arrayfun(#(c) A(1:c,1:c),randi([1,15],1,5),'UniformOutput',false);
ccat = vert_cat(c);
I used this pice of code to generate data:
%generating some dummy data
m=1000;
s=100;
t=100;
cell_in=cell(m,1);
for idx=1:m
cell_in{idx}=cell(randi(s),randi(t));
end
Applying some minor modifications, I was able to speed up the code by a factor of 5
%Minor modifications of the original code
%use arrays instead of cells for row and col
[row,col] = cellfun(#size,cell_in);
%claculate max(col) once
tcol=max(col);
%use cell instead of repmat to generate an empty cell
fcn_vert = #(x)([x,cell(size(x,1),tcol-size(x,2))]);
cell_out = cellfun(fcn_vert,cell_in,'Uni',0); % Taking up lot of time
cell_out = vertcat(cell_out{:});
Using simply a for loop is even faster, because the data is only moved once
%new approac. Basic idea: move every data only once
[row,col] = cellfun(#size,cell_in);
trow=sum(row);
tcol=max(col);
r=1;
cell_out2 = cell(trow,tcol);
for idx=1:numel(cell_in)
cell_out2(r:r+row(idx)-1,1:col(idx))=cell_in{idx};
r=r+row(idx);
end

Speed up an Enumeration process

After a few days of optimization this is my code for an enumeration process that consist in finding the best combination for every row of W. The algorithm separates the matrix W in one where the elements of W are grather of LimiteInferiore (called W_legali) and one that have only element below the limit (called W_nlegali).
Using some parameters like Media (aka Mean), rho_b_legali The algorithm minimizes the total cost function. In the last part, I find where is the combination with the lowest value of objective function and save it in W_ottimo
As you can see the algorithm is not so "clean" and with very large matrix (142506x3000) is damn slow...So, can somebody help me to speed it up a little bit?
for i=1:3000
W = PesoIncertezza * MatriceCombinazioni';
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
W_legali = W;
W_legali(W<LimiteInferiore) = nan;
if i==1
Media = W_legali;
rho_b_legale = ones(size (W_legali,1),size(MatriceCombinazioni,1));
else
Media = (repmat(sum(W_tot_migl,2),1,size(MatriceCombinazioni,1))+W_legali)/(size(W_tot_migl,2)+1);
rho_b_legale = repmat(((n_b+1)/i),1,size(MatriceCombinazioni,1));
end
[W_legali_migl,comb] = min(C_u .* Media .* (1./rho_b_legale) + (1./rho_b_legale) .* c_0 + (c_1./(i * rho_b_legale)),[],2);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
MatriceCombinazioni_2 = MatriceCombinazioni;
MatriceCombinazioni_2(sum(MatriceCombinazioni_2,2)<2,:)=[];
W_nlegali = PesoIncertezza * MatriceCombinazioni_2';
W_nlegali(W_nlegali>=LimiteInferiore) = nan;
if i==1
Media = W_nlegali;
rho_b_nlegale = zeros(size (W_nlegali,1),size(MatriceCombinazioni_2,1));
else
Media = (repmat(sum(W_tot_migl,2),1,size(MatriceCombinazioni_2,1))+W_nlegali)/(size(W_tot_migl,2)+1);
rho_b_nlegale = repmat(((n_b)/i),1,size(MatriceCombinazioni_2,1));
end
[W_nlegali_migliori,comb2] = min(C_u .* Media .* (1./rho_b_nlegale) + (1./rho_b_nlegale) .* c_0 + (c_1./(i * rho_b_nlegale)),[],2);
z = [W_legali_migl, W_nlegali_migliori];
[z_ott,comb3] = min(z,[],2);
%Increasing n_b
if i==1
n_b = zeros(size(W,1),1);
end
index = find(comb3==1);
increment = ones(size(index,1),1);
B = accumarray(index,increment);
nzIndex = (B ~= 0);
n_b(nzIndex) = n_b(nzIndex) + B(nzIndex);
%Using comb3 to find where is the best configuration, is in
%W_legali or in W_nLegali?
combinazione = comb.*logical(comb3==1) + comb2.*logical(comb3==2);
W_ottimo = W(sub2ind(size(W),[1:size(W,1)],combinazione'))';
W_tot_migl(:,i) = W_ottimo;
FunzObb(:,i) = z_ott;
[PesoCestelli] = Simulazione_GenerazioneNumeriCasuali (PianoSperimentale,NumeroCestelli,NumeroEsperimenti,Alfa);
[PesoIncertezza_2] = Simulazione_GenerazioneIncertezza (NumeroCestelli,NumeroEsperimenti,IncertezzaCella,PesoCestelli);
PesoIncertezza(MatriceCombinazioni(combinazione,:)~=0) = PesoIncertezza_2(MatriceCombinazioni(combinazione,:)~=0); %updating just the hoppers that has been discharged
end
When you see repmat you should think bsxfun. For example, replace:
Media = (repmat(sum(W_tot_migl,2),1,size(MatriceCombinazioni,1))+W_legali) / ...
(size(W_tot_migl,2)+1);
with
Media = bsxfun(#plus,sum(W_tot_migl,2),W_legali) / ...
(size(W_tot_migl,2)+1);
The purpose of bsxfun is to do a virtual "singleton expansion" like repmat, without actually replicating the array into a matrix of the same size as W_legali.
Also note that in the above code, sum(W_tot_migl,2) is computed twice. There are other small optimizations, but changing to bsxfun should give you a good improvement.
The values of 1./rho_b_legale are effectively computed three times. Store this quotient matrix.

Resources