Efficiently compute pairwise squared Euclidean distance in Matlab - performance

Given two sets of d-dimensional points. How can I most efficiently compute the pairwise squared euclidean distance matrix in Matlab?
Notation:
Set one is given by a (numA,d)-matrix A and set two is given by a (numB,d)-matrix B. The resulting distance matrix shall be of the format (numA,numB).
Example points:
d = 4; % dimension
numA = 100; % number of set 1 points
numB = 200; % number of set 2 points
A = rand(numA,d); % set 1 given as matrix A
B = rand(numB,d); % set 2 given as matrix B

The usually given answer here is based on bsxfun (cf. e.g. [1]). My proposed approach is based on matrix multiplication and turns out to be much faster than any comparable algorithm I could find:
helpA = zeros(numA,3*d);
helpB = zeros(numB,3*d);
for idx = 1:d
helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*A(:,idx), A(:,idx).^2 ];
helpB(:,3*idx-2:3*idx) = [B(:,idx).^2 , B(:,idx), ones(numB,1)];
end
distMat = helpA * helpB';
Please note:
For constant d one can replace the for-loop by hardcoded implementations, e.g.
helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*A(:,1), A(:,1).^2, ... % d == 2
ones(numA,1), -2*A(:,2), A(:,2).^2 ]; % etc.
Evaluation:
%% create some points
d = 2; % dimension
numA = 20000;
numB = 20000;
A = rand(numA,d);
B = rand(numB,d);
%% pairwise distance matrix
% proposed method:
tic;
helpA = zeros(numA,3*d);
helpB = zeros(numB,3*d);
for idx = 1:d
helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*A(:,idx), A(:,idx).^2 ];
helpB(:,3*idx-2:3*idx) = [B(:,idx).^2 , B(:,idx), ones(numB,1)];
end
distMat = helpA * helpB';
toc;
% compare to pdist2:
tic;
pdist2(A,B).^2;
toc;
% compare to [1]:
tic;
bsxfun(#plus,dot(A,A,2),dot(B,B,2)')-2*(A*B');
toc;
% Another method: added 07/2014
% compare to ndgrid method (cf. Dan's comment)
tic;
[idxA,idxB] = ndgrid(1:numA,1:numB);
distMat = zeros(numA,numB);
distMat(:) = sum((A(idxA,:) - B(idxB,:)).^2,2);
toc;
Result:
Elapsed time is 1.796201 seconds.
Elapsed time is 5.653246 seconds.
Elapsed time is 3.551636 seconds.
Elapsed time is 22.461185 seconds.
For a more detailed evaluation w.r.t. dimension and number of data points follow the discussion below (#comments). It turns out that different algos should be preferred in different settings. In non time critical situations just use the pdist2 version.
Further development:
One can think of replacing the squared euclidean by any other metric based on the same principle:
help = zeros(numA,numB,d);
for idx = 1:d
help(:,:,idx) = [ones(numA,1), A(:,idx) ] * ...
[B(:,idx)' ; -ones(1,numB)];
end
distMat = sum(ANYFUNCTION(help),3);
Nevertheless, this is quite time consuming. It could be useful to replace for smaller d the 3-dimensional matrix help by d 2-dimensional matrices. Especially for d = 1 it provides a method to compute the pairwise difference by a simple matrix multiplication:
pairDiffs = [ones(numA,1), A ] * [B'; -ones(1,numB)];
Do you have any further ideas?

For squared Euclidean distance one can also use the following formula
||a-b||^2 = ||a||^2 + ||b||^2 - 2<a,b>
Where <a,b> is the dot product between a and b
nA = sum( A.^2, 2 ); %// norm of A's elements
nB = sum( B.^2, 2 ); %// norm of B's elements
distMat = bsxfun( #plus, nA, nB' ) - 2 * A * B' ;
Recently, I've been told that as of R2016b this method for computing square Euclidean distance is faster than accepted method.

Related

Efficient way of computing multivariate gaussian varying the mean - Matlab

Is there a efficient way to do the computation of a multivariate gaussian (as below) that returns matrix p , that is, making use of some sort of vectorization? I am aware that matrix p is symmetric, but still for a matrix of size 40000x3, for example, this will take quite a long time.
Matlab code example:
DataMatrix = [3 1 4; 1 2 3; 1 5 7; 3 4 7; 5 5 1; 2 3 1; 4 4 4];
[rows, cols ] = size(DataMatrix);
I = eye(cols);
p = zeros(rows);
for k = 1:rows
p(k,:) = mvnpdf(DataMatrix(:,:),DataMatrix(k,:),I);
end
Stage 1: Hack into source code
Iteratively we are performing mvnpdf(DataMatrix(:,:),DataMatrix(k,:),I)
The syntax is : mvnpdf(X,Mu,Sigma).
Thus, the correspondence with our input becomes :
X = DataMatrix(:,:);
Mu = DataMatrix(k,:);
Sigma = I
For the sizes relevant to our situation, the source code mvnpdf.m reduces to -
%// Store size parameters of X
[n,d] = size(X);
%// Get vector mean, and use it to center data
X0 = bsxfun(#minus,X,Mu);
%// Make sure Sigma is a valid covariance matrix
[R,err] = cholcov(Sigma,0);
%// Create array of standardized data, and compute log(sqrt(det(Sigma)))
xRinv = X0 / R;
logSqrtDetSigma = sum(log(diag(R)));
%// Finally get the quadratic form and thus, the final output
quadform = sum(xRinv.^2, 2);
p_out = exp(-0.5*quadform - logSqrtDetSigma - d*log(2*pi)/2)
Now, if the Sigma is always an identity matrix, we would have R as an identity matrix too. Therefore, X0 / R would be same as X0, which is saved as xRinv. So, essentially quadform = sum(X0.^2, 2);
Thus, the original code -
for k = 1:rows
p(k,:) = mvnpdf(DataMatrix(:,:),DataMatrix(k,:),I);
end
reduces to -
[n,d] = size(DataMatrix);
[R,err] = cholcov(I,0);
p_out = zeros(rows);
K = sum(log(diag(R))) + d*log(2*pi)/2;
for k = 1:rows
X0 = bsxfun(#minus,DataMatrix,DataMatrix(k,:));
quadform = sum(X0.^2, 2);
p_out(k,:) = exp(-0.5*quadform - K);
end
Now, if the input matrix is of size 40000x3, you might want to stop here. But with system resources permitting, you can vectorize everything as discussed next.
Stage 2: Vectorize everything
Now that we see what's actually going on and that the computations look parallelizable, it's time to step-up to use bsxfun in 3D with his good friend permute for a vectorized solution, like so -
%// Get size params and R
[n,d] = size(DataMatrix);
[R,err] = cholcov(I,0);
%// Calculate constants : "logSqrtDetSigma" and "d*log(2*pi)/2`"
K1 = sum(log(diag(R)));
K2 = d*log(2*pi)/2;
%// Major thing happening here as we calclate "X0" for all iterations
%// in one go with permute and bsxfun
diffs = bsxfun(#minus,DataMatrix,permute(DataMatrix,[3 2 1]));
%// "Sigma" is an identity matrix, so it plays no in "/R" at "xRinv = X0 / R".
%// Perform elementwise squaring and summing rows to get vectorized "quadform"
quadform1 = squeeze(sum(diffs.^2,2))
%// Finally use "quadform1" and get vectorized output as a 2D array
p_out = exp(-0.5*quadform1 - K1 - K2)

Implementation of EM algorithm for Gaussian Mixture Models

Using the EM algorithm, I want to train a Gaussian Mixture model with four components on a given dataset. The set is three dimensional and contains 300 samples.
The problem is that after about 6 rounds of the EM algorithm, the covariance matrices sigma become close to singular according to matlab (rank(sigma) = 2 instead of 3). This in turn leads to undesired results like complex values evaluating the gaussian distribution gm(k,i).
Furthermore I used the log of the gaussian to account for underflow troubles - see E-step. I am not sure if this is correct and if I have to take the exp of the responsibilites p(w_k | x^(i), theta) somewhere else?
Can you tell me if my implementation of the EM algorithm is correct so far?
And how to account for the problem with the close to singular covariance sigma?
Here is my implementation of the EM algorithm:
First I initialized the means and the covariance of the components using kmeans:
load('data1.mat');
X = Data'; % 300x3 data set
D = size(X,2); % dimension
N = size(X,1); % number of samples
K = 4; % number of Gaussian Mixture components
% Initialization
p = [0.2, 0.3, 0.2, 0.3]; % arbitrary pi
[idx,mu] = kmeans(X,K); % initial means of the components
% compute the covariance of the components
sigma = zeros(D,D,K);
for k = 1:K
sigma(:,:,k) = cov(X(idx==k,:));
end
For the E-step I am using the following formula to calculate the responsibilities.
w_k are the k gaussian components.
x^(i) is a single datapoint (sample)
theta stands for the parameters of the gaussian mixture model: mu, Sigma, pi.
Here is the corresponding code:
% variables for convergence
converged = 0;
prevLoglikelihood = Inf;
prevMu = mu;
prevSigma = sigma;
prevPi = p;
round = 0;
while (converged ~= 1)
round = round +1
gm = zeros(K,N); % gaussian component in the nominator
sumGM = zeros(N,1); % denominator of responsibilities
% E-step: Evaluate the responsibilities using the current parameters
% compute the nominator and denominator of the responsibilities
for k = 1:K
for i = 1:N
Xmu = X-mu;
% I am using log to prevent underflow of the gaussian distribution (exp("small value"))
logPdf = log(1/sqrt(det(sigma(:,:,k))*(2*pi)^D)) + (-0.5*Xmu*(sigma(:,:,k)\Xmu'));
gm(k,i) = log(p(k)) * logPdf;
sumGM(i) = sumGM(i) + gm(k,i);
end
end
% calculate responsibilities
res = zeros(K,N); % responsibilities
Nk = zeros(4,1);
for k = 1:K
for i = 1:N
% I tried to use the exp(gm(k,i)/sumGM(i)) to compute res but this leads to sum(pi) > 1.
res(k,i) = gm(k,i)/sumGM(i);
end
Nk(k) = sum(res(k,:));
end
Nk(k) is computed using the formula given in the M-step and is used in the M-step to calculate the new probabilities p(k).
M-step
% M-step: Re-estimate the parameters using the current responsibilities
for k = 1:K
for i = 1:N
mu(k,:) = mu(k,:) + res(k,i).*X(k,:);
sigma(:,:,k) = sigma(:,:,k) + res(k,i).*(X(k,:)-mu(k,:))*(X(k,:)-mu(k,:))';
end
mu(k,:) = mu(k,:)./Nk(k);
sigma(:,:,k) = sigma(:,:,k)./Nk(k);
p(k) = Nk(k)/N;
end
Now in order to check for convergence the log-likelihood is computed using this formula:
% Evaluate the log-likelihood and check for convergence of either
% the parameters or the log-likelihood. If not converged, go to E-step.
loglikelihood = 0;
for i = 1:N
loglikelihood = loglikelihood + log(sum(gm(:,i)));
end
% Check for convergence of parameters
errorLoglikelihood = abs(loglikelihood-prevLoglikelihood);
if (errorLoglikelihood <= eps)
converged = 1;
end
errorMu = abs(mu(:)-prevMu(:));
errorSigma = abs(sigma(:)-prevSigma(:));
errorPi = abs(p(:)-prevPi(:));
if (all(errorMu <= eps) && all(errorSigma <= eps) && all(errorPi <= eps))
converged = 1;
end
prevLoglikelihood = loglikelihood;
prevMu = mu;
prevSigma = sigma;
prevPi = p;
end % while
Is there something wrong with my Matlab implementation of EM algorithm for Gaussian Mixture Models?
Previous troubles:
The problem is that I cannot check for convergence using the log-likelihood because it is -Inf. This results from rounded zero values while evaluating the gaussian in the formula of the responsibilities (see E-step).
Can you tell me if my implementation of the EM algorithm is correct so far?
And how to account for the problem with the rounded zero values?
Here is my implementation of the EM algorithm:
First I initialized the means and the covariance of the components using kmeans:
load('data1.mat');
X = Data'; % 300x3 data set
D = size(X,2); % dimension
N = size(X,1); % number of samples
K = 4; % number of Gaussian Mixture components
% Initialization
p = [0.2, 0.3, 0.2, 0.3]; % arbitrary pi
[idx,mu] = kmeans(X,K); % initial means of the components
% compute the covariance of the components
sigma = zeros(D,D,K);
for k = 1:K
sigma(:,:,k) = cov(X(idx==k,:));
end
For the E-step I am using the following formula to calculate the responsibilities
Here is the corresponding code:
% variables for convergence
converged = 0;
prevLoglikelihood = Inf;
prevMu = mu;
prevSigma = sigma;
prevPi = p;
round = 0;
while (converged ~= 1)
round = round +1
gm = zeros(K,N); % gaussian component in the nominator -
% some values evaluate to zero
sumGM = zeros(N,1); % denominator of responsibilities
% E-step: Evaluate the responsibilities using the current parameters
% compute the nominator and denominator of the responsibilities
for k = 1:K
for i = 1:N
% HERE values evalute to zero e.g. exp(-746.6228) = -Inf
gm(k,i) = p(k)/sqrt(det(sigma(:,:,k))*(2*pi)^D)*exp(-0.5*(X(i,:)-mu(k,:))*inv(sigma(:,:,k))*(X(i,:)-mu(k,:))');
sumGM(i) = sumGM(i) + gm(k,i);
end
end
% calculate responsibilities
res = zeros(K,N); % responsibilities
Nk = zeros(4,1);
for k = 1:K
for i = 1:N
res(k,i) = gm(k,i)/sumGM(i);
end
Nk(k) = sum(res(k,:));
end
Nk(k) is computed using the formula given in the M-step.
M-step
% M-step: Re-estimate the parameters using the current responsibilities
mu = zeros(K,3);
for k = 1:K
for i = 1:N
mu(k,:) = mu(k,:) + res(k,i).*X(k,:);
sigma(:,:,k) = sigma(:,:,k) + res(k,i).*(X(k,:)-mu(k,:))*(X(k,:)-mu(k,:))';
end
mu(k,:) = mu(k,:)./Nk(k);
sigma(:,:,k) = sigma(:,:,k)./Nk(k);
p(k) = Nk(k)/N;
end
Now in order to check for convergence the log-likelihood is computed using this formula:
% Evaluate the log-likelihood and check for convergence of either
% the parameters or the log-likelihood. If not converged, go to E-step.
loglikelihood = 0;
for i = 1:N
loglikelihood = loglikelihood + log(sum(gm(:,i)));
end
% Check for convergence of parameters
errorLoglikelihood = abs(loglikelihood-prevLoglikelihood);
if (errorLoglikelihood <= eps)
converged = 1;
end
errorMu = abs(mu(:)-prevMu(:));
errorSigma = abs(sigma(:)-prevSigma(:));
errorPi = abs(p(:)-prevPi(:));
if (all(errorMu <= eps) && all(errorSigma <= eps) && all(errorPi <= eps))
converged = 1;
end
prevLoglikelihood = loglikelihood;
prevMu = mu;
prevSigma = sigma;
prevPi = p;
end % while
After the first round the loglikelihood is around 700.
In the second round it is -Inf because some gm(k,i) values in the E-step are zero. Therefore the log is obviously negative infinity.
The zero values also lead to sumGM equals to zero and therefore leading to all NaN entries inside the mu and sigma matrices.
How can I solve this problem?
Can you tell me if there is something wrong with my implementation?
Could it be solved by increasing Matlab's precision somehow?
EDIT:
I added a scaling for the exp() term in gm(k,i).
Unfortunately this doesn't help much. After some more rounds I still get the underflow problem.
scale = zeros(N,D);
for i = 1:N
max = 0;
for k = 1:K
Xmu = X(i,:)-mu(k,:);
if (norm(scale(i,:) - Xmu) > max)
max = norm(scale(i,:) - Xmu);
scale(i,:) = Xmu;
end
end
end
for k = 1:K
for i = 1:N
Xmu = X(i,:)-mu(k,:);
% scale gm to prevent underflow
Xmu = Xmu - scale(i,:);
gm(k,i) = p(k)/sqrt(det(sigma(:,:,k))*(2*pi)^D)*exp(-0.5*Xmu*inv(sigma(:,:,k))*Xmu');
sumGM(i) = sumGM(i) + gm(k,i);
end
end
Further I noticed that kmeans initializes the means completely different compared to the following rounds where the means are computed in the M-step.
kmeans:
mu = 13.500000000000000 0.026602138870044 0.062415945993735
88.500000000000000 -0.009869960132085 -0.075177888210981
39.000000000000000 -0.042569305020309 0.043402772876513
64.000000000000000 -0.024519281362918 -0.012586980924762
after M-step:
round = 2
mu = 1.000000000000000 0.077230046948357 0.024498886414254
2.000000000000000 0.074260118474053 0.026484346404660
3.000000000000002 0.070944016105476 0.029043085983168
4.000000000000000 0.067613431480832 0.031641849205021
In the next rounds mu doesn't change at all. It stays the same as in round 2.
I guess this is caused because of the underflow in gm(k,i)?
Either my implementation of the scaling is incorrect or the whole implementation of the algorithm is wrong somewhere :(
EDIT 2
After four rounds I got NaN values and looked into gm in more detail. Looking at only one sample (and without the 0.5 factor), gm becomes zero in all components. Put in matlab gm(:,1) = [0 0 0 0]. This in turn leads to sumGM equal to zero -> NaN because I divided by zero. I have given more details in
round = 1
mu = 62.0000 -0.0298 -0.0078
37.0000 -0.0396 0.0481
87.5000 -0.0083 -0.0728
12.5000 0.0303 0.0614
gm(:,1) = [11.7488, 0.0000, 0.0000, 0.0000]
round = 2
mu = 1.0000 0.0772 0.0245
2.0000 0.0743 0.0265
3.0000 0.0709 0.0290
4.0000 0.0676 0.0316
gm(:,1) = [0.0000, 0.0000, 0.0000, 0.3128]
round = 3
mu = 1.0000 0.0772 0.0245
2.0000 0.0743 0.0265
3.0000 0.0709 0.0290
4.0000 0.0676 0.0316
gm(:,1) = [0, 0, 0.0000, 0.2867]
round = 4
mu = 1.0000 0.0772 0.0245
NaN NaN NaN
3.0000 0.0709 0.0290
4.0000 0.0676 0.0316
gm(:,1) = 1.0e-105 * [0, NaN, 0, 0.5375]
First of all the means doesn't seem to change and are completely different compared to the initializaiton of kmeans.
And every sample (not just for the first one like here) corresponds only to one gaussian component according to the output of gm(:,1). Shouldn't the sample be "partially distributed" among every gaussian component?
EDIT3:
So I guess the problem with mu not changing was the first line in the M-step: mu = zeros(K,3);.
To account the underflow problem I am currently trying to use the log of the gaussian:
function logPdf = logmvnpdf(X, mu, sigma, D)
Xmu = X-mu;
logPdf = log(1/sqrt(det(sigma)*(2*pi)^D)) + (-0.5*Xmu*inv(sigma)*Xmu');
end
The new problem is the covariance matrix sigma. Matlab claims:
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate.
After 6 rounds I get imaginary values for gm (gaussian distribution).
The updated E-Step looks like this now:
gm = zeros(K,N); % gaussian component in the nominator
sumGM = zeros(N,1); % denominator of responsibilities
for k = 1:K
for i = 1:N
%gm(k,i) = p(k)/sqrt(det(sigma(:,:,k))*(2*pi)^D)*exp(-0.5*Xmu*inv(sigma(:,:,k))*Xmu');
%gm(k,i) = p(k)*mvnpdf(X(i,:),mu(k,:),sigma(:,:,k));
gm(k,i) = log(p(k)) + logmvnpdf(X(i,:), mu(k,:), sigma(:,:,k), D);
sumGM(i) = sumGM(i) + gm(k,i);
end
end
It looks like you should be able to use a scale factor scale(i) to bring gm(k, i) into a representable range, because if you multiply gm(k, i) by scale(i) this will end up multiplying sumGM(i) as well, and be cancelled away when you work out res(k, i) = gm(k, i) / sumGM(i).
I would make scale(i) = 1 / max_k(exp(-0.5*(X(i,:)-mu(k,:))) in theory, and actually calculate it without doing the exponentiation, so you end up dealing with its log, max_k(-0.5*(X(i,:)-mu(k,:)) - this gives you a common term you can add to each -0.5*(X(i,:)-mu(k,:) before using exp() and will keep at least the maximum within a representable range - anything that still underflows to zero after this correction you don't care about anyway, because it is vanishingly small compared to the other contributions.

Efficiently Calculate Frequency Averaged Periodogram Using GPU

In Matlab I am looking for a way to most efficiently calculate a frequency averaged periodogram on a GPU.
I understand that the most important thing is to minimise for loops and use the already built in GPU functions. However my code still feels relatively unoptimised and I was wondering what changes I can make to it to gain a better speed up.
r = 5; % Dimension
n = 100; % Time points
m = 20; % Bandwidth of smoothing
% Generate some random rxn data
X = rand(r, n);
% Generate normalised weights according to a cos window
w = cos(pi * (-m/2:m/2)/m);
w = w/sum(w);
% Generate non-smoothed Periodogram
FT = (n)^(-0.5)*(ctranspose(fft(ctranspose(X))));
Pdgm = zeros(r, r, n/2 + 1);
for j = 1:n/2 + 1
Pdgm(:,:,j) = FT(:,j)*FT(:,j)';
end
% Finally smooth with our weights
SmPdgm = zeros(r, r, n/2 + 1);
% Take advantage of the GPU filter function
% Create new Periodogram WrapPdgm with m/2 values wrapped around in front and
% behind it (it seems like there is redundancy here)
WrapPdgm = zeros(r,r,n/2 + 1 + m);
WrapPdgm(:,:,m/2+1:n/2+m/2+1) = Pdgm;
WrapPdgm(:,:,1:m/2) = flip(Pdgm(:,:,2:m/2+1),3);
WrapPdgm(:,:,n/2+m/2+2:end) = flip(Pdgm(:,:,n/2-m/2+1:end-1),3);
% Perform filtering
for i = 1:r
for j = 1:r
temp = filter(w, [1], WrapPdgm(i,j,:));
SmPdgm(i,j,:) = temp(:,:,m+1:end);
end
end
In particular, I couldn't see a way to optimise out the for loop when calculating the initial Pdgm from the Fourier transformed data and I feel the trick I play with the WrapPdgm in order to take advantage of filter() on the GPU feels unnecessary if there were a smooth function instead.
Solution Code
This seems to be pretty efficient as benchmark runtimes in the next section might convince us -
%// Select the portion of FT to be processed and
%// send copy to GPU for calculating everything
gFT = gpuArray(FT(:,1:n/2 + 1));
%// Perform non-smoothed Periodogram, thus removing the first loop
Pdgm1 = bsxfun(#times,permute(gFT,[1 3 2]),permute(conj(gFT),[3 1 2]));
%// Generate WrapPdgm right on GPU
WrapPdgm1 = zeros(r,r,n/2 + 1 + m,'gpuArray');
WrapPdgm1(:,:,m/2+1:n/2+m/2+1) = Pdgm1;
WrapPdgm1(:,:,1:m/2) = Pdgm1(:,:,m/2+1:-1:2);
WrapPdgm1(:,:,n/2+m/2+2:end) = Pdgm1(:,:,end-1:-1:n/2-m/2+1);
%// Perform filtering on GPU and get the final output, SmPdgm1
filt_data = filter(w,1,reshape(WrapPdgm1,r*r,[]),[],2);
SmPdgm1 = gather(reshape(filt_data(:,m+1:end),r,r,[]));
Benchmarking
Benchmarking Code
%// Input parameters
r = 50; % Dimension
n = 1000; % Time points
m = 200; % Bandwidth of smoothing
% Generate some random rxn data
X = rand(r, n);
% Generate normalised weights according to a cos window
w = cos(pi * (-m/2:m/2)/m);
w = w/sum(w);
% Generate non-smoothed Periodogram
FT = (n)^(-0.5)*(ctranspose(fft(ctranspose(X))));
tic, %// ... Code from original approach, toc
tic %// ... Code from proposed approach, toc
Runtime results thus obtained on GPU, GTX 750 Ti against CPU, I-7 4790K -
------------------------------ With Original Approach on CPU
Elapsed time is 0.279816 seconds.
------------------------------ With Proposed Approach on GPU
Elapsed time is 0.169969 seconds.
To get rid of the first loop you can do the following:
Pdgm_cell = cellfun(#(x) x * x', mat2cell(FT(:, 1 : 51), [5], ones(51, 1)), 'UniformOutput', false);
Pdgm = reshape(cell2mat(Pdgm_cell),5,5,[]);
Then in your filter you can do the following:
temp = filter(w, 1, WrapPdgm, [], 3);
SmPdgm = temp(:, :, m + 1 : end);
The 3 lets the filter know to operate along the 3rd dimension of your data.
You can use pagefun on the GPU for the first loop. (Note that the implementation of cellfun is basically a hidden loop, whereas pagefun runs natively on the GPU using a batched GEMM operation). Here's how:
n = 16;
r = 8;
X = gpuArray.rand(r, n);
R = gpuArray.zeros(r, r, n/2 + 1);
for jj = 1:(n/2+1)
R(:,:,jj) = X(:,jj) * X(:,jj)';
end
X2 = X(:,1:(n/2+1));
R2 = pagefun(#mtimes, reshape(X2, r, 1, []), reshape(X2, 1, r, []));
R - R2

Fastest way to find rows without NaNs in Matlab

I would like to find the indexes of rows without any NaN in the fastest way possible since I need to do it thousands of times. So far I have tried the following two approaches:
find(~isnan(sum(data, 2)));
find(all(~isnan(data), 2));
Is there a clever way to speed this up or is this the best possible? The dimension of the data matrix is usually thousands by hundreds.
Edit:
matrix multiplication can be faster than sum, so the operation is almost twice faster for matrices above 500 x500 elements (in my Matlab 2012a machine). So my solution is:
find(~isnan(data*zeros(size(data,2),1)))
Out of the two methods you suggested (denoted f and g) in the question the first is faster (using timeit):
data=rand(4000);
nani=randi(numel(data),1,500);
data(nani)=NaN;
f= #() find(~isnan(sum(data, 2)));
g= #() find(all(~isnan(data), 2));
h= #() find(~isnan(data*zeros(size(data,2),1)));
timeit(f)
ans =
0.0263
timeit(g)
ans =
0.1489
timeit(h)
ans =
0.0146
If the nan density is high enough, then a double loop will be the fastest method. This is because the search of a row can be discarded as soon as the first nan is found. For example, consider the following speed test:
%# Preallocate some parameters
T = 5000; %# Number of rows
N = 500; %# Number of columns
X = randi(5, T, N); %# Sample data matrix
M = 100; %# Number of simulation iterations
X(X == 1) = nan; %# Randomly set some elements of X to nan
%# Your first method
tic
for m = 1:M
Soln1 = find(~isnan(sum(X, 2)));
end
toc
%# Your second method
tic
for m = 1:M
Soln2 = find(all(~isnan(X), 2));
end
toc
%# A double loop
tic
for m = 1:M
Soln3 = ones(T, 1);
for t = 1:T
for n = 1:N
if isnan(X(t, n))
Soln3(t) = 0;
break
end
end
end
Soln3 = find(Soln3);
end
toc
The results are:
Elapsed time is 0.164880 seconds.
Elapsed time is 0.218950 seconds.
Elapsed time is 0.068168 seconds. %# The double loop method
Of course, the nan density is so high in this simulation that none of the rows are nan free. But you never said anything about the nan density of your matrix, so I figured I'd post this answer for general consumption and contemplation :-)
Can you tell more about what you want to do with the indices
time = cputime;
A = rand(1000,100); % Some matrix data
for i = 1:100
A(randi(20,1,100)) = NaN; % Randomly assigned NaN
B = isnan(A); % B has 0 and 1
C = A(B == 0); % C has all ~NaN elements
ind(i,:) = find(B == 1); % ind has all NaN indices
end
disp(cputime-time)
for 100 times in a loop, 0.1404 sec
any() is faster than all() or sum().
try:
idx = find(~any(isnan(data), 2));
correction: it seems that sum() approach is faster:
idx = find(~isnan(sum(data, 2)));

Summation without a for loop - MATLAB

I have 2 matrices: V which is square MxM, and K which is MxN. Calling the dimension across rows x and the dimension across columns t, I need to evaluate the integral (i.e sum) over both dimensions of K times a t-shifted version of V, the answer being a function of the shift (almost like a convolution, see below). The sum is defined by the following expression, where _{} denotes the summation indices, and a zero-padding of out-of-limits elements is assumed:
S(t) = sum_{x,tau}[V(x,t+tau) * K(x,tau)]
I manage to do it with a single loop, over the t dimension (vectorizing the x dimension):
% some toy matrices
V = rand(50,50);
K = rand(50,10);
[M N] = size(K);
S = zeros(1, M);
for t = 1 : N
S(1,1:end-t+1) = S(1,1:end-t+1) + sum(bsxfun(#times, V(:,t:end), K(:,t)),1);
end
I have similar expressions which I managed to evaluate without a for loop, using a combination of conv2 and\or mirroring (flipping) of a single dimension. However I can't see how to avoid a for loop in this case (despite the appeared similarity to convolution).
Steps to vectorization
1] Perform sum(bsxfun(#times, V(:,t:end), K(:,t)),1) for all columns in V against all columns in K with matrix-multiplication -
sum_mults = V.'*K
This would give us a 2D array with each column representing sum(bsxfun(#times,.. operation at each iteration.
2] Step1 gave us all possible summations and also the values to be summed are not aligned in the same row across iterations, so we need to do a bit more work before summing along rows. The rest of the work is about getting a shifted up version. For the same, you can use boolean indexing with a upper and lower triangular boolean mask. Finally, we sum along each row for the final output. So, this part of the code would look like so -
valid_mask = tril(true(size(sum_mults)));
sum_mults_shifted = zeros(size(sum_mults));
sum_mults_shifted(flipud(valid_mask)) = sum_mults(valid_mask);
out = sum(sum_mults_shifted,2);
Runtime tests -
%// Inputs
V = rand(1000,1000);
K = rand(1000,200);
disp('--------------------- With original loopy approach')
tic
[M N] = size(K);
S = zeros(1, M);
for t = 1 : N
S(1,1:end-t+1) = S(1,1:end-t+1) + sum(bsxfun(#times, V(:,t:end), K(:,t)),1);
end
toc
disp('--------------------- With proposed vectorized approach')
tic
sum_mults = V.'*K; %//'
valid_mask = tril(true(size(sum_mults)));
sum_mults_shifted = zeros(size(sum_mults));
sum_mults_shifted(flipud(valid_mask)) = sum_mults(valid_mask);
out = sum(sum_mults_shifted,2);
toc
Output -
--------------------- With original loopy approach
Elapsed time is 2.696773 seconds.
--------------------- With proposed vectorized approach
Elapsed time is 0.044144 seconds.
This might be cheating (using arrayfun instead of a for loop) but I believe this expression gives you what you want:
S = arrayfun(#(t) sum(sum( V(:,(t+1):(t+N)) .* K )), 1:(M-N), 'UniformOutput', true)

Resources