Vectorizing distance calculation between vectors - image

I have a 3 X 1000 (and later 3 X 10 000) matrix cord given, which contains the three dimensional coordinates for my pixels.
My intention is to calculate the distance between all the pixels, and I do it with a for loop (see below), but I will have to calculate this for huge matrices soon, and am wondering if I could vectorize the code for making it faster...?
dist = zeros(size(cord,2),size(cord,2));
for i = 1:size(cord,2)
for j = 1:size(cord,2)
dist(i,j) = norm(cord(:,i)-cord(:,j));
dist(j,i) = dist(i,j);
end
end

pdist does exactly that. squareform is needed to get the result in the form of a square, symmetric matrix:
dist = squareform(pdist(cord.'));

Approach 1 (Vectorized apprach with bsxfun ) -
squeeze(sqrt(sum(bsxfun(#minus,cord,permute(cord,[1 3 2])).^2)))
Not sure if this will be faster though.
Approach 2 -
Inspired by this very smart approach and all credits to the poster. The code posted here is just slightly customized for your case and hopefully slightly better in terms of runtime. Here it is -
A = cord'; %//'
numA = size(cord,2);
helpA = ones(numA,9);
helpB = ones(numA,9);
for idx = 1:3
sqA_idx = A(:,idx).^2;
helpA(:,3*idx-1:3*idx) = [-2*A(:,idx), sqA_idx ];
helpB(:,3*idx-2:3*idx-1) = [sqA_idx , A(:,idx)];
end
dist1 = sqrt(helpA * helpB'); %// desired output

From your code, you have recognized that the dist matrix is symmetric
dist(i,j) = norm(cord(:,i)-cord(:,j));
dist(j,i) = dist(i,j);
You could change the inner loop to account for this and reduce by roughly one half the number of calculations needed
for j = i:size(cord,2)
Further, we can avoid the dist(j,i) = dist(i,j); at each iteration and just do that at the end by extracting the upper triangle part of dist and adding its transpose to the dist matrix to account for the symmetry
dist = zeros(size(cord,2),size(cord,2));
for i = 1:size(cord,2)
for j = i:size(cord,2)
dist(i,j) = norm(cord(:,i)-cord(:,j));
end
end
dist = dist + triu(dist)';
The above addition is fine since the main diagonal is all zeros.
It still performs poorly though and so we should take advantage of vectorization. We can do that as follows against the inner loop
dist = zeros(size(cord,2),size(cord,2));
for i = 1:size(cord,2)
dist(i,i+1:end) = sum((repmat(cord(:,i),1,size(cord,2)-i)-cord(:,i+1:end)).^2);
end
dist = dist + triu(dist)';
dist = sqrt(dist);
For every element in cord we need to calculate its distance with all other elements that follow it. We reproduce the element with repmat so that we can subtract it from every element that follows without the need for the loop. The differences are squared and summed and assigned to the dist matrix. We take care of the symmetry and then take the square root of the matrix to complete the norm operation.
With tic and toc, the original distance calculation with a random cord (cord = rand(3,num);) took ~93 seconds. This version took ~2.8.

Related

Efficiency of diag() - MATLAB

Motivation:
In writing out a matrix operation that was to be performed over tens of thousands of vectors I kept coming across the warning:
Requested 200000x200000 (298.0GB) array exceeds maximum array size
preference. Creation of arrays greater than this limit may take a long
time and cause MATLAB to become unresponsive. See array size limit or
preference panel for more information.
The reason for this was my use of diag() to get the values down the diagonal of an matrix inner product. Because MATLAB is generally optimized for vector/matrix operations, when I first write code, I usually go for the vectorized form. In this case, however, MATLAB has to build the entire matrix in order to get the diagonal which causes the memory and speed issues.
Experiment:
I decided to test the use of diag() vs a for loop to see if at any point it was more efficient to use diag():
num = 200000; % Matrix dimension
x = ones(num, 1);
y = 2 * ones(num, 1);
% z = diag(x*y'); % Expression to solve
% Loop approach
tic
z = zeros(num,1);
for i = 1 : num
z(i) = x(i)*y(i);
end
toc
% Dividing the too-large matrix into process-able chunks
fraction = [10, 20, 50, 100, 500, 1000, 5000, 10000, 20000];
time = zeros(size(fraction));
for k = 1 : length(fraction)
f = fraction(k);
% Operation to time
tic
z = zeros(num,1);
for i = 1 : k
first = (i-1) * (num / f);
last = first + (num / f);
z(first + 1 : last) = diag(x(first + 1: last) * y(first + 1 : last)');
end
time(k) = toc;
end
% Plot results
figure;
hold on
plot(log10(fraction), log10(chunkTime));
plot(log10(fraction), repmat(log10(loopTime), 1, length(fraction)));
plot(log10(fraction), log10(chunkTime), 'g*'); % Plot points along time
legend('Partioned Running Time', 'Loop Running Time');
xlabel('Log_{10}(Fractional Size)'), ylabel('Log_{10}(Running Time)'), title('Running Time Comparison');
This is the result of the test:
(NOTE: The red line represents the loop time as a threshold--it's not to say that the total loop time is constant regardless of the number of loops)
From the graph it is clear that it takes breaking the operations down into roughly 200x200 square matrices to be faster to use diag than to perform the same operation using loops.
Question:
Can someone explain why I'm seeing these results? Also, I would think that with MATLAB's ever-more optimized design, there would be built-in handling of these massive matrices within a diag() function call. For example, it could just perform the i = j indexed operations. Is there a particular reason why this might be prohibitive?
I also haven't really thought of memory implications for diag using the partition method, although it's clear that as the partition size decreases, memory requirements drop.
Test of speed of diag vs. a loop.
Initialization:
n = 10000;
M = randn(n, n); %create a random matrix.
Test speed of diag:
tic;
d = diag(M);
toc;
Test speed of loop:
tic;
d = zeros(n, 1);
for i=1:n
d(i) = M(i,i);
end;
toc;
This would test diag. Your code is not a clean test of diag...
Comment on where there might be confusion
Diag only extracts the diagonal of a matrix. If x and y are vectors, and you do d = diag(x * y'), MATLAB first constructs the n by n matrix x*y' and calls diag on that. This is why, you get the error, "cannot construct 290GB matrix..." Matlab interpreter does not optimize in a crazy way, realize you only want the diagonal and construct just a vector (rather than full matrix with x*y', that does not happen.
Not sure if you're asking this, but the fastest way to calculate d = diag(x*y') where x and y are n by 1 vectors would simply be: d = x.*y

MATLAB vectorization: computing a neighborhood matrix

Given two vectors X and Y of length n, representing points on the plane, and a neighborhood radius rad, is there a vectorized way to compute the neighborhood matrix of the points?
In other words, can the following (painfully slow for large n) loop be vectorized:
neighborhood_mat = zeros(n, n);
for i = 1 : n
for j = 1 : i - 1
dist = norm([X(j) - X(i), Y(j) - Y(i)]);
if (dist < radius)
neighborhood_mat(i, j) = 1;
neighborhood_mat(j, i) = 1;
end
end
end
Approach #1
bsxfun based approach -
out = bsxfun(#minus,X,X').^2 + bsxfun(#minus,Y,Y').^2 < radius^2
out(1:n+1:end)= 0
Approach #2
Distance matrix calculation using matrix-multiplication based approach (possibly faster) -
A = [X(:) Y(:)]
A_t = A.'; %//'
out = [-2*A A.^2 ones(n,3)]*[A_t ; ones(3,n) ; A_t.^2] < radius^2
out(1:n+1:end)= 0
Approach #3
With pdist and squareform -
A = [X(:) Y(:)]
out = squareform(pdist(A))<radius
out(1:n+1:end)= 0
Approach #4
You can use pdist as with the previous approach, but avoid squareform with some logical indexing to get the final output of neighbourhood matrix as shown below -
A = [X(:) Y(:)]
dists = pdist(A)< radius
mask_lower = bsxfun(#gt,[1:n]',1:n) %//'
%// OR tril(true(n),-1)
mask_upper = bsxfun(#lt,[1:n]',1:n) %//'
%// OR mask_upper = triu(true(n),1)
%// OR mask_upper = ~mask_lower; mask_upper(1:n+1:end) = false;
out = zeros(n)
out(mask_lower) = dists
out_t = out' %//'
out(mask_upper) = out_t(mask_upper)
Note: As one can see, for the all above mentioned approaches, we are using pre-allocation for the output. A fast way to pre-allocate would be with out(n,n) = 0 and is based upon this wonderful blog on undocumented MATLAB. This should really speed up those approaches!
The following approach is great if the number of points in your neighborhoods is small or you run low on memory using the brute-force approach:
If you have the statistics toolbox installed, you can have a look at the rangesearch method.
(Free alternatives include the k-d tree implementations of a range search on the File Exchange.)
The usage of rangesearch is straightforward:
P = [X,Y];
[idx,D] = rangesearch(P, P, rad);
It returns a cell-array idx of the indices of nodes within reach and their distances D.
Depending on the size of your data, this could be beneficial in terms of speed and memory.
Instead of computing all pairwise distances and then filtering out those that are large, this algorithm builds a data structure called a k-d tree to more efficiently search close points.
You can then use this to build a sparse matrix:
I = cell2mat(idx.').';
J = runLengthDecode(cellfun(#numel,idx));
n = size(P,1);
S = sparse(I,J,1,n,n)-speye(n);
(This uses the runLengthDecode function from this answer.)
You can also have a look at the KDTreeSearcher class if your data points don't change and you want to query your data lots of times.

Compute double sum in matlab efficiently?

I am looking for an optimal way to program this summation ratio. As input I have two vectors v_mn and x_mn with (M*N)x1 elements each.
The ratio is of the form:
The vector x_mn is 0-1 vector so when x_mn=1, the ration is r given above and when x_mn=0 the ratio is 0.
The vector v_mn is a vector which contain real numbers.
I did the denominator like this but it takes a lot of times.
function r_ij = denominator(v_mn, M, N, i, j)
%here x_ij=1, to get r_ij.
S = [];
for m = 1:M
for n = 1:N
if (m ~= i)
if (n ~= j)
S = [S v_mn(i, n)];
else
S = [S 0];
end
else
S = [S 0];
end
end
end
r_ij = 1+S;
end
Can you give a good way to do it in matlab. You can ignore the ratio and give me the denominator which is more complicated.
EDIT: I am sorry I did not write it very good. The i and j are some numbers between 1..M and 1..N respectively. As you can see, the ratio r is many values (M*N values). So I calculated only the value i and j. More precisely, I supposed x_ij=1. Also, I convert the vectors v_mn into a matrix that's why I use double index.
If you reshape your data, your summation is just a repeated matrix/vector multiplication.
Here's an implementation for a single m and n, along with a simple speed/equality test:
clc
%# some arbitrary test parameters
M = 250;
N = 1000;
v = rand(M,N); %# (you call it v_mn)
x = rand(M,N); %# (you call it x_mn)
m0 = randi(M,1); %# m of interest
n0 = randi(N,1); %# n of interest
%# "Naive" version
tic
S1 = 0;
for mm = 1:M %# (you call this m')
if mm == m0, continue; end
for nn = 1:N %# (you call this n')
if nn == n0, continue; end
S1 = S1 + v(m0,nn) * x(mm,nn);
end
end
r1 = v(m0,n0)*x(m0,n0) / (1+S1);
toc
%# MATLAB version: use matrix multiplication!
tic
ninds = [1:m0-1 m0+1:M];
minds = [1:n0-1 n0+1:N];
S2 = sum( x(minds, ninds) * v(m0, ninds).' );
r2 = v(m0,n0)*x(m0,n0) / (1+S2);
toc
%# Test if values are equal
abs(r1-r2) < 1e-12
Outputs on my machine:
Elapsed time is 0.327004 seconds. %# loop-version
Elapsed time is 0.002455 seconds. %# version with matrix multiplication
ans =
1 %# and yes, both are equal
So the speedup is ~133×
Now that's for a single value of m and n. To do this for all values of m and n, you can use an (optimized) double loop around it:
r = zeros(M,N);
for m0 = 1:M
xx = x([1:m0-1 m0+1:M], :);
vv = v(m0,:).';
for n0 = 1:N
ninds = [1:n0-1 n0+1:N];
denom = 1 + sum( xx(:,ninds) * vv(ninds) );
r(m0,n0) = v(m0,n0)*x(m0,n0)/denom;
end
end
which completes in ~15 seconds on my PC for M = 250, N= 1000 (R2010a).
EDIT: actually, with a little more thought, I was able to reduce it all down to this:
denom = zeros(M,N);
for mm = 1:M
xx = x([1:mm-1 mm+1:M],:);
denom(mm,:) = sum( xx*v(mm,:).' ) - sum( bsxfun(#times, xx, v(mm,:)) );
end
denom = denom + 1;
r_mn = x.*v./denom;
which completes in less than 1 second for N = 250 and M = 1000 :)
For a start you need to pre-alocate your S matrix. It changes size every loop so put
S = zeros(m*n, 1)
at the start of your function. This will also allow you to do away with your else conditional statements, ie they will reduce to this:
if (m ~= i)
if (n ~= j)
S(m*M + n) = v_mn(i, n);
Otherwise since you have to visit every element im afraid it may not be able to get much faster.
If you desperately need more speed you can look into doing some mex coding which is code in c/c++ but run in matlab.
http://www.mathworks.com.au/help/matlab/matlab_external/introducing-mex-files.html
Rather than first jumping into vectorization of the double loop, you may want modify the above to make sure that it does what you want. In this code, there is no summing of the data, instead a vector S is being resized at each iteration. As well, the signature could include the matrices V and X so that the multiplication occurs as in the formula (rather than just relying on the value of X to be zero or one, let us pass that matrix in).
The function could look more like the following (I've replaced the i,j inputs with m,n to be more like the equation):
function result = denominator(V,X,m,n)
% use the size of V to determine M and N
[M,N] = size(V);
% initialize the summed value to one (to account for one at the end)
result = 1;
% outer loop
for i=1:M
% ignore the case where m==i
if i~=m
for j=1:N
% ignore the case where n==j
if j~=n
result = result + V(m,j)*X(i,j);
end
end
end
end
Note how the first if is outside of the inner for loop since it does not depend on j. Try the above and see what happens!
You can vectorize from within Matlab to speed up your calculations. Every time you use an operation like ".^" or ".*" or any matrix operation for that matter, Matlab will do them in parallel, which is much, much faster than iterating over each item.
In this case, look at what you are doing in terms of matrices. First, in your loop you are only dealing with the mth row of $V_{nm}$, which we can use as a vector for itself.
If you look at your formula carefully, you can figure out that you almost get there if you just write this row vector as a column vector and multiply the matrix $X_{nm}$ to it from the left, using standard matrix multiplication. The resulting vector contains the sums over all n. To get the final result, just sum up this vector.
function result = denominator_vectorized(V,X,m,n)
% get the part of V with the first index m
Vm = V(m,:)';
% remove the parts of X you don't want to iterate over. Note that, since I
% am inside the function, I am only editing the value of X within the scope
% of this function.
X(m,:) = 0;
X(:,n) = 0;
%do the matrix multiplication and the summation at once
result = 1-sum(X*Vm);
To show you how this optimizes your operation, I will compare it to the code proposed by another commenter:
function result = denominator(V,X,m,n)
% use the size of V to determine M and N
[M,N] = size(V);
% initialize the summed value to one (to account for one at the end)
result = 1;
% outer loop
for i=1:M
% ignore the case where m==i
if i~=m
for j=1:N
% ignore the case where n==j
if j~=n
result = result + V(m,j)*X(i,j);
end
end
end
end
The test:
V=rand(10000,10000);
X=rand(10000,10000);
disp('looped version')
tic
denominator(V,X,1,1)
toc
disp('matrix operation')
tic
denominator_vectorized(V,X,1,1)
toc
The result:
looped version
ans =
2.5197e+07
Elapsed time is 4.648021 seconds.
matrix operation
ans =
2.5197e+07
Elapsed time is 0.563072 seconds.
That is almost ten times the speed of the loop iteration. So, always look out for possible matrix operations in your code. If you have the Parallel Computing Toolbox installed and a CUDA-enabled graphics card installed, Matlab will even perform these operations on your graphics card without any further effort on your part!
EDIT: That last bit is not entirely true. You still need to take a few steps to do operations on CUDA hardware, but they aren't a lot. See Matlab documentation.

Summation without a for loop - MATLAB

I have 2 matrices: V which is square MxM, and K which is MxN. Calling the dimension across rows x and the dimension across columns t, I need to evaluate the integral (i.e sum) over both dimensions of K times a t-shifted version of V, the answer being a function of the shift (almost like a convolution, see below). The sum is defined by the following expression, where _{} denotes the summation indices, and a zero-padding of out-of-limits elements is assumed:
S(t) = sum_{x,tau}[V(x,t+tau) * K(x,tau)]
I manage to do it with a single loop, over the t dimension (vectorizing the x dimension):
% some toy matrices
V = rand(50,50);
K = rand(50,10);
[M N] = size(K);
S = zeros(1, M);
for t = 1 : N
S(1,1:end-t+1) = S(1,1:end-t+1) + sum(bsxfun(#times, V(:,t:end), K(:,t)),1);
end
I have similar expressions which I managed to evaluate without a for loop, using a combination of conv2 and\or mirroring (flipping) of a single dimension. However I can't see how to avoid a for loop in this case (despite the appeared similarity to convolution).
Steps to vectorization
1] Perform sum(bsxfun(#times, V(:,t:end), K(:,t)),1) for all columns in V against all columns in K with matrix-multiplication -
sum_mults = V.'*K
This would give us a 2D array with each column representing sum(bsxfun(#times,.. operation at each iteration.
2] Step1 gave us all possible summations and also the values to be summed are not aligned in the same row across iterations, so we need to do a bit more work before summing along rows. The rest of the work is about getting a shifted up version. For the same, you can use boolean indexing with a upper and lower triangular boolean mask. Finally, we sum along each row for the final output. So, this part of the code would look like so -
valid_mask = tril(true(size(sum_mults)));
sum_mults_shifted = zeros(size(sum_mults));
sum_mults_shifted(flipud(valid_mask)) = sum_mults(valid_mask);
out = sum(sum_mults_shifted,2);
Runtime tests -
%// Inputs
V = rand(1000,1000);
K = rand(1000,200);
disp('--------------------- With original loopy approach')
tic
[M N] = size(K);
S = zeros(1, M);
for t = 1 : N
S(1,1:end-t+1) = S(1,1:end-t+1) + sum(bsxfun(#times, V(:,t:end), K(:,t)),1);
end
toc
disp('--------------------- With proposed vectorized approach')
tic
sum_mults = V.'*K; %//'
valid_mask = tril(true(size(sum_mults)));
sum_mults_shifted = zeros(size(sum_mults));
sum_mults_shifted(flipud(valid_mask)) = sum_mults(valid_mask);
out = sum(sum_mults_shifted,2);
toc
Output -
--------------------- With original loopy approach
Elapsed time is 2.696773 seconds.
--------------------- With proposed vectorized approach
Elapsed time is 0.044144 seconds.
This might be cheating (using arrayfun instead of a for loop) but I believe this expression gives you what you want:
S = arrayfun(#(t) sum(sum( V(:,(t+1):(t+N)) .* K )), 1:(M-N), 'UniformOutput', true)

MATLAB loop optimization

I have a matrix, matrix_logical(50000,100000), that is a sparse logical matrix (a lot of falses, some true). I have to produce a matrix, intersect(50000,50000), that, for each pair, i,j, of rows of matrix_logical(50000,100000), stores the number of columns for which rows i and j have both "true" as the value.
Here is the code I wrote:
% store in advance the nonzeros cols
for i=1:50000
nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end
intersect = zeros(50000,50000);
for i=1:49999
a = cell2mat(nonzeros{i});
for j=(i+1):50000
b = cell2mat(nonzeros{j});
intersect(i,j) = numel(intersect(a,b));
end
end
Is it possible to further increase the performance? It takes too long to compute the matrix. I would like to avoid the double loop in the second part of the code.
matrix_logical is sparse, but it is not saved as sparse in MATLAB because otherwise the performance become the worst possible.
Since the [i,j] entry counts the number of non zero elements in the element-wise multiplication of rows i and j, you can do it by multiplying matrix_logical with its transpose (you should convert to numeric data type first, e.g matrix_logical = single(matrix_logical)):
inter = matrix_logical * matrix_logical';
And it works both for sparse or full representation.
EDIT
In order to calculate numel(intersect(a,b))/numel(union(a,b)); (as asked in your comment), you can use the fact that for two sets a and b, you have
length(union(a,b)) = length(a) + length(b) - length(intersect(a,b))
so, you can do the following:
unLen = sum(matrix_logical,2);
tmp = repmat(unLen, 1, length(unLen)) + repmat(unLen', length(unLen), 1);
inter = matrix_logical * matrix_logical';
inter = inter ./ (tmp-inter);
If I understood you correctly, you want a logical AND of the rows:
intersct = zeros(50000, 50000)
for ii = 1:49999
for jj = ii:50000
intersct(ii, jj) = sum(matrix_logical(ii, :) & matrix_logical(jj, :));
intersct(jj, ii) = intersct(ii, jj);
end
end
Doesn't avoid the double loop, but at least works without the first loop and the slow find command.
Elaborating on my comment, here is a distance function suitable for pdist()
function out = distfun(xi,xj)
out = zeros(size(xj,1),1);
for i=1:size(xj,1)
out(i) = sum(sum( xi & xj(i,:) )) / sum(sum( xi | xj(i,:) ));
end
In my experience, sum(sum()) is faster for logicals than nnz(), thus its appearance above.
You would also need to use squareform() to reshape the output of pdist() appropriately:
squareform(pdist(martrix_logical,#distfun));
Note that pdist() includes a 'jaccard' distance measure, but it is actually the Jaccard distance and not the Jaccard index or coefficient, which is the value you are apparently after.

Resources