Best practice when working with sparse matrices - performance

My question is twofold:
In the below, A = full(S) where S is a sparse matrix.
What's the "correct" way to access an element in a sparse matrix?
That is, what would the sparse equivalent to var = A(row, col) be?
My view on this topic: You wouldn't do anything different. var = S(row, col) is as efficient as it gets.
What's the "correct" way to add elements to a sparse matrix?
That is, what would the sparse equivalent of A(row, col) = var be? (Assuming A(row, col) == 0 to begin with)
It is known that simply doing A(row, col) = var is slow for large sparse matrices. From the documentation:
If you wanted to change a value in this matrix, you might be tempted
to use the same indexing:
B(3,1) = 42; % This code does work, however, it is slow.
My view on this topic: When working with sparse matrices, you often start with the vectors and use them to create the matrix this way: S = sparse(i,j,s,m,n). Of course, you could also have created it like this: S = sparse(A) or sprand(m,n,density) or something similar.
If you start of the first way, you would simply do:
i = [i; new_i];
j = [j; new_j];
s = [s; new_s];
S = sparse(i,j,s,m,n);
If you started out not having the vectors, you would do the same thing, but use find first:
[i, j, s] = find(S);
i = [i; new_i];
j = [j; new_j];
s = [s; new_s];
S = sparse(i,j,s,m,n);
Now you would of course have the vectors, and can reuse them if you're doing this operation several times. It would however be better to add all new elements at once, and not do the above in a loop, because growing vectors are slow. In this case, new_i, new_j and new_s will be vectors corresponding to the new elements.

MATLAB stores sparse matrices in compressed column format. This means that when you perform an operations like A(2,2) (to get the element in at row 2, column 2) MATLAB first access the second column and then finds the element in row 2 (row indices in each column are stored in ascending order). You can think of it as:
A2 = A(:,2);
A2(2)
If you are only accessing a single element of sparse matrix doing var = S(r,c) is fine. But if you are looping over the elements of a sparse matrix, you probably want to access one column at a time, and then loop over the nonzero row indices via [i,~,x]=find(S(:,c)). Or use something like spfun.
You should avoid constructing a dense matrix A and then doing S = sparse(A), as this operations just squeezes out zeros. Instead, as you note, it's much more efficient to build a sparse matrix from scratch using triplet-form and a call to sparse(i,j,x,m,n). MATLAB has a nice page which describes how to efficiently construct sparse matrices.
The original paper describing the implementation of sparse matrices in MATLAB is quite a good read. It provides some more info on how the sparse matrix algorithms were originally implemented.

EDIT: Answer modified according to suggestions by Oleg (see comments).
Here is my benchmark for the second part of your question. For testing direct insertion, the matrices are initialized empty with a varying nzmax. For testing rebuilding from index vectors this is irrelevant as the matrix is built from scratch at every call. The two methods were tested for doing a single insertion operation (of a varying number of elements), or for doing incremental insertions, one value at a time (up to the same numbers of elements). Due to the computational strain I lowered the number of repetitions from 1000 to 100 for each test case. I believe this is still statistically viable.
Ssize = 10000;
NumIterations = 100;
NumInsertions = round(logspace(0, 4, 10));
NumInitialNZ = round(logspace(1, 4, 4));
NumTests = numel(NumInsertions) * numel(NumInitialNZ);
TimeDirect = zeros(numel(NumInsertions), numel(NumInitialNZ));
TimeIndices = zeros(numel(NumInsertions), 1);
%% Single insertion operation (non-incremental)
% Method A: Direct insertion
for iInitialNZ = 1:numel(NumInitialNZ)
disp(['Running with initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]);
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
S = spalloc(Ssize, Ssize, NumInitialNZ(iInitialNZ));
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
tic
S(r,c) = 1;
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' direct insertions: ' num2str(tSum) ' seconds']);
TimeDirect(iInsertions, iInitialNZ) = tSum;
end
end
% Method B: Rebuilding from index vectors
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
i = []; j = []; s = [];
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
s_ones = ones(NumInsertions(iInsertions), 1);
tic
i_new = [i; r];
j_new = [j; c];
s_new = [s; s_ones];
S = sparse(i_new, j_new ,s_new , Ssize, Ssize);
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' indexed insertions: ' num2str(tSum) ' seconds']);
TimeIndices(iInsertions) = tSum;
end
SingleOperation.TimeDirect = TimeDirect;
SingleOperation.TimeIndices = TimeIndices;
%% Incremental insertion
for iInitialNZ = 1:numel(NumInitialNZ)
disp(['Running with initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]);
% Method A: Direct insertion
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
S = spalloc(Ssize, Ssize, NumInitialNZ(iInitialNZ));
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
tic
for ii = 1:NumInsertions(iInsertions)
S(r(ii),c(ii)) = 1;
end
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' direct insertions: ' num2str(tSum) ' seconds']);
TimeDirect(iInsertions, iInitialNZ) = tSum;
end
end
% Method B: Rebuilding from index vectors
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
i = []; j = []; s = [];
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
tic
for ii = 1:NumInsertions(iInsertions)
i = [i; r(ii)];
j = [j; c(ii)];
s = [s; 1];
S = sparse(i, j ,s , Ssize, Ssize);
end
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' indexed insertions: ' num2str(tSum) ' seconds']);
TimeIndices(iInsertions) = tSum;
end
IncremenalInsertion.TimeDirect = TimeDirect;
IncremenalInsertion.TimeIndices = TimeIndices;
%% Plot results
% Single insertion
figure;
loglog(NumInsertions, SingleOperation.TimeIndices);
cellLegend = {'Using index vectors'};
hold all;
for iInitialNZ = 1:numel(NumInitialNZ)
loglog(NumInsertions, SingleOperation.TimeDirect(:, iInitialNZ));
cellLegend = [cellLegend; {['Direct insertion, initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]}];
end
hold off;
title('Benchmark for single insertion operation');
xlabel('Number of insertions'); ylabel('Runtime for 100 operations [sec]');
legend(cellLegend, 'Location', 'NorthWest');
grid on;
% Incremental insertions
figure;
loglog(NumInsertions, IncremenalInsertion.TimeIndices);
cellLegend = {'Using index vectors'};
hold all;
for iInitialNZ = 1:numel(NumInitialNZ)
loglog(NumInsertions, IncremenalInsertion.TimeDirect(:, iInitialNZ));
cellLegend = [cellLegend; {['Direct insertion, initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]}];
end
hold off;
title('Benchmark for incremental insertions');
xlabel('Number of insertions'); ylabel('Runtime for 100 operations [sec]');
legend(cellLegend, 'Location', 'NorthWest');
grid on;
I ran this in MATLAB R2012a. The results for doing a single insertion operations are summarized in this graph:
This shows that using direct insertion is much slower than using index vectors, if only a single operation is done. The growth in the case of using index vectors can be either because of growing the vectors themselves or from the lengthier sparse matrix construction, I'm not sure which. The initial nzmax used to construct the matrices seems to have no effect on their growth.
The results for doing incremental insertions are summarized in this graph:
Here we see the opposite trend: using index vectors is slower, because of the overhead of incrementally growing them and rebuilding the sparse matrix at every step. A way to understand this is to look at the first point in the previous graph: for insertion of a single element, it is more effective to use direct insertion rather than rebuilding using the index vectors. In the incrementlal case, this single insertion is done repetitively, and so it becomes viable to use direct insertion rather than index vectors, against MATLAB's suggestion.
This understanding also suggests that were we to incrementally add, say, 100 elements at a time, the efficient choice would then be to use index vectors rather than direct insertion, as the first graph shows this method to be faster for insertions of this size. In between these two regimes is an area where you should probably experiment to see which method is more effective, though probably the results will show that the difference between the methods is neglibile there.
Bottom line: which method should I use?
My conclusion is that this is dependant on the nature of your intended insertion operations.
If you intend to insert elements one at a time, use direct insertion.
If you intend to insert a large (>10) number of elements at a time, rebuild the matrix from index vectors.

Related

Is heap sort supposed to be very slow on MATLAB?

I wrote a heap sort function in MATLAB and it works fine, except that when the length of input is greater or equal to 1000, it can take a long time (e.g. the length of 1000 takes half a second). I'm not sure if it's that MATLAB doesn't run very fast on heap sort algorithm or it's just my code needs to be improved.
My code is shown below:
function b = heapsort(a)
[~,n] = size(a);
b = zeros(1,n);
for i = 1:n
a = build_max_heap(a);
b(n+1-i) = a(1);
temp = a(1);
a(1) = a(n+1-i);
a(n+1-i) = temp;
a(n+1-i) = [];
a = heapify(a,1);
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function a = build_max_heap(a)
[~,n] = size(a);
m = floor(n/2);
for i = m:-1:1
a = heapify(a,i);
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function a = heapify(a,i)
[~,n] = size(a);
left = 2*i;
right = 2*i + 1;
if left <= n
if a(left) >= a(i)
large = left;
else
large = i;
end
else
return
end
if right <= n
if a(right) >= a(large)
large = right;
end
end
if large ~= i
temp = a(large);
a(large) = a(i);
a(i) = temp;
a = heapify(a,large);
end
end
I'm aware that maybe it's the code a(n+1-i) = []; that may consume a lot of time. But when I changed the [] into -999 (lower than any number of the input vector), it doesn't help but took even more time.
You should use the profiler to check which lines that takes the most time. It's definitely a(n+1-i) = []; that's slowing down your function.
Resizing arrays in loops is very slow, so you should always try to avoid it.
A simple test:
Create a function that takes a large vector as input, and iteratively removes elements until it's empty.
Create a function that takes the same vector as input and iteratively sets each value to 0, Inf, NaN or something else.
Use timeit to check which function is faster. You'll see that the last function is approximately 100 times faster (depending on the size of the vector of course).
The reason why -999 takes more time is most likely because a no longer gets smaller and smaller, thus a = heapify(a,1); won't get faster and faster. I haven't tested it, but if you try the following in your first function you'll probably get a much faster program (you must insert the n+1-i) other places in your code as well, but I'll leave that to you):
a(n+1-ii) = NaN;
a(1:n+1-ii) = heapify(a(1:n+1-ii),1);
Note that I changed i to ii. That's partially because I want to give you a good advice, and partially to avoid being reminded to not use i and j as variables in MATLAB.

MATLAB vectorization: computing a neighborhood matrix

Given two vectors X and Y of length n, representing points on the plane, and a neighborhood radius rad, is there a vectorized way to compute the neighborhood matrix of the points?
In other words, can the following (painfully slow for large n) loop be vectorized:
neighborhood_mat = zeros(n, n);
for i = 1 : n
for j = 1 : i - 1
dist = norm([X(j) - X(i), Y(j) - Y(i)]);
if (dist < radius)
neighborhood_mat(i, j) = 1;
neighborhood_mat(j, i) = 1;
end
end
end
Approach #1
bsxfun based approach -
out = bsxfun(#minus,X,X').^2 + bsxfun(#minus,Y,Y').^2 < radius^2
out(1:n+1:end)= 0
Approach #2
Distance matrix calculation using matrix-multiplication based approach (possibly faster) -
A = [X(:) Y(:)]
A_t = A.'; %//'
out = [-2*A A.^2 ones(n,3)]*[A_t ; ones(3,n) ; A_t.^2] < radius^2
out(1:n+1:end)= 0
Approach #3
With pdist and squareform -
A = [X(:) Y(:)]
out = squareform(pdist(A))<radius
out(1:n+1:end)= 0
Approach #4
You can use pdist as with the previous approach, but avoid squareform with some logical indexing to get the final output of neighbourhood matrix as shown below -
A = [X(:) Y(:)]
dists = pdist(A)< radius
mask_lower = bsxfun(#gt,[1:n]',1:n) %//'
%// OR tril(true(n),-1)
mask_upper = bsxfun(#lt,[1:n]',1:n) %//'
%// OR mask_upper = triu(true(n),1)
%// OR mask_upper = ~mask_lower; mask_upper(1:n+1:end) = false;
out = zeros(n)
out(mask_lower) = dists
out_t = out' %//'
out(mask_upper) = out_t(mask_upper)
Note: As one can see, for the all above mentioned approaches, we are using pre-allocation for the output. A fast way to pre-allocate would be with out(n,n) = 0 and is based upon this wonderful blog on undocumented MATLAB. This should really speed up those approaches!
The following approach is great if the number of points in your neighborhoods is small or you run low on memory using the brute-force approach:
If you have the statistics toolbox installed, you can have a look at the rangesearch method.
(Free alternatives include the k-d tree implementations of a range search on the File Exchange.)
The usage of rangesearch is straightforward:
P = [X,Y];
[idx,D] = rangesearch(P, P, rad);
It returns a cell-array idx of the indices of nodes within reach and their distances D.
Depending on the size of your data, this could be beneficial in terms of speed and memory.
Instead of computing all pairwise distances and then filtering out those that are large, this algorithm builds a data structure called a k-d tree to more efficiently search close points.
You can then use this to build a sparse matrix:
I = cell2mat(idx.').';
J = runLengthDecode(cellfun(#numel,idx));
n = size(P,1);
S = sparse(I,J,1,n,n)-speye(n);
(This uses the runLengthDecode function from this answer.)
You can also have a look at the KDTreeSearcher class if your data points don't change and you want to query your data lots of times.

Fast way of computing Inverse EDF in Matlab

I am running the following code to obtain the values of the inverse EDF of a data Matrix at the data points:
function [mOUT] = InvEDF (data)
% compute inverse of EDF at data values
% function takes T*K matrix of data and returns T*K matrix of transformed
% data, keepin the order of the original series
T = rows(data);
K = cols(data);
mOUT=zeros(T,K);
for j = 1:K
for i = 1:T
temp = data(:,j)<=data(i,j);
mOUT(i,j) = 1/(T+1)*sum(temp);
end
end
The data Matrix is usually of size 1000*10 or even 1000*30 and I am calling this function a few thousand times. Is there a faster way of doinf this? Any answers are appreciated. Thanks!
You can sort the values and use the index in the sorted matrix as the count of values less or equal. We treat each column by itself, so I will illustrate on a Mx1 matrix.
A = rand(M,1);
[B,I] = sort(A);
C(I) = 1:M;
C(i) will now contain the count of values less or equal to A(i). If you can have duplicate values you need to take that into account.
The advantage of this approach is that we can do it in O(M log M) time, whereas your original inner loop is O(M^2)
Try this -
mOUT=zeros(T,K);
for j = 1:K
d1 = data(:,j);
mOUT(:,j) = sum(bsxfun(#ge,d1,d1'),2); %%//'
end
mOUT = mOUT./(T+1);

Improving performance of interpolation (Barycentric formula)

I have been given an assignment in which I am supposed to write an algorithm which performs polynomial interpolation by the barycentric formula. The formulas states that:
p(x) = (SIGMA_(j=0 to n) w(j)*f(j)/(x - x(j)))/(SIGMA_(j=0 to n) w(j)/(x - x(j)))
I have written an algorithm which works just fine, and I get the polynomial output I desire. However, this requires the use of some quite long loops, and for a large grid number, lots of nastly loop operations will have to be done. Thus, I would appreciate it greatly if anyone has any hints as to how I may improve this, so that I will avoid all these loops.
In the algorithm, x and f stand for the given points we are supposed to interpolate. w stands for the barycentric weights, which have been calculated before running the algorithm. And grid is the linspace over which the interpolation should take place:
function p = barycentric_formula(x,f,w,grid)
%Assert x-vectors and f-vectors have same length.
if length(x) ~= length(f)
sprintf('Not equal amounts of x- and y-values. Function is terminated.')
return;
end
n = length(x);
m = length(grid);
p = zeros(1,m);
% Loops for finding polynomial values at grid points. All values are
% calculated by the barycentric formula.
for i = 1:m
var = 0;
sum1 = 0;
sum2 = 0;
for j = 1:n
if grid(i) == x(j)
p(i) = f(j);
var = 1;
else
sum1 = sum1 + (w(j)*f(j))/(grid(i) - x(j));
sum2 = sum2 + (w(j)/(grid(i) - x(j)));
end
end
if var == 0
p(i) = sum1/sum2;
end
end
This is a classical case for matlab 'vectorization'. I would say - just remove the loops. It is almost that simple. First, have a look at this code:
function p = bf2(x, f, w, grid)
m = length(grid);
p = zeros(1,m);
for i = 1:m
var = grid(i)==x;
if any(var)
p(i) = f(var);
else
sum1 = sum((w.*f)./(grid(i) - x));
sum2 = sum(w./(grid(i) - x));
p(i) = sum1/sum2;
end
end
end
I have removed the inner loop over j. All I did here was in fact removing the (j) indexing and changing the arithmetic operators from / to ./ and from * to .* - the same, but with a dot in front to signify that the operation is performed on element by element basis. This is called array operators in contrast to ordinary matrix operators. Also note that treating the special case where the grid points fall onto x is very similar to what you had in the original implementation, only using a vector var such that x(var)==grid(i).
Now, you can also remove the outermost loop. This is a bit more tricky and there are two major approaches how you can do that in MATLAB. I will do it the simpler way, which can be less efficient, but more clear to read - using repmat:
function p = bf3(x, f, w, grid)
% Find grid points that coincide with x.
% The below compares all grid values with all x values
% and returns a matrix of 0/1. 1 is in the (row,col)
% for which grid(row)==x(col)
var = bsxfun(#eq, grid', x);
% find the logical indexes of those x entries
varx = sum(var, 1)~=0;
% and of those grid entries
varp = sum(var, 2)~=0;
% Outer-most loop removal - use repmat to
% replicate the vectors into matrices.
% Thus, instead of having a loop over j
% you have matrices of values that would be
% referenced in the loop
ww = repmat(w, numel(grid), 1);
ff = repmat(f, numel(grid), 1);
xx = repmat(x, numel(grid), 1);
gg = repmat(grid', 1, numel(x));
% perform the calculations element-wise on the matrices
sum1 = sum((ww.*ff)./(gg - xx),2);
sum2 = sum(ww./(gg - xx),2);
p = sum1./sum2;
% fix the case where grid==x and return
p(varp) = f(varx);
end
The fully vectorized version can be implemented with bsxfun rather than repmat. This can potentially be a bit faster, since the matrices are not explicitly formed. However, the speed difference may not be large for small system sizes.
Also, the first solution with one loop is also not too bad performance-wise. I suggest you test those and see, what is better. Maybe it is not worth it to fully vectorize? The first code looks a bit more readable..

MATLAB loop optimization

I have a matrix, matrix_logical(50000,100000), that is a sparse logical matrix (a lot of falses, some true). I have to produce a matrix, intersect(50000,50000), that, for each pair, i,j, of rows of matrix_logical(50000,100000), stores the number of columns for which rows i and j have both "true" as the value.
Here is the code I wrote:
% store in advance the nonzeros cols
for i=1:50000
nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end
intersect = zeros(50000,50000);
for i=1:49999
a = cell2mat(nonzeros{i});
for j=(i+1):50000
b = cell2mat(nonzeros{j});
intersect(i,j) = numel(intersect(a,b));
end
end
Is it possible to further increase the performance? It takes too long to compute the matrix. I would like to avoid the double loop in the second part of the code.
matrix_logical is sparse, but it is not saved as sparse in MATLAB because otherwise the performance become the worst possible.
Since the [i,j] entry counts the number of non zero elements in the element-wise multiplication of rows i and j, you can do it by multiplying matrix_logical with its transpose (you should convert to numeric data type first, e.g matrix_logical = single(matrix_logical)):
inter = matrix_logical * matrix_logical';
And it works both for sparse or full representation.
EDIT
In order to calculate numel(intersect(a,b))/numel(union(a,b)); (as asked in your comment), you can use the fact that for two sets a and b, you have
length(union(a,b)) = length(a) + length(b) - length(intersect(a,b))
so, you can do the following:
unLen = sum(matrix_logical,2);
tmp = repmat(unLen, 1, length(unLen)) + repmat(unLen', length(unLen), 1);
inter = matrix_logical * matrix_logical';
inter = inter ./ (tmp-inter);
If I understood you correctly, you want a logical AND of the rows:
intersct = zeros(50000, 50000)
for ii = 1:49999
for jj = ii:50000
intersct(ii, jj) = sum(matrix_logical(ii, :) & matrix_logical(jj, :));
intersct(jj, ii) = intersct(ii, jj);
end
end
Doesn't avoid the double loop, but at least works without the first loop and the slow find command.
Elaborating on my comment, here is a distance function suitable for pdist()
function out = distfun(xi,xj)
out = zeros(size(xj,1),1);
for i=1:size(xj,1)
out(i) = sum(sum( xi & xj(i,:) )) / sum(sum( xi | xj(i,:) ));
end
In my experience, sum(sum()) is faster for logicals than nnz(), thus its appearance above.
You would also need to use squareform() to reshape the output of pdist() appropriately:
squareform(pdist(martrix_logical,#distfun));
Note that pdist() includes a 'jaccard' distance measure, but it is actually the Jaccard distance and not the Jaccard index or coefficient, which is the value you are apparently after.

Resources