Related
I have 200 vectors; each one has a length of 10000.
I want to fill a matrix such that each line represents a vector.
If your vectors are already stored in an array then you can use vcat( ) here:
A = [rand(10000)' for idx in 1:200]
B = vcat(A...)
Julia stores matrices in column-major order so you are going to have to adapt a bit to that
If you have 200 vectors of length 100000 you should make first an empty vector, a = [], this will be your matrix
Then you have to vcat the first vector to your empty vector, like so
v = your vectors, however they are defined
a = []
a = vcat(a, v[1])
Then you can iterate through vectors 2:200 by
for i in 2:200
a = hcat(a,v[i])
end
And finally transpose a
a = a'
Alternatively, you could do
a = zeros(200,10000)
for i in 1:length(v)
a[i,:] = v[i]
end
but I suppose that wont be as fast, if performance is at all an issue, because as I said, julia stores in column major order so access will be slower
EDIT from reschu's comment
a = zeros(10000,200)
for i in 1:length(v)
a[:,i] = v[i]
end
a = a'
I am looking for an optimal way to program this summation ratio. As input I have two vectors v_mn and x_mn with (M*N)x1 elements each.
The ratio is of the form:
The vector x_mn is 0-1 vector so when x_mn=1, the ration is r given above and when x_mn=0 the ratio is 0.
The vector v_mn is a vector which contain real numbers.
I did the denominator like this but it takes a lot of times.
function r_ij = denominator(v_mn, M, N, i, j)
%here x_ij=1, to get r_ij.
S = [];
for m = 1:M
for n = 1:N
if (m ~= i)
if (n ~= j)
S = [S v_mn(i, n)];
else
S = [S 0];
end
else
S = [S 0];
end
end
end
r_ij = 1+S;
end
Can you give a good way to do it in matlab. You can ignore the ratio and give me the denominator which is more complicated.
EDIT: I am sorry I did not write it very good. The i and j are some numbers between 1..M and 1..N respectively. As you can see, the ratio r is many values (M*N values). So I calculated only the value i and j. More precisely, I supposed x_ij=1. Also, I convert the vectors v_mn into a matrix that's why I use double index.
If you reshape your data, your summation is just a repeated matrix/vector multiplication.
Here's an implementation for a single m and n, along with a simple speed/equality test:
clc
%# some arbitrary test parameters
M = 250;
N = 1000;
v = rand(M,N); %# (you call it v_mn)
x = rand(M,N); %# (you call it x_mn)
m0 = randi(M,1); %# m of interest
n0 = randi(N,1); %# n of interest
%# "Naive" version
tic
S1 = 0;
for mm = 1:M %# (you call this m')
if mm == m0, continue; end
for nn = 1:N %# (you call this n')
if nn == n0, continue; end
S1 = S1 + v(m0,nn) * x(mm,nn);
end
end
r1 = v(m0,n0)*x(m0,n0) / (1+S1);
toc
%# MATLAB version: use matrix multiplication!
tic
ninds = [1:m0-1 m0+1:M];
minds = [1:n0-1 n0+1:N];
S2 = sum( x(minds, ninds) * v(m0, ninds).' );
r2 = v(m0,n0)*x(m0,n0) / (1+S2);
toc
%# Test if values are equal
abs(r1-r2) < 1e-12
Outputs on my machine:
Elapsed time is 0.327004 seconds. %# loop-version
Elapsed time is 0.002455 seconds. %# version with matrix multiplication
ans =
1 %# and yes, both are equal
So the speedup is ~133×
Now that's for a single value of m and n. To do this for all values of m and n, you can use an (optimized) double loop around it:
r = zeros(M,N);
for m0 = 1:M
xx = x([1:m0-1 m0+1:M], :);
vv = v(m0,:).';
for n0 = 1:N
ninds = [1:n0-1 n0+1:N];
denom = 1 + sum( xx(:,ninds) * vv(ninds) );
r(m0,n0) = v(m0,n0)*x(m0,n0)/denom;
end
end
which completes in ~15 seconds on my PC for M = 250, N= 1000 (R2010a).
EDIT: actually, with a little more thought, I was able to reduce it all down to this:
denom = zeros(M,N);
for mm = 1:M
xx = x([1:mm-1 mm+1:M],:);
denom(mm,:) = sum( xx*v(mm,:).' ) - sum( bsxfun(#times, xx, v(mm,:)) );
end
denom = denom + 1;
r_mn = x.*v./denom;
which completes in less than 1 second for N = 250 and M = 1000 :)
For a start you need to pre-alocate your S matrix. It changes size every loop so put
S = zeros(m*n, 1)
at the start of your function. This will also allow you to do away with your else conditional statements, ie they will reduce to this:
if (m ~= i)
if (n ~= j)
S(m*M + n) = v_mn(i, n);
Otherwise since you have to visit every element im afraid it may not be able to get much faster.
If you desperately need more speed you can look into doing some mex coding which is code in c/c++ but run in matlab.
http://www.mathworks.com.au/help/matlab/matlab_external/introducing-mex-files.html
Rather than first jumping into vectorization of the double loop, you may want modify the above to make sure that it does what you want. In this code, there is no summing of the data, instead a vector S is being resized at each iteration. As well, the signature could include the matrices V and X so that the multiplication occurs as in the formula (rather than just relying on the value of X to be zero or one, let us pass that matrix in).
The function could look more like the following (I've replaced the i,j inputs with m,n to be more like the equation):
function result = denominator(V,X,m,n)
% use the size of V to determine M and N
[M,N] = size(V);
% initialize the summed value to one (to account for one at the end)
result = 1;
% outer loop
for i=1:M
% ignore the case where m==i
if i~=m
for j=1:N
% ignore the case where n==j
if j~=n
result = result + V(m,j)*X(i,j);
end
end
end
end
Note how the first if is outside of the inner for loop since it does not depend on j. Try the above and see what happens!
You can vectorize from within Matlab to speed up your calculations. Every time you use an operation like ".^" or ".*" or any matrix operation for that matter, Matlab will do them in parallel, which is much, much faster than iterating over each item.
In this case, look at what you are doing in terms of matrices. First, in your loop you are only dealing with the mth row of $V_{nm}$, which we can use as a vector for itself.
If you look at your formula carefully, you can figure out that you almost get there if you just write this row vector as a column vector and multiply the matrix $X_{nm}$ to it from the left, using standard matrix multiplication. The resulting vector contains the sums over all n. To get the final result, just sum up this vector.
function result = denominator_vectorized(V,X,m,n)
% get the part of V with the first index m
Vm = V(m,:)';
% remove the parts of X you don't want to iterate over. Note that, since I
% am inside the function, I am only editing the value of X within the scope
% of this function.
X(m,:) = 0;
X(:,n) = 0;
%do the matrix multiplication and the summation at once
result = 1-sum(X*Vm);
To show you how this optimizes your operation, I will compare it to the code proposed by another commenter:
function result = denominator(V,X,m,n)
% use the size of V to determine M and N
[M,N] = size(V);
% initialize the summed value to one (to account for one at the end)
result = 1;
% outer loop
for i=1:M
% ignore the case where m==i
if i~=m
for j=1:N
% ignore the case where n==j
if j~=n
result = result + V(m,j)*X(i,j);
end
end
end
end
The test:
V=rand(10000,10000);
X=rand(10000,10000);
disp('looped version')
tic
denominator(V,X,1,1)
toc
disp('matrix operation')
tic
denominator_vectorized(V,X,1,1)
toc
The result:
looped version
ans =
2.5197e+07
Elapsed time is 4.648021 seconds.
matrix operation
ans =
2.5197e+07
Elapsed time is 0.563072 seconds.
That is almost ten times the speed of the loop iteration. So, always look out for possible matrix operations in your code. If you have the Parallel Computing Toolbox installed and a CUDA-enabled graphics card installed, Matlab will even perform these operations on your graphics card without any further effort on your part!
EDIT: That last bit is not entirely true. You still need to take a few steps to do operations on CUDA hardware, but they aren't a lot. See Matlab documentation.
My question is twofold:
In the below, A = full(S) where S is a sparse matrix.
What's the "correct" way to access an element in a sparse matrix?
That is, what would the sparse equivalent to var = A(row, col) be?
My view on this topic: You wouldn't do anything different. var = S(row, col) is as efficient as it gets.
What's the "correct" way to add elements to a sparse matrix?
That is, what would the sparse equivalent of A(row, col) = var be? (Assuming A(row, col) == 0 to begin with)
It is known that simply doing A(row, col) = var is slow for large sparse matrices. From the documentation:
If you wanted to change a value in this matrix, you might be tempted
to use the same indexing:
B(3,1) = 42; % This code does work, however, it is slow.
My view on this topic: When working with sparse matrices, you often start with the vectors and use them to create the matrix this way: S = sparse(i,j,s,m,n). Of course, you could also have created it like this: S = sparse(A) or sprand(m,n,density) or something similar.
If you start of the first way, you would simply do:
i = [i; new_i];
j = [j; new_j];
s = [s; new_s];
S = sparse(i,j,s,m,n);
If you started out not having the vectors, you would do the same thing, but use find first:
[i, j, s] = find(S);
i = [i; new_i];
j = [j; new_j];
s = [s; new_s];
S = sparse(i,j,s,m,n);
Now you would of course have the vectors, and can reuse them if you're doing this operation several times. It would however be better to add all new elements at once, and not do the above in a loop, because growing vectors are slow. In this case, new_i, new_j and new_s will be vectors corresponding to the new elements.
MATLAB stores sparse matrices in compressed column format. This means that when you perform an operations like A(2,2) (to get the element in at row 2, column 2) MATLAB first access the second column and then finds the element in row 2 (row indices in each column are stored in ascending order). You can think of it as:
A2 = A(:,2);
A2(2)
If you are only accessing a single element of sparse matrix doing var = S(r,c) is fine. But if you are looping over the elements of a sparse matrix, you probably want to access one column at a time, and then loop over the nonzero row indices via [i,~,x]=find(S(:,c)). Or use something like spfun.
You should avoid constructing a dense matrix A and then doing S = sparse(A), as this operations just squeezes out zeros. Instead, as you note, it's much more efficient to build a sparse matrix from scratch using triplet-form and a call to sparse(i,j,x,m,n). MATLAB has a nice page which describes how to efficiently construct sparse matrices.
The original paper describing the implementation of sparse matrices in MATLAB is quite a good read. It provides some more info on how the sparse matrix algorithms were originally implemented.
EDIT: Answer modified according to suggestions by Oleg (see comments).
Here is my benchmark for the second part of your question. For testing direct insertion, the matrices are initialized empty with a varying nzmax. For testing rebuilding from index vectors this is irrelevant as the matrix is built from scratch at every call. The two methods were tested for doing a single insertion operation (of a varying number of elements), or for doing incremental insertions, one value at a time (up to the same numbers of elements). Due to the computational strain I lowered the number of repetitions from 1000 to 100 for each test case. I believe this is still statistically viable.
Ssize = 10000;
NumIterations = 100;
NumInsertions = round(logspace(0, 4, 10));
NumInitialNZ = round(logspace(1, 4, 4));
NumTests = numel(NumInsertions) * numel(NumInitialNZ);
TimeDirect = zeros(numel(NumInsertions), numel(NumInitialNZ));
TimeIndices = zeros(numel(NumInsertions), 1);
%% Single insertion operation (non-incremental)
% Method A: Direct insertion
for iInitialNZ = 1:numel(NumInitialNZ)
disp(['Running with initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]);
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
S = spalloc(Ssize, Ssize, NumInitialNZ(iInitialNZ));
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
tic
S(r,c) = 1;
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' direct insertions: ' num2str(tSum) ' seconds']);
TimeDirect(iInsertions, iInitialNZ) = tSum;
end
end
% Method B: Rebuilding from index vectors
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
i = []; j = []; s = [];
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
s_ones = ones(NumInsertions(iInsertions), 1);
tic
i_new = [i; r];
j_new = [j; c];
s_new = [s; s_ones];
S = sparse(i_new, j_new ,s_new , Ssize, Ssize);
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' indexed insertions: ' num2str(tSum) ' seconds']);
TimeIndices(iInsertions) = tSum;
end
SingleOperation.TimeDirect = TimeDirect;
SingleOperation.TimeIndices = TimeIndices;
%% Incremental insertion
for iInitialNZ = 1:numel(NumInitialNZ)
disp(['Running with initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]);
% Method A: Direct insertion
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
S = spalloc(Ssize, Ssize, NumInitialNZ(iInitialNZ));
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
tic
for ii = 1:NumInsertions(iInsertions)
S(r(ii),c(ii)) = 1;
end
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' direct insertions: ' num2str(tSum) ' seconds']);
TimeDirect(iInsertions, iInitialNZ) = tSum;
end
end
% Method B: Rebuilding from index vectors
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
i = []; j = []; s = [];
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
tic
for ii = 1:NumInsertions(iInsertions)
i = [i; r(ii)];
j = [j; c(ii)];
s = [s; 1];
S = sparse(i, j ,s , Ssize, Ssize);
end
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' indexed insertions: ' num2str(tSum) ' seconds']);
TimeIndices(iInsertions) = tSum;
end
IncremenalInsertion.TimeDirect = TimeDirect;
IncremenalInsertion.TimeIndices = TimeIndices;
%% Plot results
% Single insertion
figure;
loglog(NumInsertions, SingleOperation.TimeIndices);
cellLegend = {'Using index vectors'};
hold all;
for iInitialNZ = 1:numel(NumInitialNZ)
loglog(NumInsertions, SingleOperation.TimeDirect(:, iInitialNZ));
cellLegend = [cellLegend; {['Direct insertion, initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]}];
end
hold off;
title('Benchmark for single insertion operation');
xlabel('Number of insertions'); ylabel('Runtime for 100 operations [sec]');
legend(cellLegend, 'Location', 'NorthWest');
grid on;
% Incremental insertions
figure;
loglog(NumInsertions, IncremenalInsertion.TimeIndices);
cellLegend = {'Using index vectors'};
hold all;
for iInitialNZ = 1:numel(NumInitialNZ)
loglog(NumInsertions, IncremenalInsertion.TimeDirect(:, iInitialNZ));
cellLegend = [cellLegend; {['Direct insertion, initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]}];
end
hold off;
title('Benchmark for incremental insertions');
xlabel('Number of insertions'); ylabel('Runtime for 100 operations [sec]');
legend(cellLegend, 'Location', 'NorthWest');
grid on;
I ran this in MATLAB R2012a. The results for doing a single insertion operations are summarized in this graph:
This shows that using direct insertion is much slower than using index vectors, if only a single operation is done. The growth in the case of using index vectors can be either because of growing the vectors themselves or from the lengthier sparse matrix construction, I'm not sure which. The initial nzmax used to construct the matrices seems to have no effect on their growth.
The results for doing incremental insertions are summarized in this graph:
Here we see the opposite trend: using index vectors is slower, because of the overhead of incrementally growing them and rebuilding the sparse matrix at every step. A way to understand this is to look at the first point in the previous graph: for insertion of a single element, it is more effective to use direct insertion rather than rebuilding using the index vectors. In the incrementlal case, this single insertion is done repetitively, and so it becomes viable to use direct insertion rather than index vectors, against MATLAB's suggestion.
This understanding also suggests that were we to incrementally add, say, 100 elements at a time, the efficient choice would then be to use index vectors rather than direct insertion, as the first graph shows this method to be faster for insertions of this size. In between these two regimes is an area where you should probably experiment to see which method is more effective, though probably the results will show that the difference between the methods is neglibile there.
Bottom line: which method should I use?
My conclusion is that this is dependant on the nature of your intended insertion operations.
If you intend to insert elements one at a time, use direct insertion.
If you intend to insert a large (>10) number of elements at a time, rebuild the matrix from index vectors.
I have 2 matrices: V which is square MxM, and K which is MxN. Calling the dimension across rows x and the dimension across columns t, I need to evaluate the integral (i.e sum) over both dimensions of K times a t-shifted version of V, the answer being a function of the shift (almost like a convolution, see below). The sum is defined by the following expression, where _{} denotes the summation indices, and a zero-padding of out-of-limits elements is assumed:
S(t) = sum_{x,tau}[V(x,t+tau) * K(x,tau)]
I manage to do it with a single loop, over the t dimension (vectorizing the x dimension):
% some toy matrices
V = rand(50,50);
K = rand(50,10);
[M N] = size(K);
S = zeros(1, M);
for t = 1 : N
S(1,1:end-t+1) = S(1,1:end-t+1) + sum(bsxfun(#times, V(:,t:end), K(:,t)),1);
end
I have similar expressions which I managed to evaluate without a for loop, using a combination of conv2 and\or mirroring (flipping) of a single dimension. However I can't see how to avoid a for loop in this case (despite the appeared similarity to convolution).
Steps to vectorization
1] Perform sum(bsxfun(#times, V(:,t:end), K(:,t)),1) for all columns in V against all columns in K with matrix-multiplication -
sum_mults = V.'*K
This would give us a 2D array with each column representing sum(bsxfun(#times,.. operation at each iteration.
2] Step1 gave us all possible summations and also the values to be summed are not aligned in the same row across iterations, so we need to do a bit more work before summing along rows. The rest of the work is about getting a shifted up version. For the same, you can use boolean indexing with a upper and lower triangular boolean mask. Finally, we sum along each row for the final output. So, this part of the code would look like so -
valid_mask = tril(true(size(sum_mults)));
sum_mults_shifted = zeros(size(sum_mults));
sum_mults_shifted(flipud(valid_mask)) = sum_mults(valid_mask);
out = sum(sum_mults_shifted,2);
Runtime tests -
%// Inputs
V = rand(1000,1000);
K = rand(1000,200);
disp('--------------------- With original loopy approach')
tic
[M N] = size(K);
S = zeros(1, M);
for t = 1 : N
S(1,1:end-t+1) = S(1,1:end-t+1) + sum(bsxfun(#times, V(:,t:end), K(:,t)),1);
end
toc
disp('--------------------- With proposed vectorized approach')
tic
sum_mults = V.'*K; %//'
valid_mask = tril(true(size(sum_mults)));
sum_mults_shifted = zeros(size(sum_mults));
sum_mults_shifted(flipud(valid_mask)) = sum_mults(valid_mask);
out = sum(sum_mults_shifted,2);
toc
Output -
--------------------- With original loopy approach
Elapsed time is 2.696773 seconds.
--------------------- With proposed vectorized approach
Elapsed time is 0.044144 seconds.
This might be cheating (using arrayfun instead of a for loop) but I believe this expression gives you what you want:
S = arrayfun(#(t) sum(sum( V(:,(t+1):(t+N)) .* K )), 1:(M-N), 'UniformOutput', true)
I have a matrix, matrix_logical(50000,100000), that is a sparse logical matrix (a lot of falses, some true). I have to produce a matrix, intersect(50000,50000), that, for each pair, i,j, of rows of matrix_logical(50000,100000), stores the number of columns for which rows i and j have both "true" as the value.
Here is the code I wrote:
% store in advance the nonzeros cols
for i=1:50000
nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end
intersect = zeros(50000,50000);
for i=1:49999
a = cell2mat(nonzeros{i});
for j=(i+1):50000
b = cell2mat(nonzeros{j});
intersect(i,j) = numel(intersect(a,b));
end
end
Is it possible to further increase the performance? It takes too long to compute the matrix. I would like to avoid the double loop in the second part of the code.
matrix_logical is sparse, but it is not saved as sparse in MATLAB because otherwise the performance become the worst possible.
Since the [i,j] entry counts the number of non zero elements in the element-wise multiplication of rows i and j, you can do it by multiplying matrix_logical with its transpose (you should convert to numeric data type first, e.g matrix_logical = single(matrix_logical)):
inter = matrix_logical * matrix_logical';
And it works both for sparse or full representation.
EDIT
In order to calculate numel(intersect(a,b))/numel(union(a,b)); (as asked in your comment), you can use the fact that for two sets a and b, you have
length(union(a,b)) = length(a) + length(b) - length(intersect(a,b))
so, you can do the following:
unLen = sum(matrix_logical,2);
tmp = repmat(unLen, 1, length(unLen)) + repmat(unLen', length(unLen), 1);
inter = matrix_logical * matrix_logical';
inter = inter ./ (tmp-inter);
If I understood you correctly, you want a logical AND of the rows:
intersct = zeros(50000, 50000)
for ii = 1:49999
for jj = ii:50000
intersct(ii, jj) = sum(matrix_logical(ii, :) & matrix_logical(jj, :));
intersct(jj, ii) = intersct(ii, jj);
end
end
Doesn't avoid the double loop, but at least works without the first loop and the slow find command.
Elaborating on my comment, here is a distance function suitable for pdist()
function out = distfun(xi,xj)
out = zeros(size(xj,1),1);
for i=1:size(xj,1)
out(i) = sum(sum( xi & xj(i,:) )) / sum(sum( xi | xj(i,:) ));
end
In my experience, sum(sum()) is faster for logicals than nnz(), thus its appearance above.
You would also need to use squareform() to reshape the output of pdist() appropriately:
squareform(pdist(martrix_logical,#distfun));
Note that pdist() includes a 'jaccard' distance measure, but it is actually the Jaccard distance and not the Jaccard index or coefficient, which is the value you are apparently after.