Julia : How to fill a matrix row by row in julia - matrix

I have 200 vectors; each one has a length of 10000.
I want to fill a matrix such that each line represents a vector.

If your vectors are already stored in an array then you can use vcat( ) here:
A = [rand(10000)' for idx in 1:200]
B = vcat(A...)

Julia stores matrices in column-major order so you are going to have to adapt a bit to that
If you have 200 vectors of length 100000 you should make first an empty vector, a = [], this will be your matrix
Then you have to vcat the first vector to your empty vector, like so
v = your vectors, however they are defined
a = []
a = vcat(a, v[1])
Then you can iterate through vectors 2:200 by
for i in 2:200
a = hcat(a,v[i])
end
And finally transpose a
a = a'
Alternatively, you could do
a = zeros(200,10000)
for i in 1:length(v)
a[i,:] = v[i]
end
but I suppose that wont be as fast, if performance is at all an issue, because as I said, julia stores in column major order so access will be slower
EDIT from reschu's comment
a = zeros(10000,200)
for i in 1:length(v)
a[:,i] = v[i]
end
a = a'

Related

Fast way of computing Inverse EDF in Matlab

I am running the following code to obtain the values of the inverse EDF of a data Matrix at the data points:
function [mOUT] = InvEDF (data)
% compute inverse of EDF at data values
% function takes T*K matrix of data and returns T*K matrix of transformed
% data, keepin the order of the original series
T = rows(data);
K = cols(data);
mOUT=zeros(T,K);
for j = 1:K
for i = 1:T
temp = data(:,j)<=data(i,j);
mOUT(i,j) = 1/(T+1)*sum(temp);
end
end
The data Matrix is usually of size 1000*10 or even 1000*30 and I am calling this function a few thousand times. Is there a faster way of doinf this? Any answers are appreciated. Thanks!
You can sort the values and use the index in the sorted matrix as the count of values less or equal. We treat each column by itself, so I will illustrate on a Mx1 matrix.
A = rand(M,1);
[B,I] = sort(A);
C(I) = 1:M;
C(i) will now contain the count of values less or equal to A(i). If you can have duplicate values you need to take that into account.
The advantage of this approach is that we can do it in O(M log M) time, whereas your original inner loop is O(M^2)
Try this -
mOUT=zeros(T,K);
for j = 1:K
d1 = data(:,j);
mOUT(:,j) = sum(bsxfun(#ge,d1,d1'),2); %%//'
end
mOUT = mOUT./(T+1);

Best practice when working with sparse matrices

My question is twofold:
In the below, A = full(S) where S is a sparse matrix.
What's the "correct" way to access an element in a sparse matrix?
That is, what would the sparse equivalent to var = A(row, col) be?
My view on this topic: You wouldn't do anything different. var = S(row, col) is as efficient as it gets.
What's the "correct" way to add elements to a sparse matrix?
That is, what would the sparse equivalent of A(row, col) = var be? (Assuming A(row, col) == 0 to begin with)
It is known that simply doing A(row, col) = var is slow for large sparse matrices. From the documentation:
If you wanted to change a value in this matrix, you might be tempted
to use the same indexing:
B(3,1) = 42; % This code does work, however, it is slow.
My view on this topic: When working with sparse matrices, you often start with the vectors and use them to create the matrix this way: S = sparse(i,j,s,m,n). Of course, you could also have created it like this: S = sparse(A) or sprand(m,n,density) or something similar.
If you start of the first way, you would simply do:
i = [i; new_i];
j = [j; new_j];
s = [s; new_s];
S = sparse(i,j,s,m,n);
If you started out not having the vectors, you would do the same thing, but use find first:
[i, j, s] = find(S);
i = [i; new_i];
j = [j; new_j];
s = [s; new_s];
S = sparse(i,j,s,m,n);
Now you would of course have the vectors, and can reuse them if you're doing this operation several times. It would however be better to add all new elements at once, and not do the above in a loop, because growing vectors are slow. In this case, new_i, new_j and new_s will be vectors corresponding to the new elements.
MATLAB stores sparse matrices in compressed column format. This means that when you perform an operations like A(2,2) (to get the element in at row 2, column 2) MATLAB first access the second column and then finds the element in row 2 (row indices in each column are stored in ascending order). You can think of it as:
A2 = A(:,2);
A2(2)
If you are only accessing a single element of sparse matrix doing var = S(r,c) is fine. But if you are looping over the elements of a sparse matrix, you probably want to access one column at a time, and then loop over the nonzero row indices via [i,~,x]=find(S(:,c)). Or use something like spfun.
You should avoid constructing a dense matrix A and then doing S = sparse(A), as this operations just squeezes out zeros. Instead, as you note, it's much more efficient to build a sparse matrix from scratch using triplet-form and a call to sparse(i,j,x,m,n). MATLAB has a nice page which describes how to efficiently construct sparse matrices.
The original paper describing the implementation of sparse matrices in MATLAB is quite a good read. It provides some more info on how the sparse matrix algorithms were originally implemented.
EDIT: Answer modified according to suggestions by Oleg (see comments).
Here is my benchmark for the second part of your question. For testing direct insertion, the matrices are initialized empty with a varying nzmax. For testing rebuilding from index vectors this is irrelevant as the matrix is built from scratch at every call. The two methods were tested for doing a single insertion operation (of a varying number of elements), or for doing incremental insertions, one value at a time (up to the same numbers of elements). Due to the computational strain I lowered the number of repetitions from 1000 to 100 for each test case. I believe this is still statistically viable.
Ssize = 10000;
NumIterations = 100;
NumInsertions = round(logspace(0, 4, 10));
NumInitialNZ = round(logspace(1, 4, 4));
NumTests = numel(NumInsertions) * numel(NumInitialNZ);
TimeDirect = zeros(numel(NumInsertions), numel(NumInitialNZ));
TimeIndices = zeros(numel(NumInsertions), 1);
%% Single insertion operation (non-incremental)
% Method A: Direct insertion
for iInitialNZ = 1:numel(NumInitialNZ)
disp(['Running with initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]);
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
S = spalloc(Ssize, Ssize, NumInitialNZ(iInitialNZ));
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
tic
S(r,c) = 1;
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' direct insertions: ' num2str(tSum) ' seconds']);
TimeDirect(iInsertions, iInitialNZ) = tSum;
end
end
% Method B: Rebuilding from index vectors
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
i = []; j = []; s = [];
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
s_ones = ones(NumInsertions(iInsertions), 1);
tic
i_new = [i; r];
j_new = [j; c];
s_new = [s; s_ones];
S = sparse(i_new, j_new ,s_new , Ssize, Ssize);
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' indexed insertions: ' num2str(tSum) ' seconds']);
TimeIndices(iInsertions) = tSum;
end
SingleOperation.TimeDirect = TimeDirect;
SingleOperation.TimeIndices = TimeIndices;
%% Incremental insertion
for iInitialNZ = 1:numel(NumInitialNZ)
disp(['Running with initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]);
% Method A: Direct insertion
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
S = spalloc(Ssize, Ssize, NumInitialNZ(iInitialNZ));
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
tic
for ii = 1:NumInsertions(iInsertions)
S(r(ii),c(ii)) = 1;
end
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' direct insertions: ' num2str(tSum) ' seconds']);
TimeDirect(iInsertions, iInitialNZ) = tSum;
end
end
% Method B: Rebuilding from index vectors
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
i = []; j = []; s = [];
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
tic
for ii = 1:NumInsertions(iInsertions)
i = [i; r(ii)];
j = [j; c(ii)];
s = [s; 1];
S = sparse(i, j ,s , Ssize, Ssize);
end
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' indexed insertions: ' num2str(tSum) ' seconds']);
TimeIndices(iInsertions) = tSum;
end
IncremenalInsertion.TimeDirect = TimeDirect;
IncremenalInsertion.TimeIndices = TimeIndices;
%% Plot results
% Single insertion
figure;
loglog(NumInsertions, SingleOperation.TimeIndices);
cellLegend = {'Using index vectors'};
hold all;
for iInitialNZ = 1:numel(NumInitialNZ)
loglog(NumInsertions, SingleOperation.TimeDirect(:, iInitialNZ));
cellLegend = [cellLegend; {['Direct insertion, initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]}];
end
hold off;
title('Benchmark for single insertion operation');
xlabel('Number of insertions'); ylabel('Runtime for 100 operations [sec]');
legend(cellLegend, 'Location', 'NorthWest');
grid on;
% Incremental insertions
figure;
loglog(NumInsertions, IncremenalInsertion.TimeIndices);
cellLegend = {'Using index vectors'};
hold all;
for iInitialNZ = 1:numel(NumInitialNZ)
loglog(NumInsertions, IncremenalInsertion.TimeDirect(:, iInitialNZ));
cellLegend = [cellLegend; {['Direct insertion, initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]}];
end
hold off;
title('Benchmark for incremental insertions');
xlabel('Number of insertions'); ylabel('Runtime for 100 operations [sec]');
legend(cellLegend, 'Location', 'NorthWest');
grid on;
I ran this in MATLAB R2012a. The results for doing a single insertion operations are summarized in this graph:
This shows that using direct insertion is much slower than using index vectors, if only a single operation is done. The growth in the case of using index vectors can be either because of growing the vectors themselves or from the lengthier sparse matrix construction, I'm not sure which. The initial nzmax used to construct the matrices seems to have no effect on their growth.
The results for doing incremental insertions are summarized in this graph:
Here we see the opposite trend: using index vectors is slower, because of the overhead of incrementally growing them and rebuilding the sparse matrix at every step. A way to understand this is to look at the first point in the previous graph: for insertion of a single element, it is more effective to use direct insertion rather than rebuilding using the index vectors. In the incrementlal case, this single insertion is done repetitively, and so it becomes viable to use direct insertion rather than index vectors, against MATLAB's suggestion.
This understanding also suggests that were we to incrementally add, say, 100 elements at a time, the efficient choice would then be to use index vectors rather than direct insertion, as the first graph shows this method to be faster for insertions of this size. In between these two regimes is an area where you should probably experiment to see which method is more effective, though probably the results will show that the difference between the methods is neglibile there.
Bottom line: which method should I use?
My conclusion is that this is dependant on the nature of your intended insertion operations.
If you intend to insert elements one at a time, use direct insertion.
If you intend to insert a large (>10) number of elements at a time, rebuild the matrix from index vectors.

Algorithm to express elements of a matrix as a vector

Statement of Problem:
I have an array M with m rows and n columns. The array M is filled with non-zero elements.
I also have a vector t with n elements, and a vector omega
with m elements.
The elements of t correspond to the columns of matrix M.
The elements of omega correspond to the rows of matrix M.
Goal of Algorithm:
Define chi as the multiplication of vector t and omega. I need to obtain a 1D vector a, where each element of a is a function of chi.
Each element of chi is unique (i.e. every element is different).
Using mathematics notation, this can be expressed as a(chi)
Each element of vector a corresponds to an element or elements of M.
Matlab code:
Here is a code snippet showing how the vectors t and omega are generated. The matrix M is pre-existing.
[m,n] = size(M);
t = linspace(0,5,n);
omega = linspace(0,628,m);
Conceptual Diagram:
This appears to be a type of integration (if this is the right word for it) along constant chi.
Reference:
Link to reference
The algorithm is not explicitly stated in the reference. I only wish that this algorithm was described in a manner reminiscent of computer science textbooks!
Looking at Figure 11.5, the matrix M is Figure 11.5(a). The goal is to find an algorithm to convert Figure 11.5(a) into 11.5(b).
It appears that the algorithm is a type of integration (averaging, perhaps?) along constant chi.
It appears to me that reshape is the matlab function you need to use. As noted in the link:
B = reshape(A,siz) returns an n-dimensional array with the same elements as A, but reshaped to siz, a vector representing the dimensions of the reshaped array.
That is, create a vector siz with the number m*n in it, and say A = reshape(P,siz), where P is the product of vectors t and ω; or perhaps say something like A = reshape(t*ω,[m*n]). (I don't have matlab here, or would run a test to see if I have the product the right way around.) Note, the link does not show an example with one number (instead of several) after the matrix parameter to reshape, but I would expect from the description that A = reshape(t*ω,m*n) might also work.
You should add a pseudocode or a link to the algorithm you want to implement. From what I could understood I have developed the following code anyway:
M = [1 2 3 4; 5 6 7 8; 9 10 11 12]' % easy test M matrix
a = reshape(M, prod(size(M)), 1) % convert M to vector 'a' with reshape command
[m,n] = size(M); % Your sample code
t = linspace(0,5,n); % Your sample code
omega = linspace(0,628,m); % Your sample code
for i=1:length(t)
for j=1:length(omega) % Acces a(chi) in the desired order
chi = length(omega)*(i-1)+j;
t(i) % related t value
omega(j) % related omega value
a(chi) % related a(chi) value
end
end
As you can see, I also think that the reshape() function is the solution to your problems. I hope that this code helps,
The basic idea is to use two separate loops. The outer loop is over the chi variable values, whereas the inner loop is over the i variable values. Referring to the above diagram in the original question, the i variable corresponds to the x-axis (time), and the j variable corresponds to the y-axis (frequency). Assuming that the chi, i, and j variables can take on any real number, bilinear interpolation is then used to find an amplitude corresponding to an element in matrix M. The integration is just an averaging over elements of M.
The following code snippet provides an overview of the basic algorithm to express elements of a matrix as a vector using the spectral collapsing from 2D to 1D. I can't find any reference for this, but it is a solution that works for me.
% Amp = amplitude vector corresponding to Figure 11.5(b) in book reference
% M = matrix corresponding to the absolute value of the complex Gabor transform
% matrix in Figure 11.5(a) in book reference
% Nchi = number of chi in chi vector
% prod = product of timestep and frequency step
% dt = time step
% domega = frequency step
% omega_max = maximum angular frequency
% i = time array element along x-axis
% j = frequency array element along y-axis
% current_i = current time array element in loop
% current_j = current frequency array element in loop
% Nchi = number of chi
% Nivar = number of i variables
% ivar = i variable vector
% calculate for chi = 0, which only occurs when
% t = 0 and omega = 0, at i = 1
av0 = mean( M(1,:) );
av1 = mean( M(2:end,1) );
av2 = mean( [av0 av1] );
Amp(1) = av2;
% av_val holds the sum of all values that have been averaged
av_val_sum = 0;
% loop for rest of chi
for ccnt = 2:Nchi % 2:Nchi
av_val_sum = 0; % reset av_val_sum
current_chi = chi( ccnt ); % current value of chi
% loop over i vector
for icnt = 1:Nivar % 1:Nivar
current_i = ivar( icnt );
current_j = (current_chi / (prod * (current_i - 1))) + 1;
current_t = dt * (current_i - 1);
current_omega = domega * (current_j - 1);
% values out of range
if(current_omega > omega_max)
continue;
end
% use bilinear interpolation to find an amplitude
% at current_t and current_omega from matrix M
% f_x_y is the bilinear interpolated amplitude
% Insert bilinear interpolation code here
% add to running sum
av_val_sum = av_val_sum + f_x_y;
end % icnt loop
% compute the average over all i
av = av_val_sum / Nivar;
% assign the average to Amp
Amp(ccnt) = av;
end % ccnt loop

MATLAB loop optimization

I have a matrix, matrix_logical(50000,100000), that is a sparse logical matrix (a lot of falses, some true). I have to produce a matrix, intersect(50000,50000), that, for each pair, i,j, of rows of matrix_logical(50000,100000), stores the number of columns for which rows i and j have both "true" as the value.
Here is the code I wrote:
% store in advance the nonzeros cols
for i=1:50000
nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end
intersect = zeros(50000,50000);
for i=1:49999
a = cell2mat(nonzeros{i});
for j=(i+1):50000
b = cell2mat(nonzeros{j});
intersect(i,j) = numel(intersect(a,b));
end
end
Is it possible to further increase the performance? It takes too long to compute the matrix. I would like to avoid the double loop in the second part of the code.
matrix_logical is sparse, but it is not saved as sparse in MATLAB because otherwise the performance become the worst possible.
Since the [i,j] entry counts the number of non zero elements in the element-wise multiplication of rows i and j, you can do it by multiplying matrix_logical with its transpose (you should convert to numeric data type first, e.g matrix_logical = single(matrix_logical)):
inter = matrix_logical * matrix_logical';
And it works both for sparse or full representation.
EDIT
In order to calculate numel(intersect(a,b))/numel(union(a,b)); (as asked in your comment), you can use the fact that for two sets a and b, you have
length(union(a,b)) = length(a) + length(b) - length(intersect(a,b))
so, you can do the following:
unLen = sum(matrix_logical,2);
tmp = repmat(unLen, 1, length(unLen)) + repmat(unLen', length(unLen), 1);
inter = matrix_logical * matrix_logical';
inter = inter ./ (tmp-inter);
If I understood you correctly, you want a logical AND of the rows:
intersct = zeros(50000, 50000)
for ii = 1:49999
for jj = ii:50000
intersct(ii, jj) = sum(matrix_logical(ii, :) & matrix_logical(jj, :));
intersct(jj, ii) = intersct(ii, jj);
end
end
Doesn't avoid the double loop, but at least works without the first loop and the slow find command.
Elaborating on my comment, here is a distance function suitable for pdist()
function out = distfun(xi,xj)
out = zeros(size(xj,1),1);
for i=1:size(xj,1)
out(i) = sum(sum( xi & xj(i,:) )) / sum(sum( xi | xj(i,:) ));
end
In my experience, sum(sum()) is faster for logicals than nnz(), thus its appearance above.
You would also need to use squareform() to reshape the output of pdist() appropriately:
squareform(pdist(martrix_logical,#distfun));
Note that pdist() includes a 'jaccard' distance measure, but it is actually the Jaccard distance and not the Jaccard index or coefficient, which is the value you are apparently after.

Sorting rows of two matrices using same ordering [duplicate]

Suppose I have a matrix A and I sort the rows of this matrix. How do I replicate the same ordering on a matrix B (same size of course)?
E.g.
A = rand(3,4);
[val ind] = sort(A,2);
B = rand(3,4);
%// Reorder the elements of B according to the reordering of A
This is the best I've come up with
m = size(A,1);
B = B(bsxfun(#plus,(ind-1)*m,(1:m)'));
Out of curiosity, any alternatives?
Update: Jonas' excellent solution profiled on 2008a (XP):
n = n
0.048524 1.4632 1.4791 1.195 1.0662 1.108 1.0082 0.96335 0.93155 0.90532 0.88976
n = 2m
0.63202 1.3029 1.1112 1.0501 0.94703 0.92847 0.90411 0.8849 0.8667 0.92098 0.85569
It just goes to show that loops aren't anathema to MATLAB programmers anymore thanks to JITA (perhaps).
A somewhat clearer way to do this is to use a loop
A = rand(3,4);
B = rand(3,4);
[sortedA,ind] = sort(A,2);
for r = 1:size(A,1)
B(r,:) = B(r,ind(r,:));
end
Interestingly, the loop version is faster for small (<12 rows) and large (>~700 rows) square arrays (r2010a, OS X). The more columns there are relative to rows, the better the loop performs.
Here's the code I quickly hacked up for testing:
siz = 10:100:1010;
tt = zeros(100,2,length(siz));
for s = siz
for k = 1:100
A = rand(s,1*s);
B = rand(s,1*s);
[sortedA,ind] = sort(A,2);
tic;
for r = 1:size(A,1)
B(r,:) = B(r,ind(r,:));
end,tt(k,1,s==siz) = toc;
tic;
m = size(A,1);
B = B(bsxfun(#plus,(ind-1)*m,(1:m).'));
tt(k,2,s==siz) = toc;
end
end
m = squeeze(mean(tt,1));
m(1,:)./m(2,:)
For square arrays
ans =
0.7149 2.1508 1.2203 1.4684 1.2339 1.1855 1.0212 1.0201 0.8770 0.8584 0.8405
For twice as many columns as there are rows (same number of rows)
ans =
0.8431 1.2874 1.3550 1.1311 0.9979 0.9921 0.8263 0.7697 0.6856 0.7004 0.7314
Sort() returns the index along the dimension you sorted on. You can explicitly construct indexes for the other dimensions that cause the rows to remain stable, and then use linear indexing to rearrange the whole array.
A = rand(3,4);
B = A; %// Start with same values so we can programmatically check result
[A2 ix2] = sort(A,2);
%// ix2 is the index along dimension 2, and we want dimension 1 to remain unchanged
ix1 = repmat([1:size(A,1)]', [1 size(A,2)]); %//'
%// Convert to linear index equivalent of the reordering of the sort() call
ix = sub2ind(size(A), ix1, ix2)
%// And apply it
B2 = B(ix)
ok = isequal(A2, B2) %// confirm reordering
Can't you just do this?
[val ind]=sort(A);
B=B(ind);
It worked for me, unless I'm understanding your problem wrong.

Resources