Count the number of rows between each instance of a value in a matrix - performance

Assume the following matrix:
myMatrix = [
1 0 1
1 0 0
1 1 1
1 1 1
0 1 1
0 0 0
0 0 0
0 1 0
1 0 0
0 0 0
0 0 0
0 0 1
0 0 1
0 0 1
];
Given the above (and treating each column independently), I'm trying to create a matrix that will contain the number of rows since the last value of 1 has "shown up". For example, in the first column, the first four values would become 0 since there are 0 rows between each of those rows and the previous value of 1.
Row 5 would become 1, row 6 = 2, row 7 = 3, row 8 = 4. Since row 9 contains a 1, it would become 0 and the count starts again with row 10. The final matrix should look like this:
FinalMatrix = [
0 1 0
0 2 1
0 0 0
0 0 0
1 0 0
2 1 1
3 2 2
4 0 3
0 1 4
1 2 5
2 3 6
3 4 0
4 5 0
5 6 0
];
What is a good way of accomplishing something like this?
EDIT: I'm currently using the following code:
[numRow,numCol] = size(myMatrix);
oneColumn = 1:numRow;
FinalMatrix = repmat(oneColumn',1,numCol);
toSubtract = zeros(numRow,numCol);
for m=1:numCol
rowsWithOnes = find(myMatrix(:,m));
for mm=1:length(rowsWithOnes);
toSubtract(rowsWithOnes(mm):end,m) = rowsWithOnes(mm);
end
end
FinalMatrix = FinalMatrix - toSubtract;
which runs about 5 times faster than the bsxfun solution posted over many trials and data sets (which are about 1500 x 2500 in size). Can the code above be optimized?

For a single column you could do this:
col = 1; %// desired column
vals = bsxfun(#minus, 1:size(myMatrix,1), find(myMatrix(:,col)));
vals(vals<0) = inf;
result = min(vals, [], 1).';
Result for first column:
result =
0
0
0
0
1
2
3
4
0
1
2
3
4
5

find + diff + cumsum based approach -
offset_array = zeros(size(myMatrix));
for k1 = 1:size(myMatrix,2)
a = myMatrix(:,k1);
widths = diff(find(diff([1 ; a])~=0));
idx = find(diff(a)==1)+1;
offset_array(idx(idx<=numel(a)),k1) = widths(1:2:end);
end
FinalMatrix1 = cumsum(double(myMatrix==0) - offset_array);
Benchmarking
The benchmarking code for comparing the above mentioned approach against the one in the question is listed here -
clear all
myMatrix = round(rand(1500,2500)); %// create random input array
for k = 1:50000
tic(); elapsed = toc(); %// Warm up tic/toc
end
disp('------------- With FIND+DIFF+CUMSUM based approach') %//'#
tic
offset_array = zeros(size(myMatrix));
for k1 = 1:size(myMatrix,2)
a = myMatrix(:,k1);
widths = diff(find(diff([1 ; a])~=0));
idx = find(diff(a)==1)+1;
offset_array(idx(idx<=numel(a)),k1) = widths(1:2:end);
end
FinalMatrix1 = cumsum(double(myMatrix==0) - offset_array);
toc
clear FinalMatrix1 offset_array idx widths a
disp('------------- With original approach') %//'#
tic
[numRow,numCol] = size(myMatrix);
oneColumn = 1:numRow;
FinalMatrix = repmat(oneColumn',1,numCol); %//'#
toSubtract = zeros(numRow,numCol);
for m=1:numCol
rowsWithOnes = find(myMatrix(:,m));
for mm=1:length(rowsWithOnes);
toSubtract(rowsWithOnes(mm):end,m) = rowsWithOnes(mm);
end
end
FinalMatrix = FinalMatrix - toSubtract;
toc
The results I got were -
------------- With FIND+DIFF+CUMSUM based approach
Elapsed time is 0.311115 seconds.
------------- With original approach
Elapsed time is 7.587798 seconds.

Related

Quick way of finding complementary vectors in MATLAB

I have a matrix of N rows of binary vectors, i.e.
mymatrix = [ 1 0 0 1 0;
1 1 0 0 1;
0 1 1 0 1;
0 1 0 0 1;
0 0 1 0 0;
0 0 1 1 0;
.... ]
where I'd like to find the combinations of rows that, when added together, gets me exactly:
[1 1 1 1 1]
So in the above example, the combinations that would work are 1/3, 1/4/5, and 2/6.
The code I have for this right now is:
i = 1;
for j = 1:5
C = combnk([1:N],j); % Get every possible combination of rows
for c = 1:size(C,1)
if isequal(ones(1,5),sum(mymatrix(C(c,:),:)))
combis{i} = C(c,:);
i = i+1;
end
end
end
But as you would imagine, this takes a while, especially because of that combnk in there.
What might be a useful algorithm/function that can help me speed this up?
M = [
1 0 0 1 0;
1 1 0 0 1;
0 1 1 0 1;
0 1 0 0 1;
0 0 1 0 0;
0 0 1 1 0;
1 1 1 1 1
];
% Find all the unique combinations of rows...
S = (dec2bin(1:2^size(M,1)-1) == '1');
% Find the matching combinations...
matches = cell(0,1);
for i = 1:size(S,1)
S_curr = S(i,:);
rows = M(S_curr,:);
rows_sum = sum(rows,1);
if (all(rows_sum == 1))
matches = [matches; {find(S_curr)}];
end
end
To display your matches in a good stylized way:
for i = 1:numel(matches)
match = matches{i};
if (numel(match) == 1)
disp(['Match found for row: ' mat2str(match) '.']);
else
disp(['Match found for rows: ' mat2str(match) '.']);
end
end
This will produce:
Match found for row: 7.
Match found for rows: [2 6].
Match found for rows: [1 4 5].
Match found for rows: [1 3].
In terms of efficiency, in my machine this algoritm is completing the detection of matches in about 2 milliseconds.

Count the frequency of matrix values including 0

I have a vector
A = [ 1 1 1 2 2 3 6 8 9 9 ]
I would like to write a loop that counts the frequencies of values in my vector within a range I choose, this would include values that have 0 frequencies
For example, if I chose the range of 1:9 my results would be
3 2 1 0 0 1 0 1 2
If I picked 1:11 the result would be
3 2 1 0 0 1 0 1 2 0 0
Is this possible? Also ideally I would have to do this for giant matrices and vectors, so the fasted way to calculate this would be appreciated.
Here's an alternative suggestion to histcounts, which appears to be ~8x faster on Matlab 2015b:
A = [ 1 1 1 2 2 3 6 8 9 9 ];
maxRange = 11;
N = accumarray(A(:), 1, [maxRange,1])';
N =
3 2 1 0 0 1 0 1 2 0 0
Comparing the speed:
K>> tic; for i = 1:100000, N1 = accumarray(A(:), 1, [maxRange,1])'; end; toc;
Elapsed time is 0.537597 seconds.
K>> tic; for i = 1:100000, N2 = histcounts(A,1:maxRange+1); end; toc;
Elapsed time is 4.333394 seconds.
K>> isequal(N1, N2)
ans =
1
As per the loop request, here's a looped version, which should not be too slow since the latest engine overhaul:
A = [ 1 1 1 2 2 3 6 8 9 9 ];
maxRange = 11; %// your range
output = zeros(1,maxRange); %// initialise output
for ii = 1:maxRange
tmp = A==ii; %// temporary storage
output(ii) = sum(tmp(:)); %// find the number of occurences
end
which would result in
output =
3 2 1 0 0 1 0 1 2 0 0
Faster and not-looping would be #beaker's suggestion to use histcounts:
[N,edges] = histcounts(A,1:maxRange+1);
N =
3 2 1 0 0 1 0 1 2 0
where the +1 makes sure the last entry is included as well.
Assuming the input A to be a sorted array and the range starts from 1 and goes until some value greater than or equal to the largest element in A, here's an approach using diff and find -
%// Inputs
A = [2 4 4 4 8 9 11 11 11 12]; %// Modified for variety
maxN = 13;
idx = [0 find(diff(A)>0) numel(A)]+1;
out = zeros(1,maxN); %// OR for better performance : out(maxN) = 0;
out(A(idx(1:end-1))) = diff(idx);
Output -
out =
0 1 0 3 0 0 0 1 1 0 3 1 0
This can be done very easily with bsxfun.
Let the data be
A = [ 1 1 1 2 2 3 6 8 9 9 ]; %// data
B = 1:9; %// possible values
Then
result = sum(bsxfun(#eq, A(:), B(:).'), 1);
gives
result =
3 2 1 0 0 1 0 1 2

Comparing Two Matrices in MATLAB which shows how much they are matched

Please assume A is a matrix of 4 x 4 which has:
A = 1 0 1 0
1 0 1 0
1 1 1 0
1 1 0 0
And B is a reference matrix (4 x 4) which is:
B = 1 0 1 0
1 0 1 0
1 0 1 0
1 1 1 0
Now, if A would be compared to B which is the reference matrix, by matching these two matrices, almost all of members are equal except A(4,3) and A(3,2). However, since B is the reference matrix and A is comparing to that, only differences of those members are matter which are 1 in B. In this particular example, A(4,3) is only matter, not A(3,2), Means:
>> C = B ~= A;
ans =
0 0 0 0
0 0 0 0
0 1 0 0
0 0 1 0
A(4,3) ~= B(4,3)
Finally, we are looking for a piece of code which can show how many percentage of ones in A are equal to their equivalent members at B. In this case the difference is:
(8 / 9) * 100 = 88.89 % are matched.
Please bear in mind that speed is also important here. Therefore, quicker solution are more appreciated. Thanks.
For getting only the different entries where there is a 1 in B, just add an & to it, so you'll only get these entries. To get the percentage, take the sum where A and B are 1. Then divide it by the sum of 1 in B (or the sum of 1in A -> see the note below).
A = [1 0 1 0;
1 0 1 0;
1 1 1 0;
1 1 0 0];
B = [1 0 1 0;
1 0 1 0;
1 0 1 0;
1 1 1 0];
C = (B ~= A) & B
p = sum(B(:) & A(:)) / sum(B(:)) * 100
This is the result:
C =
0 0 0 0
0 0 0 0
0 0 0 0
0 0 1 0
p =
88.8889
Edit / Note: In the OP's question it's not 100% clear if he wants the percentage in relation to the sum of ones in A or B. I assumed that it is a percentage of the reference-matrix, which is B. Therefore I divide by sum(B(:)). In case you need it in reference to the ones in A, just change the last line to:
p = sum(B(:) & A(:)) / sum(A(:)) * 100
If I got it right, what you want to know is where B == 1 and A == 0.
Try this:
>> C = B & ~A
C =
0 0 0 0
0 0 0 0
0 0 0 0
0 0 1 0
To get the percentage, you could try this:
>> 100 * sum(A(:) & B(:)) / sum(A(:))
ans =
88.8889
You can use matrix-multiplication, which must be pretty efficient as listed next.
To get the percentage value with respect to A -
percentage_wrtA = A(:).'*B(:)/sum(A(:)) * 100;
To get the percentage value with respect to B -
percentage_wrtB = A(:).'*B(:)/sum(B(:)) * 100;
Runtime tests
Here's some quick runtime tests to compare matrix-multiplication against summation of elements with (:) and ANDing -
>> M = 6000; %// Datasize
>> A = randi([0,1],M,M);
>> B = randi([0,1],M,M);
>> tic,sum(B(:) & A(:));toc
Elapsed time is 0.500149 seconds.
>> tic,A(:).'*B(:);toc
Elapsed time is 0.126881 seconds.
Try:
sum(sum(A & B))./sum(sum(A))
Output:
ans =
0.8889

Efficiently unpack a vector into binary matrix Octave

On Octave I'm trying to unpack a vector in the format:
y = [ 1
2
4
1
3 ]
I want to return a matrix of dimension ( rows(y) x max value(y) ), where for each row I have a 1 in the column of the original digits value, and a zero everywhere else, i.e. for the example above
y01 = [ 1 0 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0 ]
so far I have
y01 = zeros( m, num_labels );
for i = 1:m
for j = 1:num_labels
y01(i,j) = (y(i) == j);
end
end
which works, but is going get slow for bigger matrices, and seems inefficient because it is cycling through every single value even though the majority aren't changing.
I found this for R on another thread:
f3 <- function(vec) {
U <- sort(unique(vec))
M <- matrix(0, nrow = length(vec),
ncol = length(U),
dimnames = list(NULL, U))
M[cbind(seq_len(length(vec)), match(vec, U))] <- 1L
M
}
but I don't know R and I'm not sure if/how the solution ports to octave.
Thanks for any suggestions!
Use a sparse matrix (which also saves a lot of memory) which can be used in further calculations as usual:
y = [1; 2; 4; 1; 3]
y01 = sparse (1:rows (y), y, 1)
if you really want a full matrix then use "full":
full (y01)
ans =
1 0 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0
Sparse is a more efficient way to do this when the matrix is big.
If your dimension of the result is not very high, you can try this:
y = [1; 2; 4; 1; 3]
I = eye(max(y));
y01 = I(y,:)
The result is same as full(sparse(...)).
y01 =
1 0 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0
% Vector y to Matrix Y
Y = zeros(m, num_labels);
% Loop through each row
for i = 1:m
% Use the value of y as an index; set the value matching index to 1
Y(i,y(i)) = 1;
end
Another possibility is:
y = [1; 2; 4; 1; 3]
classes = unique(y)(:)
num_labels = length(classes)
y01=[1:num_labels] == y
With the following detailed printout:
y =
1
2
4
1
3
classes =
1
2
3
4
num_labels = 4
y01 =
1 0 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0

I want replace the values of 1 in an adjacency matrix with weights given in another smaller matrix

How can I replace the values of 1 in an adjacency matrix with weights given in another matrix?
For example:
adjacent_matrix = [1 0 0 1; 0 0 1 1; 1 0 1 0; 0 1 1 0 ]
weight_matrix = [ 2 4 6 2; 4 5 1 3]
The final matrix should look like this: [2 0 0 4; 0 0 6 2; 4 0 5 0; 0 1 3 0]
Code -
out = adjacent_matrix';
out(out==1) = reshape(weight_matrix',1,numel(weight_matrix))';
out = out';
Inputs 'adjacent_matrix' and 'weight_matrix' stay the same, as suggested by #chappjc.
accumarray solution:
>> [ii,jj] = find(adjacent_matrix.');
>> out = accumarray([ii jj],reshape(weight_matrix.',[],1)).'
out =
2 0 0 4
0 0 6 2
4 0 5 0
0 1 3 0
sparse solution:
[ii,jj] = find(adjacent_matrix.');
out = full(sparse(ii,jj,weight_matrix.')).'

Resources