Matlab: sorting a matrix in a unique way - algorithm

I have a problem with sorting some finance data based on firmnumbers. So given is a matrix that looks like:
[1 3 4 7;
1 2 7 8;
2 3 7 8;]
On Matlab i would like the matrix to be sorted as follows:
[1 0 3 4 7 0;
1 2 0 0 7 8;
0 2 3 0 7 8;]
So basically every column needs to consist of 1 type of number.
I have tried many things but i cant get the matrix sorted properly.

A = [1 3 4 7;
1 2 7 8;
2 3 7 8;]
%// Get a unique list of numbers in the order that you want them to appear as the new columns
U = unique(A(:))'
%'//For each column (of your output, same as columns of U), find which rows have that number. Do this by making A 3D so that bsxfun compares each element with each element
temp1 = bsxfun(#eq,permute(A,[1,3,2]),U)
%// Consolidate this into a boolean matrix with the right dimensions and 1 where you'll have a number in your final answer
temp2 = any(temp1,3)
%// Finally multiply each line with U
bsxfun(#times, temp2, U)
So you can do that all in one line but I broke it up to make it easier to understand. I suggest you run each line and look at the output to see how it works. It might seem complicated but it's worthwhile getting to understand bsxfun as it's a really useful function. The first use which also uses permute is a bit more tricky so I suggest you first make sure you understand that last line and then work backwards.

What you are asking can also be seen as an histogram
A = [1 3 4 7;
1 2 7 8;
2 3 7 8;]
uniquevalues = unique(A(:))
N = histc(A,uniquevalues' ,2) %//'
B = bsxfun(#times,N,uniquevalues') %//'
%// bsxfun can replace the following instructions:
%//(the instructions are equivalent only when each value appears only once per row )
%// B = repmat(uniquevalues', size(A,1),1)
%// B(N==0) = 0

Answer without assumptions - Simplified
I did not feel comfortable with my old answer that makes the assumption of everything being an integer and removed the possibility of duplicates, so I came up with a different solution based on #lib's suggestion of using a histogram and counting method.
The only case I can see this not working for is if a 0 is entered. you will end up with a column of all zeros, which one might interpret as all rows initially containing a zero, but that would be incorrect. you could uses nan instead of zeros in that case, but not sure what this data is being put into, and if it that processing would freak out.
EDITED
Includes sorting of secondary matrix, B, along with A.
A = [-1 3 4 7 9; 0 2 2 7 8.2; 2 3 5 9 8];
B = [5 4 3 2 1; 1 2 3 4 5; 10 9 8 7 6];
keys = unique(A);
[counts,bin] = histc(A,transpose(unique(A)),2);
A_sorted = cell(size(A,1),1);
for ii = 1:size(A,1)
for jj = 1:numel(keys)
temp = zeros(1,max(counts(:,jj)));
temp(1:counts(ii,jj)) = keys(jj);
A_sorted{ii} = [A_sorted{ii},temp];
end
end
A_sorted = cell2mat(A_sorted);
B_sorted = nan(size(A_sorted));
for ii = 1:size(bin,1)
for jj = 1:size(bin,2)
idx = bin(ii,jj);
while ~isnan(B_sorted(ii,idx))
idx = idx+1;
end
B_sorted(ii,idx) = B(ii,jj);
end
end
B_sorted(isnan(B_sorted)) = 0

You can create at the beginning a matrix with 9 columns , and treat the values in your original matrix as column indexes.
A = [1 3 4 7;
1 2 7 8;
2 3 7 8;]
B = zeros(3,max(A(:)))
for i = 1:size(A,1)
B(i,A(i,:)) = A(i,:)
end
B(:,~any(B,1)) = []

Related

remove non matching elements from matrix

I am trying to compare two matrices A and B. If elements in the first two columns of A match those in B, I want to delete all non matching rows from A. The third column in B should not factor into the comparison.
A = [1 2 3 B = [1 2 8
3 4 5 3 4 5]
6 7 8]
Desired result:
A = [1 2 3
3 4 5]
So far I only found ways to remove duplicate entries, which is the exact opposite of what I want. How can I do this?
You can efficiently use ismember for this task:
% Input matrices
A = [1 2 3; 3 4 5; 7 8 9];
B = [1 2 8; 3 4 5];
A1 = A(:,1:2); % Extract first two columns for both matrices
B1 = B(:,1:2);
[~,ii] = ismember(A1,B1,'rows'); % Returns which rows in A1 are also in B1
ii = ii(ii>0); % Where ii is zero, it's a non-matching row
A(ii,:) % Index to keep only matching rows
All of this can be written more compactly, but I wanted to show the step-by-step process first:
[~,ii] = ismember(A(:,1:2),B(:,1:2),'rows');
A(ii(ii>0),:)
A = [1 2 3;3 4 5;7 8 9];
B = [1 2 8; 3 4 5];
tmp = min([size(A,1) size(B,1)]); % get size to loop over
k = false(tmp,1); % storage counter
for ii = 1:tmp
if all(A(ii,1:2)==B(ii,1:2)) % if the first two columns match
k(ii)=true; % store
end
end
C = A(k,:) % extract requested rows

Loop over part of Matrix

I have a Matrix A. I want to iterate over the inner part of the matrix (B), while also working with the rows and columns that are not part of B.
A = [1 4 5 6 7 1; B = [2 2 2 2;
8 2 2 2 2 1; 2 3 3 2;
9 2 3 3 2 1; 2 8 2 2];
0 2 8 2 2 1;
1 1 1 1 1 1];
I know it is possible to select the part of A like this:
[rows,columns] = size(A);
B = A([2:1:rows-1],[2:1:columns-1]);
for i = 1:(rows*columns)
%do loop stuff
endfor
This however wont work because I also need the outer rows and columns for my calculations. How can I achieve a loop without altering A?
So, why do not use two indexes for the inner matrix?
%....
for i=2:rows-1
for j=2:cols-1
% here, A(i,j) are the B elements, but you
% can still access to A(i-1, j+1) if you want.
end
end
%....

Vectorizing range setting - MATLAB

I have got the following code. I need to rewrite it without looping. How should I do it?
l1 = [1 2 3 2 1];
l2 = [3 4 4 5 4];
A = zeros(5,5);
for i=1:5
A(i, l1(i):l2(i)) = 1;
end
A
You can use bsxfun -
I = 1:5 % Array corresponding to iterator : "for i=1:5"
out = bsxfun(#le,l1(:),I) & bsxfun(#ge,l2(:),I)
If you need a double datatype array, convert to double, like so -
out_double = double(out)
Add one more into the mix then! This one simply uses a cumsum to generate all the 1s - so it does not use the : operator at all - It's also fully parallel :D
l1 = [1 2 3 2 1];
l2 = [3 4 4 5 4];
A = zeros(5,5);
L1 = l1+(1:5)*5-5; %Convert to matrix location index
L2 = l2+(1:5)*5-5; %Convert to matrix location index
A(L1) = 1; %Place 1 in that location
A(L2) = 1; %Place 1 in that location
B = cumsum(A,1) ==1 ; %So fast
Answer = (A|B)'; %Lightning fast
Answer =
1 1 1 0 0
0 1 1 1 0
0 0 1 1 0
0 1 1 1 1
1 1 1 1 0
Here is how you could build the matrix without using a loop.
% Our starting values
l1 = [1 2 3 2 1];
l2 = [3 4 4 5 4];
% Coordinate grid of the right size (we don't need r, but I keep it there for illustration)
[r,c] = ndgrid(1:5);
% Build the logical index based on our lower and upper bounds on the column indices
idx_l1=bsxfun(#ge,c,l1');
idx_l2=bsxfun(#le,c,l2');
% The result
A = zeros(size(idx_l1));
A(idx_l1&idx_l2)=1
You may need something like [r,c] = ndgrid(1:numel(l1),1:10).
Also if your matrix size is truly huge and memory becomes an issue, you may want to stick to a loop anyway, but for 'normal size' this could be faster.
There should be some skepticism in every vectorization. If you measure the time actually your loop is faster than the given answers, mostly because you only perform in place write.
Here is another one that would probably get faster for larger sizes but I haven't tested:
tic
myind = [];
for i = 1:5
myind = [myind (5*(i-1))+[l1(i):l2(i)]];
end
A(myind) = 1;
toc
gives the transposed A because of the linear indexing order.

Efficient way of finding rows in which A>B

Suppose M is a matrix where each row represents a randomized sequence of a pool of N objects, e.g.,
1 2 3 4
3 4 1 2
2 1 3 4
How can I efficiently find all the rows in which a number A comes before a number B?
e.g., A=1 and B=2; I want to retrieve the first and the second rows (in which 1 comes before 2)
There you go:
[iA jA] = find(M.'==A);
[iB jB] = find(M.'==B);
sol = find(iA<iB)
Note that this works because, according to the problem specification, every number is guaranteed to appear once in each row.
To find rows of M with a given prefix (as requested in the comments): let prefix be a vector with the sought prefix (for example, prefix = [1 2]):
find(all(bsxfun(#eq, M(:,1:numel(prefix)).', prefix(:))))
something like the following code should work. It will look to see if A comes before B in each row.
temp = [1 2 3 4;
3 4 1 2;
2 1 3 4];
A = 1;
B = 2;
orderMatch = zeros(1,size(temp,1));
for i = 1:size(temp,1)
match1= temp(i,:) == A;
match2= temp(i,:) == B;
aIndex = find(match1,1);
bIndex = find(match2,1);
if aIndex < bIndex
orderMatch(i) = 1;
end
end
solution = find(orderMatch);
This will result in [1,1,0] because the first two rows have 1 coming before 2, but the third row does not.
UPDATE
added find function on ordermatch to give row indices as suggested by Luis

Select rolling rows without a loop

I have a question.
Suppose I have matrix
A =
1 2 3
4 5 6
7 8 9
10 11 12
I need to select n rolling rows from A and transpose elements in new matrix C in rows.
The loop that I use is:
n = 3; %for instance every 3 rows of A
B = [];
for i = 1:n
Btemp = transpose(A(i:i+size(A,1)-n,:));
B = [B;Btemp];
end
C=B';
and that produces matrix C which is:
C =
1 2 3 4 5 6 7 8 9
4 5 6 7 8 9 10 11 12
This is what i want too do, but can I do the same job without the loop?
It takes 4 minutes to calculate for an A matrix of 3280x35 size.
I think you can make it work very fast if you make initialization. And one other trick is to take the transpose first, since MATLAB uses columns as first index instead of rows.
tic
A = reshape(1:3280*35,[3280 35])'; %# Generate an example A
[nRows, nCols] = size(A);
n = 3; %for instance every 3 rows of A
B = zeros(nRows-n+1,nCols*n);
At = A';
for i = 1:size(B,1)
B(i,:) = reshape(At(:,i:i+n-1), [1 nCols*n]);
end
toc
The elapsed time is
Elapsed time is 0.004059 seconds.
I would not use reshape in the loop, but transform A first to one single row (actually a column will also work, doesn't matter)
Ar = reshape(A',1,[]); % the ' is important here!
then the selecting of elements out of Ar is really simple:
[nrows, ncols] = size(A);
new_ncols = ncols*n;
B = zeros(nrows-(n-1),new_ncols);
for ii = 1:nrows-(n-1)
B(ii,:) = Ar(n*(ii-1)+(1:new_ncols));
end
Still, the preallocation of B, gives you the largest improvement: more info at http://www.mathworks.nl/help/techdoc/matlab_prog/f8-784135.html
I don't have Matlab on me right now but I think you can do this without loops like this:
reshape(permute(cat(A(1:end-1,:),A(2:end,:),3),[3,2,1]), [2, size(A,2)*(size(A,1) - 1)]);
and in fact won't this do what you want?:
A1 = A(1:end-1,:);
A2 = A(2:end,:);
answer = [A1(:) ; A2(:)]

Resources