Separate matrix based on cumulative sum of column criteria (MATLAB) - algorithm

I have a dataset something like the following:
a = [1 11; 2 16; 3 9; 4 13; 5 8; 6 14];
I am looking to separate into several Matrices by the following criteria:
Starting with the first column, construct sets where the sum of the second row is in the range 19-to-25.
So the output would be something like this:
a1 = [1 11; 3 9]
a2 = [2 16; 5 8]
a3 = [6 14]
Where a1=20, a2=24, and a3 does not meet criteria but is the last.
Could this be contained and output from a FOR loop?
Edit: Criteria of how to combine: I am looking to start at the beginning (first row) and add to the next row. If the sum is greater than 25, that row would be skipped till the next iteration. Each iteration should output a seperate matrix (a1, a2, a3).

I think I have some useful pseudo code for you.
For one I would not modify the matrix by removing columns. Rather I would keep a list of used columns.
I would use the summing like this:
used = false(1,num lines)
for i=1:num lines
if used(i) continue
curr_use = i
for j=i+1
if used(j) continue
if cant_add(j) continue
Concat j to curr_use
end
used(curr_use) = true
end

Related

remove non matching elements from matrix

I am trying to compare two matrices A and B. If elements in the first two columns of A match those in B, I want to delete all non matching rows from A. The third column in B should not factor into the comparison.
A = [1 2 3 B = [1 2 8
3 4 5 3 4 5]
6 7 8]
Desired result:
A = [1 2 3
3 4 5]
So far I only found ways to remove duplicate entries, which is the exact opposite of what I want. How can I do this?
You can efficiently use ismember for this task:
% Input matrices
A = [1 2 3; 3 4 5; 7 8 9];
B = [1 2 8; 3 4 5];
A1 = A(:,1:2); % Extract first two columns for both matrices
B1 = B(:,1:2);
[~,ii] = ismember(A1,B1,'rows'); % Returns which rows in A1 are also in B1
ii = ii(ii>0); % Where ii is zero, it's a non-matching row
A(ii,:) % Index to keep only matching rows
All of this can be written more compactly, but I wanted to show the step-by-step process first:
[~,ii] = ismember(A(:,1:2),B(:,1:2),'rows');
A(ii(ii>0),:)
A = [1 2 3;3 4 5;7 8 9];
B = [1 2 8; 3 4 5];
tmp = min([size(A,1) size(B,1)]); % get size to loop over
k = false(tmp,1); % storage counter
for ii = 1:tmp
if all(A(ii,1:2)==B(ii,1:2)) % if the first two columns match
k(ii)=true; % store
end
end
C = A(k,:) % extract requested rows

Combining every column-combination of an arbitrary number of matrices

I'm trying to figure out a way to do a certain "reduction"
I have a varying number of matrices of varying size, e.g
1 2 2 2 5 6...70 70
3 7 8 9 7 7...88 89
1 3 4
2 7 7
3 8 8
9 9 9
.
.
44 49 49 49 49 49 49
50 50 50 50 50 50 50
87 87 88 89 90 91 92
What I need to do (and I hope that I'm explaining this clearly enough) is to combine any possible
combination of columns from these matrices, this means that one column might be
1
3
1
2
3
9
.
.
.
44
50
87
Which would reduce down to
1
2
3
9
.
.
.
44
50
87
The reason why I'm doing this is because I need to find the smallest unique combined column
What am I trying to accomplish
For those interested, I'm trying to find the smallest set of gene knockouts
to disable reactions. Here, every matrix represents a reactions, and the columns represent the indices of
the genes that would disable that reaction.
The method may be as brute force as needed, as these matrices rarely become overwhelmingly large,
and the reaction combinations won't be long either
The problem
I can't (as far as I know) create a for loop with an arbitrary number of iterators, and the number of
matrices (reactions to disable) is arbitrary.
Clarification
If I have matrices A,B,C with columns a1,a2...b1,b2...c1...cn what I need
are the columns [a1 b1 c1], [a1, b1, c2], ..., [a1 b1 cn] ... [an bn cn]
Solution
Courtesy of Michael Ohlrogge below.
Extension of his answer, for completeness
His solution ends with
MyProd = product(Array_of_ColGroups...)
Which gets the job done
And picking up where he left off
collection = collect(MyProd); #MyProd is an iterator
merged_cols = Array[] # the rows of 'collection' are arrays of arrays
for (i,v) in enumerate(collection)
# I apologize for this line
push!(merged_cols, sort!(unique(vcat(v...))))
end
# find all lengths so I can find which is the minimum
lengths = map(x -> length(x), merged_cols);
loc_of_shortest = find(broadcast((x,y) -> length(x) == y, merged_cols,minimum(lengths)))
best_gene_combos = merged_cols[loc_of_shortest]
tl;dr - complete solution:
# example matrices
a = rand(1:50, 8,4); b = rand(1:50, 10,5); c = rand(1:50, 12,4);
Matrices = [a,b,c];
toJagged(x) = [x[:,i] for i in 1:size(x,2)];
JaggedMatrices = [toJagged(x) for x in Matrices];
Combined = [unique(i) for i in JaggedMatrices[1]];
for n in 2:length(JaggedMatrices)
Combined = [unique([i;j]) for i in Combined, j in JaggedMatrices[n]];
end
Lengths = [length(s) for s in Combined];
Minima = findin(Lengths, min(Lengths...));
SubscriptsArray = ind2sub(size(Lengths), Minima);
ComboTuples = [((i[j] for i in SubscriptsArray)...) for j in 1:length(Minima)]
Explanation:
Assume you have matrix a and b
a = rand(1:50, 8,4);
b = rand(1:50, 10,5);
Express them as a jagged array, columns first
A = [a[:,i] for i in 1:size(a,2)];
B = [b[:,i] for i in 1:size(b,2)];
Concatenate rows for all column combinations using a list comprehension; remove duplicates on the spot:
Combined = [unique([i;j]) for i in A, j in B];
You now have all column combinations of a and b, as concatenated rows with duplicates removed. Find the lengths easily:
Lengths = [length(s) for s in Combined];
If you have more than two matrices, perform this process iteratively in a for loop, e.g. by using the Combined matrix in place of a. e.g. if you have a matrix c:
c = rand(1:50, 12,4);
C = [c[:,i] for i in 1:size(c,2)];
Combined = [unique([i;j]) for i in Combined, j in C];
Once you have the Lengths array as a multidimensional array (as many dimensions as input matrices, where the size of each dimension is the number of columns in each matrix), you can find the column combinations that correspond to the lowest value (there may well be more than one combination), via a simple ind2sub operation:
Minima = findin(Lengths, min(Lengths...));
SubscriptsArray = ind2sub(size(Lengths), Minima)
(e.g. for a randomized run with 3 input matrices, I happened to get 4 results with the minimal length of 19. The result of ind2sub was ([4,4,3,4,4],[3,3,4,5,3],[1,3,3,3,4])
You can convert this further to a list of "Column Combination" tuples with a (somewhat ugly) list comprehension:
ComboTuples = [((i[j] for i in SubscriptsArray)...) for j in 1:length(Minima)]
# results in:
# 5-element Array{Tuple{Int64,Int64,Int64},1}:
# (4,3,1)
# (4,3,3)
# (3,4,3)
# (4,5,3)
# (4,3,4)
Ok, let's see if I understand this. You've got n matrices and want all combinations with one column from each of the n matrices? If so, how about the product() (for Cartesian product) from the Iterators package?
using Iterators
n = 3
Array_of_Arrays = [rand(3,3) for idx = 1:n] ## arbitrary representation of your set of arrays.
Array_of_ColGroups = Array(Array, length(Array_of_Arrays))
for (idx, MyArray) in enumerate(Array_of_Arrays)
Array_of_ColGroups[idx] = [MyArray[:,jdx] for jdx in 1:size(MyArray,2)]
end
MyProd = product(Array_of_ColGroups...)
This will create an iterator object which you can then loop over to consider the specific combinations of columns.

Matlab: sorting a matrix in a unique way

I have a problem with sorting some finance data based on firmnumbers. So given is a matrix that looks like:
[1 3 4 7;
1 2 7 8;
2 3 7 8;]
On Matlab i would like the matrix to be sorted as follows:
[1 0 3 4 7 0;
1 2 0 0 7 8;
0 2 3 0 7 8;]
So basically every column needs to consist of 1 type of number.
I have tried many things but i cant get the matrix sorted properly.
A = [1 3 4 7;
1 2 7 8;
2 3 7 8;]
%// Get a unique list of numbers in the order that you want them to appear as the new columns
U = unique(A(:))'
%'//For each column (of your output, same as columns of U), find which rows have that number. Do this by making A 3D so that bsxfun compares each element with each element
temp1 = bsxfun(#eq,permute(A,[1,3,2]),U)
%// Consolidate this into a boolean matrix with the right dimensions and 1 where you'll have a number in your final answer
temp2 = any(temp1,3)
%// Finally multiply each line with U
bsxfun(#times, temp2, U)
So you can do that all in one line but I broke it up to make it easier to understand. I suggest you run each line and look at the output to see how it works. It might seem complicated but it's worthwhile getting to understand bsxfun as it's a really useful function. The first use which also uses permute is a bit more tricky so I suggest you first make sure you understand that last line and then work backwards.
What you are asking can also be seen as an histogram
A = [1 3 4 7;
1 2 7 8;
2 3 7 8;]
uniquevalues = unique(A(:))
N = histc(A,uniquevalues' ,2) %//'
B = bsxfun(#times,N,uniquevalues') %//'
%// bsxfun can replace the following instructions:
%//(the instructions are equivalent only when each value appears only once per row )
%// B = repmat(uniquevalues', size(A,1),1)
%// B(N==0) = 0
Answer without assumptions - Simplified
I did not feel comfortable with my old answer that makes the assumption of everything being an integer and removed the possibility of duplicates, so I came up with a different solution based on #lib's suggestion of using a histogram and counting method.
The only case I can see this not working for is if a 0 is entered. you will end up with a column of all zeros, which one might interpret as all rows initially containing a zero, but that would be incorrect. you could uses nan instead of zeros in that case, but not sure what this data is being put into, and if it that processing would freak out.
EDITED
Includes sorting of secondary matrix, B, along with A.
A = [-1 3 4 7 9; 0 2 2 7 8.2; 2 3 5 9 8];
B = [5 4 3 2 1; 1 2 3 4 5; 10 9 8 7 6];
keys = unique(A);
[counts,bin] = histc(A,transpose(unique(A)),2);
A_sorted = cell(size(A,1),1);
for ii = 1:size(A,1)
for jj = 1:numel(keys)
temp = zeros(1,max(counts(:,jj)));
temp(1:counts(ii,jj)) = keys(jj);
A_sorted{ii} = [A_sorted{ii},temp];
end
end
A_sorted = cell2mat(A_sorted);
B_sorted = nan(size(A_sorted));
for ii = 1:size(bin,1)
for jj = 1:size(bin,2)
idx = bin(ii,jj);
while ~isnan(B_sorted(ii,idx))
idx = idx+1;
end
B_sorted(ii,idx) = B(ii,jj);
end
end
B_sorted(isnan(B_sorted)) = 0
You can create at the beginning a matrix with 9 columns , and treat the values in your original matrix as column indexes.
A = [1 3 4 7;
1 2 7 8;
2 3 7 8;]
B = zeros(3,max(A(:)))
for i = 1:size(A,1)
B(i,A(i,:)) = A(i,:)
end
B(:,~any(B,1)) = []

How to find rows of a matrix where with the same ordering of unique and duplicated elements, but not necessarily the same value

I wasn't quite sure how to phrase this question. Suppose I have the following matrix:
A=[1 0 0;
0 0 1;
0 1 0;
0 1 1;
0 1 2;
3 4 4]
Given row 1, I want to find all rows where:
the elements that are unique in row 1, are unique in the same column in the other row, but don't necessarily have the same value
and if there are elements with duplicate values in row 1, there are be duplicate values in the same columns in the other row, but not necessarily the same value
For example, in matrix A, if I was given row 1 I would like to find rows 4 and 6.
Can't test this right now, but I think the following will work:
A=[1 0 0;
0 0 1;
0 1 0;
0 1 1;
0 1 2;
3 4 4];
B = zeros(size(A));
for ii = 1:size(A,1)
r = A(ii,:);
B(ii,1) = 1;
for jj = 2:size(A,2)
c = find(r(1:jj-1)==r(jj));
if numel(c) > 0
B(ii,jj) = B(ii,c);
else
B(ii,jj) = B(ii,jj-1)+1;
end
end
end
At the end of this we have an array B in which "like indices have like values" and the rows you are looking for are now identical.
Now you can do
[C, ia, ic] = unique(B,'rows','stable');
disp('The answer you want is ');
disp(ia);
And the answer you want will be in the variable ia. See http://www.mathworks.com/help/matlab/ref/unique.html#btb0_8v . I am not 100% sure that you can use the rows and stable parameters in the same call - but I think you can.
Try it and see if it works - and ask questions if you need more info.
Here is a simple method
B = NaN(size(A)); %//Preallocation
for row = 1:size(A,1)
[~,~,B(row,:)] = unique(A(row,:), 'stable');
end
find(ismember(B(2:end,:), B(1,:), 'rows')) + 1
A simple solution without loops:
row = 1; %// row used as reference
equal = bsxfun(#eq, A, permute(A, [1 3 2]));
equal = reshape(equal,size(A,1),[]); %// linearized signature of each row
result = find(ismember(equal,equal(row,:),'rows')); %// find matching rows
result = setdiff(result,row); %// remove reference row, if needed
The key is to compute a "signature" of each row, meaning the equality relationship between all combinations of its elements. This is done with bsxfun. Then, rows with the same signature can be easily found with ismember.
Thanks, Floris. The unique call didn't work correctly and I think you meant to use matrix B in it, too. Here's what I managed to do, although it's not as clean:
A=[1 0 0 1;
0 0 1 3;
0 1 0 1;
0 1 1 0;
0 1 2 2;
3 4 4 3;
5 9 9 4];
B = zeros(size(A));
for ii = 1:size(A,1)
r = A(ii,:);
B(ii,1) = 1;
for jj = 2:size(A,2)
c = find(r(1:jj-1)==r(jj));
if numel(c) > 0
B(ii,jj) = B(ii,c);
else
B(ii,jj) = max(B(ii,:))+1; % need max to generalize to more columns
end
end
end
match = zeros(size(A,1)-1,size(A,2));
for i=2:size(A,1)
for j=1:size(A,2)
if B(i,j) == B(1,j)
match(i-1,j)=1;
end
end
end
index=find(sum(match,2)==size(A,2));
In the nested loops I check if the elements in the rows below it match up in the correct column. If there is a perfect match the row should sum to the row dimension.
When I generalize this for the specific problem I'm working on the matrix fills with a certain set of base size(A,2) numbers. So for base 4 and greater, a max statement is needed in the else statement for no matches. Otherwise, for certain number combinations in a given row, a duplication of an element may occur when there is none.
A overview would be to reduce each row into a "signature" counting element repeats, i.e., your row 1 becomes 1, 2. Then check for equal signatures.

Efficient way of finding rows in which A>B

Suppose M is a matrix where each row represents a randomized sequence of a pool of N objects, e.g.,
1 2 3 4
3 4 1 2
2 1 3 4
How can I efficiently find all the rows in which a number A comes before a number B?
e.g., A=1 and B=2; I want to retrieve the first and the second rows (in which 1 comes before 2)
There you go:
[iA jA] = find(M.'==A);
[iB jB] = find(M.'==B);
sol = find(iA<iB)
Note that this works because, according to the problem specification, every number is guaranteed to appear once in each row.
To find rows of M with a given prefix (as requested in the comments): let prefix be a vector with the sought prefix (for example, prefix = [1 2]):
find(all(bsxfun(#eq, M(:,1:numel(prefix)).', prefix(:))))
something like the following code should work. It will look to see if A comes before B in each row.
temp = [1 2 3 4;
3 4 1 2;
2 1 3 4];
A = 1;
B = 2;
orderMatch = zeros(1,size(temp,1));
for i = 1:size(temp,1)
match1= temp(i,:) == A;
match2= temp(i,:) == B;
aIndex = find(match1,1);
bIndex = find(match2,1);
if aIndex < bIndex
orderMatch(i) = 1;
end
end
solution = find(orderMatch);
This will result in [1,1,0] because the first two rows have 1 coming before 2, but the third row does not.
UPDATE
added find function on ordermatch to give row indices as suggested by Luis

Resources