Efficient way of finding rows in which A>B - performance

Suppose M is a matrix where each row represents a randomized sequence of a pool of N objects, e.g.,
1 2 3 4
3 4 1 2
2 1 3 4
How can I efficiently find all the rows in which a number A comes before a number B?
e.g., A=1 and B=2; I want to retrieve the first and the second rows (in which 1 comes before 2)

There you go:
[iA jA] = find(M.'==A);
[iB jB] = find(M.'==B);
sol = find(iA<iB)
Note that this works because, according to the problem specification, every number is guaranteed to appear once in each row.
To find rows of M with a given prefix (as requested in the comments): let prefix be a vector with the sought prefix (for example, prefix = [1 2]):
find(all(bsxfun(#eq, M(:,1:numel(prefix)).', prefix(:))))

something like the following code should work. It will look to see if A comes before B in each row.
temp = [1 2 3 4;
3 4 1 2;
2 1 3 4];
A = 1;
B = 2;
orderMatch = zeros(1,size(temp,1));
for i = 1:size(temp,1)
match1= temp(i,:) == A;
match2= temp(i,:) == B;
aIndex = find(match1,1);
bIndex = find(match2,1);
if aIndex < bIndex
orderMatch(i) = 1;
end
end
solution = find(orderMatch);
This will result in [1,1,0] because the first two rows have 1 coming before 2, but the third row does not.
UPDATE
added find function on ordermatch to give row indices as suggested by Luis

Related

Counting Sort - Why go in reverse order during the insertion?

I was looking at the code for Counting Sort on GeeksForGeeks and during the final stage of the algorithm where the elements from the original array are inserted into their final locations in the sorted array (the second-to-last for loop), the input array is traversed in reverse order.
I can't seem to understand why you can't just go from the beginning of the input array to the end, like so :
for i in range(len(arr)):
output_arr[count_arr[arr[i] - min_element] - 1] = arr[i]
count_arr[arr[i] - min_element] -= 1
Is there some subtle reason for going in reverse order that I'm missing? Apologies if this is a very obvious question. I saw Counting Sort implemented in the same style here as well.
Any comments would be helpful, thank you!
Stability. With your way, the order of equal-valued elements gets reversed instead of preserved. Going over the input backwards cancels out the backwards copying (that -= 1 thing).
To process an array in forward order, the count / index array either needs to be one element larger so that the starting index is 0 or two local variables can be used. Example for integer array:
def countSort(arr):
output = [0 for i in range(len(arr))]
count = [0 for i in range(257)] # change
for i in arr:
count[i+1] += 1 # change
for i in range(256):
count[i+1] += count[i] # change
for i in range(len(arr)):
output[count[arr[i]]] = arr[i] # change
count[arr[i]] += 1 # change
return output
arr = [4,3,0,1,3,7,0,2,6,3,5]
ans = countSort(arr)
print(ans)
or using two variables, s to hold the running sum, c to hold the current count:
def countSort(arr):
output = [0 for i in range(len(arr))]
count = [0 for i in range(256)]
for i in arr:
count[i] += 1
s = 0
for i in range(256):
c = count[i]
count[i] = s
s = s + c
for i in range(len(arr)):
output[count[arr[i]]] = arr[i]
count[arr[i]] += 1
return output
arr = [4,3,0,1,3,7,0,2,6,3,5]
ans = countSort(arr)
print(ans)
Here We are Considering Stable Sort --> which is actually considering the Elements position by position.
For eg if we have array like
arr--> 5 ,8 ,3, 1, 1, 2, 6
0 1 2 3 4 5 6 7 8
count-> 0 2 1 1 0 1 1 0 1
Now we take cummulative sum of all frequencies
0 1 2 3 4 5 6 7 8
count-> 0 2 3 4 4 5 6 6 7
After Traversing the Original array , we prefer from last Since
we want to add Elements on their proper position so when we subtract the index , the Element will be added to lateral position.
But if we start traversing from beginning , then there will be no meaning for taking the cummulative sum since we are not adding according to the Elements placed. We are adding hap -hazardly which can be done even if we not take their cummulative sum.

All possible N choose K WITHOUT recusion

I'm trying to create a function that is able to go through a row vector and output the possible combinations of an n choose k without recursion.
For example: 3 choose 2 on [a,b,c] outputs [a,b; a,c; b,c]
I found this: How to loop through all the combinations of e.g. 48 choose 5 which shows how to do it for a fixed n choose k and this: https://codereview.stackexchange.com/questions/7001/generating-all-combinations-of-an-array which shows how to get all possible combinations. Using the latter code, I managed to make a very simple and inefficient function in matlab which returned the result:
function [ combi ] = NCK(x,k)
%x - row vector of inputs
%k - number of elements in the combinations
combi = [];
letLen = 2^length(x);
for i = 0:letLen-1
temp=[0];
a=1;
for j=0:length(x)-1
if (bitand(i,2^j))
temp(k) = x(j+1);
a=a+1;
end
end
if (nnz(temp) == k)
combi=[combi; derp];
end
end
combi = sortrows(combi);
end
This works well for very small vectors, but I need this to be able to work with vectors of at least 50 in length. I've found many examples of how to do this recursively, but is there an efficient way to do this without recursion and still be able to do variable sized vectors and ks?
Here's a simple function that will take a permutation of k ones and n-k zeros and return the next combination of nchoosek. It's completely independent of the values of n and k, taking the values directly from the input array.
function [nextc] = nextComb(oldc)
nextc = [];
o = find(oldc, 1); %// find the first one
z = find(~oldc(o+1:end), 1) + o; %// find the first zero *after* the first one
if length(z) > 0
nextc = oldc;
nextc(1:z-1) = 0;
nextc(z) = 1; %// make the first zero a one
nextc(1:nnz(oldc(1:z-2))) = 1; %// move previous ones to the beginning
else
nextc = zeros(size(oldc));
nextc(1:nnz(oldc)) = 1; %// start over
end
end
(Note that the else clause is only necessary if you want the combinations to wrap around from the last combination to the first.)
If you call this function with, for example:
A = [1 1 1 1 1 0 1 0 0 1 1]
nextCombination = nextComb(A)
the output will be:
A =
1 1 1 1 1 0 1 0 0 1 1
nextCombination =
1 1 1 1 0 1 1 0 0 1 1
You can then use this as a mask into your alphabet (or whatever elements you want combinations of).
C = ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k']
C(find(nextCombination))
ans = abcdegjk
The first combination in this ordering is
1 1 1 1 1 1 1 1 0 0 0
and the last is
0 0 0 1 1 1 1 1 1 1 1
To generate the first combination programatically,
n = 11; k = 8;
nextCombination = zeros(1,n);
nextCombination(1:k) = 1;
Now you can iterate through the combinations (or however many you're willing to wait for):
for c = 2:nchoosek(n,k) %// start from 2; we already have 1
nextCombination = nextComb(A);
%// do something with the combination...
end
For your example above:
nextCombination = [1 1 0];
C(find(nextCombination))
for c = 2:nchoosek(3,2)
nextCombination = nextComb(nextCombination);
C(find(nextCombination))
end
ans = ab
ans = ac
ans = bc
Note: I've updated the code; I had forgotten to include the line to move all of the 1's that occur prior to the swapped digits to the beginning of the array. The current code (in addition to being corrected above) is on ideone here. Output for 4 choose 2 is:
allCombs =
1 2
1 3
2 3
1 4
2 4
3 4

Matlab: sorting a matrix in a unique way

I have a problem with sorting some finance data based on firmnumbers. So given is a matrix that looks like:
[1 3 4 7;
1 2 7 8;
2 3 7 8;]
On Matlab i would like the matrix to be sorted as follows:
[1 0 3 4 7 0;
1 2 0 0 7 8;
0 2 3 0 7 8;]
So basically every column needs to consist of 1 type of number.
I have tried many things but i cant get the matrix sorted properly.
A = [1 3 4 7;
1 2 7 8;
2 3 7 8;]
%// Get a unique list of numbers in the order that you want them to appear as the new columns
U = unique(A(:))'
%'//For each column (of your output, same as columns of U), find which rows have that number. Do this by making A 3D so that bsxfun compares each element with each element
temp1 = bsxfun(#eq,permute(A,[1,3,2]),U)
%// Consolidate this into a boolean matrix with the right dimensions and 1 where you'll have a number in your final answer
temp2 = any(temp1,3)
%// Finally multiply each line with U
bsxfun(#times, temp2, U)
So you can do that all in one line but I broke it up to make it easier to understand. I suggest you run each line and look at the output to see how it works. It might seem complicated but it's worthwhile getting to understand bsxfun as it's a really useful function. The first use which also uses permute is a bit more tricky so I suggest you first make sure you understand that last line and then work backwards.
What you are asking can also be seen as an histogram
A = [1 3 4 7;
1 2 7 8;
2 3 7 8;]
uniquevalues = unique(A(:))
N = histc(A,uniquevalues' ,2) %//'
B = bsxfun(#times,N,uniquevalues') %//'
%// bsxfun can replace the following instructions:
%//(the instructions are equivalent only when each value appears only once per row )
%// B = repmat(uniquevalues', size(A,1),1)
%// B(N==0) = 0
Answer without assumptions - Simplified
I did not feel comfortable with my old answer that makes the assumption of everything being an integer and removed the possibility of duplicates, so I came up with a different solution based on #lib's suggestion of using a histogram and counting method.
The only case I can see this not working for is if a 0 is entered. you will end up with a column of all zeros, which one might interpret as all rows initially containing a zero, but that would be incorrect. you could uses nan instead of zeros in that case, but not sure what this data is being put into, and if it that processing would freak out.
EDITED
Includes sorting of secondary matrix, B, along with A.
A = [-1 3 4 7 9; 0 2 2 7 8.2; 2 3 5 9 8];
B = [5 4 3 2 1; 1 2 3 4 5; 10 9 8 7 6];
keys = unique(A);
[counts,bin] = histc(A,transpose(unique(A)),2);
A_sorted = cell(size(A,1),1);
for ii = 1:size(A,1)
for jj = 1:numel(keys)
temp = zeros(1,max(counts(:,jj)));
temp(1:counts(ii,jj)) = keys(jj);
A_sorted{ii} = [A_sorted{ii},temp];
end
end
A_sorted = cell2mat(A_sorted);
B_sorted = nan(size(A_sorted));
for ii = 1:size(bin,1)
for jj = 1:size(bin,2)
idx = bin(ii,jj);
while ~isnan(B_sorted(ii,idx))
idx = idx+1;
end
B_sorted(ii,idx) = B(ii,jj);
end
end
B_sorted(isnan(B_sorted)) = 0
You can create at the beginning a matrix with 9 columns , and treat the values in your original matrix as column indexes.
A = [1 3 4 7;
1 2 7 8;
2 3 7 8;]
B = zeros(3,max(A(:)))
for i = 1:size(A,1)
B(i,A(i,:)) = A(i,:)
end
B(:,~any(B,1)) = []

How to find rows of a matrix where with the same ordering of unique and duplicated elements, but not necessarily the same value

I wasn't quite sure how to phrase this question. Suppose I have the following matrix:
A=[1 0 0;
0 0 1;
0 1 0;
0 1 1;
0 1 2;
3 4 4]
Given row 1, I want to find all rows where:
the elements that are unique in row 1, are unique in the same column in the other row, but don't necessarily have the same value
and if there are elements with duplicate values in row 1, there are be duplicate values in the same columns in the other row, but not necessarily the same value
For example, in matrix A, if I was given row 1 I would like to find rows 4 and 6.
Can't test this right now, but I think the following will work:
A=[1 0 0;
0 0 1;
0 1 0;
0 1 1;
0 1 2;
3 4 4];
B = zeros(size(A));
for ii = 1:size(A,1)
r = A(ii,:);
B(ii,1) = 1;
for jj = 2:size(A,2)
c = find(r(1:jj-1)==r(jj));
if numel(c) > 0
B(ii,jj) = B(ii,c);
else
B(ii,jj) = B(ii,jj-1)+1;
end
end
end
At the end of this we have an array B in which "like indices have like values" and the rows you are looking for are now identical.
Now you can do
[C, ia, ic] = unique(B,'rows','stable');
disp('The answer you want is ');
disp(ia);
And the answer you want will be in the variable ia. See http://www.mathworks.com/help/matlab/ref/unique.html#btb0_8v . I am not 100% sure that you can use the rows and stable parameters in the same call - but I think you can.
Try it and see if it works - and ask questions if you need more info.
Here is a simple method
B = NaN(size(A)); %//Preallocation
for row = 1:size(A,1)
[~,~,B(row,:)] = unique(A(row,:), 'stable');
end
find(ismember(B(2:end,:), B(1,:), 'rows')) + 1
A simple solution without loops:
row = 1; %// row used as reference
equal = bsxfun(#eq, A, permute(A, [1 3 2]));
equal = reshape(equal,size(A,1),[]); %// linearized signature of each row
result = find(ismember(equal,equal(row,:),'rows')); %// find matching rows
result = setdiff(result,row); %// remove reference row, if needed
The key is to compute a "signature" of each row, meaning the equality relationship between all combinations of its elements. This is done with bsxfun. Then, rows with the same signature can be easily found with ismember.
Thanks, Floris. The unique call didn't work correctly and I think you meant to use matrix B in it, too. Here's what I managed to do, although it's not as clean:
A=[1 0 0 1;
0 0 1 3;
0 1 0 1;
0 1 1 0;
0 1 2 2;
3 4 4 3;
5 9 9 4];
B = zeros(size(A));
for ii = 1:size(A,1)
r = A(ii,:);
B(ii,1) = 1;
for jj = 2:size(A,2)
c = find(r(1:jj-1)==r(jj));
if numel(c) > 0
B(ii,jj) = B(ii,c);
else
B(ii,jj) = max(B(ii,:))+1; % need max to generalize to more columns
end
end
end
match = zeros(size(A,1)-1,size(A,2));
for i=2:size(A,1)
for j=1:size(A,2)
if B(i,j) == B(1,j)
match(i-1,j)=1;
end
end
end
index=find(sum(match,2)==size(A,2));
In the nested loops I check if the elements in the rows below it match up in the correct column. If there is a perfect match the row should sum to the row dimension.
When I generalize this for the specific problem I'm working on the matrix fills with a certain set of base size(A,2) numbers. So for base 4 and greater, a max statement is needed in the else statement for no matches. Otherwise, for certain number combinations in a given row, a duplication of an element may occur when there is none.
A overview would be to reduce each row into a "signature" counting element repeats, i.e., your row 1 becomes 1, 2. Then check for equal signatures.

Filter some rows from a matrix

Suppose I have this matrix:
matrix = [2 2; 2 3; 3 4; 4 5]
And now I'd like to filter out all rows which do not begin with an even number to produce
[2 2; 2 3; 4 5]
Is there a high-level procedure for doing this, or do I have to code for it?
You can get a logical index for the rows whose first element is even, and use : to select all the columns. Here's how it's done, line by line:
octave> matrix = [2 2; 2 3; 3 4; 4 5]
matrix =
2 2
2 3
3 4
4 5
octave> ! mod (matrix(:,1), 2)
ans =
1
1
0
1
octave> matrix(! mod (matrix(:,1), 2),:)
ans =
2 2
2 3
4 5
EDIT: in the comments below it was asked for other selection methods. I'm unaware of any specific function for it, but the thing above is indexing with a function:
even_rows = matrix(! mod (matrix(:,1), 2), :) # first element is even
s3_rows = matrix(matrix(:,1) == 3, :); # first element is 3
int_rows = matrix(fix (matrix(:,1)), == matrix(:,1), :); # first element is an integer
IF there was a function, one would still have to write the function, it wouldn't be any easier shorter or easier to read. But if you want to write a function, you could:
function selec = select_rows (func, mt)
selec = mt(func (mt(:,1)),:);
endfunction
even_rows = select_rows (#(x) ! mod (x, 2), matrix);
se_rows = select_rows (#(x) x == 3, matrix);
int_rows = select_rows (#(x) fix (x) == x, matrix);
EDIT2: to have the rows that have already matched, simply keep track of them on the mask. Example:
mask = ! mod (matrix(:,1), 2); # mask for even numbers
even = matrix(mask,:);
mask = ! mask & matrix(:,1) == 3; # mask for left overs starting with a 3
s3 = matrix(mask,:);
rest = matrix(! mask, :); # get the leftovers
As above, you could write a function that does it. It would take a matrix as the first argument plus any number of function handles. It would iterate over the function handles modifying the mask everytime and filling a cell array with the matrices.

Resources