Calculate rank of binary matrix with larger size - algorithm

Hell all, I have some problem when compute the rank of binary matrix that only 1 or 0. The rank of binary matrix will based on the row reduction using boolean operations XOR. Let see the XOR operation:
1 xor 1 =0
1 xor 0= 1
0 xor 0= 0
0 xor 1= 1
Given a binary matrix as
A =
1 1 0 0 0 0
1 0 0 0 0 1
0 1 0 0 0 1
We can see the third row equals first row xor with second row. Hence, the rank of matrix A only 2, instead of 3 by rank matlab function.
I have one way to compute the extractly rank of binary matrix using this code
B=gf(A)
rank(B)
It will return 2. However, when I compute with large size of matrix, for example 400 by 400. It does not return the rank (never stop). Could you suggest to me the good way to find rank of binary matrix for large size? Thank all
UPDATE: this is computation time using tic toc
N=50; Elapsed time is=0.646823 seconds
N=100;Elapsed time is 3.123573 seconds.
N=150;Elapsed time is 7.438541 seconds.
N=200;Elapsed time is 11.349964 seconds.
N=400;Elapsed time is 66.815286 seconds.
Note that check rank is only the condition in my algorithm. However, it take very long long time, then it will affect to my method
Base on the suggestion of R. I will use Gaussian Elimination to find the rank. This is my code. However, it call the rank function (spend some computation times). Could you modify help me without using rank function?
function rankA=GaussEliRank(A)
mat = A;
[m n] = size(A); % read the size of the original matrix A
for i = 1 : n
j = find(mat(i:m, i), 1); % finds the FIRST 1 in i-th column starting at i
if isempty(j)
mat = mat( sum(mat,2)>0 ,:);
rankA=rank(mat); %%Here
return;
else
j = j + i - 1; % we need to add i-1 since j starts at i
temp = mat(j, :); % swap rows
mat(j, :) = mat(i, :);
mat(i, :) = temp;
% add i-th row to all rows that contain 1 in i-th column
% starting at j+1 - remember up to j are zeros
for k = find(mat( (j+1):m, i ))'
mat(j + k, :) = bitxor(mat(j + k, :), mat(i, :));
end
end
end
%remove all-zero rows if there are some
mat = mat( sum(mat,2)>0 ,:);
if any(sum( mat(:,1:n) ,2)==0) % no solution because matrix A contains
error('No solution.'); % all-zero row, but with nonzero RHS
end
rankA=rank(mat); %%Here
end
Let check the matrix A at here. Correct ans is 393 for rank of A.

Once you get the matrix into row echelon form with Gaussian elimination, the rank is the number of nonzero rows. You should be able to replace the code after the loop with something like rankA=sum(sum(mat,2)>0);.

Related

An efficient algorithm to count the number of integer grids

Consider a square 3 by 3 grid of non-negative integers. For each row i the sum of the integers is set to be r_i. Similarly for each column j the sum of integers in that column is set to be c_j. An instance of the problem is therefore described by 6 non-negative integers.
Is there an efficient algorithm to count how many different
assignments of integers to the grid there are given the row and column
sum constraints?
Clearly one could enumerate all possible matrices of non-negative integers with values up to sum r_i and check the constraints for each, but that would be insanely slow.
Example
Say the row constraints are 1 2 3 and the column constraints are 3 2 1. The possible integer grids are:
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│0 0 1│0 0 1│0 0 1│0 1 0│0 1 0│0 1 0│0 1 0│1 0 0│1 0 0│1 0 0│1 0 0│1 0 0│
│0 2 0│1 1 0│2 0 0│0 1 1│1 0 1│1 1 0│2 0 0│0 1 1│0 2 0│1 0 1│1 1 0│2 0 0│
│3 0 0│2 1 0│1 2 0│3 0 0│2 1 0│2 0 1│1 1 1│2 1 0│2 0 1│1 2 0│1 1 1│0 2 1│
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
In practice my main interest is when the total sum of the grid will be at most 100 but a more general solution would be very interesting.
Is there an efficient algorithm to count how many different assignments of integers to the grid there are given the row and column sum constraints?
upd My answer is wrong for this particular problem, when N is fixed (i.e. becomes constant 3). In this case it is polynomial. Sorry for misleading information.
TL;DR: I think it's at least NP-hard. There is no polinomial algorithm, but maybe there're some heuristic speedups.
For N-by-N grid you have N equations for row sums, N equations for col sums and N^2 non-negative constraints :
For N > 2 this system has more than one possible solution in general. Because there're N^2 unknown variables x_ij and just 2N equations => for N > 2: N^2 > 2N.
You can eliminate 2N - 1 variables to leave with just one equation with K = N^2 - (2N-1) variables getting the sum S. Then you'll have to deal with integer partition problem to find out all possible combinations of K terms to get the S. This problem is NP-complete. And the number of combinations depends not only on the number of terms K, but also on the order of the value S.
This problem reminded me about Simplex method. My first thought was to find just one solution using something like that method and then traverse edges of the convex to find all the possible solutions. And I was hoping that there's an optimal algorithm for that. But no, integer simplex method, which is related to integer linear programming, is NP-hard :(
I hope, there're some kind heuristics for related problems you can use to speedup naive brute force solution.
I don't know of a matching algorithm, but I don't think it would be that difficult to work one out. Given any one solution, you can derive another solution by selecting four corners of a rectangular region of your grid, increasing two diagonal corners by some value and decreasing the other two by that same value. The range for that value will be constrained by the lowest value of each diagonal pair. If you determine the size of all such ranges, you should be able to multiply them together to determine the total possible solutions.
Assuming you described your grid like a familiar spreadsheet alphabetically for columns, and numerically for rows, you could describe all possible regions in the following list:
A1:B2, A1:B3, A1:C2, A1:C3, B1:C2, B1:C3, A2:B3, A2:C3, B2:C3
For each region, we tabulate a range based on the lowest value from each diagonal corner pair. You can incrementally reduce either pair until a member reaches zero because there's no upper bound for the other pair.
Selecting the first solution of your example, we can derive all other possible solutions using this technique.
A B C
┌─────┐
1 │0 0 1│ sum=1
2 │0 2 0│ sum=2
3 │3 0 0│ sum=3
└─────┘
3 2 1 = sums
A1:B2 - 1 solution (0,0,0,2)
A1:C2 - 1 solution (0,1,0,0)
A1:B3 1 solution (0,0,3,0)
A1:C3 2 solutions (0,1,3,0), (1,0,2,1)
B1:C2 2 solutions (0,1,2,0), (1,0,1,1)
B1:C3 1 solution (0,1,0,0)
A2:B3 3 solutions (0,2,3,0), (1,1,2,1), (2,0,1,2)
A2:C3 1 solution (0,0,3,0)
B2:C3 1 solution (2,0,0,0)
Multiply all solution counts together and you get 2*2*3=12 solutions.
Maybe a simple 4-nested-loop solution is fast enough, if the total sum is small?
function solve(rowsum, colsum) {
var count = 0;
for (var a = 0; a <= rowsum[0] && a <= colsum[0]; a++) {
for (var b = 0; b <= rowsum[0] - a && b <= colsum[1]; b++) {
var c = rowsum[0] - a - b;
for (var d = 0; d <= rowsum[1] && d <= colsum[0] - a; d++) {
var g = colsum[0] - a - d;
for (var e = 0; e <= rowsum[1] - d && e <= colsum[1] - b; e++) {
var f = rowsum[1] - d - e;
var h = colsum[1] - b - e;
var i = rowsum[2] - g - h;
if (i >= 0 && i == colsum[2] - c - f) ++count;
}
}
}
}
return count;
}
document.write(solve([1,2,3],[3,2,1]) + "<br>");
document.write(solve([22,33,44],[30,40,29]) + "<br>");
It won't help with the problem being #P-hard (if you allow matrices to be of any sizes -- see reference in the comment below), but there is a solution which doesn't amount to enumerate all the matrices but rather a smaller set of objects called semi-standard Young tableaux. Depending on your input, it could go faster, but still being of exponential complexity. Since it's an entire chapter in several algebraic combinatorics book or in Knuth's AOCP 3, I won't go into details here only pointing to the relevant wikipedia pages.
The idea is that using the Robinson–Schensted–Knuth correspondence each of these matrix is in bijection with a pair of tableaux of the same shape, where one of the tableau is filled with integers counted by the row sum, the other by the column sum. The number of tableau of shape U filled with numbers counted by V is called the Kostka Number K(U,V). As a consequence, you end up with a formula such as
#Mat(RowSum, ColSum) = \sum_shape K(shape, RowSum)*K(shape, ColSum)
Of course if RowSum == ColSum == Sum:
#Mat(Sum, Sum) = \sum_shape K(shape, Sum)^2
Here is your example in the SageMath system:
sage: sum(SemistandardTableaux(p, [3,2,1]).cardinality()^2 for p in Partitions(6))
12
Here are some larger examples:
sage: sums = [6,5,4,3,2,1]
sage: %time sum(SemistandardTableaux(p, sums).cardinality()^2 for p in Partitions(sum(sums)))
CPU times: user 228 ms, sys: 4.77 ms, total: 233 ms
Wall time: 224 ms
8264346
sage: sums = [7,6,5,4,3,2,1]
sage: %time sum(SemistandardTableaux(p, sums).cardinality()^2 for p in Partitions(sum(sums)))
CPU times: user 1.95 s, sys: 205 µs, total: 1.95 s
Wall time: 1.94 s
13150070522
sage: sums = [5,4,4,4,4,3,2,1]
sage: %time sum(SemistandardTableaux(p, sums).cardinality()^2 for p in Partitions(sum(sums)))
CPU times: user 1.62 s, sys: 221 µs, total: 1.62 s
Wall time: 1.61 s
1769107201498
It's clear that you won't get that fast enumerating matrices.
As requested by גלעד ברקן# here is a solution with different row and column sums:
sage: rsums = [5,4,3,2,1]; colsums = [5,4,3,3]
sage: %time sum(SemistandardTableaux(p, rsums).cardinality() * SemistandardTableaux(p, colsums).cardinality() for p in Partitions(sum(rsums)))
CPU times: user 88.3 ms, sys: 8.04 ms, total: 96.3 ms
Wall time: 92.4 ms
10233
I've tired to optimize the slow option. I get the all combinations and change the code only to get the total count. This is the fastest I could get:
private static int count(int[] rowSums, int[] colSums)
{
int count = 0;
int[] row0 = new int[3];
int sum = rowSums[0];
for (int r0 = 0; r0 <= sum; r0++)
for (int r1 = 0, max1 = sum - r0; r1 <= max1; r1++)
{
row0[0] = r0;
row0[1] = r1;
row0[2] = sum - r0 - r1;
count += getCombinations(rowSums[1], row0, colSums);
}
return count;
}
private static int getCombinations(int sum, int[] row0, int[] colSums)
{
int count = 0;
int max1 = Math.Min(colSums[1] - row0[1], sum);
int max2 = Math.Min(colSums[2] - row0[2], sum);
for (int r0 = 0, max0 = Math.Min(colSums[0] - row0[0], sum); r0 <= max0; r0++)
for (int r1 = 0; r1 <= max1; r1++)
{
int r01 = r0 + r1;
if (r01 <= sum)
if ((r01 + max2) >= sum)
count++;
}
return count;
}
Stopwatch w2 = Stopwatch.StartNew();
int res = count(new int[] { 1, 2, 3 }, new int[] { 3, 2, 1 });//12
int res1 = count(new int[] { 22, 33, 44 }, new int[] { 30, 40, 29 });//117276
int res2 = count(new int[] { 98, 99, 100}, new int[] { 100, 99, 98});//12743775
int res3 = count(new int[] { 198, 199, 200 }, new int[] { 200, 199, 198 });//201975050
w2.Stop();
Console.WriteLine("w2:" + w2.ElapsedMilliseconds);//322 - 370 on my computer
Aside my other answer using Robinson-Schensted-Knuth bijection, here is
another solution which doesn't need advanced combinatorics, but some trick in
programming solve this problem for arbitrary larger matrix. The first idea
that should be used to solve those kind of problems is to use recursion, avoiding recompution things thanks to some memoization
or better dynamic programming. Specifically once you have chosen a candidate
for the first row, you subtract this first row to the column sum and you are
left with the same problem only there is one less row. To avoid recomputing
thing you store the result. You can do this
either basically in a big table (memoization)
or in a more tricky way by storing all the solutions for matrices with n rows
and deducing the number of solutions for matrices with n+1 rows (dynamic programming).
Here is a recursive method using memoization in Python:
# Generator for the rows of sum s which are smaller that maxrow
def choose_one_row(s, maxrow):
if not maxrow:
if s == 0: yield []
else: return
else:
for i in range(0, maxrow[0]+1):
for res in choose_one_row(s-i, maxrow[1:]):
yield [i]+res
memo = dict()
def nmat(rsum, colsum):
# sanity check: sum by row and column must match
if sum(rsum) != sum(colsum): return 0
# base case rsum is empty
if not rsum: return 1
# convert to immutable tuple for memoization
rsum = tuple(rsum)
colsum = tuple(colsum)
# try if allready computed
try:
return memo[rsum, colsum]
except KeyError:
pass
# apply the recursive formula
res = 0
for row in choose_one_row(rsum[0], colsum):
res += nmat(rsum[1:], tuple(a - b for a, b in zip(colsum, row)))
# memoize the result
memo[(tuple(rsum), tuple(colsum))] = res
return res
Then after that:
sage: nmat([3,2,1], [3,2,1])
12
sage: %time nmat([6,5,4,3,2,1], [6,5,4,3,2,1])
CPU times: user 1.49 s, sys: 7.16 ms, total: 1.5 s
Wall time: 1.48 s
8264346

Change the size of matrix?

I have an image that I'm converting to a binary matrix with size (n,m)
I need a MATLAB function to reshape the size of this matrix to be (n,n).
Otherwise, would it be possible to make the size of image be (n,n) versus the initial (n,m)?
It's actually quite easy. Supposing that your matrix is A and is n x m. I'm assuming you'll want to zero-pad the matrix, meaning that the extra elements would be set to 0. You would simply do this:
[n,m] = size(A);
A(:,m+1:n) = 0;
The first line of code finds the rows n and columns m of the matrix A. Next, we will make all of the rows from the (m+1)th column to the nth column all 0 which effectively makes this a n x n matrix.
Example Run
Here's an example with a 4 x 2 matrix A, and the process requires that we change the size so that A is 4 x 4.
>> A = rand(4,2)
A =
0.9575 0.9572
0.9649 0.4854
0.1576 0.8003
0.9706 0.1419
>> [n,m] = size(A);
>> A(:,m+1:n) = 0
A =
0.9575 0.9572 0 0
0.9649 0.4854 0 0
0.1576 0.8003 0 0
0.9706 0.1419 0 0
Minor Note
This assumes that the number of rows is greater than the number of columns... I'm assuming that this is a requirement on your end. If this is not the case, then the above code won't work. You can make the algorithm agnostic whereas you would zero-pad the matrix in the dimension that has the least amount of entries, but I'll leave that to you as an exercise.

Reshape vector to matrix with column-wise zero padding in matlab

for an input matrix
in = [1 1;
1 2;
1 3;
1 4;
2 5;
2 6;
2 7;
3 8;
3 9;
3 10;
3 11];
i want to get the output matrix
out = [1 5 8;
2 6 9;
3 7 10;
4 0 11];
meaning i want to reshape the second input column into an output matrix, where all values corresponding to one value in the first input column are written into one column of the output matrix.
As there can be different numbers of entries for each value in the first input column (here 4 values for "1" and "3", but only 3 for "2"), the normal reshape function is not applicable. I need to pad all columns to the maximum number of rows.
Do you have an idea how to do this matlab-ish?
The second input column can only contain positive numbers, so the padding values can be 0, -x, NaN, ...
The best i could come up with is this (loop-based):
maxNumElem = 0;
for i=in(1,1):in(end,1)
maxNumElem = max(maxNumElem,numel(find(in(:,1)==i)));
end
out = zeros(maxNumElem,in(end,1)-in(1,1));
for i=in(1,1):in(end,1)
tmp = in(in(:,1)==i,2);
out(1:length(tmp),i) = tmp;
end
Either of the following approaches assumes that column 1 of in is sorted, as in the example. If that's not the case, apply this initially to sort in according to that criterion:
in = sortrows(in,1);
Approach 1 (using accumarray)
Compute the required number of rows, using mode;
Use accumarray to gather the values corresponding to each column, filled with zeros at the end. The result is a cell;
Concatenate horizontally the contents of all cells.
Code:
[~, n] = mode(in(:,1)); %//step 1
out = accumarray(in(:,1), in(:,2), [], #(x){[x; zeros(n-numel(x),1)]}); %//step 2
out = [out{:}]; %//step 3
Alternatively, step 1 could be done with histc
n = max(histc(in(:,1), unique(in(:,1)))); %//step 1
or with accumarray:
n = max(accumarray(in(:,1), in(:,2), [], #(x) numel(x))); %//step 1
Approach 2 (using sparse)
Generate a row-index vector using this answer by #Dan, and then build your matrix with sparse:
a = arrayfun(#(x)(1:x), diff(find([1,diff(in(:,1).'),1])), 'uni', 0); %//'
out = full(sparse([a{:}], in(:,1), in(:,2)));
Introduction to proposed solution and Code
Proposed here is a bsxfun based masking approach that uses the binary operators available as builtins for use with bsxfun and as such I would consider this very appropriate for problems like this. Of course, you must also be aware that bsxfun is a memory hungry tool. So, it could pose a threat if you are dealing with maybe billions of elements depending also on the memory available for MATLAB's usage.
Getting into the details of the proposed approach, we get the counts of each ID from column-1 of the input with histc. Then, the magic happens with bsxfun + #le to create a mask of positions in the output array (initialized by zeros) that are to be filled by the column-2 elements from input. That's all you need to tackle the problem with this approach.
Solution Code
counts = histc(in(:,1),1:max(in(:,1)))'; %//' counts of each ID from column1
max_counts = max(counts); %// Maximum counts for each ID
mask = bsxfun(#le,[1:max_counts]',counts); %//'# mask of locations where
%// column2 elements are to be placed
out = zeros(max_counts,numel(counts)); %// Initialize the output array
out(mask) = in(:,2); %// place the column2 elements in the output array
Benchmarking (for performance)
The benchmarking presented here compares the proposed solution in this post against the various methods presented in Luis's solution. This skips the original loopy approach presented in the problem as it appeared to be very slow for the input generated in the benchmarking code.
Benchmarking Code
num_ids = 5000;
counts_each_id = randi([10 100],num_ids,1);
num_runs = 20; %// number of iterations each approach is run for
%// Generate random input array
in = [];
for k = 1:num_ids
in = [in ; [repmat(k,counts_each_id(k),1) rand(counts_each_id(k),1)]];
end
%// Warm up tic/toc.
for k = 1:50000
tic(); elapsed = toc();
end
disp('------------- With HISTC + BSXFUN Masking approach')
tic
for iter = 1:num_runs
counts = histc(in(:,1),1:max(in(:,1)))';
max_counts = max(counts);
out = zeros(max_counts,numel(counts));
out(bsxfun(#le,[1:max_counts]',counts)) = in(:,2);
end
toc
clear counts max_counts out
disp('------------- With MODE + ACCUMARRAY approach')
tic
for iter = 1:num_runs
[~, n] = mode(in(:,1)); %//step 1
out = accumarray(in(:,1), in(:,2), [], #(x){[x; zeros(n-numel(x),1)]}); %//step 2
out = [out{:}];
end
toc
clear n out
disp('------------- With HISTC + ACCUMARRAY approach')
tic
for iter = 1:num_runs
n = max(histc(in(:,1), unique(in(:,1))));
out = accumarray(in(:,1), in(:,2), [], #(x){[x; zeros(n-numel(x),1)]}); %//step 2
out = [out{:}];
end
toc
clear n out
disp('------------- With ARRAYFUN + Sparse approach')
tic
for iter = 1:num_runs
a = arrayfun(#(x)(1:x), diff(find([1,diff(in(:,1).'),1])), 'uni', 0); %//'
out = full(sparse([a{:}], in(:,1), in(:,2)));
end
toc
clear a out
Results
------------- With HISTC + BSXFUN Masking approach
Elapsed time is 0.598359 seconds.
------------- With MODE + ACCUMARRAY approach
Elapsed time is 2.452778 seconds.
------------- With HISTC + ACCUMARRAY approach
Elapsed time is 2.579482 seconds.
------------- With ARRAYFUN + Sparse approach
Elapsed time is 1.455362 seconds.
slightly better, but still uses a loop :(
out=zeros(4,3);%set to zero matrix
for i = 1:max(in(:,1)); %find max in column 1, and loop for that number
ind = find(in(:,1)==i); %
out(1: size(in(ind,2),1),i)= in(ind,2);
end
don't know if you can avoid the loop...

Find rank of matrix in GF(2) using Gaussian Elimination

I am find the rank of binary matrix in GF(2)( Galois Field). The rank function in matlab cannot find it. For example, Given a matrix 400 by 400 as here. If you use the rank function as
rank(A)
ans=357
However, the correct ans. in GF(2) must be 356 by this code
B=gf(A);
rank(B);
ans=356;
But this way spends a lot a time (about 16s). Hence, I used Gaussian elimination to find the rank in GF(2) with small time. But, it does not works well. Sometime, it returns the true value, but sometime it returns wrong. Please see my code and let me know the problem in my code. Note that, it spend very small time compare with above code
function rankA =GaussEliRank(A)
tic
mat = A;
[m n] = size(A); % read the size of the original matrix A
for i = 1 : n
j = find(mat(i:m, i), 1); % finds the FIRST 1 in i-th column starting at i
if isempty(j)
mat = mat( sum(mat,2)>0 ,:);
rankA=rank(mat);
return;
else
j = j + i - 1; % we need to add i-1 since j starts at i
temp = mat(j, :); % swap rows
mat(j, :) = mat(i, :);
mat(i, :) = temp;
% add i-th row to all rows that contain 1 in i-th column
% starting at j+1 - remember up to j are zeros
for k = find(mat( (j+1):m, i ))'
mat(j + k, :) = bitxor(mat(j + k, :), mat(i, :));
end
end
end
%remove all-zero rows if there are some
mat = mat( sum(mat,2)>0 ,:);
if any(sum( mat(:,1:n) ,2)==0) % no solution because matrix A contains
error('No solution.'); % all-zero row, but with nonzero RHS
end
rankA=sum(sum(mat,2)>0);
end
Let use the gfrank function. It is suitable for your matrix.
Use:
gfrank(A)
ans=
356
More detail: How to find the row rank of matrix in Galois fields?

Random choosing number in array without repeated

I have a algorithm to randomly select element t in a array with out repeated. This is more detail of algorithm
It can explain as folowing:
Initial a array index u that stores the index of numbers from 1 to k (line 1 to 3)
Set initial of gamma from k and reduce by one for each iteration. The purpose of gamma is for without repeated (line 4,9,10)
Random choose a number t from 1 to N(at the j=1, choose 1 to k, N are nonrepated number), and then put the number to the end of array.
Repate the step 2 to 3
If gamma =0,reset gamma=k
This function will return the t.
For example, I have a array A=[1,2,3,4,5,6,7,8,9], k=9 =size(A), N=12 (From 1 to 9, number select only one time). Now I want to use this algorithm to randomly select number t from array A. This is my code. However, it does not similar the line 6 in the algorithm. Is it right? Let see my code help me
function nonRepeat
k=9;
u=1:k; % initial value of index
N=12
gamma=k;
for j=1:N
index=randi(gamma,1); % use other choosing
t=u(index)
%%swapping
temp=u(t);
u(t)=u(gamma);
u(gamma)=temp;
gamma=gamma-1;
if gamma==0
gamma=k;
end
end
end
I think index=randi(gamma,1); is not right because it says select number t randomly but you select index randomly and assign t=u(index).
See if it works,
k = 9;
u = 1 : k;
N = 12;
gamma = k;
for j = 1 : N
t = randi(gamma,1);
temp = u(t);
u(t) = u(gamma);
u(gamma) = temp;
gamma = gamma - 1;
if gamma == 0
gamma = k;
end
end

Resources