MATLAB - Permutations of random indices in specific areas of a grid - algorithm

I have a problem in which I have 4 objects (1s) on a 100x100 grid of zeros that is split up into 16 even squares of 25x25.
I need to create a (16^4 * 4) table where entries listing all the possible positions of each of these 4 objects across the 16 submatrices. The objects can be anywhere within the submatrices so long as they aren't overlapping one another. This is clearly a permutation problem, but there is added complexity because of the indexing and the fact that the positions ned to be random but not overlapping within a 16th square. Would love any pointers!
What I tried to do was create a function called "top_left_corner(position)" that returns the subscript of the top left corner of the sub-matrix you are in. E.g. top_left_corner(1) = (1,1), top_left_corner(2) = (26,1), etc. Then I have:
pos = randsample(24,2);
I = pos(1)+top_left_corner(position,1);
J = pos(2)+top_left_corner(position,2);
The problem is how to generate and store permutations of this in a table as linear indices.

First using ndgrid cartesian product generated in the form of a [4 , 16^4] matrix perm. Then in the while loop random numbers generated and added to perm. If any column of perm contains duplicated random numbers ,random number generation repeated for those columns until no column has duplicated elements.Normally no more than 2-3 iterations needed. Since the [100 ,100] array divided into 16 blocks, using kron an index pattern like the 16 blocks generated and with the sort function indexes of sorted elements extracted. Then generated random numbers form indexes of the pattern( 16 blocks).
C = cell(1,4);
[C{:}]=ndgrid(0:15,0:15,0:15,0:15);
perm = reshape([C{:}],16^4,4).';
perm_rnd = zeros(size(perm));
c = 1:size(perm,2);
while true
perm_rnd(:,c) = perm(:,c) * 625 +randi(625,4,numel(c));
[~ ,c0] = find(diff(sort(perm_rnd(:,c),1),1,1)==0);
if isempty(c0)
break;
end
%c = c(unique(c0));
c = c([true ; diff(c0)~=0]);
end
pattern = kron(reshape(1:16,4,4),ones(25));
[~,idx] = sort(pattern(:));
result = idx(perm_rnd).';

Related

Faster way to find the size of the intersection of any two corresponding multisets from two 3D arrays of multisets

I have two uint16 3D (GPU) arrays A and B in MATLAB, which have the same 2nd and 3rd dimension. For instance, size(A,1) = 300 000, size(B,1) = 2000, size(A,2) = size(B,2) = 20, and size(A,3) = size(B,3) = 100, to give an idea about the orders of magnitude. Actually, size(A,3) = size(B,3) is very big, say ~ 1 000 000, but the arrays are stored externally in small pieces cut along the 3rd dimension. The point is that there is a very long loop along the 3rd dimension (cfg. MWE below), so the code inside of it needs to be optimized further (if possible). Furthermore, the values of A and B can be assumed to be bounded way below 65535, but there are still hundreds of different values.
For each i,j, and d, the rows A(i,:,d) and B(j,:,d) represent multisets of the same size, and I need to find the size of the largest common submultiset (multisubset?) of the two, i.e. the size of their intersection as multisets. Moreover, the rows of B can be assumed sorted.
For example, if [2 3 2 1 4 5 5 5 6 7] and [1 2 2 3 5 5 7 8 9 11] are two such multisets, respectively, then their multiset intersection is [1 2 2 3 5 5 7], which has the size 7 (7 elements as a multiset).
I am currently using the following routine to do this:
s = 300000; % 1st dim. of A
n = 2000; % 1st dim. of B
c = 10; % 2nd dim. of A and B
depth = 10; % 3rd dim. of A and B (corresponds to a batch of size 10 of A and B along the 3rd dim.)
N = 100; % upper bound on the possible values of A and B
A = randi(N,s,c,depth,'uint16','gpuArray');
B = randi(N,n,c,depth,'uint16','gpuArray');
Sizes_of_multiset_intersections = zeros(s,n,depth,'uint8'); % too big to fit in GPU memory together with A and B
for d=1:depth
A_slice = A(:,:,d);
B_slice = B(:,:,d);
unique_B_values = permute(unique(B_slice),[3 2 1]); % B is smaller than A
% compute counts of the unique B-values for each multiset:
A_values_counts = permute(sum(uint8(A_slice==unique_B_values),2,'native'),[1 3 2]);
B_values_counts = permute(sum(uint8(B_slice==unique_B_values),2,'native'),[1 3 2]);
% compute the count of each unique B-value in the intersection:
Sizes_of_multiset_intersections_tmp = gpuArray.zeros(s,n,'uint8');
for i=1:n
Sizes_of_multiset_intersections_tmp(:,i) = sum(min(A_values_counts,B_values_counts(i,:)),2,'native');
end
Sizes_of_multiset_intersections(:,:,d) = gather(Sizes_of_multiset_intersections_tmp);
end
One can also easily adapt above code to compute the result in batches along dimension 3 rather than d=1:depth (=batch of size 1), though at the expense of even bigger unique_B_values vector.
Since the depth dimension is large (even when working in batches along it), I am interested in faster alternatives to the code inside the outer loop. So my question is this: is there a faster (e.g. better vectorized) way to compute sizes of intersections of multisets of equal size?
Disclaimer : This is not a GPU based solution (Don't have a good GPU). I find the results interesting and want to share, but I can delete this answer if you think it should be.
Below is a vectorized version of your code, that makes it possible to get rid of the inner loop, at the cost of having to deal with a bigger array, that might be too big to fit in the memory.
The idea is to have the matrices A_values_counts and B_values_counts be 3D matrices shaped in such a way that calling min(A_values_counts,B_values_counts) will calculate everything in one go due to implicit expansion. In the background it will create a big array of size s x n x length(unique_B_values) (Probably most of the time too big)
In order to go around the constraint on the size, the results are calculated in batches along the n dimension, i.e. the first dimension of B:
tic
nBatches_B = 2000;
sBatches_B = n/nBatches_B;
Sizes_of_multiset_intersections_new = zeros(s,n,depth,'uint8');
for d=1:depth
A_slice = A(:,:,d);
B_slice = B(:,:,d);
% compute counts of the unique B-values for each multiset:
unique_B_values = reshape(unique(B_slice),1,1,[]);
A_values_counts = sum(uint8(A_slice==unique_B_values),2,'native'); % s x 1 x length(uniqueB) array
B_values_counts = reshape(sum(uint8(B_slice==unique_B_values),2,'native'),1,n,[]); % 1 x n x length(uniqueB) array
% Not possible to do it all in one go, must split in batches along B
for ii = 1:nBatches_B
Sizes_of_multiset_intersections_new(:,((ii-1)*sBatches_B+1):ii*sBatches_B,d) = sum(min(A_values_counts,B_values_counts(:,((ii-1)*sBatches_B+1):ii*sBatches_B,:)),3,'native'); % Vectorized
end
end
toc
Here is a little benchmark with different values of the number of batches. You can see that a minimum is found around a number of 400 (batch size 50), with a decrease of around 10% in processing time (each point is an average over 3 runs). (EDIT : x axis is amount of batches, not batches size)
I'd be interested in knowing how it behaves for GPU arrays as well!

How to traverse an image across the blocks randomly?

I have divided a 512X512 image into 2X2 pixel blocks. Thus I have 65536 blocks in total. Each block has four pixels.
Now I want to traverse the image in random order. As for example: starting from 6th block, then to 3rd block, then to 8th, then to 1st block...... like this until the whole image is traversed.
Important: I need to store the traversing order for later use.
Please help me writing a MATLAB code for this. Many many many thanks in advance.
Easy, let's make an example with small matrix (6x6)
Im = rand(6,6);
nblocks = 9;
blocksize = 2;
You will have blocks of size 2x2 (in total 3x3=9 blocks).
Reshape the matrix into a 2 x 18 matrix.
Im = reshape(Im, numel(Im)/blocksize, blocksize);
Now generate a random permutation of indexes separated by the size of the block:
idx = randperm(nblocks) * blocksize;
Et voilĂ . Now you can access the 5th block just doing:
currentblock = Im(idx(5):idx(5)+blocksize, :);
Use a loop to transverse each block.
You can divide the image into blocks and tile them along a third dimension using this great answer. You then loop over a random permutation of the third dimension indices:
A = randn(12,12);
m = 3;
n = 6;
T = permute(reshape(permute(reshape(A, size(A, 1), n, []), [2 1 3]), n, m, []), [2 1 3]);
% each third-dim slice is an mxn block
scan_order = randperm(size(T,3)); % random permutation of block indices
for b = scan_order
block = T(:,:,b);
% Do stuff with current block
end

Matrix reordering to block diagonal form

Give a sparse matrix, how to reorder the rows and columns such that it is in block diagonal like form via row and column permutation?
Row and column permutation are not necessarily coupled like reverse Cuthill-McKee ordering:
http://www.mathworks.com/help/matlab/ref/symrcm.html?refresh=true In short, you can independently perform any row or column permutation.
The overall goal is to cluster all the non zero elements towards diagonal line.
Here is one approach.
First make a graph whose vertices are rows and columns. Every non-zero value is a edge between that row and that column.
You can then use a standard graph theory algorithm to detect the connected components of this graph. The single element ones represent all zero rows and columns. Number the others. Those components may have unequal numbers of rows and columns. You can distribute some zero rows and columns to them to make them square.
Your square components will be your blocks, and from the numbering of those components you know what order to put them in. Now just reorder rows and columns to achieve this structure and, voila! (The remaining zero rows/columns will result in a bunch of 0 blocks at the bottom right of the diagonal.)
Just an idea, but if you make a new matrix Ab from the original block-matrix A that contains the block-sparsity structure of A. E.g.:
A = [B 0 0; 0 0 C; 0 D 0]; % with matrices 0 (zero elements), B,C and D
Ab = [1 0 0; 0 0 2; 0 3 0]; % with identifiers 1, 2 and 3 (1-->B, 2-->C, 3-->D)
Then Ab is a simple sparse matrix (size 3x3 in the example). You can then use the reverse Cuthill-McKee ordering to get the permutations you want, and apply these permutations to Ab.
p = symrcm(Ab);
Abperm = Ab(p,p);
Then use the identifiers to create the ordered block matrix Aperm from Abperm and you'll have the desired result, I believe.
You'll need to be clever in assigning the identifiers to the individual blocks and so on, but this should be possible.

How to apply a function to rows of a SciPy CSR sparse matrix?

I have a CSR matrix of counts (X_ngrams). I would like to build a sparse log-odds matrix by taking the log of the quotient of each entry and the sum across the row. Here is my best shot:
log_odds = X_ngrams.asfptype() # convert the counts to floats
row_sums = log_odds.sum(axis=1) # sum up each row
log_odds.log1p() # take log of each element
for ii in xrange(row_sums.shape[0]):
log_odds[ii,:].__add__(math.log(row_sums[ii,0]))
But that gives an error:
NotImplementedError: adding a nonzero scalar to a sparse matrix is not supported
So, my question is: how do I modify the contents of a CSR? I only want to modify the elements that are present.
Other approaches would also be welcome. The basic problem is to modify a CSR based on the sum across the columns for each row for the elements that exist.
So far as I can tell, one cannot apply an arbitrary function for elementwise calculation on a CSR sparse matrix. Instead, you can create a new sparse matrix with the same structure and just run the calculation across the sparse data. Here is sample code that shows how to calculate the log() of the ratio of each element to the sum across the columns on each row:
X_ngrams.sort_indices() # *MUST* have indices sorted for this to work!
row_sums = np.squeeze(np.asarray(X_ngrams.sum(axis=1),dtype=np.float64))
rows,cols = X_ngrams.nonzero()
data = np.array( [ math.log(x/row_sums[rows[ii]]) for ii,x in enumerate(X_ngrams.data)] )
new_odds = csr_matrix((data,X_ngrams.indices,X_ngrams.indptr),shape=X_ngrams.shape)
Here is a sample of the results, printing the first element of each row in both matrices:
row_sum Xngrams new_odds
[ 0][1439] 1063 20 -3.973118
[ 1][ 13] 1677 18 -4.534390
[ 2][1439] 5323 68 -4.360285
[ 3][1439] 983 15 -4.182559
This is not fast, but I suppose it is good enough. The sample X_ngrams data set has 2,596,855 non-zero elements with a shape = (2257, 202262) and the creation of the new matrix takes 10.5s on my 5 year old macbook pro.
You can use csr_matrix.nonzero method to get the arrays of indices of nonzero elements.

What is the fast way to calculate this summation in MATLAB?

So I have the following constraints:
How to write this in MATLAB in an efficient way? The inputs are x_mn, M, and N. The set B={1,...,N} and the set U={1,...,M}
I did it like this (because I write x as the follwoing vector)
x=[x_11, x_12, ..., x_1N, X_21, x_22, ..., x_M1, X_M2, ..., x_MN]:
%# first constraint
function R1 = constraint_1(M, N)
ee = eye(N);
R1 = zeros(N, N*M);
for m = 1:M
R1(:, (m-1)*N+1:m*N) = ee;
end
end
%# second constraint
function R2 = constraint_2(M, N)
ee = ones(1, N);
R2 = zeros(M, N*M);
for m = 1:M
R2(m, (m-1)*N+1:m*N) = ee;
end
end
By the above code I will get a matrix A=[R1; R2] with 0-1 and I will have A*x<=1.
For example, M=N=2, I will have something like this:
And, I will create a function test(x) which returns true or false according to x.
I would like to get some help from you and optimize my code.
You should place your x_mn values in a matrix. After that, you can sum in each dimension to get what you want. Looking at your constraints, you will place these values in an M x N matrix, where M is the amount of rows and N is the amount of columns.
You can certainly place your values in a vector and construct your summations in the way you intended earlier, but you would have to write for loops to properly subset the proper elements in each iteration, which is very inefficient. Instead, use a matrix, and use sum to sum over the dimensions you want.
For example, let's say your values of x_mn ranged from 1 to 20. B is in the set from 1 to 5 and U is in the set from 1 to 4. As such:
X = vec2mat(1:20, 5)
X =
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
vec2mat takes a vector and reshapes it into a matrix. You specify the number of columns you want as the second element, and it will create the right amount of rows to ensure that a proper matrix is built. In this case, I want 5 columns, so this should create a 4 x 5 matrix.
The first constraint can be achieved by doing:
first = sum(X,1)
first =
34 38 42 46 50
sum works for vectors as well as matrices. If you have a matrix supplied to sum, you can specify a second parameter that tells you in what direction you wish to sum. In this case, specifying 1 will sum over all of the rows for each column. It works in the first dimension, which is the rows.
What this is doing is it is summing over all possible values in the set B over all values of U, which is what we are exactly doing here. You are simply summing every single column individually.
The second constraint can be achieved by doing:
second = sum(X,2)
second =
15
40
65
90
Here we specify 2 as the second parameter so that we can sum over all of the columns for each row. The second dimension goes over the columns. What this is doing is it is summing over all possible values in the set U over all values of B. Basically, you are simply summing every single row individually.
BTW, your code is not achieving what you think it's achieving. All you're doing is simply replicating the identity matrix a set number of times over groups of columns in your matrix. You are actually not performing any summations as per the constraint. What you are doing is you are simply ensuring that this matrix will have the conditions you specified at the beginning of your post to be enforced. These are the ideal matrices that are required to satisfy the constraints.
Now, if you want to check to see if the first condition or second condition is satisfied, you can do:
%// First condition satisfied?
firstSatisfied = all(first <= 1);
%// Second condition satisfied
secondSatisfied = all(second <= 1);
This will check every element of first or second and see if the resulting sums after you do the above code that I just showed are all <= 1. If they all satisfy this constraint, we will have true. Else, we have false.
Please let me know if you need anything further.

Resources