Monotonically increasing 2-d array - algorithm

Give an algorithm to find a given element x (give the co-ordinates), in an n by n matrix where the rows and columns are monotonically increasing.
My thoughts:
Reduce problem set size.
In the 1st column, find the largest element <= x. We know x must be in this row or after (lower). In the last column of the matrix, find the smallest element >= x. We know x must be in this row or before. Do the same thing with the first and last rows of the matrix. We have now defined a sub-matrix such that if x is in the matrix at all, it is in this sub-matrix. Now repeat the algo on this sub-matrix... Something along these lines.
[YAAQ: Yet another arrays question.]

I think you cannot hope for more than O(N), which is attainable. (N is the width of the matrix).
Why you cannot hope for more
Imagine a matrix like this:
0 0 0 0 0 0 ... 0 0 x
0 0 0 0 0 0 ... 0 x 2
0 0 0 0 0 0 ... x 2 2
.....................
0 0 0 0 0 x ... 2 2 2
0 0 0 0 x 2 ... 2 2 2
0 0 0 x 2 2 ... 2 2 2
0 0 x 2 2 2 ... 2 2 2
0 x 2 2 2 2 ... 2 2 2
x 2 2 2 2 2 ... 2 2 2
where x is an unknown number (not the same number, ie. it might be a different one in every column). To satisfy the monotonicity of the matrix, you can place any of 0, 1, or 2 in all of the x places. So, to find if there is 1 in the matrix, you have to check all the x places, and there are N of them.
How to make it O(n)
Imagine you have to find first column indicies with number > q (a given number) for all rows. You start in the upper right corner of the matrix; if the number you see is greater, you go left; else go down. End when you are in the last row. The points where you went down are the places you search for. If any of them have the number you search for, you've found it.
This algorithm is O(n), because in each step, you either go left or down. Totally, it cannot go more than N times left and N times down. Therefore it's O(n).

Pick a corner element, one that is greatest in its row and smallest in its column (or the other way). Compare with x. Depending on the result of the comparison, you can exclude the row or the column from further search.
The new matrix has sum of dimensions decreased by 1, compared to the original one. Apply the above iteratively. After 2*n steps you end up with a 1x1 matrix.

If "the rows and columns are monotonically increasing" means that the values in each (row,col) increase such that for any row, (rowM,col1) < (rowM,col2) < ... < (rowM,colN) < (rowM+1,col1) ...
Then you can just treat it as a 1 dimensional array that is sorted from smallest to largest, and do a standard binary search, by sampling the item that is 1/2(rows * cols) fron the start, then sampling the element that is 1/4(rows * cols) behind (if the first element sampled is > x) or ahead (if the first element sampled is < x), and so forth.

Related

Finding probability that chess Knight will stay on chessboard after k moves with dynamic prgramming

I was trying out "Knight Probability in Chessboard" problem from leetcode:
Given n, k, row and column, we have to find the probability that knight initially kept at cell indexed by [row,column] will stay on n x n chessboard after k moves.
I wanted to do it by addition, that is, maintain number of ways we can get to cell at index [x,y] in kth step at dynamic programming memory location indexed [x,y,k]. Then sum counts in all cells at kth index and then divide it by 8^k. That is, if I start at index [0,0], with n=4, the values at successive k-th index will be:
After step 1:
0 0 0 0
0 0 1 0
0 1 0 0
0 0 0 0
After step 2:
4 0 2 0
0 0 0 2
2 0 0 0
0 2 0 4
After step 3:
0 6 0 0
6 0 11 0
0 11 0 6
0 0 6 0
Only first step output seems to be correct. After second step, the sum is 2+2+2+2+4+4=16 and the probability is 16/8^2 = 0.25. However, the actual answer is 0.125. After third step, the sum becomes 6+6+6+6+11+11=46 and the probability is 46/8^3 = 0.0898. But, the actual answer is 0.039. Where does this dynamic programming approach make mistake?
Sample calculation for step 2
Bottom up approach:
Start by filling P(x_start, y_start, 0) = 1 and setting (x_start, y_start) in a map (from positions to booleans) previous_layer_map. Also, set the counter current_layer to 1.
Iterate though each of the n^2 positions of the board. For each of them, check in O(1) if it reaches a square in previous_layer_map. If so:
If (x, y) was never saw before in the current layer (current_layer_map[x][y] == false), fill
P(x, y, current_layer) = P(x_reached, y_reached, current_layer-1)/8
and set (x, y) in current_layer_map.
Else, set
P(x, y, current_layer) += P(x_reached, y_reached, current_layer-1)/8
After you finish iterating though each of the n^2 positions of the board, empty previous_layer_map, fill it with the elements of current_layer_map and empty current_layer_map. Also, increase the counter current_layer. Then, start a new iteration. Go like this until you reach the k-th layer.
Total time complexity: O(k * n^2).
Top down approach:
Let P(x, y, k) be the probability that the knight is at the square (x, y) at the k-th step. Look at all squares that the knight could have come from (you can get them in O(1), just look at the board with a pen and paper and get the formulas from the different cases, like knight in the corner, knight in the border, knight in a central region etc). Let them be (x1, y1), ... (xj, yj). For each of these squares, what is the probability that the knight jumps to (x, y) ? Considering that it can go out of the border, it's always 1/8. So:
P(x, y, k) = (P(x1, y1, k-1) + ... + P(xj, yj, k-1))/8
The base case is k = 0.
P(x, y ,0) = 1 if (x, y) = (x_start, y_start) and P(x, y, 0) = 0 otherwise.
That is your recurrence formula. You can use dynamic programming to calculate it.
Open question: how to analyze the time complexity of this solution ? Is it equivalent to the bottom-up approach described in my other answer ?
I was incorrecty incrementing the numbers. For example in the diagram shown at the end of original question, red arrows increments from 1 to 2. It shouldnt be the case as going from one cell to next represents the same single path to next cell. It does not create two different paths to next cell. Same is the case with blue arrow. So, corrected steps are:
After step 1
0 0 0 0
0 0 1 0
0 1 0 0
0 0 0 0
After step 2
2 0 1 0
0 0 0 1
1 0 0 0
0 1 0 2
After step 3
0 2 0 0
2 0 6 0
0 6 0 2
0 0 2 0
and (2+2+2+2+6+6)/8^3 = 20/8^3 = 0.039
which is the correct answer!

Most efficent way of finding submatrices of a matrix [matlab]

Say we have a matrix of zeros and ones
0 1 1 1 0 0 0
1 1 1 1 0 1 1
0 0 1 0 0 1 0
0 1 1 0 1 1 1
0 0 0 0 0 0 1
0 0 0 0 0 0 1
and we want to find all the submatrices (we just need the row indices and column indices of the corners) with these properties:
contain at least L ones and L zeros
contain max H elements
i.e. take the previous matrix with L=1 and H=5, the submatrix 1 2 1 4 (row indices 1 2 and column indices 1 4)
0 1 1 1
1 1 1 1
satisfies the property 1 but has 8 elements (bigger than 5) so it is not good;
the matrix 4 5 1 2
0 1
0 0
is good because satisfies both the properties.
The objective is then to find all the submatrices with min area 2*L, max area H and containg at least L ones and L zeros.
If we consider a matrix as a rectangle it is easy to find all the possibile subrectangles with max area H and min area 2*L by looking at the divisors of all the numbers from H to 2*L.
For example, with H=5 and L=1 all the possibile subrectangles/submatrices are given by the divisors of
H=5 -> divisors [1 5] -> possibile rectangles of area 5 are 1x5 and 5x1
4 -> divisors [1 2 4] -> possibile rectangles of area 4 are 1x4 4x1 and 2x2
3 -> divisors [1 3] -> possibile rectangles of area 3 are 3x1 and 1x3
2*L=2 -> divisors [1 2] -> possibile rectangles of area 2 are 2x1 and 1x2
I wrote this code, which, for each number finds its divisors and cycles over them to find the submatrices. To find the submatrices it does this: take for example a 1x5 submatrix, what the code does is to fix the first line of the matrix and move step by step (along all the columns of the matrix) the submatrix from the left edge of the matrix to the right edge of the matrix, then the code fixes the second row of the matrix and moves the submatrix along all the columns from left to right, and so on until it arrives at the last row.
It does this for all the 1x5 submatrices, then it considers the 5x1 submatrices, then the 1x4, then the 4x1, then the 2x2, etc.
The code do the job in 2 seconds (it finds all the submatrices) but for big matrices, i.e. 200x200, a lot of minutes are needed to find all the submatrices. So I wonder if there are more efficient ways to do the job, and eventually which is the most efficient.
This is my code:
clc;clear all;close all
%% INPUT
P= [0 1 1 1 0 0 0 ;
1 1 1 1 0 1 1 ;
0 0 1 0 0 1 0 ;
0 1 1 0 1 1 1 ;
0 0 0 0 0 0 1 ;
0 0 0 0 0 0 1];
L=1; % a submatrix has to containg at least L ones and L zeros
H=5; % max area of a submatrix
[R,C]=size(P); % rows and columns of P
sub=zeros(1,6); % initializing the matrix containing the indexes of each submatrix (columns 1-4), their area (5) and the counter (6)
counter=1; % no. of submatrices found
%% FIND ALL RECTANGLES OF AREA >= 2*L & <= H
%
% idea: all rectangles of a certain area can be found using the area's divisors
% e.g. divisors(6)=[1 2 3 6] -> rectangles: 1x6 6x1 2x3 and 3x2
tic
for sH = H:-1:2*L % find rectangles of area H, H-1, ..., 2*L
div_sH=divisors(sH); % find all divisors of sH
disp(['_______AREA ', num2str(sH), '_______'])
for i = 1:round(length(div_sH)/2) % cycle over all couples of divisors
div_small=div_sH(i);
div_big=div_sH(end-i+1);
if div_small <= R && div_big <= C % rectangle with long side <= C and short side <= R
for j = 1:R-div_small+1 % cycle over all possible rows
for k = 1:C-div_big+1 % cycle over all possible columns
no_of_ones=length(find(P(j:j-1+div_small,k:k-1+div_big))); % no. of ones in the current submatrix
if no_of_ones >= L && no_of_ones <= sH-L % if the submatrix contains at least L ones AND L zeros
% row indexes columns indexes area position
sub(counter,:)=[j,j-1+div_small , k,k-1+div_big , div_small*div_big , counter]; % save the submatrix
counter=counter+1;
end
end
end
disp([' [', num2str(div_small), 'x', num2str(div_big), '] submatrices: ', num2str(size(sub,1))])
end
if div_small~=div_big % if the submatrix is a square, skip this part (otherwise there will be duplicates in sub)
if div_small <= C && div_big <= R % rectangle with long side <= R and short side <= C
for j = 1:C-div_small+1 % cycle over all possible columns
for k = 1:R-div_big+1 % cycle over all possible rows
no_of_ones=length(find(P(k:k-1+div_big,j:j-1+div_small)));
if no_of_ones >= L && no_of_ones <= sH-L
sub(counter,:)=[k,k-1+div_big,j,j-1+div_small , div_big*div_small, counter];
counter=counter+1;
end
end
end
disp([' [', num2str(div_big), 'x', num2str(div_small), '] submatrices: ', num2str(size(sub,1))])
end
end
end
end
fprintf('\ntime: %2.2fs\n\n',toc)
Here is a solution centered around 2D matrix convolution. The rough idea is to convolve P for each submatrix shape with a second matrix such that each element of the resulting matrix indicates how many ones are in the submatrix having its top left corner at said element. Like this you get all solutions for a single shape in one go, without having to loop over rows/columns, greatly speeding things up (it takes less than a second for a 200x200 matrix on my 8 years old laptop)
P= [0 1 1 1 0 0 0
1 1 1 1 0 1 1
0 0 1 0 0 1 0
0 1 1 0 1 1 1
0 0 0 0 0 0 1
0 0 0 0 0 0 1];
L=1; % a submatrix has to containg at least L ones and L zeros
H=5; % max area of a submatrix
submats = [];
for sH = H:-1:2*L
div_sH=divisors(sH); % find all divisors of sH
for i = 1:length(div_sH) % cycle over all couples of divisors
%number of rows of the current submatrix
nrows=div_sH(i);
% number of columns of the current submatrix
ncols=div_sH(end-i+1);
% perpare matrix to convolve P with
m = zeros(nrows*2-1,ncols*2-1);
m(1:nrows,1:ncols) = 1;
% get the number of ones in the top left corner each submatrix
submatsums = conv2(P,m,'same');
% set values where the submatrices go outside P invalid
validsums = zeros(size(P))-1;
validsums(1:(end-nrows+1),1:(end-ncols+1)) = submatsums(1:(end-nrows+1),1:(end-ncols+1));
% get the indexes where the number of ones and zeros is >= L
topLeftIdx = find(validsums >= L & validsums<=sH-L);
% save submatrixes in following format: [index, nrows, ncols]
% You can ofc use something different, but it seemed the simplest way to me
submats = [submats ; [topLeftIdx bsxfun(#times,[nrows ncols],ones(length(topLeftIdx),1))]];
end
end
First, I suggest that you combine finding the allowable sub-matrix sizes.
for smaller = 1:sqrt(H)
for larger = 2*L:H/smaller
# add smaller X larger and larger x smaller to your shapes list
Next, start with the smallest rectangles in the shapes. Note that any solution to a small rectangle can be extended in any direction, to the area limit of H, and the added elements will not invalidate the solution you found. This will identify many solutions without bothering to check the populations within.
Keep track of the solutions you've found. As you work your way toward larger rectangles, you can avoid checking anything already in your solutions set. If you keep that in a hash table, checking membership is O(1). All you'll need to check thereafter will be larger blocks of mostly-1 adjacent to mostly-0. This should speed up the processing somewhat.
Is that enough of a nudge to help?

How to optimize search of rows x columns combination in a matrix?

Given a matrix of 1's and 0's, I want to find a combination of rows and columns with least or none 0's, maximizing the n_of_rows * n_of_columns picked.
For example, rows (0,1,2) and columns (0,1,3) have only one zero in col #0 row #1, and the rest 8 values are 1's.
1 1 0 1 0
0 1 1 1 0
1 1 0 1 1
0 0 1 0 0
Pracical task is to search over 1000's to 1000000's of rows and columns, finding the maximal biclique in a bipartite graph – rows and cols can be viewed as verticles, and values as connections.
The problem in NP-complete, as far as I learned.
Please advice an approach / algorithm that would speed up the task and reduce requirements to CPU and memory.
Not sure you could minimise thism
However, easy way to work this out would be...
Multiple your matrix by a 1 column and n rows full of 1's. This will give you number of ones in each row. Next do a 1 row by n columns multiplcation (at frot of) your matrix full of 1's. This will give you totals of 1's for each column, From there it's a pretty easy compairson........
ie original matrix...
1 0 1
0 1 1
0 0 0
do
1 0 1 x 1 = 2 (row totals)
o 1 1 1 2
0 0 0 1 0
do
1 1 1 x 1 0 1 = 1 (Column totals)
0 1 1 2
0 0 0 0
nb max sum is 2 (which you would keep track of as you work it out.
Actually given the following assumptions:
1. You don't care how many 0's are in each row or column
2. You don't need to keep track of their order....
Then you only really need to store values to count the total in each row/column as you read the values in and don't actually store the matrix itself.
If you are given the number of rows and columns prior to reading in the matrix you can do the following heuristics to reduce computational time...
Keep track of the current max. If the current row cannot reach this potential max stop counting for the row (but continue in the columns). Vice versa is true for the columns
But you still have a worst case scenario in which all rows and columns have sme number of 1's and 0's.... :)

How I can get the 'n' possible matrices from two vectors?

I've been searching for an algorithm for the solution of all possible matrices of dimension 'n' that can be obtained with two arrays, one of the sum of the rows, and another, of the sum of the columns of a matrix. For example, if I have the following matrix of dimension 7:
matriz= [ 1 0 0 1 1 1 0
1 0 1 0 1 0 0
0 0 1 0 1 0 0
1 0 0 1 1 0 1
0 1 1 0 1 0 1
1 1 1 0 0 0 1
0 0 1 0 1 0 1 ]
The sum of the columns are:
col= [4 2 5 2 6 1 4]
The sum of the rows are:
row = [4 3 2 4 4 4 3]
Now, I want to obtain all possible matrices of "ones and zeros" where the sum of the columns and the rows fulfil the condition of "col" and "row" respectively.
I would appreciate ideas that can help solve this problem.
One obvious way is to brute-force a solution: for the first row, generate all the possibilities that have the right sum, then for each of these, generate all the possibilities for the 2nd row, and so on. Once you have generated all the rows, you check if the sum of the columns is right. But this will take a lot of time. My math might be rusty at this time of the day, but I believe the number of distinct possibilities for a row of length n of which k bits are 1 is given by the binomial coefficient or nchoosek(n,k) in Matlab. To determine the total number of possibilities, you have to multiply this number for every row:
>> n = 7;
>> row= [4 3 2 4 4 4 3];
>> prod(arrayfun(#(k) nchoosek(n, k), row))
ans =
3.8604e+10
This is a lot of possibilities to check! Doing the same for the columns gives
>> col= [4 2 5 2 6 1 4];
>> prod(arrayfun(#(k) nchoosek(n, k), col))
ans =
555891525
Still a large number, but 'only' a factor 70 smaller.
It might be possible to improve this brute-force method a little bit by seeing if the later rows are already constrained by the previous rows. If in your example, for a particular combination of the first two rows, both rows have a 1 in the second column, the rest of this column should all be 0, since the sum must be 2. This reduces the number of possibilities for the remaining rows a bit. Implementing such checks might complicate things a bit, but they might make the difference between a calculation that takes 2 days or one that takes just 1 hour.
An optimized version of this might alternatively generate rows and columns, and start with those for which the number of possibilities is the lowest. I don't know if there is a more elegant solution than this brute-force method, I would be interested to hear one.

Matrix, algorithm interview question

This was one of my interview questions.
We have a matrix containing integers (no range provided). The matrix is randomly populated with integers. We need to devise an algorithm which finds those rows which match exactly with a column(s). We need to return the row number and the column number for the match. The order of of the matching elements is the same. For example, If, i'th row matches with j'th column, and i'th row contains the elements - [1,4,5,6,3]. Then jth column would also contain the elements - [1,4,5,6,3]. Size is n x n.
My solution:
RCEQUAL(A,i1..12,j1..j2)// A is n*n matrix
if(i2-i1==2 && j2-j1==2 && b[n*i1+1..n*i2] has [j1..j2])
use brute force to check if the rows and columns are same.
if (any rows and columns are same)
store the row and column numbers in b[1..n^2].//b[1],b[n+2],b[2n+3].. store row no,
// b[2..n+1] stores columns that
//match with row 1, b[n+3..2n+2]
//those that match with row 2,etc..
else
RCEQUAL(A,1..n/2,1..n/2);
RCEQUAL(A,n/2..n,1..n/2);
RCEQUAL(A,1..n/2,n/2..n);
RCEQUAL(A,n/2..n,n/2..n);
Takes O(n^2). Is this correct? If correct, is there a faster algorithm?
you could build a trie from the data in the rows. then you can compare the columns with the trie.
this would allow to exit as soon as the beginning of a column do not match any row. also this would let you check a column against all rows in one pass.
of course the trie is most interesting when n is big (setting up a trie for a small n is not worth it) and when there are many rows and columns which are quite the same. but even in the worst case where all integers in the matrix are different, the structure allows for a clear algorithm...
You could speed up the average case by calculating the sum of each row/column and narrowing your brute-force comparison (which you have to do eventually) only on rows that match the sums of columns.
This doesn't increase the worst case (all having the same sum) but if your input is truly random that "won't happen" :-)
This might only work on non-singular matrices (not sure), but...
Let A be a square (and possibly non-singular) NxN matrix. Let A' be the transpose of A. If we create matrix B such that it is a horizontal concatenation of A and A' (in other words [A A']) and put it into RREF form, we will get a diagonal on all ones in the left half and some square matrix in the right half.
Example:
A = 1 2
3 4
A'= 1 3
2 4
B = 1 2 1 3
3 4 2 4
rref(B) = 1 0 0 -2
0 1 0.5 2.5
On the other hand, if a column of A were equal to a row of A then column of A would be equal to a column of A'. Then we would get another single 1 in of of the columns of the right half of rref(B).
Example
A=
1 2 3 4 5
2 6 -3 4 6
3 8 -7 6 9
4 1 7 -5 3
5 2 4 -1 -1
A'=
1 2 3 4 5
2 6 8 1 2
3 -3 -7 7 4
4 4 6 -5 -1
5 6 9 3 -1
B =
1 2 3 4 5 1 2 3 4 5
2 6 -3 4 6 2 6 8 1 2
3 8 -7 6 9 3 -3 -7 7 4
4 1 7 -5 3 4 4 6 -5 -1
5 2 4 -1 -1 5 6 9 3 -1
rref(B)=
1 0 0 0 0 1.000 -3.689 -5.921 3.080 0.495
0 1 0 0 0 0 6.054 9.394 -3.097 -1.024
0 0 1 0 0 0 2.378 3.842 -0.961 0.009
0 0 0 1 0 0 -0.565 -0.842 1.823 0.802
0 0 0 0 1 0 -2.258 -3.605 0.540 0.662
1.000 in the top row of the right half means that the first column of A matches on of its rows. The fact that the 1.000 is in the left-most column of the right half means that it is the first row.
Without looking at your algorithm or any of the approaches in the previous answers, but since the matrix has n^2 elements to begin with, I do not think there is a method which does better than that :)
IFF the matrix is truely random...
You could create a list of pointers to the columns sorted by the first element. Then create a similar list of the rows sorted by their first element. This takes O(n*logn).
Next create an index into each sorted list initialized to 0. If the first elements match, you must compare the whole row. If they do not match, increment the index of the one with the lowest starting element (either move to the next row or to the next column). Since each index cycles from 0 to n-1 only once, you have at most 2*n comparisons unless all the rows and columns start with the same number, but we said a matrix of random numbers.
The time for a row/column comparison is n in the worst case, but is expected to be O(1) on average with random data.
So 2 sorts of O(nlogn), and a scan of 2*n*1 gives you an expected run time of O(nlogn). This is of course assuming random data. Worst case is still going to be n**3 for a large matrix with most elements the same value.

Resources