Matrix, algorithm interview question - algorithm

This was one of my interview questions.
We have a matrix containing integers (no range provided). The matrix is randomly populated with integers. We need to devise an algorithm which finds those rows which match exactly with a column(s). We need to return the row number and the column number for the match. The order of of the matching elements is the same. For example, If, i'th row matches with j'th column, and i'th row contains the elements - [1,4,5,6,3]. Then jth column would also contain the elements - [1,4,5,6,3]. Size is n x n.
My solution:
RCEQUAL(A,i1..12,j1..j2)// A is n*n matrix
if(i2-i1==2 && j2-j1==2 && b[n*i1+1..n*i2] has [j1..j2])
use brute force to check if the rows and columns are same.
if (any rows and columns are same)
store the row and column numbers in b[1..n^2].//b[1],b[n+2],b[2n+3].. store row no,
// b[2..n+1] stores columns that
//match with row 1, b[n+3..2n+2]
//those that match with row 2,etc..
else
RCEQUAL(A,1..n/2,1..n/2);
RCEQUAL(A,n/2..n,1..n/2);
RCEQUAL(A,1..n/2,n/2..n);
RCEQUAL(A,n/2..n,n/2..n);
Takes O(n^2). Is this correct? If correct, is there a faster algorithm?

you could build a trie from the data in the rows. then you can compare the columns with the trie.
this would allow to exit as soon as the beginning of a column do not match any row. also this would let you check a column against all rows in one pass.
of course the trie is most interesting when n is big (setting up a trie for a small n is not worth it) and when there are many rows and columns which are quite the same. but even in the worst case where all integers in the matrix are different, the structure allows for a clear algorithm...

You could speed up the average case by calculating the sum of each row/column and narrowing your brute-force comparison (which you have to do eventually) only on rows that match the sums of columns.
This doesn't increase the worst case (all having the same sum) but if your input is truly random that "won't happen" :-)

This might only work on non-singular matrices (not sure), but...
Let A be a square (and possibly non-singular) NxN matrix. Let A' be the transpose of A. If we create matrix B such that it is a horizontal concatenation of A and A' (in other words [A A']) and put it into RREF form, we will get a diagonal on all ones in the left half and some square matrix in the right half.
Example:
A = 1 2
3 4
A'= 1 3
2 4
B = 1 2 1 3
3 4 2 4
rref(B) = 1 0 0 -2
0 1 0.5 2.5
On the other hand, if a column of A were equal to a row of A then column of A would be equal to a column of A'. Then we would get another single 1 in of of the columns of the right half of rref(B).
Example
A=
1 2 3 4 5
2 6 -3 4 6
3 8 -7 6 9
4 1 7 -5 3
5 2 4 -1 -1
A'=
1 2 3 4 5
2 6 8 1 2
3 -3 -7 7 4
4 4 6 -5 -1
5 6 9 3 -1
B =
1 2 3 4 5 1 2 3 4 5
2 6 -3 4 6 2 6 8 1 2
3 8 -7 6 9 3 -3 -7 7 4
4 1 7 -5 3 4 4 6 -5 -1
5 2 4 -1 -1 5 6 9 3 -1
rref(B)=
1 0 0 0 0 1.000 -3.689 -5.921 3.080 0.495
0 1 0 0 0 0 6.054 9.394 -3.097 -1.024
0 0 1 0 0 0 2.378 3.842 -0.961 0.009
0 0 0 1 0 0 -0.565 -0.842 1.823 0.802
0 0 0 0 1 0 -2.258 -3.605 0.540 0.662
1.000 in the top row of the right half means that the first column of A matches on of its rows. The fact that the 1.000 is in the left-most column of the right half means that it is the first row.

Without looking at your algorithm or any of the approaches in the previous answers, but since the matrix has n^2 elements to begin with, I do not think there is a method which does better than that :)

IFF the matrix is truely random...
You could create a list of pointers to the columns sorted by the first element. Then create a similar list of the rows sorted by their first element. This takes O(n*logn).
Next create an index into each sorted list initialized to 0. If the first elements match, you must compare the whole row. If they do not match, increment the index of the one with the lowest starting element (either move to the next row or to the next column). Since each index cycles from 0 to n-1 only once, you have at most 2*n comparisons unless all the rows and columns start with the same number, but we said a matrix of random numbers.
The time for a row/column comparison is n in the worst case, but is expected to be O(1) on average with random data.
So 2 sorts of O(nlogn), and a scan of 2*n*1 gives you an expected run time of O(nlogn). This is of course assuming random data. Worst case is still going to be n**3 for a large matrix with most elements the same value.

Related

APL: how to search for a value's index in a matrix

In APL, matrices and vectors are used to hold data. I was wondering if there was a way to search within a matrix for a given value, and have that values index returned. For example, say I have the following 2-dimensional matrices:
VALUES ← 1 2 3 4 5 6 7 8 9 10 11... all the way up to 36
KINDS ← 0 0 0 2 0 0 0 3 0 ... filled with 0's the rest of the way to 36 length.
If I laminated these two matrices with
kinds,[.5] values
so that they are laminated one on top of the other
1 2 3 4 5 6 7 8 9 10...
0 0 0 2 0 0 0 3 0 ....
is there a functionally easy way to search for the index of the 2 value in the "second row" of the newly laminated matrix? eg. the column containing
4
2
and return that matrix index?
The value 2 also appears in row 1 of your newly laminated matrix (nlm), and as you stated, you really do not want to search the whole matrix, but only the second row. So, since you're only searching within a given row, getting the column index in that row gives you the complete answer:
row←2
⎕←col←nlm[row;]⍳2
4
nlm[;col] ⍝ values in matched column
4 2
Try it online!

How to optimize search of rows x columns combination in a matrix?

Given a matrix of 1's and 0's, I want to find a combination of rows and columns with least or none 0's, maximizing the n_of_rows * n_of_columns picked.
For example, rows (0,1,2) and columns (0,1,3) have only one zero in col #0 row #1, and the rest 8 values are 1's.
1 1 0 1 0
0 1 1 1 0
1 1 0 1 1
0 0 1 0 0
Pracical task is to search over 1000's to 1000000's of rows and columns, finding the maximal biclique in a bipartite graph – rows and cols can be viewed as verticles, and values as connections.
The problem in NP-complete, as far as I learned.
Please advice an approach / algorithm that would speed up the task and reduce requirements to CPU and memory.
Not sure you could minimise thism
However, easy way to work this out would be...
Multiple your matrix by a 1 column and n rows full of 1's. This will give you number of ones in each row. Next do a 1 row by n columns multiplcation (at frot of) your matrix full of 1's. This will give you totals of 1's for each column, From there it's a pretty easy compairson........
ie original matrix...
1 0 1
0 1 1
0 0 0
do
1 0 1 x 1 = 2 (row totals)
o 1 1 1 2
0 0 0 1 0
do
1 1 1 x 1 0 1 = 1 (Column totals)
0 1 1 2
0 0 0 0
nb max sum is 2 (which you would keep track of as you work it out.
Actually given the following assumptions:
1. You don't care how many 0's are in each row or column
2. You don't need to keep track of their order....
Then you only really need to store values to count the total in each row/column as you read the values in and don't actually store the matrix itself.
If you are given the number of rows and columns prior to reading in the matrix you can do the following heuristics to reduce computational time...
Keep track of the current max. If the current row cannot reach this potential max stop counting for the row (but continue in the columns). Vice versa is true for the columns
But you still have a worst case scenario in which all rows and columns have sme number of 1's and 0's.... :)

How I can get the 'n' possible matrices from two vectors?

I've been searching for an algorithm for the solution of all possible matrices of dimension 'n' that can be obtained with two arrays, one of the sum of the rows, and another, of the sum of the columns of a matrix. For example, if I have the following matrix of dimension 7:
matriz= [ 1 0 0 1 1 1 0
1 0 1 0 1 0 0
0 0 1 0 1 0 0
1 0 0 1 1 0 1
0 1 1 0 1 0 1
1 1 1 0 0 0 1
0 0 1 0 1 0 1 ]
The sum of the columns are:
col= [4 2 5 2 6 1 4]
The sum of the rows are:
row = [4 3 2 4 4 4 3]
Now, I want to obtain all possible matrices of "ones and zeros" where the sum of the columns and the rows fulfil the condition of "col" and "row" respectively.
I would appreciate ideas that can help solve this problem.
One obvious way is to brute-force a solution: for the first row, generate all the possibilities that have the right sum, then for each of these, generate all the possibilities for the 2nd row, and so on. Once you have generated all the rows, you check if the sum of the columns is right. But this will take a lot of time. My math might be rusty at this time of the day, but I believe the number of distinct possibilities for a row of length n of which k bits are 1 is given by the binomial coefficient or nchoosek(n,k) in Matlab. To determine the total number of possibilities, you have to multiply this number for every row:
>> n = 7;
>> row= [4 3 2 4 4 4 3];
>> prod(arrayfun(#(k) nchoosek(n, k), row))
ans =
3.8604e+10
This is a lot of possibilities to check! Doing the same for the columns gives
>> col= [4 2 5 2 6 1 4];
>> prod(arrayfun(#(k) nchoosek(n, k), col))
ans =
555891525
Still a large number, but 'only' a factor 70 smaller.
It might be possible to improve this brute-force method a little bit by seeing if the later rows are already constrained by the previous rows. If in your example, for a particular combination of the first two rows, both rows have a 1 in the second column, the rest of this column should all be 0, since the sum must be 2. This reduces the number of possibilities for the remaining rows a bit. Implementing such checks might complicate things a bit, but they might make the difference between a calculation that takes 2 days or one that takes just 1 hour.
An optimized version of this might alternatively generate rows and columns, and start with those for which the number of possibilities is the lowest. I don't know if there is a more elegant solution than this brute-force method, I would be interested to hear one.

Nullify a 2D matrix with some set of operations

Given an N x M matrix having only positive integer values, we have to nullify the matrix
i.e make all entries 0.
We are given two operations
1) multiply each element of any one column at a time by 2.
2) Subtract 1 from all elements of any one row at a time
Find the minimum number of operations required to nullify the matrix.
i thought of doing something related to LCM but could not reach to a solution
Let's first solve for 1 row first and we can extend it to all rows. Let's take a random example:
6 11 5 13
The goal is to make all elements as 1. First we make 5 (smallest element) as 1. For this we need to subtract 4 (i.e subtract 1 four times). The resultant array is:
2 7 1 9
Now we multiply 1 with 2 and subtract all row elements by 1:
1 6 1 8
Next, we multiply 2 1's by 2 and subtract all row elements by 1:
1 5 1 7
Continuing in this manner, we get to 1 1 1 1. Now we subtract 1 to get 0 0 0 0.
Next, we get to other rows and do the same like above. The row we nullified above are all zeroes so multiplication by 2 when manipulating other rows doesn't change the already nullified rows.
The question of finding the minimum number of operations would also depend on the row sequence we select. I think that would be to select a row whose maximum is minimum (among other rows) first. I need to verify this.

Permuting rows in an array to eliminate increasing subsequences

The following problem is taken from Problems on Algorithms (Problem 653):
You are given a n x 2 matrix of numbers. Find an O(n log n) algorithm that permutes the rows in the array such that that neither column of the array contains an increasing subsequence (that may not consist of contiguous array elements) longer than ⌈√n.⌉
I'm not sure how to solve this. I think that it might use some sort of divide-and-conquer recurrence, but I can't seem to find one.
Does anyone have any ideas how to solve this?
Heres's my solution.
1) Sort rows according to the first element from greatest to lowest.
1 6 5 1
3 3 -\ 3 3
2 4 -/ 2 4
5 1 1 6
2) Divide it into groups of ⌈√n⌉, and what is left(no more then ⌈√n⌉ groups)
5 1 5 1
3 3 -\ 3 3
2 4 -/
1 6 2 4
1 6
3) Sort rows in each group according to the second element from greatest to lowest
5 1 3 3
3 3 5 1
->
2 4 1 6
1 6 2 4
Proof of correctness:
Increasing subsequences in column 1 can happen only in single group(size is <= ⌈√n⌉),
No 2 elements of increasing subsequences in column 2 are in the same group(no more than ⌈√n⌉ groups)

Resources