align 2 matrice for maximum overlap - algorithm

So following is an interview problem.
Given two N2 matrices with entries being 0 or 1. How can we find out the number of maximum overlapping 1's possible?
Note: You can only move the matrix upward, downward, leftward and rightward, so rotation is not allowed
Currently I'm stuck at the most naive O(N^4) method, which being align the top left corner of one matrix to every possible position of the other matrix and count the all the overlap 1s.
Example:
[0 1 0] [0 0 1]
A: [1 0 0] B: [0 0 1]
[1 0 0] [0 0 0]
Then the number of maximum overlapping 1s are 2, that we alight (0,2) of B to (1,0) of A, then (0,2) and (1,0) are both 1, and (1,2) and (2,0) are both 1.
Can it be optimise from O(N4)?

If floating-point arithmetics calculations are possible, this problem might be solved with 2D cross-correlation (using fast Fourier transform intrinsically) in O(n^2 logn) time. This method is used in 2D pattern searching.
Not so obvious tip: to implement correlation and get proper results, one should shift values to make "signals" bi-polar (transform zeros to -1 or subtract matrix average from all matrix elements)
Calculate correlation matrix, find index (dx,dy) of maximum value - it should correspond to align vector.

Related

Converting 2D integer matrix to binary tree

I have a 2D integer matrix and I have to convert it into a tree, to perform matrix multiplication. Is there any way to do that.
Matrix:
[[1 2 1]
[2 1 2]
[2 2 2]]
The matrix can square or rectangular.
I am trying to design an algorithm for getting the result of matrix multiplication, with better time complexity. Since most of the operations in binary tree are of 0(h).

Sort matrix elements around the diagonal

I am looking for an algorithm that can sort the rows of a matrix so that the elements will cumulate around the diagonal.
I will have a square matrix (around 80 rows/ columns) containing only the values 0 and 1. There are algorithms that sort the rows in a way that most of the elements with the value 1 are below the diagonal.
I need an algorithm that sort to minimize the mean distance of the elements to the diagonal.
Like so:
from:
0 1 0
1 0 1
1 1 0
to:
1 1 0
0 1 0
1 0 1
Since I am not familiar with this topic I hope that someone can help me. I am not looking for a complete solution. The name of such algorithm if it exists or a pseudo code would be sufficient.
Thanks a lot!
There is probably a more efficient way, but you could treat this problem as an assignment problem (trying to assign each row to a diagonal element).
This can be done in three steps:
1) Create a new matrix M where each entry M(i,j) contains the cost of assigning row i of your input matrix to the diagonal element j. For your example this matrix will be the following (average distance to the diagonal element):
1 0 1
1 1 1
1 0.5 1.5
Example: M(0,0) = 1 is the average distance when assigning row 0 of the input matrix (0 1 0) to the diagonal element positioned at 0.
2) Run an algorithm to find the best assignment (e.g., hungarian algorithm). This will give you an optimal 1:1 matching between rows and columns minimizing the sum of cost in the matrix.
The result will be the elements (0,1), (1,2) and (2,0)
3) Rearrange your input matrix using this knowledge. So
row 0 -> row 1
row 1 -> row 2
row 2 -> row 0

Constrained maximization of the sum of square submatrices

I have an intensity map of an image that I would like to select sub-regions with large average value. To do this, I want to find the sub-regions which maximize the sum of the intensity map pixels covered by the sub-regions. To prevent an excessive number of returned sub-regions, a penalty is applied for each additional sub-region returned. Additionally, it is fine if two sub-regions overlap, but the overlap objective value is only that of the union of the sub-regions.
More formally, suppose you have a matrix A containing non-negative values with dimensions m x n. You would like to cover the matrix with square sub-matrices with dimension s x s such that the sum of the values of A covered by the union of the area of the squares is maximized. For each square you add to the solution, a constant penalty p is subtracted from the objective value of the solution.
For instance, consider the following matrix:
0 0 0 0 0 0
0 1 2 2 1 0
0 1 2 2 2 0
0 0 0 0 0 0
0 3 0 0 0 0
with parameters p = -4 and s = 2. The optimal solution is the two squares S1 = [1, 2; 1, 2] and S2 = [2, 1; 2, 2] with coordinates (2:3,2:3) and (2:3,4:5) respectively (in Matlab notation). Note that in this example that the greedy approach of incrementally adding the squares with maximum value until no squares can be added (without decreasing the objective value) fails.
One brute force way of solving it would be to check all possible combinations using exactly k squares. Starting from k =1, you would compute the optimal combination with exactly k squares, increment k and repeat until the objective value stops increasing. This is clearly very expensive.
You can precompute the sums of values of the (m-s+1)*(n-s+1) possible squares in time O(mn) using an integral image.
Is there an efficient solution to this?
The problem is NP-Hard. This could be proven by reduction from planar minimum vertex cover. Proof for special case s=3, p=2, and A having only values 0 or 1 is identical to the proof for other SO question.
As for brute force solution, it could be made more efficient if instead of trying all combinations with increasing k, you add squares incrementally. When objective value of partial solution plus sum of not-yet-covered values is not greater than best-so-far objective value, rollback to last valid combination by removing recently added square(s) and try other squares. Avoid adding squares that add zero to objective value. Also avoid adding sub-optimal squares: if in example from OP partial solution contains square [1, 2; 1, 2], do not add square [2, 2; 2, 2] because [2, 1; 2, 2] would always be at least as good or even better. And reorder the squares in such a way that you quickly get good enough solution, this allows to terminate all further attempts sooner.

Determine if some row permutation of a matrix is Toeplitz

A Toeplitz matrix "is a matrix in which each descending diagonal from left to right is constant." Given a binary matrix M, is there an efficient algorithm to determine if there is a permutation of the rows which makes it Toeplitz?
For example, set
M= [0 1 1]
[1 1 0]
[1 0 1]
If you swap the first and second row you get
[1 1 0]
[0 1 1]
[1 0 1]
which is Toeplitz.
In python you can make a random binary matrix as follows.
n = 10
h = 10
M = np.random.randint(2, size=(h,n))
I would like to apply the test to M.
(Note the matrix M does not need to be square.)
This problem can be solved in linear O(h*w) time, where h is number of rows and w is number of columns.
Construct a graph where each vertex corresponds to (w-1)-length substring which may be either prefix or suffix of some row in the matrix. One vertex may correspond to several duplicate substrings. Connect these vertexes with h edges. Each edge corresponds to row of the matrix. It is directed from the vertex corresponding to this row's prefix to the vertex corresponding to this row's suffix.
To determine if some row permutation is a Toeplitz matrix, it is enough to check if constructed graph is Eulerian graph. To find permutation itself, it is enough to find Eulerian path in this graph.
We need some efficient way to interconnect vertexes and edges. Straightforward approach assumes comparing each row-substring pair. This is not very interesting because of O(h2*w) time complexity.
Building Generalized suffix tree (or suffix array) for rows of the matrix needs only O(h*w) time. And this tree allows to interconnect vertexes and edges also in linear time: each internal node with depth w-1 represents some (w-1)-length substring (vertex); each leaf attached to this node represents some row's suffix (incoming edge); and each leaf attached to this node's children represents some row containing this substring as a prefix (outgoing edge).
Other alternative is to use hash map. With (w-1)-length substring of matrix row as a key and pair of lists of row indexes (for rows where this substring is prefix/suffix) as a value. Comparing to suffix tree/array approach, this allows simpler implementation, needs less memory (each key needs only space for hash value and pointer to beginning of the substring), should work faster (on average), but has inferior worst-case complexity: O(h2*w).
One simple-minded approach that would work for small matrices is:
Sort the rows of M
For each choice of start row
For each choice of end row
construct a Toeplitz matrix T from the given start and end row
Sort the rows of T and compare to M
If you find a match then T is a permutation of M that is Toeplitz
This is based on the fact that a Toeplitz matrix is uniquely defined once you know the start and end rows.
However, this approach is not particularly efficient.
Example Python Code
M= [[0, 1, 1],
[1, 1, 0],
[1, 0, 1]]
n=len(M)
M2 = sorted(M)
for start in M2:
for end in M2:
v = end+start[1:]
T = [v[s:s+n] for s in range(n-1,-1,-1)]
if sorted(T)==M2:
print 'Found Toeplitz representation'
print T
prints
Found Toeplitz representation
[[0, 1, 1],
[1, 0, 1],
[1, 1, 0]]
Found Toeplitz representation
[[1, 0, 1],
[1, 1, 0],
[0, 1, 1]]
Found Toeplitz representation
[[1, 1, 0],
[0, 1, 1],
[1, 0, 1]]
You can conduct a pre-preliminary check for elimination condition:
Find out the column-wise sum of all the columns of the matrix.
Now in any permutation of rows, the values in the columns shall stay in the same column.
So the difference between the sum of any two neighbouring columns should be at the maximum 1.
Also, if i and i+1 are two neighbouring columns, then:
If sum(i+1) = sum(i) + 1, then we know that bottom-most element in column i should be 0 and top-most element in column (i+1) should be 1.
If sum(i+1) = sum(i) - 1, then we know that bottom-most element in column i should be 1 and top-most element in column (i+1) should be 0.
If sum(i+1) = sum(i), then we know that bottom-most element in column i should be equal to top-most element in column (i+1).
You can also conduct a similar check by summing the rows and see if there is any permutation in which the difference between sum of any two neighbouring rows is at most one.
Ofcourse, you will still have to conduct some combinatorial search, but the above filter may reduce the search scenarios.
This is because you now have to search for a pair of (candidate top and bottom) rows that satisfies the above 3 conditions for each pair of neighbouring columns.
Also, this optimization shall not be very helpful if the number of rows is much larger than the number of columns.

Algorithm to maximize the smallest diagonal element of a matrix

Suppose we are given a square matrix A. Our goal is to maximize the smallest diagonal element by row permutations. In other words, for the given matrix A, we have n diagonal elements and thus we have the minimum $min{d_i}$. Our purpose is to reach the matrix with possibly largest minimum diagonal element by row permutations.
This is like $max min{d_i}$ over all row permutations.
For example, suppose A = [4 3 2 1; 1 4 3 2; 2 1 4 3; 2.5 3.5 4.5 1.5]. The diagonal is [4, 4, 4, 1.5]. The minimum of the diagonal is 1.5. We can swap row 3 and 4 to get to a new matrix \tilde_A = [4 3 2 1; 1 4 3 2; 2.5 3.5 4.5 1.5; 2 1 4 3]. The new diagonal is [4, 4, 4.5, 3] with a new minimum 3. And in theory, this is the best result I can obtain because there seems no better option: 3 seems to be the max min{d_i}.
In my problem, n is much larger like 1000. I know there are n! row permutations so I cannot go through each permutation in theory. I know greedy algorithm will help--we start from the first row. If a_11 is not the smallest in the first column, we swap a_11 with the largest element in the first column by row permutation. Then we look at the second row by comparing a_22 with all remaining elements in the second column(except a_12). Swap a_22 if it is not the smallest. ... ... etc. We keep doing this till the last row.
Is there any better algorithm to do it?
This is similar to Minimum Euclidean Matching but they are not the same.
Suppose you wanted to know whether there was a better solution to your problem than 3.
Change your matrix to have a 1 for every element that is strictly greater than 3:
4 3 2 1 1 0 0 0
1 4 3 2 0 1 0 0
2.5 3.5 4.5 1.5 -> 0 1 1 0
2 1 4 3 0 0 1 0
Your problem can be interpreted as trying to find a perfect matching in the bipartite graph which has this binary matrix as its biadjacency graph.
In this case, it is easy to see that there is no way of improving your result because there is no way of reordering rows to make the diagonal entry in the last column greater than 3.
For a larger matrix, there are efficient algorithms to determine maximal matchings in bipartite graphs.
This suggests an algorithm:
Use bisection to find the largest value for which the generated graph has a perfect matching
The assignment corresponding to the perfect matching with the largest value will be equal to the best permutation of rows
EDIT
This Python code illustrates how to use the networkx library to determine whether the graph has a perfect matching for a particular cutoff value.
import networkx as nx
A = [[4,3,2,1],
[1,4,3,2],
[2,1,4,3],
[2.5,3.5,4.5,1.5]]
cutoff = 3
G=nx.DiGraph()
for i,row in enumerate(A):
G.add_edge('start','row'+str(i),capacity=1.0)
G.add_edge('col'+str(i),'end',capacity=1.0)
for j,e in enumerate(row):
if e>cutoff:
G.add_edge('row'+str(i),'col'+str(j),capacity=1.0)
if nx.max_flow(G,'start','end')<len(A):
print 'No perfect matching'
else:
print 'Has a perfect matching'
For a random matrix of size 1000*1000 it takes about 1 second on my computer.
Let $x_{ij}$ be 1 if row i is moved to row j and zero otherwise.
You're interested in the following integer program:
max z
\sum_{i=0}^n x_{ij} = 1 \forall j
\sum_{j=0}^n x_{ij} = 1 \forall i
A[j,j]x_{ij} >= z
Then plug this into GLPK, Gurobi, or CPLEX. Alternatively, solve the IP using your own branch and bound solve.

Resources