Converting 2D integer matrix to binary tree - algorithm

I have a 2D integer matrix and I have to convert it into a tree, to perform matrix multiplication. Is there any way to do that.
Matrix:
[[1 2 1]
[2 1 2]
[2 2 2]]
The matrix can square or rectangular.
I am trying to design an algorithm for getting the result of matrix multiplication, with better time complexity. Since most of the operations in binary tree are of 0(h).

Related

align 2 matrice for maximum overlap

So following is an interview problem.
Given two N2 matrices with entries being 0 or 1. How can we find out the number of maximum overlapping 1's possible?
Note: You can only move the matrix upward, downward, leftward and rightward, so rotation is not allowed
Currently I'm stuck at the most naive O(N^4) method, which being align the top left corner of one matrix to every possible position of the other matrix and count the all the overlap 1s.
Example:
[0 1 0] [0 0 1]
A: [1 0 0] B: [0 0 1]
[1 0 0] [0 0 0]
Then the number of maximum overlapping 1s are 2, that we alight (0,2) of B to (1,0) of A, then (0,2) and (1,0) are both 1, and (1,2) and (2,0) are both 1.
Can it be optimise from O(N4)?
If floating-point arithmetics calculations are possible, this problem might be solved with 2D cross-correlation (using fast Fourier transform intrinsically) in O(n^2 logn) time. This method is used in 2D pattern searching.
Not so obvious tip: to implement correlation and get proper results, one should shift values to make "signals" bi-polar (transform zeros to -1 or subtract matrix average from all matrix elements)
Calculate correlation matrix, find index (dx,dy) of maximum value - it should correspond to align vector.

Constrained maximization of the sum of square submatrices

I have an intensity map of an image that I would like to select sub-regions with large average value. To do this, I want to find the sub-regions which maximize the sum of the intensity map pixels covered by the sub-regions. To prevent an excessive number of returned sub-regions, a penalty is applied for each additional sub-region returned. Additionally, it is fine if two sub-regions overlap, but the overlap objective value is only that of the union of the sub-regions.
More formally, suppose you have a matrix A containing non-negative values with dimensions m x n. You would like to cover the matrix with square sub-matrices with dimension s x s such that the sum of the values of A covered by the union of the area of the squares is maximized. For each square you add to the solution, a constant penalty p is subtracted from the objective value of the solution.
For instance, consider the following matrix:
0 0 0 0 0 0
0 1 2 2 1 0
0 1 2 2 2 0
0 0 0 0 0 0
0 3 0 0 0 0
with parameters p = -4 and s = 2. The optimal solution is the two squares S1 = [1, 2; 1, 2] and S2 = [2, 1; 2, 2] with coordinates (2:3,2:3) and (2:3,4:5) respectively (in Matlab notation). Note that in this example that the greedy approach of incrementally adding the squares with maximum value until no squares can be added (without decreasing the objective value) fails.
One brute force way of solving it would be to check all possible combinations using exactly k squares. Starting from k =1, you would compute the optimal combination with exactly k squares, increment k and repeat until the objective value stops increasing. This is clearly very expensive.
You can precompute the sums of values of the (m-s+1)*(n-s+1) possible squares in time O(mn) using an integral image.
Is there an efficient solution to this?
The problem is NP-Hard. This could be proven by reduction from planar minimum vertex cover. Proof for special case s=3, p=2, and A having only values 0 or 1 is identical to the proof for other SO question.
As for brute force solution, it could be made more efficient if instead of trying all combinations with increasing k, you add squares incrementally. When objective value of partial solution plus sum of not-yet-covered values is not greater than best-so-far objective value, rollback to last valid combination by removing recently added square(s) and try other squares. Avoid adding squares that add zero to objective value. Also avoid adding sub-optimal squares: if in example from OP partial solution contains square [1, 2; 1, 2], do not add square [2, 2; 2, 2] because [2, 1; 2, 2] would always be at least as good or even better. And reorder the squares in such a way that you quickly get good enough solution, this allows to terminate all further attempts sooner.

divide and conquer - finding the median for an array

Say we have an array of size 2n of all unique elements.
Assume we split the array into 2 arrays of size n, and we have a special constant time lookup to find the kth smallest element for that particular array if 1 <= k <= n, so for [4 5 6], k=2 returns 5.
Then what is the Θ(log(n)) algorithm for finding the median? Median is defined as the nth lowest element between the 2 arrays. If array was [1 2 3 4 5 6], median would typically be (3+4)/2, but we just choose the smaller one which is 3.
My attempt ie:
2n = 6 [1 2 3 4 5 6]
n = 3 [1 2 3] [4 5 6] (not necessarily sorted, but we have the constant time lookup, so sorting is irrelevant)
Step 1) use lookup where k = n to find the kth smallest element for each array
[1 2 3] [4 5 6]
^ ^ (if k = 3, we get 3 for the first array, 6 for the second array)
Step 2) compare the 2 values we got and choose the smaller one. 3 is the median where median is defined as the nth lowest element between the 2 arrays.
First off, is this the correct algorithm for Θ(log(n)) time? Secondly, what would the proof correctness (that it finds the median) look like? I believe it would be through induction.
Selection (of which median computation is a special case) cannot be solved in O(log n) time. You can solve it in O(n) time using an algorithm such as Quickselect.

Determine if some row permutation of a matrix is Toeplitz

A Toeplitz matrix "is a matrix in which each descending diagonal from left to right is constant." Given a binary matrix M, is there an efficient algorithm to determine if there is a permutation of the rows which makes it Toeplitz?
For example, set
M= [0 1 1]
[1 1 0]
[1 0 1]
If you swap the first and second row you get
[1 1 0]
[0 1 1]
[1 0 1]
which is Toeplitz.
In python you can make a random binary matrix as follows.
n = 10
h = 10
M = np.random.randint(2, size=(h,n))
I would like to apply the test to M.
(Note the matrix M does not need to be square.)
This problem can be solved in linear O(h*w) time, where h is number of rows and w is number of columns.
Construct a graph where each vertex corresponds to (w-1)-length substring which may be either prefix or suffix of some row in the matrix. One vertex may correspond to several duplicate substrings. Connect these vertexes with h edges. Each edge corresponds to row of the matrix. It is directed from the vertex corresponding to this row's prefix to the vertex corresponding to this row's suffix.
To determine if some row permutation is a Toeplitz matrix, it is enough to check if constructed graph is Eulerian graph. To find permutation itself, it is enough to find Eulerian path in this graph.
We need some efficient way to interconnect vertexes and edges. Straightforward approach assumes comparing each row-substring pair. This is not very interesting because of O(h2*w) time complexity.
Building Generalized suffix tree (or suffix array) for rows of the matrix needs only O(h*w) time. And this tree allows to interconnect vertexes and edges also in linear time: each internal node with depth w-1 represents some (w-1)-length substring (vertex); each leaf attached to this node represents some row's suffix (incoming edge); and each leaf attached to this node's children represents some row containing this substring as a prefix (outgoing edge).
Other alternative is to use hash map. With (w-1)-length substring of matrix row as a key and pair of lists of row indexes (for rows where this substring is prefix/suffix) as a value. Comparing to suffix tree/array approach, this allows simpler implementation, needs less memory (each key needs only space for hash value and pointer to beginning of the substring), should work faster (on average), but has inferior worst-case complexity: O(h2*w).
One simple-minded approach that would work for small matrices is:
Sort the rows of M
For each choice of start row
For each choice of end row
construct a Toeplitz matrix T from the given start and end row
Sort the rows of T and compare to M
If you find a match then T is a permutation of M that is Toeplitz
This is based on the fact that a Toeplitz matrix is uniquely defined once you know the start and end rows.
However, this approach is not particularly efficient.
Example Python Code
M= [[0, 1, 1],
[1, 1, 0],
[1, 0, 1]]
n=len(M)
M2 = sorted(M)
for start in M2:
for end in M2:
v = end+start[1:]
T = [v[s:s+n] for s in range(n-1,-1,-1)]
if sorted(T)==M2:
print 'Found Toeplitz representation'
print T
prints
Found Toeplitz representation
[[0, 1, 1],
[1, 0, 1],
[1, 1, 0]]
Found Toeplitz representation
[[1, 0, 1],
[1, 1, 0],
[0, 1, 1]]
Found Toeplitz representation
[[1, 1, 0],
[0, 1, 1],
[1, 0, 1]]
You can conduct a pre-preliminary check for elimination condition:
Find out the column-wise sum of all the columns of the matrix.
Now in any permutation of rows, the values in the columns shall stay in the same column.
So the difference between the sum of any two neighbouring columns should be at the maximum 1.
Also, if i and i+1 are two neighbouring columns, then:
If sum(i+1) = sum(i) + 1, then we know that bottom-most element in column i should be 0 and top-most element in column (i+1) should be 1.
If sum(i+1) = sum(i) - 1, then we know that bottom-most element in column i should be 1 and top-most element in column (i+1) should be 0.
If sum(i+1) = sum(i), then we know that bottom-most element in column i should be equal to top-most element in column (i+1).
You can also conduct a similar check by summing the rows and see if there is any permutation in which the difference between sum of any two neighbouring rows is at most one.
Ofcourse, you will still have to conduct some combinatorial search, but the above filter may reduce the search scenarios.
This is because you now have to search for a pair of (candidate top and bottom) rows that satisfies the above 3 conditions for each pair of neighbouring columns.
Also, this optimization shall not be very helpful if the number of rows is much larger than the number of columns.

Algorithm - find the smallest subset of cells representing all the rows

I have several lists, that you can consider as rows of integers.
For example :
[1 3 5]
[3 7]
[3 5 7]
[1 5 9]
[3 9]
[1 7]
[5 9 11]
I wish to find the smallest possible set of integers represented on these rows, so that :
each row has at least one of the selected integers,
in case of cardinality ties, select the set having the highest sum.
In my example, I believe the result should be [5 7 9] (preferred to [3 5 7] or [1 3 11] or ... many possibilities).
The second part is trivial (selecting highest sum), but the generation of all minimal subsets in cardinality seems to be hard.
Do you know a good way to achieve this ?
Edit
Size of data is growing slowly with iterations, but I need exact matches.
The minimum cardinality version is NP-Complete. Set Cover can be reduced to this. Requiring the max among those only makes the problem harder.
Btw, the other answer talking about boolean satisfiability is wrong! You need to reduce boolean satisfiability to this problem to show NP-Completeness, not the other way round.
Set cover is basically:
Give a collection of sets S1, S2, ... Sn of subsets of a set X, find the smallest sub-collection (in terms of number of sets) whose union covers all the elements in S1 U S2 U ... U Sn.
To reduce this to our problem,
Let S = S1 U S2 ... U Sn. = {x1 , x2, ..., xm}
Let C_i = { j such that xi is in Sj }
Feed the C_i to our problem.
Now if our problem was solvable in polynomial time and we could find a minimum cardinality set of elements of C_i, then we can find a set cover for the Si and vice versa.
This can normally be solved as an integer programming problem (which is NP-Hard too).
For approximate solutions, this can be viewed as a linear programming problem (which has polynomial time algorithms) and randomized rounding can be done to convert fractional values (solutions to the LP) to integers.
Also, unfortunately, it has been shown that this is NP-hard to approximate to even a constant factor (in fact I believe it is O(logn)).

Resources