Constrained maximization of the sum of square submatrices - algorithm

I have an intensity map of an image that I would like to select sub-regions with large average value. To do this, I want to find the sub-regions which maximize the sum of the intensity map pixels covered by the sub-regions. To prevent an excessive number of returned sub-regions, a penalty is applied for each additional sub-region returned. Additionally, it is fine if two sub-regions overlap, but the overlap objective value is only that of the union of the sub-regions.
More formally, suppose you have a matrix A containing non-negative values with dimensions m x n. You would like to cover the matrix with square sub-matrices with dimension s x s such that the sum of the values of A covered by the union of the area of the squares is maximized. For each square you add to the solution, a constant penalty p is subtracted from the objective value of the solution.
For instance, consider the following matrix:
0 0 0 0 0 0
0 1 2 2 1 0
0 1 2 2 2 0
0 0 0 0 0 0
0 3 0 0 0 0
with parameters p = -4 and s = 2. The optimal solution is the two squares S1 = [1, 2; 1, 2] and S2 = [2, 1; 2, 2] with coordinates (2:3,2:3) and (2:3,4:5) respectively (in Matlab notation). Note that in this example that the greedy approach of incrementally adding the squares with maximum value until no squares can be added (without decreasing the objective value) fails.
One brute force way of solving it would be to check all possible combinations using exactly k squares. Starting from k =1, you would compute the optimal combination with exactly k squares, increment k and repeat until the objective value stops increasing. This is clearly very expensive.
You can precompute the sums of values of the (m-s+1)*(n-s+1) possible squares in time O(mn) using an integral image.
Is there an efficient solution to this?

The problem is NP-Hard. This could be proven by reduction from planar minimum vertex cover. Proof for special case s=3, p=2, and A having only values 0 or 1 is identical to the proof for other SO question.
As for brute force solution, it could be made more efficient if instead of trying all combinations with increasing k, you add squares incrementally. When objective value of partial solution plus sum of not-yet-covered values is not greater than best-so-far objective value, rollback to last valid combination by removing recently added square(s) and try other squares. Avoid adding squares that add zero to objective value. Also avoid adding sub-optimal squares: if in example from OP partial solution contains square [1, 2; 1, 2], do not add square [2, 2; 2, 2] because [2, 1; 2, 2] would always be at least as good or even better. And reorder the squares in such a way that you quickly get good enough solution, this allows to terminate all further attempts sooner.


How to solve Twisty Movement from Codeforces?

I've read the editorial but it's very short and claims something I don't understand why it's true. Why is it equivalent to finding longest subsequence of 1*2*1*2*?. Can someone explain the solution step by step and justify the claims at every step?
Here is the 'solution' from the editorial, but as I said it's very short and I don't understand it. Hope someone can guide me to the solution step by step justifying the claims along the way, not like in the solution here. Thanks.
Since 1 ≤ ai ≤ 2, it's equivalent to find a longest subsequence like
1 * 2 * 1 * 2 * . By an easy dynamic programming we can find it in
O(n) or O(n2) time. You can see O(n2) solution in the model
solution below. Here we introduce an O(n) approach: Since the
subsequence can be split into 4 parts (11...22...11...22...) , we
can set dp[i][j](i = 1...n, j = 0..3) be the longest subsequence of
a[1...i] with first j parts.
I also think that the cited explanation is not super clear. Here is another take.
You can collapse an original array
1 1 2 2 2 1 1 2 2 1
into a weighted array
2 3 2 2 1
^ ^ ^ ^ ^
1 2 1 2 1
where numbers at the top represent lengths of contiguous strips of repeated values in the original array.
We can convince ourselves that
The optimal flip does not "break up" any contiguous sequences.
The optimal flip starts and ends with different values (i.e. starts with 1 and ends with 2, or starts with 2 and ends with 1).
Hence, the weighted array contains enough information to solve the problem. We want to flip a contiguous slice of the weighted array s.t. the sum of weights associated with some contiguous monotonic sequence is maximized.
Specifically, we want to perform the flip in such a way that some contiguous monotonic sequence 112, 122, 211 or 221 has maximum weight.
One way to do this with dynamic programming is by creating 4 auxiliary arrays.
A[i] : maximal weight of any 1 to the right of i.
B[i] : maximal weight of any 1 to the left of i.
C[i] : maximal weight of any 2 to the right of i.
D[i] : maximal weight of any 2 to the left of i.
Let's assume that if any of A,B,C,D is accessed out of bounds, the returned value is 0.
We initialize x = 0 and do one pass through the array Arr = [1, 2, 1, 2, 1] with weights W = [2, 3, 2, 2, 1]. At each index i, we have 2 cases:
Arr[i:i+2] == 1 2. In this case we set
x = max(x, W[i] + W[i+1] + C[i+1], W[i] + W[i+1] + B[i-1]).
Arr[i:i+2] == 2 1. In this case we set
x = max(x, W[i] + W[i+1] + A[i+1], W[i] + W[i+1] + D[i-1]).
The resulting x is our answer. This is an O(N) solution.

align 2 matrice for maximum overlap

So following is an interview problem.
Given two N2 matrices with entries being 0 or 1. How can we find out the number of maximum overlapping 1's possible?
Note: You can only move the matrix upward, downward, leftward and rightward, so rotation is not allowed
Currently I'm stuck at the most naive O(N^4) method, which being align the top left corner of one matrix to every possible position of the other matrix and count the all the overlap 1s.
[0 1 0] [0 0 1]
A: [1 0 0] B: [0 0 1]
[1 0 0] [0 0 0]
Then the number of maximum overlapping 1s are 2, that we alight (0,2) of B to (1,0) of A, then (0,2) and (1,0) are both 1, and (1,2) and (2,0) are both 1.
Can it be optimise from O(N4)?
If floating-point arithmetics calculations are possible, this problem might be solved with 2D cross-correlation (using fast Fourier transform intrinsically) in O(n^2 logn) time. This method is used in 2D pattern searching.
Not so obvious tip: to implement correlation and get proper results, one should shift values to make "signals" bi-polar (transform zeros to -1 or subtract matrix average from all matrix elements)
Calculate correlation matrix, find index (dx,dy) of maximum value - it should correspond to align vector.

Algorithm to maximize the smallest diagonal element of a matrix

Suppose we are given a square matrix A. Our goal is to maximize the smallest diagonal element by row permutations. In other words, for the given matrix A, we have n diagonal elements and thus we have the minimum $min{d_i}$. Our purpose is to reach the matrix with possibly largest minimum diagonal element by row permutations.
This is like $max min{d_i}$ over all row permutations.
For example, suppose A = [4 3 2 1; 1 4 3 2; 2 1 4 3; 2.5 3.5 4.5 1.5]. The diagonal is [4, 4, 4, 1.5]. The minimum of the diagonal is 1.5. We can swap row 3 and 4 to get to a new matrix \tilde_A = [4 3 2 1; 1 4 3 2; 2.5 3.5 4.5 1.5; 2 1 4 3]. The new diagonal is [4, 4, 4.5, 3] with a new minimum 3. And in theory, this is the best result I can obtain because there seems no better option: 3 seems to be the max min{d_i}.
In my problem, n is much larger like 1000. I know there are n! row permutations so I cannot go through each permutation in theory. I know greedy algorithm will help--we start from the first row. If a_11 is not the smallest in the first column, we swap a_11 with the largest element in the first column by row permutation. Then we look at the second row by comparing a_22 with all remaining elements in the second column(except a_12). Swap a_22 if it is not the smallest. ... ... etc. We keep doing this till the last row.
Is there any better algorithm to do it?
This is similar to Minimum Euclidean Matching but they are not the same.
Suppose you wanted to know whether there was a better solution to your problem than 3.
Change your matrix to have a 1 for every element that is strictly greater than 3:
4 3 2 1 1 0 0 0
1 4 3 2 0 1 0 0
2.5 3.5 4.5 1.5 -> 0 1 1 0
2 1 4 3 0 0 1 0
Your problem can be interpreted as trying to find a perfect matching in the bipartite graph which has this binary matrix as its biadjacency graph.
In this case, it is easy to see that there is no way of improving your result because there is no way of reordering rows to make the diagonal entry in the last column greater than 3.
For a larger matrix, there are efficient algorithms to determine maximal matchings in bipartite graphs.
This suggests an algorithm:
Use bisection to find the largest value for which the generated graph has a perfect matching
The assignment corresponding to the perfect matching with the largest value will be equal to the best permutation of rows
This Python code illustrates how to use the networkx library to determine whether the graph has a perfect matching for a particular cutoff value.
import networkx as nx
A = [[4,3,2,1],
cutoff = 3
for i,row in enumerate(A):
for j,e in enumerate(row):
if e>cutoff:
if nx.max_flow(G,'start','end')<len(A):
print 'No perfect matching'
print 'Has a perfect matching'
For a random matrix of size 1000*1000 it takes about 1 second on my computer.
Let $x_{ij}$ be 1 if row i is moved to row j and zero otherwise.
You're interested in the following integer program:
max z
\sum_{i=0}^n x_{ij} = 1 \forall j
\sum_{j=0}^n x_{ij} = 1 \forall i
A[j,j]x_{ij} >= z
Then plug this into GLPK, Gurobi, or CPLEX. Alternatively, solve the IP using your own branch and bound solve.

Algorithm for making two histograms proportional, minimizing units removed

Imagine you have two histograms with an equal number of bins. N observations are distributed among the bins. Each bin now has between 0 and N observations.
What algorithm would be appropriate for determining the minimum number of observations to remove from both histograms in order to make them proportional? They do not need to be equal in absolute number, only proportional to each other. That is, there must be a common factor by which all the bins in one histogram can be multiplied in order to make it equal to the other histogram.
For example, imagine the following two histograms, where the item i in each histogram refers to the number of observations in bin i for the respective histogram.
Histogram 1: 4, 7, 4, 9
Histogram 2: 2, 0, 2, 1
For these histograms, the solution would be to remove from histogram 1 all 7 observations in bin 2 and another 7 observations from bin 4, such that (histogram 1)*2 = histogram 2.
But what general algorithm could be used to find the subsets of the two histograms that maximized the number of total observations between them while making them proportional? You can drop observations from both histograms or just one.
Seems to me that the problem is equivalent (if you consider each histogram as a N-dimensional vector), to minimizing the Manhattan length |R|, where R=xA-B, A and B are your 'vectors' and x is your proportional scale.
|R| has a single minimum (not necessarily an integer) so you can find it fairly rapidly using a simple bisection algorithm (or something akin to Newton's method).
Then, assuming you want a solution where the proportion is an integer, test the two cases ceil(x), and floor(x), to find which has the smallest Manhattan length (and that is the number of observations you need to remove).
Proof that the problem is not NP-hard:
Consider an inefficient 'solution' whereby you removed all N observations from all the bins. Now both A and B are equal to the 'zero' histogram 0 = (0,0,0,...). The two histograms are equal and thus proportional as 0 = s * 0 for all proportional values s, so a hard maximum for the number of observations to remove is N.
Now assume a more efficient solution exists with assitions/removals < N and a proportional scale s > 2*N (i.e after removal of some observations A = N * B or B=N * A ). If both A = 0 and B = 0, we have the previous solution with N removals (which contradicts the assumption that there are less than N removals). If A = 0 and B ≠ 0 then there is no s <> 0 such that 0 = s * B and no s such that s * 0 = B (with a similar argument for B = 0 and S ≠ 0). So it must be the case that both A ≠ 0 and B ≠ 0. Assume for a moment that A is the histogram to be scaled (so A * s = B), A must have at least one non-zero entry A[i] with minimum value 1 (after removal of extra observations), so when scaled it will have minimum value ≥. Therefore the equivalent entry B[i] must also have at least 2*N observations. But the total number of observations was initially N, so we have needed to add at least N observations to B[i], which contradicts the assumption that the improved solution had less than N additions/removals. So no 'efficient' solution requires a proportional scale greater than N.
So to find an efficient solution requires, at worst, testing the 'best fit' solution for scaling factors in the range 0-N.
The 'best fit' solution for scaling factor s in A = s * B, where A and B have M bins each requires
Sum(i=1 to M) of { Abs(A[i]- s * B[i]) mod s + Abs(A[i]- s * B[i]) div s } additions/removals.
This is an order M operation, so to test for each scaling factor in the range 0-N will be an algorithm of order O(M*N)
I am fairly certain (but haven't got a formal proof), that the scale factor cannot exceed the number of observations in the most filled bin. In practice it is typically very much smaller. For two histograms with two hundred bins and randomly chosen 30-300 observations per bin: if there were Na > Nb total observations in all the bins of A and B respectively the scaling factor was either almost always found in the range Na/Nb-4 < s < Na/Nb + 4, (or s = 0 if Na >> Nb).

Efficient way to find all zeros in a matrix?

I am thinking of efficient algorithm to find the number of zeros in a row of matrix but can only think of O(n2) solution (i.e by iterating over each row and column). Is there a more efficient way to count the zeros?
For example, given the matrix
3, 4, 5, 6
7, 8, 0, 9
10, 11, 12, 3
4, 0, 9, 10
I would report that there are two zeros.
Without storing any external information, no, you can't do any better than Θ(N2). The rationale is simple - if you don't look at all N2 locations in the matrix, then you can't guarantee that you've found all of the zeros and might end up giving the wrong answer back. For example, if I know that you look at fewer than N2 locations, then I can run your algorithm on a matrix and see how many zeros you report. I could then look at the locations that you didn't access, replace them all with zeros, and run your algorithm again. Since your algorithm doesn't look at those locations, it can't know that they have zeros in them, and so at least one of the two runs of the algorithm would give back the wrong answer.
More generally, when designing algorithms to process data, a good way to see if you can do better than certain runtimes is to use this sort of "adversarial analysis." Ask yourself the question: if I run faster than some time O(f(n)), could an adversary manipulate the data in ways that change the answer but I wouldn't be able to detect? This is the sort of analysis that, along with some more clever math, proves that comparison-based sorting algorithms cannot do any better than Ω(n log n) in the average case.
If the matrix has some other properties to it (for example, if it's sorted), then you might be able to do a better job than running in O(N2). As an example, suppose that you know that all rows of the matrix are sorted. Then you can easily do a binary search on each row to determine how many zeros it contains, which takes O(N log N) time and is faster.
Depending on the parameters of your setup, you might be able to get the algorithm to run faster if you assume that you're allowed to scan in parallel. For example, if your machine has K processors on it that can be dedicated to the task of scanning the matrix, then you could split the matrix into K roughly evenly-sized groups, have each processor count the number of zeros in the group, then sum the results of these computations up. This ends up giving you a runtime of Θ(N2 / K), since the runtime is split across multiple cores.
Always O(n^2) - or rather O(n x m). You cannot jump over it.
But if you know that matrix is sparse (only a few elements have nonzero values), you can store only values that are non zero and matrix size. Then consider using hashing over storing whole matrix - generally create hash which maps a row number to a nested hash.
m =
0 0 0 0
0 2 0 0
0 0 1 0
0 0 1 0
Will be represented as:
row_numbers = 4
column_numbers = 4
hash = { 1 => { 1 => 2}, 2 => {2 => 1, 3 => 2}}
number_of_zeros = row_numbers * column_numbers - number_of_cells_in_hash(hash)
For any un sorted matrix it should be O(n). Since generally we represent total elements with 'n'.
If Matrix contains X Rows and Y Columns, X by Y = n.
E.g In 4 X 4 un sorted matrix it total elements 16. so When we iterate in linear with 2 loops 4 X 4 = 16 times. it will be O(n) because the total elements in the array are 16.
Many people voted for O(n^2) because they considered n X n as matrix.
Please correct me if my understanding is wrong.
Assuming that when you say "in a row of a matrix", you mean that you have the row index i and you want to count the number of zeros in the i-th row, you can do better than O(N^2).
Suppose N is the number of rows and M is the number of columns, then store your
matrix as a single array [3,4,5,6,7,8,0,9,10,11,12,34,0,9,10], then to access row i, you access the array at index N*i.
Since arrays have constant time access, this part doesn't depend on the size of the matrix. You can then iterate over the whole row by visiting the element N*i + j for j from 0 to N-1, this is O(N), provided you know which row you want to visit and you are using an array.
This is not a perfect answer for the reasons I'll explain, but it offers an alternative solution potentially faster than the one you described:
Since you don't need to know the position of the zeros in the matrix, you can flatten it into a 1D array.
After that, perform a quicksort on the elements, this may provide a performance of O(n log n), depending on the randomness of the matrix you feed in.
Finally, count the zero elements at the beginning of the array until you reach a non-zero number.
In some cases, this will be faster than checking every element, although in a worst-case scenario the quicksort will take O(n2), which in addition to the zero counting at the end may be worse than iterating over each row and column.
assuming the given Matrix is M do an M+(-M) operation but do use the default + use instead my_add(int a, int b) such that
int my_add(int a, int b){
return (a == b == 0) ? 1 : (a+b);
That will give you a matrix like
0 0 0 0
0 0 1 0
0 0 0 0
0 1 0 0
Now you create a s := 0 and keep adding all elements to s. s += a[i][j]
You can do both in one cycle even. s += my_add(a[i][j], (-1)*a[i][j])
But still Its O(m*n)
To count the number of 1's you generally check all items in the Matrix. without operating on all elements I don't think you can tell the number of 1's. and to loop all elements its (m*n). It can be faster than (m*n) if and only if you can leave some elements unchecked and say the number of 1's
However if you move a 2x2 kernel over the matrix and hop you will get (m*n)/k iteration e.g. if you operate on neighboring elements a[i][j], a[i+1][j], a[i][j+1], a[i+1][j+1] till i < m & i< n
