Given a matrix with integer elements the problem is to find the maximum sum submatrix. The problem is stated and solved here using Kadane's algorithm for a 2D matrix.
Now I want to solve this problem for higher dimensions i.e. given a matrix in d-dimensional space design an algorithm that solves the same problem.
I wonder if you can do it in O(n^(2d-1)) time.
Any idea is appreciated.
You can compute the sum of a d-dimensional submatrix with 2^d lookups, 2^d/2 subtractions and (2^d/2)-1 additions by using a multi-dimensional Summed-area table.
The summed-area table is a matrix with the same dimensionality and size as the input matrix, where each element in the summed-area table is the sum of all all elements in the input matrix with indexes equal to or lower than that element in all dimensions. It can be calculated with a single pass over the matrix.
You could then find the maximum sum submatrix in O(n^2d) by iterating over each dimension in both start index and submatrix size, and computing the submatrix sum for those start indexes and sizes using the summed-area table. Basically you look up all the "corners" of your submatrix in the SAT and add or subtract the value to get the submatrix sum. When d is odd then for each corner, if the corner has an odd number of dimensions where it is the end index of the submatrix range then you add it, and if it has an even number then you subtract it. Vice versa when d is even. Example in 2D below (from the SAT Wikipedia page)
The submatrix with the highest total is the maximum sum submatrix.
Using Kadane's algorithm could reduce the inner two loops (start index and submatrix size of one of the dimensions) into one, making it O(n^(2d-1))
Related
I recently came across this algorithmic question in an interview. The question goes something like:
Initially there is a rectangle (starting at the origin (0,0) and ending at (n,m)) given. Then there are q queries like x=r or y=c which basically divides the initial rectangles into smaller rectangles. After each query, we have to return the largest rectangle size currently present.
See the diagram:
So, here we were initially given a rectangle from (0,0) to (6,6) [a square in fact!!]. Now after the 1st query (shown as dotted line above) x = 2, the largest rectangle size is 24. After the second query y = 1, the largest rectangle size is 20. And this is how it goes on and on.
My approach to solving this:
At every query, find:
The largest interval on the x axis (maxX) [keep storing all the x = r values in a list]
The largest interval on y axis (maxY) [keep storing all the y = c values in another list]
At every query, your answer is (maxX * maxY)
For finding 1 and 2, I will have to iterate through the whole list, which is not very efficient.
So, I have 2 questions:
Is my solution correct? If not, what is the correct approach to the problem. If yes, how can I optimise my solution?
It's correct but takes O(n) time per query.
You could, for each dimension, have one binary search tree (or other sorted container with O(log n) operations) for the coordinates (initially two) and one for the interval sizes. Then for each query in that dimension:
Add the new coordinate to the coordinates.
From its neighbors, compute the interval's old size and remove that from the sizes.
Compute the two new intervals' sizes and add them to the sizes.
The largest size is at the end of the sizes.
Would be O(log n) per query.
Yes, your algorithm is correct.
To optimize it, first of all, consider only one dimension, because the two dimensions in your geometry are fully orthogonal.
So, you need to have a data structure which holds a partitioning of an interval into sub-intervals, and supports fast application of these two operations:
Split a given interval into two
Find a largest interval
You can do that by using two sorted lists, one sorted by coordinate, and the other sorted by size. You should have pointers from one data structure to the other, and vice-versa.
To implement the "splitting" operation:
Find the interval which you should split, using binary search in the coordinate-sorted list
Remove the interval from both lists
Add two smaller intervals to both lists
Given n vectors of dimension m. For every vector, each dimension can be replaced by the other dimension value of this vector, and each value can only be used only one time to replace the other dimension. After the changing of all these n vectors, we calculate the Manhattan distance between each vector and its nearest vector. For all the replacing plans, we select the one which can get the minimum sum of the distance between all these n vectors and these nearest vectors.
Is it NP-hard?
Unless I'm missing something, the optimal configuration will always be to rearrange each row so that its entries are in ascending order. So the optimal runtime should be O(m n log m), the amount of time it takes to sort n lists of length m.
Given the binary pattern of a square sparse matrix. How can you move all non-zero elements towards diagonal line through row permutations. One possible cost function is sum the two norm of distances between each non zero element and the diagonal line.
This is more of an algorithms question, but one simple way would be to take the greedy approach. Keep evaluating which row swap would result in maximal improvement in your cost function. Repeat until the cost function stabilizes.
My problem is: we have N points in a 2D space, each point has a positive weight. Given a query consisting of two real numbers a,b and one integer k, find the position of a rectangle of size a x b, with edges are parallel to axes, so that the sum of weights of top-k points, i.e. k points with highest weights, covered by the rectangle is maximized?
Any suggestion is appreciated.
P.S.:
There are two related problems, which are already well-studied:
Maximum region sum: find the rectangle with the highest total weight sum. Complexity: NlogN.
top-K query for orthogonal ranges: find top-k points in a given rectangle. Complexity: O(log(N)^2+k).
You can reduce this problem into finding two points in the rectangle: rightmost and topmost. So effectively you can select every pair of points and calculate the top-k weight (which according to you is O(log(N)^2+k)). Complexity: O(N^2*(log(N)^2+k)).
Now, given two points, they might not form a valid pair: they might be too far or one point may be right and top of the other point. So, in reality, this will be much faster.
My guess is the optimal solution will be a variation of maximum region sum problem. Could you point to a link describing that algorithm?
An non-optimal answer is the following:
Generate all the possible k-plets of points (they are N × N-1 × … × N-k+1, so this is O(Nk) and can be done via recursion).
Filter this list down by eliminating all k-plets which are not enclosed in a a×b rectangle: this is a O(k Nk) at worst.
Find the k-plet which has the maximum weight: this is a O(k Nk-1) at worst.
Thus, this algorithm is O(k Nk).
Improving the algorithm
Step 2 can be integrated in step 1 by stopping the branch recursion when a set of points is already too large. This does not change the need to scan the element at least once, but it can reduce the number significantly: think of cases where there are no solutions because all points are separated more than the size of the rectangle, that can be found in O(N2).
Also, the permutation generator in step 1 can be made to return the points in order by x or y coordinate, by pre-sorting the point array correspondingly. This is useful because it lets us discard a bunch of more possibilities up front. Suppose the array is sorted by y coordinate, so the k-plets returned will be ordered by y coordinate. Now, supposing we are discarding a branch because it contains a point whose y coordinate is outside the max rectangle, we can also discard all the next sibling branches because their y coordinate will be more than of equal to the current one which is already out of bounds.
This adds O(n log n) for the sort, but the improvement can be quite significant in many cases -- again, when there are many outliers. The coordinate should be chosen corresponding to the minimum rectangle side, divided by the corresponding side of the 2D field -- by which I mean the maximum coordinate minus the minimum coordinate of all points.
Finally, if all the points lie within an a×b rectangle, then the algorithm performs as O(k Nk) anyways. If this is a concrete possibility, it should be checked, an easy O(N) loop, and if so then it's enough to return the points with the top N weights, which is also O(N).
Here is the interesting but complicated problem:
Suppose we have two sets of points. One set A includes points in some space grid, like regular 1D or 3D grid. The other set B includes points that are randomly spaced and are of the same size as the space grid. Mathematically, we could order the two sets and construct a corresponding matrix with respect to the distance between A and B. For example, A(i, j) may refer to the distance between i of A and j of B.
Given some ordering, we have a matrix. Then, the diagonal element (i,i) in the matrix is the distance between point i of A and point i of B. The problem is how to find a good reordering/indexing such that the maximum distance is as small as possible? In matrix form, how to find a good reordering/indexing such that the largest diagonal element as small as possible?
Notes from myself:
Suppose set A is corresponding to rows of the matrix, and set B is to columns of the matrix. Then reordering the matrix means we are doing row/column permutation. Therefore, our problem is equivalent to find a good permutation to minimize the largest diagonal element.
Greedy algorithm may be a choice. But I am trying to find an ideally perfect reordering that minimize the largest diagonal element.
The reordering you are referring to is essentially a correspondence problem i.e. you are trying to find the closest match for each point in the other set. The greedy algorithm will work fine. The distance you are looking for is commonly referred to as the Hausdorff distance.