Algorithm-Find sum in matrix - algorithm

We are given 2D matrix array (let's say length i and wide j) and integer k
We have to find size of smallest rectangle, that contains this or greater sum
F.e k=7
4 1
1 1
1 1
4 4
Anwser is 2, because 4+4=8 >= 7, if there wasn't last line, anwser would be 4, 4+1+1+1 = 7 >= 7
My idea is to count prefix sums Pref[k,l]=Tab[k,l]+Pref[k-1,l]+Pref[k,l-1]
And then compare every single rectangle
Is this possible to make it faster? My idea is T(n)=O(n^2) (Where n is number of elements in matrix)
I would like to do this in time n or n * log n
I would be really glad if someone would give me any tip how to do this :)

First, create an auxillary matrix: sums, where:
sums[i,j] = A[0,0] + A[0,1] + .... + A[0,j] + A[1,0] + ... + A[1,j] + ... + A[i,j]
I think this is what you meant when you said "prefix matrix".
This can be calculated in linear time with dynamic programming:
sums[0,j] = A[0,0] + ... + A[0,j]
sums[i,0] = A[0,0] + ... + A[i,0]
sums[i,j] = sums[i-1,j] + sums[i,j-1] - sums[i-1,j-1] + A[i,j]
^
elements counted twice
Now, assuming all elements are non negative, this is non decreasing, matrix, where each column and each row are sorted.
So, iterating the matrix again, for each pair of indices i,j, find the value closest yet smaller than sum[i,j]-k.
This can be done in O(sqrt(n)).
Do it for each such (i,j) pair, and you get O(n*sqrt(n)) solution.

Related

sum of maximum element of sliding window of length K

Recently I got stuck in a problem. The part of algorithm requires to compute sum of maximum element of sliding windows of length K. Where K ranges from 1<=K<=N (N length of an array).
Example if I have an array A as 5,3,12,4
Sliding window of length 1: 5 + 3 + 12 + 4 = 24
Sliding window of length 2: 5 + 12 + 12 = 29
Sliding window of length 3: 12 + 12 = 24
Sliding window of length 4: 12
Final answer is 24,29,24,12.
I have tried to this O(N^2). For each sliding window of length K, I can calculate the maximum in O(N). Since K is upto N. Therefore, overall complexity turns out to be O(N^2).
I am looking for O(N) or O(NlogN) or something similar to this algorithm as N maybe upto 10^5.
Note: Elements in array can be as large as 10^9 so output the final answer as modulo 10^9+7
EDIT: What I actually want to find answer for each and every value of K (i.e. from 0 to N) in overall linear time or in O(NlogN) not in O(KN) or O(KNlogN) where K={1,2,3,.... N}
Here's an abbreviated sketch of O(n).
For each element, determine how many contiguous elements to the left are no greater (call this a), and how many contiguous elements to the right are lesser (call this b). This can be done for all elements in time O(n) -- see MBo's answer.
A particular element is maximum in its window if the window contains the element and only elements among to a to its left and the b to its right. Usefully, the number of such windows of length k (and hence the total contribution of these windows) is piecewise linear in k, with at most five pieces. For example, if a = 5 and b = 3, there are
1 window of size 1
2 windows of size 2
3 windows of size 3
4 windows of size 4
4 windows of size 5
4 windows of size 6
3 windows of size 7
2 windows of size 8
1 window of size 9.
The data structure that we need to encode this contribution efficiently is a Fenwick tree whose values are not numbers but linear functions of k. For each linear piece of the piecewise linear contribution function, we add it to the cell at beginning of its interval and subtract it from the cell at the end (closed beginning, open end). At the end, we retrieve all of the prefix sums and evaluate them at their index k to get the final array.
(OK, have to run for now, but we don't actually need a Fenwick tree for step two, which drops the complexity to O(n) for that, and there may be a way to do step one in linear time as well.)
Python 3, lightly tested:
def left_extents(lst):
result = []
stack = [-1]
for i in range(len(lst)):
while stack[-1] >= 0 and lst[i] >= lst[stack[-1]]:
del stack[-1]
result.append(stack[-1] + 1)
stack.append(i)
return result
def right_extents(lst):
result = []
stack = [len(lst)]
for i in range(len(lst) - 1, -1, -1):
while stack[-1] < len(lst) and lst[i] > lst[stack[-1]]:
del stack[-1]
result.append(stack[-1])
stack.append(i)
result.reverse()
return result
def sliding_window_totals(lst):
delta_constant = [0] * (len(lst) + 2)
delta_linear = [0] * (len(lst) + 2)
for l, i, r in zip(left_extents(lst), range(len(lst)), right_extents(lst)):
a = i - l
b = r - (i + 1)
if a > b:
a, b = b, a
delta_linear[1] += lst[i]
delta_linear[a + 1] -= lst[i]
delta_constant[a + 1] += lst[i] * (a + 1)
delta_constant[b + 2] += lst[i] * (b + 1)
delta_linear[b + 2] -= lst[i]
delta_linear[a + b + 2] += lst[i]
delta_constant[a + b + 2] -= lst[i] * (a + 1)
delta_constant[a + b + 2] -= lst[i] * (b + 1)
result = []
constant = 0
linear = 0
for j in range(1, len(lst) + 1):
constant += delta_constant[j]
linear += delta_linear[j]
result.append(constant + linear * j)
return result
print(sliding_window_totals([5, 3, 12, 4]))
Let's determine for every element an interval, where this element is dominating (maximum). We can do this in linear time with forward and backward runs using stack. Arrays L and R will contain indexes out of the domination interval.
To get right and left indexes:
Stack.Push(0) //(1st element index)
for i = 1 to Len - 1 do
while Stack.Peek < X[i] do
j = Stack.Pop
R[j] = i //j-th position is dominated by i-th one from the right
Stack.Push(i)
while not Stack.Empty
R[Stack.Pop] = Len //the rest of elements are not dominated from the right
//now right to left
Stack.Push(Len - 1) //(last element index)
for i = Len - 2 to 0 do
while Stack.Peek < X[i] do
j = Stack.Pop
L[j] = i //j-th position is dominated by i-th one from the left
Stack.Push(i)
while not Stack.Empty
L[Stack.Pop] = -1 //the rest of elements are not dominated from the left
Result for (5,7,3,9,4) array.
For example, 7 dominates at 0..2 interval, 9 at 0..4
i 0 1 2 3 4
X 5 7 3 9 4
R 1 3 3 5 5
L -1 -1 1 -1 4
Now for every element we can count it's impact in every possible sum.
Element 5 dominates at (0,0) interval, it is summed only in k=1 sum entry
Element 7 dominates at (0,2) interval, it is summed once in k=1 sum entry, twice in k=2 entry, once in k=3 entry.
Element 3 dominates at (2,2) interval, it is summed only in k=1 sum entry
Element 9 dominates at (0,4) interval, it is summed once in k=1 sum entry, twice in k=2, twice in k=3, twice in k=4, once in k=5.
Element 4 dominates at (4,4) interval, it is summed only in k=1 sum entry.
In general element with long domination interval in the center of long array may give up to k*Value impact in k-length sum (it depends on position relative to array ends and to another dom. elements)
k 1 2 3 4 5
--------------------------
5
7 2*7 7
3
9 2*9 2*9 2*9 9
4
--------------------------
S(k) 28 32 25 18 9
Note that the sum of coefficients is N*(N-1)/2 (equal to the number of possible windows), the most of table entries are empty, so complexity seems better than O(N^2)
(I still doubt about exact complexity)
The sum of maximum in sliding windows for a given window size can be computed in linear time using a double ended queue that keeps elements from the current window. We maintain the deque such that the first (index 0, left most) element in the queue is always the maximum of the current window.
This is done by iterating over the array and in each iteration, first we remove the first element in the deque if it is no longer in the current window (we do that by checking its original position, which is also saved in the deque together with its value). Then, we remove any elements from the end of the deque that are smaller than the current element, and finally we add the current element to the end of the deque.
The complexity is O(N) for computing the maximum for all sliding windows of size K. If you want to do that for all values of K from 1..N, then time complexity will be O(N^2). O(N) is the best possible time to compute the sum of maximum values of all windows of size K (that is easy to see). To compute the sum for other values of K, the simple approach is to repeat the computation for each different value of K, which would lead to overall time of O(N^2). Is there a better way ? No, because even if we save the result from a computation for one value of K, we would not be able to use it to compute the result for a different value of K, in less then O(N) time. So best time is O(N^2).
The following is an implementation in python:
from collections import deque
def slide_win(l, k):
dq=deque()
for i in range(len(l)):
if len(dq)>0 and dq[0][1]<=i-k:
dq.popleft()
while len(dq)>0 and l[i]>=dq[-1][0]:
dq.pop()
dq.append((l[i],i))
if i>=k-1:
yield dq[0][0]
def main():
l=[5,3,12,4]
print("l="+str(l))
for k in range(1, len(l)+1):
s=0
for x in slide_win(l,k):
s+=x
print("k="+str(k)+" Sum="+str(s))

Complexity of searching sorted matrix

Suppose we have a matrix of size NxN of numbers where all the rows and columns are in increasing order, and we want to find if it contains a value v. One algorithm is to perform a binary search on the middle row, to find the elements closest in value to v: M[row,col] < v < M[row,col+1] (if we find v exactly, the search is complete). Since the matrix is sorted we know that v is larger than all elements in the sub-matrix M[0..row, 0..col] (the top-left quadrant of the matrix), and similarly it's smaller than all elements in the sub-matrix M[row..N-1, col+1..N-1] (the bottom right quadrant). So we can recursively search the top right quadrant M[0..row-1, col+1..N-1] and the bottom left quadrant M[row+1..N-1, 0..col].
The question is what is the complexity of this algorithm ?
Example: Suppose we have the 5x5 matrix shown below and we are searching for the number 25:
0 10 20 30 40
1 11 21 31 41
2 12 22 32 42
3 13 23 33 43
4 14 24 34 44
In the first iteration we perform binary search on the middle row and find the closest element which is smaller than 25 is 22 (at row=2 col=2). So now we know 25 is larger than all items in the top-left 3x3 quadrant:
0 10 20
1 11 21
2 12 22
Similary we know 25 is smaller than all elements in the bottom right 3x2 quadrant:
32 42
33 43
34 44
So, we recursively search the remaining quadrants - the top right 2x2:
30 40
31 41
and the bottom left 2x3:
3 13 23
4 14 24
And so on. We essentially divided the matrix into 4 quadrants (which might be of different sizes depending on the result of the binary search on the middle row), and then we recursively search two of the quadrants.
The worst-case running time is Theta(n). Certainly this is as good as it gets for correct algorithms (consider an anti-diagonal, with elements less than v above and elements greater than v below). As far as upper bounds go, the bound for an n-row, m-column matrix is O(n log(2 + m/n)), as evidenced by the correct recurrence
m-1
f(n, m) = log m + max [f(n/2, j) + f(n/2, m-1 - j)],
j=0
where there are two sub-problems, not one. This recurrence is solvable by the substitution method.
?
f(n, m) ≤ c n log(2 + m/n) - log(m) - 2 [hypothesis; c to be chosen later]
m-1
f(n, m) = log m + max [f((n-1)/2, j) + f((n-1)/2, m-j)]
j=0
m-1
≤ log m + max [ c (n/2) log(2 + j/(n/2)) - log(j) - 2
+ c (n/2) log(2 + (m-j)/(n/2))] - log(m-j) - 2]
j=0
[fixing j = m/2 by the concavity of log]
≤ log m + c n log(2 + m/n) - 2 log(m/2) - 4
= log m + c n log(2 + m/n) - 2 log(m) - 2
= c n log(2 + m/n) - log(m) - 2.
Set c large enough that, for all n, m,
c n log(2 + m/n) - log(m) - 2 ≥ log(m),
where log(m) is the cost of the base case n = 1.
If you find your element after n steps, then the searchable range has size N = 4^n. Then, time complexity is O(log base 4 of N) = O(log N / log 4) = O(0.5 * log N) = O(log N).
In other words, your algorithm is two times faster then binary search, which is equal to O(log N)
A consideration on binary search on matrices:
Binary search on 2D matrices and in general ND matrices are nothing different than binary search on sorted 1D vectors. Infact C for instance store them in row-major fashion(as concat of rows from: [[row0],[row1]..[rowk]]
This means one can use the well-known binary search on matrix as following (with complexity log(n*m)):
template<typename T>
bool binarySearch_2D(T target,T** matrix){
int a=0;int b=NCELLS-1;//ROWS*COLS
bool found=false;
while(!found && a <= b){
int half=(a+b)/2;
int r=half/COLS;
int c=half-(half/COLS)*COLS;
int v =matrix[r][c];
if(v==target)
found=true;
else if(target > v)
a=half+1;
else //target < v
b=half-1;
}
return found;
}
The complexity of this algorithm will be -:
O(log2(n*n))
= O(log2(n))
This is because you are eliminating half of the matrix in one iteration.
EDIT -:
Recurrence relation -:
Assuming n to be the total number of elements in the matrix,
=> T(n) = T(n/2) + log(sqrt(n))
=> T(n) = T(n/2) + log(n^(1/2))
=> T(n) = T(n/2) + 1/2 * log(n)
Here, a = 1, b = 2.
Therefore, c = logb(a) = log2(1) = 0
=> n^c = n^0
Also, f(n) = n^0 * 1/2 * log(n)
According to case 2 of Master Theorem,
T(n) = O((log(n))^2)
You can use a recursive function and apply the master theorem to find the complexity.
Assume n is the number of elements in the matrix.
Cost for one step is binary search on sqrt(n) elements and you get two problems, in worst case same size each with n/4 elements: 2*T(n/4). So we have:
T(n)=2*T(n/4)+log(sqrt(n))
equal to
T(n)=2*T(n/4)+log(n)/2
Now apply master theorem case 1 (a=2, b=4, f(n)=log(n)/2 and f(n) in O(n^log_b(a))=O(n^(1/2)) therefore we have case 1)
=> Total running time T(n) is in O(n^(a/b)) = O(n^(1/2))
or equal to
O(sqrt(n))
which is equal to height or width of the matrix if both sides are the same.
Let's assume that we have the following matrix:
1 2 3
4 5 6
7 8 9
Let's search for value 7 using binary search as you specified:
Search nearest value to 7 in middle row: 4 5 6, which is 6.
Hmm we have a problem, 7 is not in the following submatrix:
6
9
So what to do? One solution would be to apply binary search to all rows, which has a complexity of nlog(n). So walking the matrix is a better solution.
Edit:
Recursion relation:
T(N*N) = T(N*N/2) + log(N)
if we normalize the function to one variable with M = N^2:
T(M) = T(M/2) + log(sqrt(M))
T(M) = T(M/2) + log(M)/2
According to Master Theorem Case #2, complexity is
(log(M))^2
=> (2log(N))^2
=> (log(N))^2
Edit 2:
Sorry I answered your question from my mobile, now when you think about it, M[0...row-1, col+1...N-1] doesn't make much sense right? Consider my example, if you search for a value that is smaller than all values in the middle row, you'll always end up with the leftmost number. Similarly, if you search for a value that is greater than all values in the middle row, you'll end up with the rightmost number. So the algorithm can be reworded as follows:
Search middle row with custom binary search that returns 1 <= idx <= N if found, idx = 0 or idx = N+1 if not found. After binary search if idx = 0, start the search in the upper submatrix: M[0...row][0...N].
If the index is N + 1 start the search in the lower submatrix: M[row+1...N][0...N]. Otherwise, we are done.
You suggest that complexity should be: 2T(M/4) + log(M)/2. But at each step, we divide the whole matrix by two and only process one of them.
Moreover, if you agree that T(N*N) = T(N*N/2) + log(N) is correct, than you can substitute all N*N expressions with M.

Counting the strictly increasing sequences

I aligned the N candles from left to right. The ith candle from the left has the height Hi and the color Ci, an integer ranged from 1 to a given K, the number of colors.
Problem: , how many strictly increasing ( in height ) colorful subsequences are there? A subsequence is considered as colorful if every of the K colors appears at least one times in the subsequence.
For Ex: N=4 k= 3
H C
1 1
3 2
2 2
4 3
only two valid subsequences are (1, 2, 4) and (1, 3, 4)
I think it is a problem of Fenwick Tree please provide me a approach how to proceeded with such type of problems
For a moment, let's forget about the colors. So the problem is simpler: count the number of increasing subsequences. This problem has a standard solution:
1. Map each value to [0...n - 1] range.
2. Let's assume the f[value] is the number of increasing subsequences that have value as their last element.
3. Initially, f is filled with 0.
4. After that, you iterate over all array elements from left to right and perform the following operation: f[value] += 1 + get_sum(0, value - 1)(it means that you add this element to all possible subsequences so that they remain strictly increasing), where value is the current element of the array and get_sum(a, b) returns the sum of f[a] + f[a + 1] + ... + f[b].
5. The answer is f[0] + f[1] + ... + f[n - 1].
Using binary index tree(aka Fenwick tree), it is possible to do get_sum operation in O(log n) and get O(n log n) total time complexity.
Now let's come back to the original problem. To take into account the colors, we can compute f[value, mask] instead of f[value](that is, the number of increasing subsequences that have value as their last element and mask(it is a bitmask that shows which colors are present) colors). Then an update for each element looks like this:
for mask in [0...2^K - 1]:
f[value, mask or 2^(color[i] - 1)] += 1 + get_sum(0, value - 1, mask)
The answer is f[0, 2^K - 1] + f[1, 2^K - 1] + ... + f[n - 1, 2^K - 1].
You can maintain 2^K binary index trees to achieve O(n * log n * 2^K) time complexity using the same idea as in a simpler problem.

Maximum of sums of unsorted array and each of a number of sorted arrays

Given an unsorted array
A = a_1 ... a_n
And a set of sorted Arrays
B_i = b_i_1 ... b_i_n # for i from 1 to $large_number
I would like to find the maximums from the (not yet calculated) sum arrays
C_i = (a_1 + b_i_1) ... (a_n + b_i_n)
for each i.
Is there a trick to do better than just calculating all the C_i and finding their maximums in O($large_number * n)?
Can we do better when we know that the B arrays are just shifts from an endless sequence,
e.g.
S = 0 1 4 9 16 ...
B_i = S[i:i+n]
(The above sequence has the maybe advantageous property that (S_i - S_i-1 > S_i-1 - S_i-2))
There are $large_number * n data in your first problem, so there can't be any such trick.
You can prove this with an adversary argument. Suppose you have an algorithm that solves your problem without looking at all n * $large_number entries of b. I'm going to pick a fixed a, namely (-10, -20, -30, ..., -10n). The first $large_number * n - 1 the algorithm looks at an entry b_(i,j), I'll answer that it's 10j, for a sum of zero. The last time it looks at an entry, I'll answer that it's 10j+1, for a sum of 1.
If $large_number is Omega(n), your second problem requires you to look at n * $large_number entries of S, so it also can't have any such trick.
However, if you specify S, there may be something. And if $large_number <= n/2 (or whatever it is), then, all of the entries of S must be sorted, so you only have to look at the last B.
If we don't know anything I don't it's possible to do better than O($large_number * n)
However - If it's just shifts of an endless sequence we can do it in O($large_number + n):
We calculate B_0 ןמ O($large_number).
Than B_1 = (B_0 - S[0]) + S[n+1]
And in general: B_i = (B_i-1 - S[i-1]) + S[i-1+n].
So we can calculate all the other entries and the max in O(n).
This is for a general sequence - if we have some info about it, it might be possible to do better.
we know that the B arrays are just shifts from an endless sequence,
e.g.
S = 0 1 4 9 16 ...
B_i = S[i:i+n]
You can easily calculate S[i:i+n] as (sum of squares from 1 to i+n) - (sum of squares from 1 to i-1)
See https://math.stackexchange.com/questions/183316/how-to-get-to-the-formula-for-the-sum-of-squares-of-first-n-numbers
With the provided example, S1 = 0, S2 = 1, S3 = 4...
Let f(n) = SUM of Si for i=1 to n = (n-1)(n)(2n-1)/6
B_i = f(i+n) - f(i-1)
You then add SUM(A) to each sum.
Another approach is to calculate the difference between B_i and B_(i-1):
That would be: S[i:i+n] - S[i-1:i+n-1] = S(i+n) - S(i-1)
That way, you can just calculate the difference of the sums of each array with the previous one. In my understanding, since Ci = SUM(Bi)+SUM(A), SUM(A) becomes a constant that is irrelevant in finding the maximum.

possible number of rectangle in a matrix

I am going through a algorithm of calculating the sum of a region of matrix. And I read a solution to pre-compute the sum to get better result.I want to calculate the possible number of rectangle(sub Maxtix) in a 2d-matrix of size mXn.
Can anybody explain solution using permutation and combination.
Start form the simplest case first:
Start with m x n, thats 1 rectangle.
Reduce n by 1, that gives another 2.
Reduce n by 1, that gives another 3.
Reduce n by 1, that gives another 4.
Do you see the pattern?
When n gets down to 1, subtract 1 from m, and start over:
Start with m-1 x n, thats 2 rectangles.
Reduce n by 1, that gives another 4.
Reduce n by 1, that gives another 6.
Reduce n by 1, that gives another 8.
Are you seeing the pattern yet..?
Now you extrapolate to m-2, m-3, m-4, ..., 1.
Now start form the beginning reducing n first, then n (or simply double all results, except mxn).
And the sum of all those results is your answer.
You can count all rectangles a*b with a,b>=2 by simply choosing which rows and which columns to include (see picture):
C(m,2)*C(n,2)
You can count a*1 rectangles with a>=2 via
C(m,2)*n
and 1*b rectangles with b>=2 via
m*C(n,2)
and 1*1 matrices via:
m*n
so add these for the final answer:
C(m,2)*C(n,2) + C(m,2)*n + m*C(n,2) + m*n
Number of possible Rectangles of any size in an m x n matrix
mn + (m-1)(n-1) + (m-2)(n-1) + .. + (m-m+1)(n-1) + (m-1)(m-2) + .. + (m-m)(n-n)
Sum { i * j } for i in [0,m]; j in [0,n]
It is not an algorithm it is a counting problem.
try to count the amount of rectangulars in a 1X1 matrix and 1X2 2X1 3X2 etc. and then you will see
num_of_rect(mXn) = sum(i*j) for 0<i<m+1; 0<j<n+1
in python:
def countRect(n,m):
return sum([i*j for i in xrange(n+1) for j in xrange(m+1)])
if __name__ == "__main__":
print countRect(2,3)
gives 18

Resources