How to find largest square of palindrome in a matrix - algorithm

I am trying to solve a problem where I am given a nXn square matrix of characters and I want to find out size of the largest palindrome square from this? The largest palindrome square is, a square with all rows and all columns as palindrome.
For eg.
Input
a g h j k
s d g d j
s e f e n
a d g d h
r y d g s
The output will be:
3
corresponding to the middle square. I am thinking of dynamic programming solution but unable to formulate the recurrence relation. I am thinking the dimensions should be a(i,j,k) where i, j are the bottom-right of rectangle and k be the size of palindrome square.
Can someone help me with the recurrence relation for this problem?
EDIT:
n<500, so I believe that I can't go beyond O(n^3).

Assuming that you can solve the following problem:
Ending at cell (i, j) is there any palindrome with different length horizontally and vertically.
Hint for above problem:
boolean[][][]palindrome;//Is there any palindrome ending at (i , j) has length k
for(int i = 0; i < n; i++){
for(int j = 0; j < n; j++){
palindrome[i][j][0] = true;
palindrome[i][j][1] = true;
for(int k = 2; k <= n; k++)
if(data[i][j - k + 1] == data[i][j] && palindrome[i][j - 1][k - 2])
palindrome[i][j][k] = true;
}
}
So, we can create two three dimensional arrays int[n][n][n]col and int[n][n][n]row.
For each cell(i, j), we will calculate the total number of palindrome with length k, ending at cell (0, j), (1, j) , ... (i, j) and total number of palindrome with length k, ending at cell (i,0), (i, 1), ... (i, j)
for(int k = 1; k <= n; k++)
if(there is palindrome length k horizontally, end at cell (i, j))
row[i][j][k] = 1 + row[i - 1][j][k];
if(there is palindrome length k vertically, end at cell (i, j))
col[i][j][k] = 1 + col[i][j - 1][k];
Finally, if row[i][j][k] >= k && col[i][j][k] >= k -> there is an square palindrome length k ending at (i,j).
In total, the time complexity will be O(n^3)

lets start with the complexity of validating a palindrome:
A palindrome can be identified in O(k) where k is the length of the palindrome see here
you then want need to do that test 2k times once for each row and column in you r inner square. (using the length of the palindrome k, as the dimension)
so now you have k * 2k -> O(2k^2) -> O(k^2)
then you want to increase the possible search space to the whole data set nxn this is when a 2nd variable gets introduced
you will need to iterate over columns 1 to (n-k) and all rows 1 to (n-k) in a nested loop.
so now you have (n-k)^2 * O(k^2) -> O(n^2 * k^2)
Note: this problem is dependant on more than one variable
This is the same approach i suggest you take to coding the solution, start small and get bigger

Im sure there is probably a better way, and im pretty sure my logic is correct so take this at face value as its not tested.
Just to make the example easy im going to say that i,j is the top left corner or coordinates 1,1
1 2 3 4 5 6 7 8
1 a b c d e f g f
2 d e f g h j k q
3 a b g d z f g f
4 a a a a a a a a
ie (1,1) = a, (1,5) = e and (2,1) = d
now instead of checking every column you could start by checking every kth column
ie when k=3
1) create a 2D boolean array the size of the character table all results TRUE
2) I start by checking column 3 cfg which is not a palindrome, thus I no longer need to test columns 1 or 2.
3) because the palindrome test failed marked the coresponding result in the 2D array (1,3) as FALSE (I know not to test any range that uses this position as it is not a palindrome)
4) Next check column 6, fjf which is a palindrome so I go back and test column 5, ehz != a palindrome
5) set (1,5) = FALSE
6) Then test column 8 then 7,
NOTE: You have only had to test 5 of the 8 columns.
since there were k columns in a row that were palindromes, now test the corresponding rows. Start from the bottom row in this case 3 as it will eliminate the most other checks if it fails
7) check row starting at (3,6) fgf = palindrome
8) check row starting at (2,6) jkq != a palindrome
9) set (2,6) = FALSE
10) check column starting at (2,3) daa != palindrome
11) set (2,3) = FALSE
Dont need to test any more for row 2 as both (2,3) and (2,6) are FALSE
Hopefully you can make sense of that.
Note: you would probably start this at k = n and decrement k until you find a result

Related

Counting inversions in an array of 2D pair

Problem Description:
Let there be an array of 2D pairs ((x1, y1), . . . ,(xn, yn))
. With a fixed constant
y' a pair (i, j) is called half-inverted if i < j, xi > xj , and yi ≥ y' > yj . Devise an algorithm
that counts the number of half-inverted pairs. You will get full marks if your algorithm is
correct of complexity no more than O(n log n).
\My idea is to treat this using similar method as counting inversion in a normal array, but my problem is that how do we maintain the order during the Merge And Count step?
It is a simple modification of the familiar merge-sort inversion counting algorithm which can be used to solve this problem so make you fully understand it as a prerequisite.
If we examine the merge step of this algorithm we have 2 sorted halves and 2 pointers pointing to an element of each. Let our left pointer be i and our right, j. Using the traditional definition of an inversion, if our i pointer points to a value that is larger than the value pointed to by j then due the arrays being sorted and all the elements on the left being before those on the right in the real array, we know all the elements from i to the end of the left half meet our definition of an inversion for our value at j so we increase our count by mid - i where mid is the end of the left half.
Switching back to your problem, we are dealing with pairs (x,y). If we can keep our x values sorted then, using the approach described above, we can simply count the number of inversions only considering x values. Looking at your definition of half inversions we will surely be over counting the number we need if we only count xi > xj. We are missing the additional constraint of yi >= y' > yj which must be filtered out of our counting.
So, if we look back to our traditional algorithm when our i pointer is pointing to a value greater than the value at j we also need to make sure that our y value at j is less than y'. If this not true then none of the x's from i to mid will match our definition of a half inversion and so we cannot count them. Now let's assume our j's y is smaller than y', if we simply counted all the pairs from i to mid then we would still be over counting the pairs which have yi < y'.
One way to fix this is to keep track of the of y values in the left half from i to mid which are >= y' and add that value to our count. We can keep track of how many y >= y' we see in the merge step up to any i, and subtract that from the total number of y's which are >= y' in the left half. To keep track of that total number we can return that value from our recursive function (total = left + right) and only use the number which came from the left half when merging. We also need to modify our base case which is straightforward.
def count_half_inversions(l, y):
return count_rec(l, 0, len(l), l.copy(), y)[0]
def count_rec(l, begin, end, copy, y):
if end-begin <= 1:
# we have only 1 pair
return (0, 1 if l[begin][1] >= y else 0)
mid = begin + ((end-begin) // 2)
left = count_rec(copy, begin, mid, l, y)
right = count_rec(copy, mid, end, l, y)
between = merge_count(l, begin, mid, end, copy, left[1], y)
# return (inversion count, number of pairs, (i,j), with j >= y)
return (left[0] + right[0] + between, left[1] + right[1])
def merge_count(l, begin, mid, end, copy, left_y_count, y):
result = 0
i,j = begin, mid
k = begin
while i < mid and j < end:
if copy[i][0] > copy[j][0]:
if y > copy[j][1]:
result += left_y_count
smaller = copy[j]
j += 1
else:
if copy[i][1] >= y:
left_y_count -= 1
smaller = copy[i]
i += 1
l[k] = smaller
k += 1
while i < mid:
l[k] = copy[i]
i += 1
k += 1
while j < end:
l[k] = copy[j]
j += 1
k += 1
return result
test_case = [(1,1), (6,4), (6,3), (1,2), (1,2), (3,3), (6,2), (0,1)]
fixed_y = 2
print(count_half_inversions(test_case, fixed_y))

Formulating dp problem [Codeforces 414 B]

all here is the problem statement from an old contest on codeforces
A sequence of l integers b 1, b 2, ..., b l (1 ≤ b 1 ≤ b 2 ≤ ... ≤ b
l ≤ n) is called good if each number divides (without a remainder) by
the next number in the sequence. More formally for all i
(1 ≤ i ≤ l - 1).
Given n and k find the number of good sequences of length k. As the
answer can be rather large print it modulo 1000000007 (109 + 7).
I have formulated my dp[i][j] as the number of good sequences of length i which ends with the jth number, and the transition table as the following pseudocode
dp[k][n] =
for each factor of n as i do
for j from 1 to k - 1
dp[k][n] += dp[j][i]
end
end
But in the editorial it is given as
Lets define dp[i][j] as number of good sequences of length i that ends in j.
Let's denote divisors of j by x1, x2, ..., xl. Then dp[i][j] = sigma dp[i - 1][xr]
But in my understanding, we need two sigmas, one for the divisors and the other for length. Please help me correct my understanding.
My code ->
MOD = 10 ** 9 + 7
N, K = map(int, input().split())
dp = [[0 for _ in range(N + 1)] for _ in range(K + 1)]
for k in range(1, K + 1):
for n in range(1, N + 1):
c = 1
for i in range(1, n):
if n % i != 0:
continue
for j in range(1, k):
c += dp[j][i]
dp[k][n] = c
c = 0
for i in range(1, N + 1):
c = (c + dp[K][i]) % MOD
print(c)
Link to the problem: https://codeforces.com/problemset/problem/414/B
So let's define dp[i][j] as the number of good sequences of length exactly i and which ends with a value j as its last element.
Now, dp[i][j] = Sum(dp[i-1][x]) for all x s.t. x is a divisor of i. Note that x can be equal to j itself.
This is true because if there is some sequence of length i-1 which we have already found that ends with some value x, then we can simply add j to its end and form a new sequence which satisfies all the conditions.
I guess your confusion is with the length. The thing is that since our current length is i, we can add j to the end of a sequence only if its length is i-1, we cannot iterate over other lengths.
Hope this is clear.

Pyramids dynamic programming

I encountered this question in an interview and could not figure it out. I believe it has a dynamic programming solution but it eludes me.
Given a number of bricks, output the total number of 2d pyramids possible, where a pyramid is defined as any structure where a row of bricks has strictly less bricks than the row below it. You do not have to use all the bricks.
A brick is simply a square, the number of bricks in a row is the only important bit of information.
Really stuck with this one, I thought it would be easy to solve each problem 1...n iteratively and sum. But coming up with the number of pyramids possible with exactly i bricks is evading me.
example, n = 6
X
XX
X
XX XXX
X
XXX XXXX
XX X
XXX XXXX XXXXX
X
XX XX X
XXX XXXX XXXXX XXXXXX
So the answer is 13 possible pyramids from 6 bricks.
edit
I am positive this is a dynamic programming problem, because it makes sense to (once you've determined the first row) simply look to the index in your memorized array of your remainder of bricks to see how many pyramids fit atop.
It also makes sense to consider bottom rows of width at least n/2 because we can't have more bricks atop than on the bottom row EXCEPT and this is where I lose it and my mind falls apart, in certain (few cases) you can I.e. N = 10
X
XX
XXX
XXXX
Now the bottom row has 4 but there are 6 left to place on top
But with n = 11 we cannot have a bottom row with less than n/2 bricks. There is another wierd inconsistency like that with n = 4 where we cannot have a bottom row of n/2 = 2 bricks.
Let's choose a suitable definition:
f(n, m) = # pyramids out of n bricks with base of size < m
The answer you are looking for now is (given that N is your input number of bricks):
f(N, N+1) - 1
Let's break that down:
The first N is obvious: that's your number of bricks.
Your bottom row will contain at most N bricks (because that's all you have), so N+1 is a sufficient lower bound.
Finally, the - 1 is there because technically the empty pyramid is also a pyramid (and will thus be counted) but you exclude that from your solutions.
The base cases are simple:
f(n, 0) = 1 for any n >= 0
f(0, m) = 1 for any m >= 0
In both cases, it's the empty pyramid that we are counting here.
Now, all we need still is a recursive formula for the general case.
Let's assume we are given n and m and choose to have i bricks on the bottom layer. What can we place on top of this layer? A smaller pyramid, for which we have n - i bricks left and whose base has size < i. This is exactly f(n - i, i).
What is the range for i? We can choose an empty row so i >= 0. Obviously, i <= n because we have only n bricks. But also, i <= m - 1, by definition of m.
This leads to the recursive expression:
f(n, m) = sum f(n - i, i) for 0 <= i <= min(n, m - 1)
You can compute f recursively, but using dynamic programming it will be faster of course. Storing the results matrix is straightforward though, so I leave that up to you.
Coming back to the original claim that f(N, N+1)-1 is the answer you are looking for, it doesn't really matter which value to choose for m as long as it is > N. Based on the recursive formula it's easy to show that f(N, N + 1) = f(N, N + k) for every k >= 1:
f(N, N + k) = sum f(N - i, i) for 0 <= i <= min(N, N + k - 1)
= sum f(N - i, i) for 0 <= i <= N
= sum f(N - i, i) for 0 <= i <= min(N, N + 1 - 1)
In how many ways can you build a pyramid of width n? By putting any pyramid of width n-1 or less anywhere atop the layer of n bricks. So if p(n) is the number of pyramids of width n, then p(n) = sum [m=1 to n-1] (p(m) * c(n, m)), where c(n, m) is the number of ways you can place a layer of width m atop a layer of width n (I trust that you can work that one out yourself).
This, however, doesn't place a limitation on the number of bricks. Generally, in DP, any resource limitation must be modeled as a separate dimension. So your problem is now p(n, b): "How many pyramids can you build of width n with a total of b bricks"? In the recursive formula, for each possible way of building a smaller pyramid atop your current one, you need to refer to the correct amount of remaining bricks. I leave it as a challenge for you to work out the recursive formula; let me know if you need any hints.
You can think of your recursion as: given x bricks left where you used n bricks on last row, how many pyramids can you build. Now you can fill up rows from either top to bottom row or bottom to top row. I will explain the former case.
Here the recursion might look something like this (left is number of bricks left and last is number of bricks used on last row)
f(left,last)=sum (1+f(left-i,i)) for i in range [last+1,left] inclusive.
Since when you use i bricks on current row you will have left-i bricks left and i will be number of bricks used on this row.
Code:
int calc(int left, int last) {
int total=0;
if(left<=0) return 0; // terminal case, no pyramid with no brick
for(int i=last+1; i<=left; i++) {
total+=1+calc(left-i,i);
}
return total;
}
I will leave it to you to implement memoized or bottom-up dp version. Also you may want to start from bottom row and fill up upper rows in pyramid.
Since we are asked to count pyramids of any cardinality less than or equal to n, we may consider each cardinality in turn (pyramids of 1 element, 2 elements, 3...etc.) and sum them up. But in how many different ways can we compose a pyramid from k elements? The same number as the count of distinct partitions of k (for example, for k = 6, we can have (6), (1,5), (2,4), and (1,2,3)). A generating function/recurrence for the count of distinct partitions is described in Wikipedia and a sequence at OEIS.
Recurrence, based on the Pentagonal number Theorem:
q(k) = ak + q(k − 1) + q(k − 2) − q(k − 5) − q(k − 7) + q(k − 12) + q(k − 15) − q(k − 22)...
where ak is (−1)^(abs(m)) if k = 3*m^2 − m for some integer m and is 0 otherwise.
(The subtracted coefficients are generalized pentagonal numbers.)
Since the recurrence described in Wikipedia obliges the calculation of all preceding q(n)'s to arrive at a larger q(n), we can simply sum the results along the way to obtain our result.
JavaScript code:
function numPyramids(n){
var distinctPartitions = [1,1],
pentagonals = {},
m = _m = 1,
pentagonal_m = 2,
result = 1;
while (pentagonal_m / 2 <= n){
pentagonals[pentagonal_m] = Math.abs(_m);
m++;
_m = m % 2 == 0 ? -m / 2 : Math.ceil(m / 2);
pentagonal_m = _m * (3 * _m - 1);
}
for (var k=2; k<=n; k++){
distinctPartitions[k] = pentagonals[k] ? Math.pow(-1,pentagonals[k]) : 0;
var cs = [1,1,-1,-1],
c = 0;
for (var i in pentagonals){
if (i / 2 > k)
break;
distinctPartitions[k] += cs[c]*distinctPartitions[k - i / 2];
c = c == 3 ? 0 : c + 1;
}
result += distinctPartitions[k];
}
return result;
}
console.log(numPyramids(6)); // 13

From a loop index k, obtain pairs i,j with i < j?

I need to traverse all pairs i,j with 0 <= i < n, 0 <= j < n and i < j for some positive integer n.
Problem is that I can only loop through another variable, say k. I can control the bounds of k. So the problem is to determine two arithmetic methods, f(k) and g(k) such that i=f(k) and j=g(k) traverse all admissible pairs as k traverses its consecutive values.
How can I do this in a simple way?
I think I got it (in Python):
def get_ij(n, k):
j = k // (n - 1) # // is integer (truncating) division
i = k - j * (n - 1)
if i >= j:
i = (n - 2) - i
j = (n - 1) - j
return i, j
for n in range(2, 6):
print n, sorted(get_ij(n, k) for k in range(n * (n - 1) / 2))
It basically folds the matrix so that it's (almost) rectangular. By "almost" I mean that there could be some unused entries on the far right of the bottom row.
The following pictures illustrate how the folding works for n=4:
and n=5:
Now, iterating over the rectangle is easy, as is mapping from folded coordinates back to coordinates in the original triangular matrix.
Pros: uses simple integer math.
Cons: returns the tuples in a weird order.
I think I found another way, that gives the pairs in lexicographic order. Note that here i > j instead of i < j.
Basically the algorithm consists of the two expressions:
i = floor((1 + sqrt(1 + 8*k))/2)
j = k - i*(i - 1)/2
that give i,j as functions of k. Here k is a zero-based index.
Pros: Gives the pairs in lexicographic order.
Cons: Relies on floating-point arithmetic.
Rationale:
We want to achieve the mapping in the following table:
k -> (i,j)
0 -> (1,0)
1 -> (2,0)
2 -> (2,1)
3 -> (3,0)
4 -> (3,1)
5 -> (3,2)
....
We start by considering the inverse mapping (i,j) -> k. It isn't hard to realize that:
k = i*(i-1)/2 + j
Since j < i, it follows that the value of k corresponding to all pairs (i,j) with fixed i satisfies:
i*(i-1)/2 <= k < i*(i+1)/2
Therefore, given k, i=f(k) returns the largest integer i such that i*(i-1)/2 <= k. After some algebra:
i = f(k) = floor((1 + sqrt(1 + 8*k))/2)
After we have found the value i, j is trivially given by
j = k - i*(i-1)/2
I'm not sure to understand exactly the question, but to sum up, if 0 <= i < n, 0 <= j < n , then you want to traverse 0 <= k < n*n
for (int k = 0; k < n*n; k++) {
int i = k / n;
int j = k % n;
// ...
}
[edit] I just saw that i < j ; so, this solution is not optimal since there's less that n*n necessary iterations ...
If we think of our solution in terms of a number triangle, where k is the sequence
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
...
Then j would be our (non zero-based) row number, that is, the greatest integer such that
j * (j - 1) / 2 < k
Solving for j:
j = ceiling ((sqrt (1 + 8 * k) - 1) / 2)
And i would be k's (zero-based) position in the row
i = k - j * (j - 1) / 2 - 1
The bounds for k are:
1 <= k <= n * (n - 1) / 2
Is it important that you actually have two arithmetic functions f(k) and g(k) doing this? Because you could first create a list such as
L = []
for i in range(n-1):
for j in range(n):
if j>i:
L.append((i,j))
This will give you all the pairs you asked for. Your variable k can now just run along the index of the list. For example, if we take n=5,
for x in L:
print(x)
gives us
(0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
Suppose your have 2<=k<5 for example, then
for k in range(2, 5)
print L[k]
yields
(0,3), (0,4), (1,2)

Counting number of points in lower left quadrant?

I am having trouble understanding a solution to an algorithmic problem
In particular, I don't understand how or why this part of the code
s += a[i];
total += query(s);
update(s);
allows you to compute the total number of points in the lower left quadrant of each point.
Could someone please elaborate?
As an analogue for the plane problem, consider this:
For a point (a, b) to lie in the lower left quadrant of (x, y), a <
x & b < y; thus, points of the form (i, P[i]) lie in the lower left quadrant
of (j, P[j]) iff i < j and P[i] < P[j]
When iterating in ascending order, all points that were considered earlier lie on the left compared to the current (i, P[i])
So one only has to locate all P[j]s less that P[i] that have been considered until now
*current point refers to the point in consideration in the current iteration of the for loop that you quoted ie, (i, P[i])
Let's define another array, C[s]:
C[s] = Number of Prefix Sums of array A[1..(i - 1)] that amount to s
So the solution to #3 becomes the sum ... C[-2] + C[-1] + C[0] + C[1] + C[2] ... C[P[i] - 1], ie prefix sum of C[P[i]]
Use the BIT to store the prefix sum of C, thus defining query(s) as:
query(s) = Number of Prefix Sums of array A[1..(i - 1)] that amount to a value < s
Using these definitions, s in the given code gives you the prefix sum up to the current index i (P[i]). total builds the answer, and update simply adds P[i] to the BIT.
We have to repeat this method for all i, hence the for loop.
PS: It uses a data structure called a Binary Indexed Tree (http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees) for operations. If you aren't acquainted with it, I'd recommend that you check the link.
EDIT:
You are given a array S and a value X. You can split S into two disjoint subarrays such that L has all elements of S less than X, and H that has those that are greater than or equal to X.
A: All elements of L are less than all elements of H.
Any subsequence T of S will have some elements of L and some elements of H. Let's say it has p elements of L and q of H. When T is sorted to give T', all p elements of L appear before the q elements of H because of A.
Median being the central value is the value at location m = (p + q)/2
It is intuitive to think that having q >= p implies that the median lies in X, as a proof:
Values in locations [1..p] in T' belong to L. Therefore for the median to be in H, it's position m should be greater than p:
m > p
(p + q)/2 > p
p + q > 2p
q > p
B: q - p > 0
To computer q - p, I replace all elements in T' with -1 if they belong to L ( < X ) and +1 if they belong to H ( >= X)
T looks something like {-1, -1, -1... 1, 1, 1}
It has p times -1 and q times 1. Sum of T' will now give me:
Sum = p * (-1) + q * (1)
C: Sum = q - p
I can use this information to find the value in B.
All subsequences are of the form {A[i], A[i + 2], A[i + 3] ... A[j + 1]} since they are contiguous, To compute sum of A[i] to A[j + 1], I can compute the prefix sum of A[i] with P[i] = A[1] + A[2] + .. A[i - 1]
Sum of subsequence from A[i] to A[j] then can be computed as P[j] - P[i] (j is greater of j and i)
With C and B in mind, we conclude:
Sum = P[j] - P[i] = q - p (q - p > 0)
P[j] - P[i] > 0
P[j] > P[i]
j > i and P[j] > P[i] for each solution that gives you a median >= X
In summary:
Replace all A[i] with -1 if they are less than X and -1 otherwise
Computer prefix sums of A[i]
For each pair (i, P[i]), count pairs which lie to its lower left quadrant.

Resources