Could someone please explain me how this particular line works - data-structures

I was trying to solve this problem on leetcode : https://leetcode.com/problems/maximum-product-of-word-lengths/
I tried many approaches but was not able to come up with an efficient solution.
After, going through the discussion portal I found this solution.
Could someone please tell me how this line works:
value[i] |= 1 << (tmp.charAt(j) - 'a');
This is the code:
if (words == null || words.length == 0)
return 0;
int len = words.length;
int[] value = new int[len];
for (int i = 0; i < len; i++) {
String tmp = words[i];
value[i] = 0;
for (int j = 0; j < tmp.length(); j++) {
value[i] |= 1 << (tmp.charAt(j) - 'a');
}
}
int maxProduct = 0;
for (int i = 0; i < len; i++)
for (int j = i + 1; j < len; j++) {
if ((value[i] & value[j]) == 0 && (words[i].length() * words[j].length() > maxProduct))
maxProduct = words[i].length() * words[j].length();
}
return maxProduct;
}

value[i] |= 1 << (tmp.charAt(j) - 'a');
(tmp.charAt(j) - 'a') uses the ASCII values of the characters and does a subtraction returning an integer. E.g. 'a'-'a' is 0, 'b'-'a' is '1', 'c'-'a' is 2, ... It is basically arithmetic. Note that the result is an integer and you can see it is like indexing the character (which is used for the bit place in a bit pattern).
1 << n means shift n times to left inserting 0s. E.g. if 1 is a 32 bit number 000000..00001 and you do 1 << 3 you will get 000000...01000. You have the index of the character and you can now shift to the index for that character, in this just mentioned example you must have had a d since you shifted 3 times.
a |= b is equivalent to a = a | bIt is used to combine with the OR operation the bit pattern of the word.
Operators have precedences that will determine the order of operations. In this case (I suppose Java) one can figure out the operator precedences of |= (assignment) and << (shift). By putting an expression in parentheses you give that expression higher precedence (as you know it from math I guess).
So in order the expression is evaluated as follows:
(tmp.charAt(j) - 'a') (calculate the index of the j-th character; let's call the index X)
1 << X (calculates the bis position for that character; let's call the result Y)
value[i] |= Y, which is equivalent to value[i] = value[i] | Y(add the bit position to the bit pattern)
The idea is the following:
You have a list of words. So you iterate over that list.
You keep an array value which has a size equal to the number of words you iterate over. So value[i] is the bit pattern for the word[i]
you iterate over the characters of each word and store the bit pattern for that word (the part your question is about is to get the bit place in that bit pattern for the character of the word). It is ORed (|) so that the bit pattern represents all characters found in that word.
When you have a bit pattern for one word and another bit pattern for another word, you can AND (&) them: when the result 0 then this means the words have not characters in common, otherwise they have at least one character in common.

Related

Trouble understanding a part of an algorithm

The problem is to find special strings
A string is said to be a special string if either of two conditions is met:
All of the characters are the same, e.g. aaa.
All characters except the middle one are the same, e.g. aadaa.
This is the code I got and I understand the two cases.
Case 1: All Palindromic substrings have the same character
Case 2:Count all odd length Special Palindromic substrings with the
the middle character is different.
What I cannot understand is why I have to delete n from the result, I don't see where I am adding the extra 'n' in the algorithm.
int CountSpecialPalindrome(string str)
{
int n = str.length();
int result = 0;
int sameChar[n] = { 0 };
int i = 0;
// traverse string character from left to right
while (i < n) {
// store same character count
int sameCharCount = 1;
int j = i + 1;
// count smiler character
while (str[i] == str[j] && j < n)
sameCharCount++, j++;
// Case : 1
// so total number of substring that we can
// generate are : K *( K + 1 ) / 2
// here K is sameCharCount
result += (sameCharCount * (sameCharCount + 1) / 2);
// store current same char count in sameChar[]
// array
sameChar[i] = sameCharCount;
// increment i
i = j;
}
// Case 2: Count all odd length Special Palindromic
// substring
for (int j = 1; j < n; j++)
{
// if current character is equal to previous
// one then we assign Previous same character
// count to current one
if (str[j] == str[j - 1])
sameChar[j] = sameChar[j - 1];
// case 2: odd length
if (j > 0 && j < (n - 1) &&
(str[j - 1] == str[j + 1] &&
str[j] != str[j - 1]))
result += min(sameChar[j - 1],
sameChar[j + 1]);
}
// subtract all single length substring
return result - n;
}
// driver program to test above fun
int main()
{
string str = "abccba";
cout << CountSpecialPalindrome(str) << endl;
return 0;
}

minimum columns to be deleted in a matrix to make it row-wise lexicographically sorted

I was trying to solve this hiring contest problem (now closed)
Lexicographic Rows
You are given a matrix of characters. In one operation you can remove
a column of the matrix. You can perform as many operations as you
want. Your task is to make the final matrix interesting i.e. the
string formed by the characters of row is lexicographically smaller
or equal to the string formed by the characters of the row. You need
to use minimum number of operations possible. An empty matrix is
always a interesting matrix.
Input
The first line contains two integers and . The next lines contain
letters each.
Output
In the output, you need to print the minimum number of operations to
make the matrix interesting.
Constraints
There are only lowercase English alphabets as characters in the input.
Sample Input
3 3
cfg
agk
dlm
Sample Output
1
Explanation
Delete the first column to make the matrix interesting.
I'm pretty convinced this is a DP problem. I was having difficulties finding the optimal subproblem though. I managed to pass only a couple of test cases
I defined dp[i][j] as the minimum number of the columns to be removed to have an interesting matrix.
And for every character input[i][j] there are two possibilities.
if the previous entry is lexicographically valid we can take dp[i][j - 1] and the current input isn't going to change anything.
else we check if the input[i -1][j] and input[i][j] if they are in the correct order we consider dp[i][j - 1] else this column is invalid too so we add 1 to dp[i][j-1]
My soln. code
int n, m;
cin >> n >> m;
vector<string> input(n);
for (int i = 0; i < n; ++i) {
string temp = "";
for (int j = 0; j < m; ++j) {
char c;
cin >> c;
temp = temp + c;
}
input[i] = temp;
}
vector<vector<int> > dp(n, vector<int>(m, 0));
for (int i = 1; i < n; ++i) {
for (int j = 1; j < m; ++j) {
//Left is valid
if (input[i - 1][j - 1] < input[i][j - 1]) {
dp[i][j] = dp[i][j - 1];
}
else {
//Current is valid
if (input[i - 1][j] <= input[i][j]) {
dp[i][j] = dp[i][j - 1];
}
else {
dp[i][j] = dp[i][j - 1] + 1;
}
}
}
}
cout << dp[n - 1][m - 1] << endl;
We can iterate through the columns left to right, choosing the ones whose inclusion wouldn't make the current matrix uninteresting. Properly implemented, this will take time linear in the size of the input.
The key fact supporting this algorithm is that, given two interesting subsets of columns, we can add the first column missing from one to the other without making it uninteresting.

How do I obtain string subsequence indices after counting number of subsequences?

Given the following algorithm to count the number of times a string appears as a subsequence of another and give me the final number, how would I implement a routine to give me the indices of the strings. eg if there are 4 string appearing as a subsequence of another how would I find the indices of each string?
[1][4][9] the first string
From my own attempts to solve the problem there is a pattern on the dp lookup table which I see visually but struggle to implement in code, how would I add a backtracking that would give me the indices of each string subsequence as it appears. In the example I know the number of times the string will appear as a subsequence but I want to know the string indices of each subsequence appearance, as stated I can determine this visually when I look at the lookup table values but struggle to code it? I know the solution lies in the backtracking the tabular lookup container
int count(string a, string b)
{
int m = a.length();
int n = b.length();
int lookup[m + 1][n + 1] = { { 0 } };
// If first string is empty
for (int i = 0; i <= n; ++i)
lookup[0][i] = 0;
// If second string is empty
for (int i = 0; i <= m; ++i)
lookup[i][0] = 1;
// Fill lookup[][] in bottom up
for (int i = 1; i <= m; i++)
{
for (int j = 1; j <= n; j++)
{
// we have two options
//
// 1. consider last characters of both strings
// in solution
// 2. ignore last character of first string
if (a[i - 1] == b[j - 1])
lookup[i][j] = lookup[i - 1][j - 1] +
lookup[i - 1][j];
else
// If last character are different, ignore
// last character of first string
lookup[i][j] = lookup[i - 1][j];
}
}
return lookup[m][n];
}
int main(void){
string a = "ccaccbbbaccccca";
string b = "abc";
cout << count(a, b);
return 0;
}
You can do it recursively (essentially you'll just be doing the same thing in another direction):
def gen(i, j):
// If there's no match, we're done
if lookup[i][j] == 0:
return []
// If one of the indices is 0, the answer is an empty list
// which means an empty sequence
if i == 0 or j == 0:
return [[]]
// Otherwise, we just do all transitions backwards
// combine the results
res = []
if a[i - 1] == b[j - 1]:
res = gen(i - 1, j - 1)
for elem in res:
elem.append(a[i - 1])
return res + gen(i - 1, j)
The idea is to do exactly the same thing we use to compute the answer, but to return a list of indices instead of the number of ways.
I haven't tested the code above, so it may contain minor bugs, but I think the idea is clear.

One of the solution for finding the longest palindromic substring could not be understood

Refer to this article on leetcode, there's a common mistake for solving the longest palindromic substring problem:
Reverse S and become S’. Find the longest common substring between S and S’, which must also be the longest palindromic substring.
For instance:
S = “abacdfgdcaba”, S’ = “abacdgfdcaba”.
The longest common substring between S and S’ is “abacd”. Clearly, this is not a valid palindrome.
But the following rectification I could not understand well. Could anyone explain it with an step-by-step procedure/example? Thanks!
To rectify this, each time we find a longest common substring candidate, we check if the substring’s indices are the same as the reversed substring’s original indices.
I am stuck there, so I google to reach here. Now I understand.Let me take original String which the author mentioned as a example.
S = "caba', S' = "abac", so longest common substring is aba.
The sentence is "we check if the substring’s indices are the same as the reversed substring’s original indices."
1.What is the substring’s indices?
"aba" is [1, 2, 3]
2.what is reversed substring’s original indices?
Reversed substring is "aba", and its original indices is also [1, 2, 3].
So that answer is correct.
And we are looking at another example.
S="abacdfgdcaba", S' = "abacdgfdcaba", so longest common substring is "abacd".
So same process:
1.What is the substring’s indices?
"abacd" is [0, 1, 2, 3, 4].
2.what is reversed substring’s original indices?
Reversed substring is "abacd", but its original indices is also [7, 8, 9, 10, 11].
So These two "abacd" is not same one, the answer is not correct.
I think that sentence is a bit tricky, and I think the author made it a little hard to understand. ​
I think the sentence should be changed to "To rectify this, each time we find a longest common substring candidate, we check if the substring are the same one as the reversed substring’."
​
In case someone is looking for an implementation of the this question, where we use Longest Common Substring to find the Longest Palindromic Substring, this is my version of it.
package leetcodeproblems;
/**
* Created by Nandan Mankad on 23-11-19.
*/
public class LongestPalindromicSubstring {
public static void main(String[] args) {
/*String s = "aacdefcaa";*/
/*String s = "dabcbae";*/
/*String s = "babad";*/
/*String s = "cbbd";*/
/*String s = "cbbc";*/
String s = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
System.out.println(longestPalindrome(s));
}
public static String longestPalindrome(String s) {
if (s == null || s.length() == 0) {
return s;
}
StringBuilder sb = new StringBuilder(s);
String revString = sb.reverse().toString(); // Reversing the string - to compute LCS.
int dp[][] = new int[s.length() + 1][s.length() + 1];
int maxLength = 0;
int iPos = 0;
for (int i = 1; i < dp.length; i++) {
for (int j = 1; j < dp.length; j++) {
if (s.charAt(i - 1) == revString.charAt(j - 1)) { // Character matches in Original String and Reversed String.
dp[i][j] = dp[i - 1][j - 1] + 1; // Standard Longest Common Substring logic for Original and Reversed String.
int revIdx = i;
int forIdx = j - dp[i][j] + 1;
if (s.length() - revIdx + 1 == forIdx) { // Here we check if the reverse string original idx is same as original string idx.
if (maxLength < dp[i][j]) {
maxLength = dp[i][j];
iPos = i;
}
}
} else {
dp[i][j] = 0;
}
}
}
StringBuilder res = new StringBuilder();
while (maxLength > 0) {
res.append(s.charAt(iPos- 1));
maxLength--;
iPos--;
}
return res.toString();
}
}
Link
It passes all the testcases on Leetcode for the problem. Improvisations might be possible where we iterate the string in reverse and not reverse the actual string, etc.
Time Complexity: O(N^2)
Space Complexity: O(N^2)
There are efficient algorithms for computing the longest palindromic substring by reversing the original substring. For example, the following:
1) create a generalized string S1#S2$ which takes O(n)
1) construct a suffix array O(n) -> not trivial, but there are easy O(nlogn) and O(nlog^2n) algorithms
2) construct an lcp array O(n) -> trivial (there is also a trivial O(nlogn) one).
3) construct an RMQ data structure: O(n) construction and O(1) querying -> not trivial (there are trivial O(nlogn) construction and O(logn) query ones)
4) Iterate over every position i in the original string S1. Find the complement position in S2 in the generalized string. Find the longest common prefix: O(1)
In general, the mentioned approach must be modified for even and odd length palindromes. The distinction is that in the odd length palindrome you just introduce a gap when selecting the indices.
This yields an O(n) solution to the problem.
Regarding the article:
The author mentions that finding the longest common substring is not enough since the two substrings with such lcp may not be neighbours in the original string.
Therefore, we want to find two strings A and B, one belonging to S1 and one to S2, such that lcp(A,B) is largest, but also A . rev(B) is in the original S1.
I hope I have been clear enough.
I am just starting out with leetcode and came across the aforementioned article.
Tried to implement the exact same logic. Was successful but it is not much efficient than the DP solution mentioned Longest palindromic substring | Dynamic programming
You can simply check whether the ranges are equal to implement the logic in the algorithm.
Here is my solution implementing the article. Takes a runtime of
364 ms, which is just faster than 21.99% of C++ online submissions
class Solution {
public:
string longestPalindrome(string s) {
int max = 0,temp,mr=0;
string revs = s;
reverse(revs.begin(), revs.end());
string soln;
int l = s.size();
if(l==0 || l==1) return s;
int dp[l+1][l+1];
for(int row = 0; row < l; row ++)
for(int col = 0; col < l; col ++){
if(s.at(row) != revs.at(col))
dp[row+1][col+1] = 0;
else{
if(row==0 or col ==0) temp = 1;
else temp = dp[row][col] + 1;
dp[row+1][col+1] = temp;
if(temp > max && row-temp+1 == l-col -1 && l-row -1 == col-temp +1 ){
mr = row;max = temp;
}
}
}
/* for(int row = 0; row < l+1; row ++){
for(int col = 0; col < l+1; col ++){
if(row == 0 && col > 0)
cout << revs.at(col-1) << ", ";
else if(col == 0 && row > 0)
cout << s.at(row-1) << ", ";
else
cout << dp[row][col] << ", ";
}
cout << endl;
}
cout << "\n___________\nmax:" << max <<"\nmr: " << mr << "\n"; */
//return (max>1)?s.substr(mr-max+1,max):s.substr(0,1);
return s.substr(mr-max+1,max);
}
};

Split a string to a string of valid words using Dynamic Programming

I need to find a dynamic programming algorithm to solve this problem. I tried but couldn't figure it out. Here is the problem:
You are given a string of n characters s[1...n], which you believe to be a corrupted text document in which all punctuation has vanished (so that it looks something like "itwasthebestoftimes..."). You wish to reconstruct the document using a dictionary, which is available in the form of a Boolean function dict(*) such that, for any string w, dict(w) has value 1 if w is a valid word, and has value 0 otherwise.
Give a dynamic programming algorithm that determines whether the string s[*] can be reconstituted as a sequence of valid words. The running time should be at most O(n^2), assuming that each call to dict takes unit time.
In the event that the string is valid, make your algorithm output the corresponding sequence of words.
Let the length of your compacted document be N.
Let b(n) be a boolean: true if the document can be split into words starting from position n in the document.
b(N) is true (since the empty string can be split into 0 words).
Given b(N), b(N - 1), ... b(N - k), you can construct b(N - k - 1) by considering all words that start at character N - k - 1. If there's any such word, w, with b(N - k - 1 + len(w)) set, then set b(N - k - 1) to true. If there's no such word, then set b(N - k - 1) to false.
Eventually, you compute b(0) which tells you if the entire document can be split into words.
In pseudo-code:
def try_to_split(doc):
N = len(doc)
b = [False] * (N + 1)
b[N] = True
for i in range(N - 1, -1, -1):
for word starting at position i:
if b[i + len(word)]:
b[i] = True
break
return b
There's some tricks you can do to get 'word starting at position i' efficient, but you're asked for an O(N^2) algorithm, so you can just look up every string starting at i in the dictionary.
To generate the words, you can either modify the above algorithm to store the good words, or just generate it like this:
def generate_words(doc, b, idx=0):
length = 1
while true:
assert b(idx)
if idx == len(doc): return
word = doc[idx: idx + length]
if word in dictionary and b(idx + length):
output(word)
idx += length
length = 1
Here b is the boolean array generated from the first part of the algorithm.
To formalize what #MinhPham suggested.
This is a dynammic programming solution.
Given a string str, let
b[i] = true if the substring str[0...i] (inclusive) can be split into valid words.
Prepend some starting character to str, say !, to represent the empty word.
str = "!" + str
The base case is the empty string, so
b[0] = true.
For the iterative case:
b[j] = true if b[i] == true and str[i..j] is a word for all i < j
The O(N^2) Dp is clear but if you know the words of the dictionary, i think you can use some precomputations to get it even faster in O(N).
Aho-Corasick
A dp solution in c++:
int main()
{
set<string> dict;
dict.insert("12");
dict.insert("123");
dict.insert("234");
dict.insert("12345");
dict.insert("456");
dict.insert("1234");
dict.insert("567");
dict.insert("123342");
dict.insert("42");
dict.insert("245436564");
dict.insert("12334");
string str = "123456712334245436564";
int size = str.size();
vector<int> dp(size+1, -1);
dp[0] = 0;
vector<string > res(size+1);
for(int i = 0; i < size; ++i)
{
if(dp[i] != -1)
{
for(int j = i+1; j <= size; ++j)
{
const int len = j-i;
string substr = str.substr(i, len);
if(dict.find(substr) != dict.end())
{
string space = i?" ":"";
res[i+len] = res[i] + space + substr;
dp[i+len] = dp[i]+1;
}
}
}
}
cout << *dp.rbegin() << endl;
cout << *res.rbegin() << endl;
return 0;
}
The string s[] can potentially be split into more than one ways. The method below finds the maximum number of words in which we can split s[]. Below is the sketch/pseudocode of the algorithm
bestScore[i] -> Stores the maximum number of words in which the first i characters can be split (it would be MINUS_INFINITY otherwise)
for (i = 1 to n){
bestScore[i] = MINUS_INFINITY
for (k = 1 to i-1){
bestScore[i] = Max(bestSCore[i], bestScore[i-k]+ f(i,k))
}
}
Where f(i,k) is defined as:
f(i,k) = 1 : if s[i-k+1 to i] is in dictionary
= MINUS_INFINITY : otherwise
bestScore[n] would store the maximum number of words in which s[] can be split (if the value is MINUS_INFINIY, s[] cannot be split)
Clearly the running time is O(n^2)
As this looks like a textbook exercise, I will not write the code to reconstruct the actual split positions.
Below is an O(n^2) solution for this problem.
void findstringvalid() {
string s = "itwasthebestoftimes";
set<string> dict;
dict.insert("it");
dict.insert("was");
dict.insert("the");
dict.insert("best");
dict.insert("of");
dict.insert("times");
vector<bool> b(s.size() + 1, false);
vector<int> spacepos(s.size(), -1);
//Initialization phase
b[0] = true; //String of size 0 is always a valid string
for (int i = 1; i <= s.size(); i++) {
for (int j = 0; j <i; j++) {
//string of size s[ j... i]
if (!b[i]) {
if (b[j]) {
//check if string "j to i" is in dictionary
string temp = s.substr(j, i - j);
set<string>::iterator it = dict.find(temp);
if (it != dict.end()) {
b[i] = true;
spacepos[i-1] = j;
}
}
}
}
}
if(b[s.size()])
for (int i = 1; i < spacepos.size(); i++) {
if (spacepos[i] != -1) {
string temp = s.substr(spacepos[i], i - spacepos[i] + 1);
cout << temp << " ";
}
}
}

Resources