Pair up strings to form palindromes - algorithm

Given N strings each of at max 1000 length. We can concatenate pair of strings by ends. Like if one is "abc" and other is "cba" then we can get "abccba" as well as "cbaabc". Some string may be left without concatenation to any other string. Also no string can be concatenated to itself.
We can only concatenate those two strings that form a palindrome. So I need to tell the minimum number of strings left after making such pairs.
Example : Let we have 9 strings :
aabbaabb
bbaabbaa
aa
bb
a
bbaa
bba
bab
ab
Then here answer is 5
Explanation : Here are 5 strings :
"aabbaabb" + "bbaabbaa" = "aabbaabbbbaabbaa"
"aa" + "a = "aaa"
"bba" + "bb" = "bbabb"
"bab" + "ab" = "babab"
"bbaa"
Also there can be 1000 such strings in total.

1) Make a graph where we have one node for each word.
2) Go through all pairs of words and check if they form palindrome if we concatenate them. If they do connect corresponding nodes in graph with edge.
3) Now use matching algorithm to find maximum number of edges you can match: http://en.wikipedia.org/wiki/Blossom_algorithm
Time complexity: O(N) for point 1, O(n*n*1000) for point 2 and O(V^4) for point 3 yielding total complexity of O(n^4).

Related

Number of substrings of a given string containing a specific character

What can be the most efficient algorithm to count the number of substrings of a given string that contain a given character.
e.g. for abb b
sub-strings : a, b, b, ab, bb, abb.
Answer : strings containg b atlest once = 5.
PS. i solved this question by generating all the substrings and then checking in O(n ^ 2). Just want to know whether there can be a better solution to this.
Let you need to find substrings with character X.
Scan string left to right, keeping position of the last X: lastX with starting value -1
When you meet X at position i, add i+1 to result and update lastX
(this is number of substrings ending in current position and they all contain X)
When you meet another character, add lastX + 1 to result
(this is again number of substrings ending in current position and containing X),
because the rightmost possible start of substring is position of the last X
Algorithm is linear.
Example:
a X a a X a
good substrings overall
idx char ending at idx lastX count count
0 a - -1 0 0
1 X aX X 1 2 2
2 a aXa Xa 1 2 4
3 a aXaa Xaa 1 2 6
4 X aXaaX XaaX aaX aX X 4 5 11
5 a aXaaXa XaaXa aaXa aXa Xa 4 5 16
Python code:
def subcnt(s, c):
last = -1
cnt = 0
for i in range(len(s)):
if s[i] == c:
last = i
cnt += last + 1
return cnt
print(subcnt('abcdba', 'b'))
You could turn this around and scan your string for occurrences of your letter. Every time you find an occurrence in some position i, you know that it is contained by definition in all the substrings that contain it (i.e. all substrings which start before or at i and end at or after i), so you only need to store pairs of indices to define substrings instead of storing substrings explicitly.
That being said, you'll still need O(n²) with this approach because although you don't mind repeated substrings as your example shows, you don't want to count the same substring twice, so you still have to make sure that you don't select the same pair of indices twice.
Let's consider the string as abcdaefgabb and the given character as a.
Loop over the string char by char.
If a character matches a given character, let's say a at index 4, so number of substrings which will contain a is from abcda to aefgabb. So, we add (4-0 + 1) + (10 - 4) = 11. These represent substrings as abcda,bcda,cda,da,a,ae,aef,aefg,aefga,aefgab and aefgabb.
This applies to wherever you find a, like you find it at index 0 and also at index 8.
Final answer is the sum of above mentioned math operations.
Update: You will have to maintain 2 pointers between last occurred a and the current a to avoid calculating duplicate substrings which start end end with the same index.
Think of a substring as selecting two elements from the gaps between the letters in your string and including everything between them (where there are gaps on the extreme ends of the string).
For a string of length n, there are choose(n+1,2) substrings.
Of those, for each run of k characters that doesn't include the target, there are choose(k+1,2) substrings that only include letters from that substring. All other substrings of the main string must include the target.
Answer: choose(n+1,2) - sum(choose(k_i+1,2)), where the k_i are the lengths of runs of letters that don't include the target.

Algorithm for all the possible positions of dots between letters

Say I have a word like "welcome", I want to insert a full-stops between the letters of the word.
So it would be "welcome", "w.elcome", we.lcome",...... Until it reaches "w.e.l.c.o.m.e".
I need an algorithm that will give me all the possible combinations of letters and full-stops for any given word.
Consider a set S containing all the possible positions of .. Now, Length of the set S would be lengthOfTheString - 1.
Now you just need to find all possible subsets of set S and while processing each of them, You can mark the positions present in the subset as ..
How to generate a powerset.
This way you can generate all the possible combinations.
Ex. String : "abc"
a b c
^ ^
1 2
Subsets :
{} abc
{1} a.bc
{2} ab.c
{1,2} a.b.c

Number of possible palindrome anagrams for a given word

I have to find No. of palindrome anagrams are possible for a given word.
Suppose the word is aaabbbb.My approach is
Prepare a hash map that contains no. of time each letter is appearing
For my example it will be
a--->3
b--->4
If length of string is even then no. of occurrence of each letter should be even to form palindrome of given word else no of
palindrome anagrams is 0
If length of string is odd then at max one occurrence of letter can be odd and other should be even.
This two above steps was for finding that weather a given word can can form palindrome or not.
Now for finding no of palindrome anagrams, what approach should I follow?
First thing to notice is that if the word is an odd length, then there must be exactly one character with an odd number of occurrences. If the word is an even length, then there must be no characters with an odd number of occurrences. In either case, you're looking for how many ways you can arrange the pairs of characters. You're looking for the number of permutations since order matters:
n = number of character pairs (aaaabbb would have 3 pairs, aabbcccc would have 4 pairs)
(n)!/( number_of_a_pairs! * number_of_b_pairs! * etc..)
So in the aaaabbb case, you're finding the permutations of aab:
3!/2!1! = 3
baa = baabaab
aba = abababa
aab = aabbbaa
And in the aabbcccc case, you're finding the permutations of abcc:
4!/2! = 12:
abcc
acbc
accb
bacc
bcac
bcca
cabc
cacb
cbac
cbca
ccab
ccba

Find all substrings that don't contain the entire set of characters

This was asked to me in an interview.
I'm given a string whose characters come from the set {a,b,c} only. Find all substrings that dont contain all the characters from the set.For e.g, substrings that contain only a's, only b's, only c's or only a,b's or only b,c's or only c,a's. I gave him the naive O(n^2) solution by generating all substrings and testing them.
The interviewer wanted an O(n) solution.
Edit: My attempt was to have the last indexes of a,b,c and run a pointer from left to right, and anytime all 3 were counted, change the start of the substring to exclude the earliest one and start counting again. It doesn't seem exhaustive
So for e.g, if the string is abbcabccaa,
let i be the pointer that traverses the string. Let start be start of the substring.
1) i = 0, start = 0
2) i = 1, start = 0, last_index(a) = 0 --> 1 substring - a
3) i = 2, start = 0, last_index(a) = 0, last_index(b) = 1 -- > 1 substring ab
4) i = 3, start = 0, last_index(a) = 0, last_index(b) = 2 --> 1 substring abb
5) i = 4, start = 1, last_index(b) = 2, last_index(c) = 3 --> 1 substring bbc(removed a from the substring)
6) i = 5, start = 3, last_index(c) = 3, last_index(a) = 4 --> 1 substring ca(removed b from the substring)
but this isn't exhaustive
Given that the problem in its original definition can't be solved in less than O(N^2) time, as some comments point out, I suggest a linear algorithm for counting the number of substrings (not necessarily unique in their values, but unique in their positions within the original string).
The algorithm
count = 0
For every char C in {'a','b','c'} scan the input S and break it into longest sequences not including C. For each such section A, add |A|*(|A|+1)/2 to count. This addition stands for the number of legal sub-strings inside A.
Now we have the total number of legal strings including only {'a','b'}, only {'a','c'} and only {'b','c'}. The problem is that we counted substrings with a single repeated character twice. To fix this we iterate over S again, this time subtracting |A|*(|A|+1)/2 for every largest sequence A of a single character that we encounter.
Return count
Example
S='aacb'
breaking it using 'a' gives us only 'cb', so count = 3. For C='b' we have 'aac', which makes count = 3 + 6 = 9. With C='c' we get 'aa' and 'b', so count = 9 + 3 + 1 = 13. Now we have to do the subtraction: 'aa': -3, 'c': -1, 'b': -1. So we have count=8.
The 8 substrings are:
'a'
'a' (the second char this time)
'aa'
'ac'
'aac'
'cb'
'c'
'b'
To get something better than O(n) we may need additional assumptions (maybe longest substrings with this property).
Consider a string of the form aaaaaaaaaabbbbbbbbbb of length n. There is at least O(n^2) possible substrings so if we want to list them all we need O(n^2) time.
I came up with a linear solution for the longest substrings.
Take a set S of all substrings separated by a, all substrings separated by b and finally all substrings separated by c. Each of those steps can be done in O(n), so we have O(3n), thus O(n).
Example:
Take aaabcaaccbaa.
In this case set S contains:
substrings separated by a: bc, ccb
substrings separated by b: aaa, caacc
substrings separated by c: aaab, aa, baa.
By the set I mean a data structure with adding and finding element with a given key in O(1).

Algorithm to find

the logic behind this was (n-2)3^(n-3) has lots of repetitons like (abc)***(abc) when abc is at start and at end and the strings repated total to 3^4 . similarly as abc moves ahead and number of sets of (abc) increase
You can use dynamic programming to compute the number of forbidden strings.
The algorithms follow from the observation below:
"Legal string of size n is the legal string of size n - 1 extended with one letter, so that the last three letters of the resulting string are not all distinct."
So if we had all the legal strings of size n-1 we could try extending them to obtain the legal strings of size n.
To check whether the extended string is legal we just need to know the last two letters of the previous string (of size n-1).
In the algorithm we will compute two arrays, where
different[i] # number of legal strings of length i in which last two letters are different
same[i] # number of legal strings of length i in which last two letters are the same
It can be easily proved that:
different[i+1] = different[i] + 2*same[i]
same[i+1] = different[i] + same[i]
It is the consequence of the following facts:
Any 'same' string of size i+1 can be obtained either from 'same' string of size i (think BB -> BBB) or from 'different' string (think AB -> ABB) and these are the only options.
Any 'different' string of size i+1 can be obtained either from 'different' string of size i (think AB-> ABA ) or from the 'same' string in two ways (AA -> AAB or AA -> AAC)
Having observed all this it is easy to write an algorithm that computes the result in O(n) time.
I suggest you use recursion, and look at two numbers:
F(n), the number of legal strings of length n whose last two symbols are the same.
G(n), the number of legal strings of length n whose last two symbols are different.
Is that enough to go on?
get the ASCII values of the last three letters and add the square values of these letters. If it gives a certain result, then it is forbidden. For A, B and C, it would be fine.
To do this:
1) find out how to get characters from your string.
2) find out how to get ASCII value of a character.
3) Multiply these ASCII values with themselves.
4) Do that for the three letters each time and add their values.

Resources