Using dynamic programming to count the number of permutations - algorithm

I have a string A of length N. I have to find number of strings (B) of length N that have M (M<=N) same characters as string A but satisfies the condition that A[i]!=B[i] for all i. Assume the characters that have to be same and the different ones are also given. What will be the recurrence relation to find number of such strings?
Example
123 is string A and M=1, and the character which is same is '1', and the new characters are '4' and '5'. The valid permutations are 451, 415, 514, 541. So it is a sort of derangement of 1 item of the given 3.
I am able to find the answer using inclusion-exclusion principle but wanted to know whether there is a recurrence relation to do the same?

Let us call g(M,N) the number of permutations satisfying your condition.
If M is 0, then the answer is N!
Otherwise, M>0 and consider placing the first character that is in string A.
There are M important positions corresponding to the places in the string where we are not allowed to place a certain character.
If we put our first character in one of these (M-1) important places (we cannot put it in position 1 due to the restriction), then we must take the place of one of the restricted characters, and so the number of restrictions reduces by 2 (1 for the character we place, and 1 for the character whose position we occupied).
If we put our first character in one of the N-M unimportant places, then we have only reduced the number of restrictions by 1.
Therefore the recurrence relation is:
g(M,N)=(M-1)g(M-2,N-1)+(N-M)g(M-1,N-1) if M>0
=N! if M=0
For your example, we wish to calculate g(1,3) (1 character matches, total of 3 characters placed)
g(1,3)=(3-1)g(0,2)
=(3-1).2!
=4

Related

Given a collection of strings, can we make all the strings equal

The allowed operations are removal of character from a string and adding that character to another string. We can repeat the operation as many times as we want.
Given list = ['CAA', 'CBB'].
We can remove 'A' from the first string and add it to the second string.
'CA', 'CBBA'.
Now, we can remove 'B" from the second string and add in the middle of string 'CA'.
So, we have 'CBA' and 'CBA'
Step-1. Compute the frequency of each character in all the strings of the list, let's say it as stringList.
Step-2: Compute the length of stringList and let's call it as length. (length is equal to the number of strings in the list.)
Step-3: Now, for the frequency of each character, Check if it is divisible by length. If any frequency is not divisible by length then it's not possible to equate the strings.
If it is possible:
Just distribute the characters equally among the strings to get an answer list.

Generating a perfect hash function given known list of strings?

Suppose I have a list of N strings, known at compile-time.
I want to generate (at compile-time) a function that will map each string to a distinct integer between 1 and N inclusive. The function should take very little time or space to execute.
For example, suppose my strings are:
{"apple", "orange", "banana"}
Such a function may return:
f("apple") -> 2
f("orange") -> 1
f("banana") -> 3
What's a strategy to generate this function?
I was thinking to analyze the strings at compile time and look for a couple of constants I could mod or add by or something?
The compile-time generation time/space can be quite expensive (but obviously not ridiculously so).
Say you have m distinct strings, and let ai, j be the jth character of the ith string. In the following, I'll assume that they all have the same length. This can be easily translated into any reasonable programming language by treating ai, j as the null character if j ≥ |ai|.
The idea I suggest is composed of two parts:
Find (at most) m - 1 positions differentiating the strings, and store these positions.
Create a perfect hash function by considering the strings as length-m vectors, and storing the parameters of the perfect hash function.
Obviously, in general, the hash function must check at least m - 1 positions. It's easy to see this by induction. For 2 strings, at least 1 character must be checked. Assume it's true for i strings: i - 1 positions must be checked. Create a new set of strings by appending 0 to the end of each of the i strings, and add a new string that is identical to one of the strings, except it has a 1 at the end.
Conversely, it's obvious that it's possible to find at most m - 1 positions sufficient for differentiating the strings (for some sets the number of course might be lower, as low as log to the base of the alphabet size of m). Again, it's easy to see so by induction. Two distinct strings must differ at some position. Placing the strings in a matrix with m rows, there must be some column where not all characters are the same. Partitioning the matrix into two or more parts, and applying the argument recursively to each part with more than 2 rows, shows this.
Say the m - 1 positions are p1, ..., pm - 1. In the following, recall the meaning above for ai, pj for pj ≥ |ai|: it is the null character.
let us define h(ai) = ∑j = 1m - 1[qj ai, pj % n], for random qj and some n. Then h is known to be a universal hash function: the probability of pair-collision P(x ≠ y &wedge; h(x) = h(y)) ≤ 1/n.
Given a universal hash function, there are known constructions for creating a perfect hash function from it. Perhaps the simplest is creating a vector of size m2 and successively trying the above h with n = m2 with randomized coefficients, until there are no collisions. The number of attempts needed until this is achieved, is expected 2 and the probability that more attempts are needed, decreases exponentially.
It is simple. Make a dictionary and assign 1 to the first word, 2 to the second, ... No need to make things complicated, just number your words.
To make the lookup effective, use trie or binary search or whatever tool your language provides.

Number of possible palindrome anagrams for a given word

I have to find No. of palindrome anagrams are possible for a given word.
Suppose the word is aaabbbb.My approach is
Prepare a hash map that contains no. of time each letter is appearing
For my example it will be
a--->3
b--->4
If length of string is even then no. of occurrence of each letter should be even to form palindrome of given word else no of
palindrome anagrams is 0
If length of string is odd then at max one occurrence of letter can be odd and other should be even.
This two above steps was for finding that weather a given word can can form palindrome or not.
Now for finding no of palindrome anagrams, what approach should I follow?
First thing to notice is that if the word is an odd length, then there must be exactly one character with an odd number of occurrences. If the word is an even length, then there must be no characters with an odd number of occurrences. In either case, you're looking for how many ways you can arrange the pairs of characters. You're looking for the number of permutations since order matters:
n = number of character pairs (aaaabbb would have 3 pairs, aabbcccc would have 4 pairs)
(n)!/( number_of_a_pairs! * number_of_b_pairs! * etc..)
So in the aaaabbb case, you're finding the permutations of aab:
3!/2!1! = 3
baa = baabaab
aba = abababa
aab = aabbbaa
And in the aabbcccc case, you're finding the permutations of abcc:
4!/2! = 12:
abcc
acbc
accb
bacc
bcac
bcca
cabc
cacb
cbac
cbca
ccab
ccba

On counting pairs of words that differ by one letter

Let us consider n words, each of length k. Those words consist of letters over an alphabet (whose cardinality is n) with defined order. The task is to derive an O(nk) algorithm to count the number of pairs of words that differ by one position (no matter which one exactly, as long as it's only a single position).
For instance, in the following set of words (n = 5, k = 4):
abcd, abdd, adcb, adcd, aecd
there are 5 such pairs: (abcd, abdd), (abcd, adcd), (abcd, aecd), (adcb, adcd), (adcd, aecd).
So far I've managed to find an algorithm that solves a slightly easier problem: counting the number of pairs of words that differ by one GIVEN position (i-th). In order to do this I swap the letter at the ith position with the last letter within each word, perform a Radix sort (ignoring the last position in each word - formerly the ith position), linearly detect words whose letters at the first 1 to k-1 positions are the same, eventually count the number of occurrences of each letter at the last (originally ith) position within each set of duplicates and calculate the desired pairs (the last part is simple).
However, the algorithm above doesn't seem to be applicable to the main problem (under the O(nk) constraint) - at least not without some modifications. Any idea how to solve this?
Assuming n and k isn't too large so that this will fit into memory:
Have a set with the first letter removed, one with the second letter removed, one with the third letter removed, etc. Technically this has to be a map from strings to counts.
Run through the list, simply add the current element to each of the maps (obviously by removing the applicable letter first) (if it already exists, add the count to totalPairs and increment it by one).
Then totalPairs is the desired value.
EDIT:
Complexity:
This should be O(n.k.logn).
You can use a map that uses hashing (e.g. HashMap in Java), instead of a sorted map for a theoretical complexity of O(nk) (though I've generally found a hash map to be slower than a sorted tree-based map).
Improvement:
A small alteration on this is to have a map of the first 2 letters removed to 2 maps, one with first letter removed and one with second letter removed, and have the same for the 3rd and 4th letters, and so on.
Then put these into maps with 4 letters removed and those into maps with 8 letters removed and so on, up to half the letters removed.
The complexity of this is:
You do 2 lookups into 2 sorted sets containing maximum k elements (for each half).
For each of these you do 2 lookups into 2 sorted sets again (for each quarter).
So the number of lookups is 2 + 4 + 8 + ... + k/2 + k, which I believe is O(k).
I may be wrong here, but, worst case, the number of elements in any given map is n, but this will cause all other maps to only have 1 element, so still O(logn), but for each n (not each n.k).
So I think that's O(n.(logn + k)).
.
EDIT 2:
Example of my maps (without the improvement):
(x-1) means x maps to 1.
Let's say we have abcd, abdd, adcb, adcd, aecd.
The first map would be (bcd-1), (bdd-1), (dcb-1), (dcd-1), (ecd-1).
The second map would be (acd-3), (add-1), (acb-1) (for 4th and 5th, value already existed, so increment).
The third map : (abd-2), (adb-1), (add-1), (aed-1) (2nd already existed).
The fourth map : (abc-1), (abd-1), (adc-2), (aec-1) (4th already existed).
totalPairs = 0
For second map - acd, for the 4th, we add 1, for the 5th we add 2.
totalPairs = 3
For third map - abd, for the 2th, we add 1.
totalPairs = 4
For fourth map - adc, for the 4th, we add 1.
totalPairs = 5.
Partial example of improved maps:
Same input as above.
Map of first 2 letters removed to maps of 1st and 2nd letter removed:
(cd-{ {(bcd-1)}, {(acd-1)} }),
(dd-{ {(bdd-1)}, {(add-1)} }),
(cb-{ {(dcb-1)}, {(acb-1)} }),
(cd-{ {(dcd-1)}, {(acd-1)} }),
(cd-{ {(ecd-1)}, {(acd-1)} })
The above is a map consisting of an element cd mapped to 2 maps, one containing one element (bcd-1) and the other containing (acd-1).
But for the 4th and 5th cd already existed, so, rather than generating the above, it will be added to that map instead, as follows:
(cd-{ {(bcd-1, dcd-1, ecd-1)}, {(acd-3)} }),
(dd-{ {(bdd-1)}, {(add-1)} }),
(cb-{ {(dcb-1)}, {(acb-1)} })
You can put each word into an array.Pop out elements from that array one by one.Then compare the resulting arrays.Finally you add back the popped element to get back the original arrays.
The popped elements from both the arrays must not be same.
Count number of cases where this occurs and finally divide it by 2 to get the exact solution
Think about how you would enumerate the language - you would likely use a recursive algorithm. Recursive algorithms map onto tree structures. If you construct such a tree, each divergence represents a difference of one letter, and each leaf will represent a word in the language.
It's been two months since I submitted the problem here. I have discussed it with my peers in the meantime and would like to share the outcome.
The main idea is similar to the one presented by Dukeling. For each word A and for each ith position within that word we are going to consider a tuple: (prefix, suffix, letter at the ith position), i.e. (A[1..i-1], A[i+1..n], A[i]). If i is either 1 or n, then the applicable substring is considered empty (these are simple boundary cases).
Having these tuples in hand, we should be able to apply the reasoning I provided in my first post to count the number of pairs of different words. All we have to do is sort the tuples by the prefix and suffix values (separately for each i) - then, words with letters equal at all but ith position will be adjacent to each other.
Here though is the technical part I am lacking. So as to make the sorting procedure (RadixSort appears to be the way to go) meet the O(nk) constraint, we might want to assign labels to our prefixes and suffixes (we only need n labels for each i). I am not quite sure how to go about the labelling stuff. (Sure, we might do some hashing instead, but I am pretty confident the former solution is viable).
While this is not an entirely complete solution, I believe it casts some light on the possible way to tackle this problem and that is why I posted it here. If anyone comes up with an idea of how to do the labelling part, I will implement it in this post.
How's the following Python solution?
import string
def one_apart(words, word):
res = set()
for i, _ in enumerate(word):
for c in string.ascii_lowercase:
w = word[:i] + c + word[i+1:]
if w != word and w in words:
res.add(w)
return res
pairs = set()
for w in words:
for other in one_apart(words, w):
pairs.add(frozenset((w, other)))
for pair in pairs:
print(pair)
Output:
frozenset({'abcd', 'adcd'})
frozenset({'aecd', 'adcd'})
frozenset({'adcb', 'adcd'})
frozenset({'abcd', 'aecd'})
frozenset({'abcd', 'abdd'})

Algorithm to find

the logic behind this was (n-2)3^(n-3) has lots of repetitons like (abc)***(abc) when abc is at start and at end and the strings repated total to 3^4 . similarly as abc moves ahead and number of sets of (abc) increase
You can use dynamic programming to compute the number of forbidden strings.
The algorithms follow from the observation below:
"Legal string of size n is the legal string of size n - 1 extended with one letter, so that the last three letters of the resulting string are not all distinct."
So if we had all the legal strings of size n-1 we could try extending them to obtain the legal strings of size n.
To check whether the extended string is legal we just need to know the last two letters of the previous string (of size n-1).
In the algorithm we will compute two arrays, where
different[i] # number of legal strings of length i in which last two letters are different
same[i] # number of legal strings of length i in which last two letters are the same
It can be easily proved that:
different[i+1] = different[i] + 2*same[i]
same[i+1] = different[i] + same[i]
It is the consequence of the following facts:
Any 'same' string of size i+1 can be obtained either from 'same' string of size i (think BB -> BBB) or from 'different' string (think AB -> ABB) and these are the only options.
Any 'different' string of size i+1 can be obtained either from 'different' string of size i (think AB-> ABA ) or from the 'same' string in two ways (AA -> AAB or AA -> AAC)
Having observed all this it is easy to write an algorithm that computes the result in O(n) time.
I suggest you use recursion, and look at two numbers:
F(n), the number of legal strings of length n whose last two symbols are the same.
G(n), the number of legal strings of length n whose last two symbols are different.
Is that enough to go on?
get the ASCII values of the last three letters and add the square values of these letters. If it gives a certain result, then it is forbidden. For A, B and C, it would be fine.
To do this:
1) find out how to get characters from your string.
2) find out how to get ASCII value of a character.
3) Multiply these ASCII values with themselves.
4) Do that for the three letters each time and add their values.

Resources