Suppose we have a string of binary values in which some portions may correspond to specific letters, for example:
A = 0
B = 10
C = 001
D = 010
E = 001
For example, if we assume the string "001010", we can have 6 different possibilities:
AABB
ADB
CAB
CD
EAB
ED
I have to extract the exact number of combinations.
I'm trying to solve the problem conceptually by a dynamic programming point of view but I have difficulty in the formulation of subproblems and in the composition of the corresponding matrix.
I appreciate any indications of the correct algorithm formulation.
Thanks in advance.
You can use a simple recursive procedure: try to match every pattern to the beginning of the string; if there is a match, repeat recursively with the remainder of the string. When the string is empty, you have found a decoding.
Patterns= ["0", "10", "001", "010", "001"]
Letters= "ABCDE"
def Decode(In, Out):
global Patterns
if len(In) == 0:
print Out
else:
for i in range(len(Patterns)):
if In[:len(Patterns[i])] == Patterns[i]:
Decode(In[len(Patterns[i]):], Out + Letters[i])
Decode("001010", "")
AABB
ADB
CAB
CD
EAB
ED
You can formulate a DP whereby f(i) = sum( f(i - j) * count(matches_j) ), for all matches of length j ending at index i, which, depending on the input, you might also speed up by creating a custom trie for the dictionary so you would only check relevant matches (e.g., A followed by B followed by D). To take your example:
f(0) = 1
f(1) = 1 * f(0) = 1
f(2) = 2
f(3) = 1 * f(2) + 1 * f(1) + 1 * f(0) = 4
f(4) = 0
f(5) = 1 * f(4) + 1 * f(3) + 1 * f(2) = 6
When solving DP problems, it often helps to think about a recursive solution first, then thinking about converting it to a DP solution.
A nice recursive insight here is that if you have a nonempty string of digits, any way of decoding it will start with some single character. You could therefore count the number of ways to decode the string by trying each character, seeing if it matches at the beginning and, if so, counting up how many ways there are to decode the rest of the string.
The reason this turns into a nice DP problem is that when you pull off a single character you're left with a shorter string of digits that's always a suffix of the original string. So imagine that you made a table storing, for each suffix of the original string, how many ways there were to decode that string. If you fill that matrix in from the right to the left using the above insight, you'd ultimately end up getting the final answer by reading off the entry corresponding to the entire string.
See if you can find a way to turn this into a concrete algorithm and to then go and code it up. Good luck!
Related
Question link : https://www.codechef.com/problems/STR
question is :
Little John just had his first class in school. He was taught first 20
letters of English alphabet and was asked to make words from these
alphabets.
Since he doesn't know many dictionary words, he quickly finished this work
by making random strings from these alphabets.
Now while other kids are busy creating their words, John gets curious and
puts all the strings he created in a list and named it X.
He picks two indices 'i' and 'j' ( not necessarily distinct). He assigns A
as X[i] and B as X[j]. He then concatenates both the strings to create a new
string C ( = A + B ). He calls a string "super string" if that string
contains all the 20 letters of English alphabet he has just learnt,atleast
once.
Given the strings of the list, can you tell him how many such unordered
pairs (i,j) he can choose such that string C is a super string.
Editorial : https://discuss.codechef.com/questions/79843/str-editorial
I cannot understand logic of dp here.Can someone help me ?
For the sake of simplicity, assume that we used first 6 characters from 'a' to 'f' instead of 20 characters. We will store each string in 6 bits by putting 1s for the characters they contain (for example, the bitmask of "abc" can be 111000).
A supermask of a string s satisfies the following:
If i-th bit of s is 1, i-th bit of the supermask is 1.
If i-th bit of s is 0, i-th bit of the supermask can be either 0 or 1.
Supermasks of s = 111000 are 111000, 111001 ... 111111. Let's denote x as the integer representation of maximum possible s value, which 63. Notice that for a string s:
s | x - s = x (111000 | 000111 = 56 + 7)
The first solution that author suggests is this: Assume that you have calculated the count of all supermasks for numbers i+1, i+2 ... x where 0 <= i <= x. Let bit(i, k) denote k-th least significant bit of input bitmask i (for i = "111000", bit(i, 2) = 0). Finally, let dp[i] denote the count of supermasks of i. The algorithm suggests that,
An element is a supermask of itself (dp[i] = 1)
From least significant bit to most significant, whenever you encounter a 0 on index k
If you flip bit k to 1, result i' is a supermask of i (dp[i]++)
All supermasks of i' are supermasks of i (dp[i] += dp[i | bit(i, k)])
The problem is this solution counts the same supermasks multiple times. Consider the case when i = 111000, it counts supermask 111111 for both i' = 111001 and i'' = 111010. You need to find a way to eliminate these duplicates.
The final thing that author suggests is as follows: Let dp[i][j] denote the number of supermasks of i, such that rightmost j 0-bits of i are all zeros. For example for i = 111000, dp[i][j] includes 111000 and 111100. Using this approach, iterating i = 111000 gives:
dp[i][0] = 111001, 111011, 111101, 111111
dp[i][1] = 111010, 111110
dp[i][2] = 111100
Unfortunately, the documentation of the author was very bad and I wasn't able to understand the notation used in his final formulation of the problem. Still, I hope that the explanation is useful enough to understand the logic.
The Z algorithm is a string matching algorithm with O(n) complexity.
One use case is finding the longest occurence of string A from string B. For example, the longest occurence of "overdose" from "stackoverflow" would be "over". You could discover this by calling the Z algorithm with a combined string "overdose#stackoverflow" (where # is some character not present in either string). The Z algorithm would then try to match the combined string with itself - and create an array z[] where z[i] gives you the length of longest match starting from index i. In our example:
index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
string o v e r d o s e # s t a c k o v e r f l o w
z (21) 0 0 0 0 1 0 0 0 0 0 0 0 0 4 0 0 0 0 0 1 0
There are plenty of code implementations and mathematically oriented explanations of the algorithm, here are some good ones:
http://www.geeksforgeeks.org/z-algorithm-linear-time-pattern-searching-algorithm/
http://codeforces.com/blog/entry/3107
I can see how it works, but I don't understand why. It seems almost like black magic. I have a very strong intuition that this task is supposed to take O(n^2), yet here is an algorithm that does it in O(n)
I don't find it completely intuitive either, so I think that I qualify for answering. Otherwise I'd just say that you don't understand because you're an idiot, and surely that's not the answer your hoping for :-)
Case in point (citation from an explanation):
Correctness is inherent in the algorithm and is pretty intuitively clear.
So, let's try to be even more intuitive...
First, I'd guess that the common intuition for O(n^2) is this: for a string of length N, if you're dropped at a random place i in the string with no other information, you have to match x (< N) characters to compute Z[i]. If you're dropped N times, you have to do up to N(N-1) tests, so that's O(n^2).
The Z algorithm, however, makes good use of the informations you've gained from the past computations.
Let's see.
First, as long as you don't have a match (Z[i]=0), you progress along the string with one comparison per character, so that's O(N).
Second, when you find a range where there's a match (at index i), the trick is to use clever deductions using the previous Z[0...i-1] to compute all the Z values in that range in constant time, without other comparisons inside that range. The next matches will only be done on the right of the range.
That's how I understand it anyway, hope this helps.
I was looking for a deeper understanding for this algorithm hence I found this question.
I didn't understand the codeforces post initially, but later I found it is good enough for understanding, and I noticed that the post was not entirely accurate, and it omitted some steps in the thinking process, making it a bit confusing.
Let me try to correct the inaccuracy in that post, and clarify some of the steps I think may help people connect the dots to a line. In this process, I hope we can learn some intuition from the original author. In the explanation, I'll mix some quoted blocks from codeforces and my own notes so we can keep the original post close to our discussion.
The Z algorithm starts as:
As we iterate over the letters in the string (index i from 1 to n - 1), we maintain an interval [L, R] which is the interval with maximum R such that 1 ≤ L ≤ i ≤ R and S[L...R] is a prefix-substring (if no such interval exists, just let L = R = - 1). For i = 1, we can simply compute L and R by comparing S[0...] to S[1...]. Moreover, we also get Z1 during this.
This is simple and straightforward.
Now suppose we have the correct interval [L, R] for i - 1 and all of the Z values up to i - 1. We will compute Z[i] and the new [L, R] by the following steps:
If i > R, then there does not exist a prefix-substring of S that starts before i and ends at or after i. If such a substring existed, [L, R] would have been the interval for that substring rather than its current value. Thus we "reset" and compute a new [L, R] by comparing S[0...] to S[i...] and get Z[i] at the same time (Z[i] = R - L + 1).
The bold part in the bullet point might be confusing, but if you read it twice, it's really just repeating the definition of R.
Otherwise, i ≤ R, so the current [L, R] extends at least to i. Let k = i - L. We know that Z[i] ≥ min(Z[k], R - i + 1) because S[i...] matches S[k...] for at least R - i + 1 characters (they are in the [L, R] interval which we know to be a prefix-substring). Now we have a few more cases to consider.
The bold part is not completely accurate, because R - i + 1 can be greater than Z[k], in which case Z[i] would be Z[k].
Let's focus on the key now: Z[i] ≥ min(Z[k], R - i + 1). Why is this true? Because of the following:
Based on the definition of interval [L, R] and i ≤ R, we already confirmed that S[0...R - L] == S[L...R], hence S[0...k] == S[L...i], and S[k...R - L] == S[i...R];
Say Z[k] = x, based on the definition of Z, we know S[0...x] == S[k...k + x];
Combined above equations, we know S[0...x] == S[L...L + x] == S[k...k + x] == S[i...i + x], when x < R - i + 1. The point is, S[k...k + x] == S[i...i + x], so Z[i] = Z[k] when Z[k] < R - i + 1.
These are the missing dots I mentioned in the beginning, and they explain both the second and the third bullet points, and partially the last bullet point. This wasn't straightforward when I read the codeforces post. To me this is the most important part of this algorithm.
For the last bullet point, if Z[k] ≥ R - i + 1, we would refresh [L, R], using i as the new L, and extending R to a bigger R'.
In the whole process, Z algorithm only uses each character once for comparison, so the time complexity is O(n).
As Ilya answered, the intuition in this algorithm is to carefully reuse every piece of information we gathered so far. I just explained it in another way. Hope it helps.
The text of Alice in Wonderland contains the word 'Wonderland' 8 times. (Let's be case-insensitive for this question).
However it contains the word many more times if you count non-contiguous subsequences as well as substrings, eg.
Either the well was very deep, or she fell very slowly, for she had
plenty of time as she went down to look about her and to WONDER what was
going to happen next. First, she tried to Look down AND make out what
she was coming to, but it was too dark to see anything;
(A subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements. —Wikipedia)
How many times does the book contain the word Wonderland as a subsequence? I expect this will be a big number—it's a long book with many w's and o's and n's and d's.
I tried brute force counting (recursion to make a loop 10 deep) but it was too slow, even for that example paragraph.
Let's say you didn't want to search for wonderland, but just for w. Then you'd simply count how many times w occurred in the story.
Now let's say you want wo. For each first character of the current pattern you find, you add to your count:
How many times the current pattern without its first character occurs in the rest of the story, after this character you're at: so you have reduced the problem (story[1..n], pattern[1..n]) to (story[2..n], pattern[2..n])
How many times the entire current pattern occurs in the rest of the story. So you have reduced the problem to (story[2..n], pattern[1..n])
Now you can just add the two. There is no overcounting if we talk in terms of subproblems. Consider the example wawo. Obviously, wo occurs 2 times. You might think the counting will go like:
For the first w, add 1 because o occurs once after it and another 1 because wo occurs once after it.
For the second w, add 1 because o occurs once after it.
Answer is 3, which is wrong.
But this is what actually happens:
(wawo, wo) -> (awo, o) -> (wo, o) -> (o, o) -> (-, -) -> 1
-> (-, o) -> 0
-> (awo, wo) -> (wo, wo) -> (o, wo) -> (-, wo) -> 0
-> (o, o) -> (-, -) -> 1
-> (-, o) -> 0
So you can see that the answer is 2.
If you don't find a w, then the count for this position is just how many times wo occurs after this current character.
This allows for dynamic programming with memoization:
count(story_index, pattern_index, dp):
if dp[story_index, pattern_index] not computed:
if pattern_index == len(pattern):
return 1
if story_index == len(story):
return 0
if story[story_index] == pattern[pattern_index]:
dp[story_index, pattern_index] = count(story_index + 1, pattern_index + 1, dp) +
count(story_index + 1, pattern_index, dp)
else:
dp[story_index, pattern_index] = count(story_index + 1, pattern_index, dp)
return dp[story_index, pattern_index]
Call with count(0, 0, dp). Note that you can make the code cleaner (remove the duplicate function call).
Python code, with no memoization:
def count(story, pattern):
if len(pattern) == 0:
return 1
if len(story) == 0:
return 0
s = count(story[1:], pattern)
if story[0] == pattern[0]:
s += count(story[1:], pattern[1:])
return s
print(count('wonderlandwonderland', 'wonderland'))
Output:
17
This makes sense: for each i first characters in the first wonderland of the story, you can group it with remaining final characters in the second wonderland, giving you 10 solutions. Another 2 are the words themselves. The other five are:
wonderlandwonderland
********* *
******** **
******** * *
** ** ******
*** * ******
You're right that this will be a huge number. I suggest that you either use large integers or take the result modulo something.
The same program returns 9624 for your example paragraph.
The string "wonderland" occurs as a subsequence in Alice in Wonderland1 24100772180603281661684131458232 times.
The main idea is to scan the main text character by character, keeping a running count of how often each prefix of the target string (i.e.: in this case, "w", "wo", "won", ..., "wonderlan", and "wonderland") has occurred up to the current letter. These running counts are easy to compute and update. If the current letter does not occur in "wonderland", then the counts are left untouched. If the current letter is "a" then we increment the count of "wonderla"s seen by the number of "wonderl"s seen up to this point. If the current letter is "n" then we increment the count of "won"s by the count of "wo"s and the count of "wonderlan"s by the count of "wonderla"s. And so forth. When we reach end of the text, we will have the count of all prefixes of "wonderland" including the string "wonderland" itself, as desired.
The advantage of this approach is that it requires a single pass through the text and does not require O(n) recursive calls (which will likely exceed the maximum recursion depth unless you do something clever).
Code
import fileinput
import string
target = 'wonderland'
prefixes = dict()
count = dict()
for i in range(len(target)) :
letter = target[i]
prefix = target[:i+1]
if letter not in prefixes :
prefixes[letter] = [prefix]
else :
prefixes[letter].append(prefix)
count[prefix] = 0L
for line in fileinput.input() :
for letter in line.lower() :
if letter in prefixes :
for prefix in prefixes[letter] :
if len(prefix) > 1 :
count[prefix] = count[prefix] + count[prefix[:len(prefix)-1]]
else:
count[prefix] = count[prefix] + 1
print count[target]
Using this text from Project Gutenberg, starting with "CHAPTER I. Down the Rabbit-Hole" and ending with "THE END"
Following up on previous comments, if you are looking for an algorithm that would return 2 for the input wonderlandwonderland and 1 for wonderwonderland, then I think you could adapt the algorithm from this question:
How to find smallest substring which contains all characters from a given string?
Effectively, the change in your case would be that, once an instance of the word is found, you increment a counter and repeat all the procedure with the remaining part of the text.
Such algorithm would be O(n) in time when n is the lenght of the text and O(m) in space where m is the length of the searched string.
Imagine we have an alphabet of, say, 5 chars: ABCDE.
We now want to enumerate all possible sets of 3 of those letters. Each letter can only be present once is a set, and the order of letters doesn't matter (hence the letters in the set should be sorted).
So we get the following sets:
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
For a total of 10 sets. The order is lexicographical.
Let's now assume that the alphabet length is N (5 in this example) and the length of the set in M (3 in this example). Knowing N and M, how could we, if at all possible:
Tell the total number of combinations in at worst O(M+N) (the answer is 10 in this example)?
Output the combination with any given number (given 1, return ABC; given 5, return ACE and so on) in at worst O(M+N)?
It's trivial to do those things with O(M^N) complexity by generating the whole list, but I wonder if there's a better solution.
The answer to the first question is straightforward: it is C(n,r), where we are to choose all combinations of r items from a set of size n. The formula is here among other places:
C(n,r) = n! / (r! (n-r)!)
The ability to select the i'th combination without computing all the others will depend on having an encoding that relates the combination number i to the combination. That would be much more challenging and will require more thought ...
(EDIT)
Having given the problem more thought, a solution looks like this in Python:
from math import factorial
def combination(n,r):
return factorial(n) / (factorial(r) * factorial(n-r))
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def showComb(n,r,i,a):
if r < 1:
return ""
rr = r-1
nn = max(n-1,rr)
lasti = i
i -= combination(nn,rr)
j = 0
while i > 0:
j += 1
nn = max(nn-1,1)
rr = min(rr,nn) # corrected this line in second edit
lasti = i
i -= combination(nn,rr)
return a[j] + showComb(n-j-1,r-1,lasti,a[(j+1):])
for i in range(10):
print(showComb(5,3,i+1,alphabet))
... which outputs the list shown in the question.
The approach I've used is to find the first element of the i'th output set using the idea that the number of combinations of the remaining set elements can be used to find which should be the first element for a given number i.
That is, for C(5,3), the first C(4,2) (=6) output sets have 'A' as their first character, then the next C(3,1) (=3) output sets have 'B' then C(1,1) (=1) sets have 'C' as their first character.
The function then finds the remaining elements recursively. Note that showComb() is tail-recursive so it could be expressed as a loop if you preferred, but I think the recursive version is easier to understand in this case.
For further testing, the following code may be useful:
import itertools
def showCombIter(n,r,i,a):
return ''.join(list(itertools.combinations(a[0:n],r))[i-1])
print ("\n")
# Testing for other cases
for i in range(120):
x = showComb(10,3,i+1,alphabet)
y = showCombIter(10,3,i+1,alphabet)
print(i+1,"\t",x==y,"\t",x,y)
... which confirms that all 120 examples of this case are correct.
I haven't calculated the time complexity exactly but the number of calls to showComb() will be r and the while loop will execute n times or fewer. Thus, in the terminology of the question, I am pretty sure the complexity will be less than O(M+N), if we assume that the factorial() function can be calculated in constant time, which I don't think is a bad approximation unless its implementation is naive.
Agree the first part is easy, put a similar equation to this into a language of your choice.
x=12
y=5
z=1
base=1
until [[ $z -gt y ]]
do
base=`echo $x $z $base|awk '{print ($1/$2) * $3}'`
x=`expr $x - 1`
z=`expr $z + 1`
echo base:$base
done
echo $base
The above example uses 12 Items, arranged in sets of 5 for 792 combinations.
To do the second part of your question... I am just thinking about it, but it is not straight forward by any stretch.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Let me explain with an example. If n=4 and r=2 that means all 4 digit binary numbers such that two adjacent digits can be 1. so the answer is 0011 0110 1011 1100 1101
Q. i am unable to figure out a pattern or an algorithm.
Hint: The 11 can start in position 0, 1, or 2. On either side, the digit must be zero, so the only "free" digits are in the remaining position and can cycle through all possible values.
For example, if there are n=10 digits and you're looking for r=3 adjacent ones, the pattern is
x01110y
Where x and y can cycle through all possible suffixes and prefixes for the remaining five free digits. Note, on the sides, the leading and trailing zero gets dropped, leaving six free digits in x0111 and 1110y.
Here's an example using Python:
from itertools import product
def gen(n, r):
'Generate all n-length sequences with r fixed adjacent ones'
result = set()
fixed = tuple([1] * r + [0])
for suffix in product([0,1], repeat=n-r-1):
result.add(fixed + suffix)
fixed = tuple([0] + [1] * r + [0])
rem = n - r - 2
for leadsize in range(1, rem):
for digits in product([0,1], repeat=rem):
result.add(digits[:leadsize] + fixed + digits[leadsize:])
fixed = tuple([0] + [1] * r)
for prefix in product([0,1], repeat=n-r-1):
result.add(prefix + fixed)
return sorted(result)
I would start with simplifying the problem. Once you have a solution for the simplest case, generalize it and then try to optimize it.
First design an algorithm that will find out if a given number has 'r' adjacent 1s. Once you have it, the brute-force way is to go through all the numbers with 'n' digits, checking each with the algorithm you just developed.
Now, you can look for optimizing it. For example: if you know whether 'r' is even or odd, you can reduce your set of numbers to look at. The counting 1's algorithm given by KNR is order of number of set bits. Thus, you rule out half of the cases with lesser complexity then actual bit by bit comparison. There might be a better way to reduce this as well.
Funny problem with very simple recursive solution. Delphi.
procedure GenerateNLengthWithROnesTogether(s: string;
N, R, Len, OnesInRow: Integer; HasPatternAlready: Boolean);
begin
if Len = N then
Output(s)
else
begin
HasPatternAlready := HasPatternAlready or (OnesInRow >= R);
if HasPatternAlready or (N - Len > R) //there is chance to make pattern}
then
GenerateNLengthWithROnesTogether('0' + s, N, R, Len + 1, 0, HasPatternAlready);
if (not HasPatternAlready) or (OnesInRow < R - 1) //only one pattern allowed
then
GenerateNLengthWithROnesTogether('1' + s, N, R, Len + 1, OnesInRow + 1, HasPatternAlready);
end;
end;
begin
GenerateNLengthWithROnesTogether('', 5, 2, 0, 0, False);
end;
program output:
N=5,R=2
11000 01100 11010 00110
10110 11001 01101 00011
10011 01011
N=7, R=3
1110000 0111000 1110100 0011100
1011100 1110010 0111010 1110110
0001110 1001110 0101110 1101110
1110001 0111001 1110101 0011101
1011101 1110011 0111011 0000111
1000111 0100111 1100111 0010111
1010111 0110111
As I've stated in the comment above, I am still unclear about the full restrictions of the output set. However, the algorithm below can be refined to cover your final case.
Before I can describe the algorithm, there is an observation: let S be 1 repeated m times, and D be the set of all possible suffixes we can use to generate valid outputs. So, the bit string S0D0 (S followed by the 0 bit, followed by the bit string D followed by the 0 bit) is a valid output for the algorithm. Also, all strings ror(S0D0, k), 0<=k<=n-m are valid outputs (ror is the rotate right function, where bits that disappear on the right side come in from left). These will generate the bit strings S0D0 to 0D0S. In addition to these rotations, the solutions S0D1 and 1D0S are valid bit strings that can be generated by the pair (S, D).
So, the algorithm is simply enumerating all valid D bit strings, and generating the above set for each (S, D) pair. If you allow more than m 1s together in the D part, it is simple bit enumeration. If not, it is a recursive definition, where D is the set of outputs of the same algorithm with n'=n-(m+2) and m' is each of {m, m-1, ..., 1}.
Of course, this algorithm will generate some duplicates. The cases I can think of are when ror(S0D0,k) matches one of the patterns S0E0, S0E1 or 1E0S. For the first case, you can stop generating more outputs for larger k values. D=E generator will take care of those. You can also simply drop the other two cases, but you need to continue rotating.
I know there is an answer, but I wanted to see the algorithm at work, so I implemented a crude version. It turned out to have more edge cases than I realized. I haven't added duplication check for the two last yields of the family() function, which causes duplication for outputs like 11011, but the majority of them are eliminated.
def ror(str, n):
return str[-n:]+str[:-n]
def family(s, d, r):
root = s + '0' + d + '0'
yield root # root is always a solution
for i in range(1, len(d)+3):
sol=ror(root, i)
if sol[:r]==s and sol[r]=='0' and sol[-1]=='0':
break
yield sol
if d[-r:]!=s: # Make sure output is valid
yield s + '0' + d + '1'
if d[:r]!=s: # Make sure output is valid (todo: duplicate check)
yield '1' + d + '0' + s
def generate(n, r):
s="1"*r
if r==0: # no 1's allowed
yield '0'*n
elif n==r: # only one combination
yield s
elif n==r+1: # two cases. Cannot use family() for this
yield s+'0'
yield '0'+s
else:
# generate all sub-problem outputs
for rr in range(r+1):
if n-r-2>=rr:
for d in generate(n-r-2, rr):
for sol in family(s, d, r):
yield sol
You use it either as [s for s in generate(6,2)], or in a loop as
for s in generate(6,3):
print(s)