Count the number of vowels occurring in all the substrings of given string - algorithm

I am looking at this challenge:
Given a string of length N of lowercase characters containing 0 or more vowels, the task is to find the count of vowels that occurred in all the substrings of the given string.
Example
Input: str = "abc"
Output: 3
The given string "abc" contains only one vowel = 'a'.
Substrings of "abc" are
{"a", "b", "c", "ab", "bc", "abc"}
Hence, the sum of occurrences of the vowel(s) in these strings is:
3
('a' occurred 3 times).
How to solve the above problem in O(N) time complexity ?

Here are some elements to use in the algorithm:
Let's first count how many substrings can be formed from a string (ignoring vowel counts):
"a" => {"a"} => 1
"ab" => {"ab", "a", "b"} => 1+2 = 3
"abc" => {"abc", "ab", "bc", "a", "b", "c"} => 1+2+3 = 6
"abcd" => {"abcd", "abc", "bcd", "ab", "bc", "cd", "a", "b", "c", "d"} => 1+2+3+4 = 10
...
The pattern is 1+2+3+...+𝑛, where 𝑛 is the length of the string, which is 𝑛(𝑛+1)/2
Now let's take a string that just has one vowel: "klmnopqrst". Then the answer consists of counting the number of substrings which have this vowel.
We know there are 10(10+1)/2 = 55 substrings in total, but many of those counted subtrings do not have a vowel. None of the subtrings of "klmn" have a vowel. There are 4(4+1)/2 = 10 such subtrings. Also none of the subtrings of "pqrst" have a vowel. There are 5(5+1)/2 = 15 such substrings. All other substrings have the vowel. So we can make the subtraction... the output should be 55 - 10 - 15 = 30.
Therefore the general principle is: for each vowel in the input, determine how many substrings do not include that vowel -- by counting the number of substrings at the left, and those at the right of the vowel. This gives us a clue about the number of substrings that do include that vowel -- by subtracting the non-cases from the total number of substrings.
If we do this for each vowel, we will have counted the total occurrences of vowels in all the substrings.
Here is that algorithm expressed in pseudo code:
function occurrence(str):
n := length(str)
total := 0
allcount := n * (n + 1) // 2
for i := 1 to n:
if str[i] is a vowel:
total = total + allcount - (i - 1) * i / 2 - (n - 1 - i) * (n - i) / 2
return total
NB: note that -- as is common in pseudo code -- i is a position (starting at 1), not a zero-based index.

(In case trincot's answer is not enough.)
Each vowel appears in (l + 1) * (r + 1) substrings, where l is the number of characters to the left of the vowel and r the number of characters on the right of the vowel.
Example 1:
"abc"
'a': (0 + 1) * (2 + 1) = 3
Total: 3
Example 2:
"ae"
'a': (0 + 1) * (1 + 1) = 2
'e': (1 + 1) * (0 + 1) = 2
Total: 4

Related

Maxsubsequence - What are the key insights for this problem?

Below is the problem assignment using tree recursion approach:
Maximum Subsequence
A subsequence of a number is a series of (not necessarily contiguous) digits of the number. For example, 12345 has subsequences that include 123, 234, 124, 245, etc. Your task is to get the maximum subsequence below a certain length.
def max_subseq(n, l):
"""
Return the maximum subsequence of length at most l that can be found in the given number n.
For example, for n = 20125 and l = 3, we have that the subsequences are
2
0
1
2
5
20
21
22
25
01
02
05
12
15
25
201
202
205
212
215
225
012
015
025
125
and of these, the maxumum number is 225, so our answer is 225.
>>> max_subseq(20125, 3)
225
>>> max_subseq(20125, 5)
20125
>>> max_subseq(20125, 6) # note that 20125 == 020125
20125
>>> max_subseq(12345, 3)
345
>>> max_subseq(12345, 0) # 0 is of length 0
0
>>> max_subseq(12345, 1)
5
"""
"*** YOUR CODE HERE ***"
There are two key insights for this problem
You need to split into the cases where the ones digit is used and the one where it is not. In the case where it is, we want to reduce l since we used one of the digits, and in the case where it isn't we do not.
In the case where we are using the ones digit, you need to put the digit back onto the end, and the way to attach a digit d to the end of a number n is 10 * n + d.
I could not understand the insights of this problem, mentioned below 2 points:
split into the cases where the ones digit is used and the one where it is not
In the case where we are using the ones digit, you need to put the digit back onto the end
My understanding of this problem:
Solution to this problem looks to generate all subsequences upto l, pseudo code looks like:
digitSequence := strconv.Itoa(n) // "20125"
printSubSequence = func(digitSequence string, currenSubSequenceSize int) { // digitSequence is "20125" and currenSubSequenceSize is say 3
printNthSubSequence(digitSequence, currenSubSequenceSize) + printSubSequence(digitSequence, currenSubSequenceSize-1)
}
where printNthSubSequence prints subsequences for (20125, 3) or (20125, 2) etc...
Finding max_subseq among all these sequences then becomes easy
Can you help me understand the insights given in this problem, with an example(say 20125, 1)? here is the complete question
Something like this? As the instructions suggest, try it with and without the current digit:
function f(s, i, l){
if (i + 1 <= l)
return Number(s.substr(0, l));
if (!l)
return 0;
return Math.max(
// With
Number(s[i]) + 10 * f(s, i - 1, l - 1),
// Without
f(s, i - 1, l)
);
}
var input = [
['20125', 3],
['20125', 5],
['20125', 6],
['12345', 3],
['12345', 0],
['12345', 1]
];
for (let [s, l] of input){
console.log(s + ', l: ' + l);
console.log(f(s, s.length-1, l));
console.log('');
}

Efficient algorithm to find the n-th digit in the string 112123123412345

What is an efficient algorithm for finding the digit in nth position in the following string
112123123412345123456 ... 123456789101112 ...
Storing the entire string in memory is not feasible for very large n, so I am looking for an algorithm that can find the nth digit in the above string which works if n is very large (i.e. an alternative to just generating the first n digits of the string).
There are several levels here: the digit is part of a number x, the number x is part of a sequence 1,2,3...x...y and that sequence is part of a block of sequences that lead up to numbers like y that have z digits. We'll tackle these levels one by one.
There are 9 numbers with 1 digit:
first: 1 (sequence length: 1 * 1)
last: 9 (sequence length: 9 * 1)
average sequence length: (1 + 9) / 2 = 5
1-digit block length: 9 * 5 = 45
There are 90 numbers with 2 digits:
first: 10 (sequence length: 9 * 1 + 1 * 2)
last: 99 (sequence length: 9 * 1 + 90 * 2)
average sequence length: 9 + (2 + 180) / 2 = 100
2-digit block length: 90 * 100 = 9000
There are 900 numbers with 3 digits:
first: 100 (sequence length: 9 * 1 + 90 * 2 + 1 * 3)
last: 999 (sequence length: 9 * 1 + 90 * 2 + 900 * 3)
average sequence length: 9 + 180 + (3 + 2,700) / 2 = 1,540.5
3-digit block length: 900 * 1,540.5 = 1,386,450
If you continue to calculate these values, you'll find which block (of sequences up to how many digits) the digit you're looking for is in, and you'll know the start and end point of this block.
Say you want the millionth digit. You find that it's in the 3-digit block, and that this block is located in the total sequence at:
start of 3-digit block: 45 + 9,000 + = 9,045
start of 4-digit block: 45 + 9,000 + 1,386,450 = 1,395,495
So in this block we're looking for digit number:
1,000,000 - 9,045 = 990,955
Now you can use e.g. a binary search to find which sequence the 990,955th digit is in; you start with the 3-digit number halfway in the 3-digit block:
first: 100 (sequence length: 9 + 180 + 1 * 3)
number: 550 (sequence length: 9 + 180 + 550 * 3)
average sequence length: 9 + 180 + (3 + 1650) / 2 = 1,015.5
total sequence length: 550 * 1,015.5 = 558,525
Which is too small; so we try 550 * 3/4 = 825, see if that is too small or large, and go up or down in increasingly smaller steps until we know which sequence the 990,995th digit is in.
Say it's in the sequence for the number n; then we calculate the total length of all 3-digit sequences up to n-1, and this will give us the location of the digit we're looking for in the sequence for the number n. Then we can use the numbers 9*1, 90*2, 900*3 ... to find which number the digit is in, and then what the digit is.
We have three types of structures that we would like to be able to search on, (1) the sequence of concatenating d-digit numbers, for example, single digit:
123456...
or 3-digit:
100101102103
(2) the rows in a section,
where each section builds on the previous section added to a prefix. For example, section 1:
1
12
123
...
or section 3:
1234...10111213...100
1234...10111213...100102
1234...10111213...100102103
<----- prefix ----->
and (3) the full sections, although the latter we can just enumerate since they grow exponentially and help build our section prefixes. For (1), we can use simple division if we know the digit count; for (2), we can binary search.
Here's Python code that also answers the big ones:
def getGreatest(n, d, prefix):
rows = 9 * 10**(d - 1)
triangle = rows * (d + rows * d) // 2
l = 0
r = triangle
while l < r:
mid = l + ((r - l) >> 1)
triangle = mid * prefix + mid * (d + mid * d) // 2
prevTriangle = (mid-1) * prefix + (mid-1) * (d + (mid-1) * d) // 2
nextTriangle = (mid+1) * prefix + (mid+1) * (d + (mid+1) * d) // 2
if triangle >= n:
if prevTriangle < n:
return prevTriangle
else:
r = mid - 1
else:
if nextTriangle >= n:
return triangle
else:
l = mid
return l * prefix + l * (d + l * d) // 2
def solve(n):
debug = 1
d = 0
p = 0.1
prefixes = [0]
sections = [0]
while sections[d] < n:
d += 1
p *= 10
rows = int(9 * p)
triangle = rows * (d + rows * d) // 2
section = rows * prefixes[d-1] + triangle
sections.append(sections[d-1] + section)
prefixes.append(prefixes[d-1] + rows * d)
section = sections[d - 1]
if debug:
print("section: %s" % section)
n = n - section
rows = getGreatest(n, d, prefixes[d - 1])
if debug:
print("rows: %s" % rows)
n = n - rows
d = 1
while prefixes[d] < n:
d += 1;
if prefixes[d] == n:
return 9;
prefix = prefixes[d - 1]
if debug:
print("prefix: %s" % prefix)
n -= prefix
if debug:
print((n, d, prefixes, sections))
countDDigitNums = n // d
remainder = n % d
prev = 10**(d - 1) - 1
num = prev + countDDigitNums
if debug:
print("num: %s" % num)
if remainder:
return int(str(num + 1)[remainder - 1])
else:
s = str(num);
return int(s[len(s) - 1])
ns = [
1, # 1
2, # 1
3, # 2
100, # 1
2100, # 2
31000, # 2
999999999999999999, # 4
1000000000000000000, # 1
999999999999999993, # 7
]
for n in ns:
print(n)
print(solve(n))
print('')
Well, you have a series of sequences each increasing by a single number.
If you have "x" of them, then the sequences up to that point occupy x * (x + 1) / 2 character positions. Or, another way of saying this is that the "x"s sequence starts at x * (x - 1) / 2 (assuming zero-based indexing). These are called triangular numbers.
So, all you need to do is to find the "x" value where the cumulative amount is closest to a given "n". Here are three ways:
Search for a closed from solution. This exists, but the formula is rather complicated. (Here is one reference for the sum of triangular numbers.)
Pre-calculate a table in memory with values up to, say, 1,000,000. that will get you to 10^10 sizes.
Use a "binary" search and the formula. So, generate the sequence of values for 1, 2, 4, 8, and so on and then do a binary search to find the exact sequence.
Once you know the sequence where the value lies, determining the value is simply a matter of arithmetic.

Determining the pairs of integers that sum to some value in the array

I have the program which counts the number of pairs of N integers that sum to value. To simplify the problem, assume also that the integers are distinct.
l.Sort();
for (int i = 0; i < l.Count; ++i)
{
int j = l.BinarySearch(value - l[i]);
if (j > i)
{
Console.WriteLine("{0} {1}", i + 1, j+1);
}
}
To solve the problem, we sort the array (to enable binary search) and then, for every entry a[i] in the array, do a binary search for value - a[i]. If the result is an index j with j > i, we show this pair.
But this algorithm don't work on the following input:
1 2 3 4 4 9 56 90 because j always smaller than i.
How to fix that?
I would go with more efficient solution that needs more space.
Assume that numbers are not distinct
Create a hash table with your integers as a key and a frequency as a value
Iterate over this hash table.
For each key
calculate diff diff = value - k
lookup for diff in hash
if there is a match check if this value have got frequency > 0
if frequency is > 0 decrement it by 1 and yield current pair k, diff
Here is a Python code:
def count_pairs(arr, value):
hsh = {}
for k in arr:
cnt = hsh.get(k, 0)
hsh[k] = cnt + 1
for k in arr:
diff = value - k
cnt = hsh.get(diff)
if cnt > 0:
hsh[k] -= 1
print("Pair detected: " + str(k) + " and " + str(diff))
count_pairs([4, 2, 3, 4, 9, 1, 5, 4, 56, 90], 8)
#=> Pair detected: 4 and 4
#=> Pair detected: 3 and 5
#=> Pair detected: 4 and 4
#=> Pair detected: 4 and 4
As far as counts the number of pairs is very vague description, here you could see 4 distinct (by number's index) pairs.
If you want this to work for non-distinct values (which your
question does not say, but your comment implies), binary search only the
portion of the array after i. This also eliminates the need for the
if (j > i) test.
Would show the code, but I don't know how to specify such a slice in
whatever language you're using.

given n, how to find the number of different ways to write n as the sum of 1, 3, 4 in ruby?

Problem: given n, find the number of different ways to write n as the sum of 1, 3, 4
Example:for n=5, the answer is 6
5=1+1+1+1+1
5=1+1+3
5=1+3+1
5=3+1+1
5=1+4
5=4+1
I have tried with permutation method,but its efficiency is very low,is there a more efficient way to do?
Using dynamic programming with a lookup table (implemented with a hash, as it makes the code simpler):
nums=[1,3,4]
n=5
table={0=>1}
1.upto(n) { |i|
table[i] = nums.map { |num| table[i-num].to_i }.reduce(:+)
}
table[n]
# => 6
Note: Just checking one of the other answers, mine was instantaneous for n=500.
def add_next sum, a1, a2
residue = a1.inject(sum, :-)
residue.zero? ? [a1] : a2.reject{|x| residue < x}.map{|x| a1 + [x]}
end
a = [[]]
until a == (b = a.flat_map{|a| add_next(5, a, [1, 3, 4])})
a = b
end
a:
[
[1, 1, 1, 1, 1],
[1, 1, 3],
[1, 3, 1],
[1, 4],
[3, 1, 1],
[4, 1]
]
a.length #=> 6
I believe this problem should be addressed in two steps.
Step 1
The first step is to determine the different numbers of 1s, 3s and 4s that sum to the given number. For n = 5, there are only 3, which we could write:
[[5,0,0], [2,1,0], [1,0,1]]
These 3 elements are respectively interpreted as "five 1s, zero 3s and zero 4s", "two 1s, one 3 and zero 4s" and "one 1, zero 3s and one 4".
To compute these combinations efficiently, I first I compute the possible combinations using only 1s, that sum to each number between zero and 5 (which of course is trivial). These values are saved in a hash, whose keys are the summands and the value is the numbers of 1's needed to sum to the value of the key:
h0 = { 0 => 0, 1 => 1, 2 => 2, 3 => 3, 4 => 4, 5 => 5 }
(If the first number had been 2, rather than 1, this would have been:
h0 = { 0 => 0, 2 => 1, 4 => 2 }
since there is no way to sum only 2s to equal 1 or 3.)
Next we consider using both 1 and 3 to sum to each value between 0 and 5. There are only two choices for the number of 3s used, zero or one. This gives rise to the hash:
h1 = { 0 => [[0,0]], 1 => [[1,0]], 2 => [[2,0]], 3 => [[3,0], [0,1]],
4 => [[4,0], [1,1]], 5 => [[5,0], [2,1]] }
This indicates, for example, that:
there is only 1 way to use 1 and 3 to sum to 1: 1 => [1,0], meaning one 1 and zero 3s.
there are two ways to sum to 4: 4 => [[4,0], [1,1]], meaning four 1s and zero 3s or one 1 and one 3.
Similarly, when 1, 3 and 4 can all be used, we obtain the hash:
h2 = { 5 => [[5,0,0], [2,1,0], [1,0,1]] }
Since this hash corresponds to the use of all three numbers, 1, 3 and 4, we are concerned only with the combinations that sum to 5.
In constructing h2, we can use zero 4s or one 4. If we use use zero 4s, we would use one 1s and 3s that sum to 5. We see from h1 that there are two combinations:
5 => [[5,0], [2,1]]
For h2 we write these as:
[[5,0,0], [2,1,0]]
If one 4 is used, 1s and 3s totalling 5 - 1*4 = 1 are used. From h1 we see there is just one combination:
1 => [[1,0]]
which for h2 we write as
[[1,0,1]]
so
the value for the key 5 in h2 is:
[[5,0,0], [2,1,0]] + [[1,0,1]] = [[5,0,0], [2,1,0]], [1,0,1]]
Aside: because of form of hashes I've chosen to represent hashes h1 and h2, it is actually more convenient to represent h0 as:
h0 = { 0 => [[0]], 1 => [[1]],..., 5 => [[5]] }
It should be evident how this sequential approach could be used for any collection of integers whose combinations are to be summed.
Step 2
The numbers of distinct arrangements of each array [n1, n3, n4] produced in Step 1 equals:
(n1+n3+n4)!/(n1!n3!n4!)
Note that if one of the n's were zero, these would be binomial coefficients. If fact, these are coefficients from the multinomial distribution, which is a generalization of the binomial distribution. The reasoning is simple. The numerator gives the number of permutations of all the numbers. The n1 1s can be permuted n1! ways for each distinct arrangement, so we divide by n1!. Same for n3 and n4
For the example of summing to 5, there are:
5!/5! = 1 distinct arrangement for [5,0,0]
(2+1)!/(2!1!) = 3 distinct arrangements for [2,1,0] and
(1+1)!/(1!1!) = 2 distinct arrangements for [1,0,1], for a total of:
1+3+2 = 6 distinct arrangements for the number 5.
Code
def count_combos(arr, n)
a = make_combos(arr,n)
a.reduce(0) { |tot,b| tot + multinomial(b) }
end
def make_combos(arr, n)
arr.size.times.each_with_object([]) do |i,a|
val = arr[i]
if i.zero?
a[0] = (0..n).each_with_object({}) { |t,h|
h[t] = [[t/val]] if (t%val).zero? }
else
first = (i==arr.size-1) ? n : 0
a[i] = (first..n).each_with_object({}) do |t,h|
combos = (0..t/val).each_with_object([]) do |p,b|
prev = a[i-1][t-p*val]
prev.map { |pr| b << (pr +[p]) } if prev
end
h[t] = combos unless combos.empty?
end
end
end.last[n]
end
def multinomial(arr)
(arr.reduce(:+)).factorial/(arr.reduce(1) { |tot,n|
tot * n.factorial })
end
and a helper:
class Fixnum
def factorial
return 1 if self < 2
(1..self).reduce(:*)
end
end
Examples
count_combos([1,3,4], 5) #=> 6
count_combos([1,3,4], 6) #=> 9
count_combos([1,3,4], 9) #=> 40
count_combos([1,3,4], 15) #=> 714
count_combos([1,3,4], 30) #=> 974169
count_combos([1,3,4], 50) #=> 14736260449
count_combos([2,3,4], 50) #=> 72581632
count_combos([2,3,4,6], 30) #=> 82521
count_combos([1,3,4], 500) #1632395546095013745514524935957247\
00017620846265794375806005112440749890967784788181321124006922685358001
(I broke the result the example (one long number) into two pieces, for display purposes.)
count_combos([1,3,4], 500) took about 2 seconds to compute; the others were essentially instantaneous.
#sawa's method and mine gave the same results for n between 6 and 9, so I'm confident they are both correct. sawa's solution times increase much more quickly with n than do mine, because he is computing and then counting all the permutations.
Edit: #Karole, who just posted an answer, and I get the same results for all my tests (including the last one!). Which answer do I prefer? Hmmm. Let me think about that.)
I don't know ruby so I am writing it in C++
say for your example n=5.
Use dynamic programming set
int D[n],n;
cin>>n;
D[0]=1;
D[1]=1;
D[2]=1;
D[3]=2;
for(i = 4; i <= n; i++)
D[i] = D[i-1] + D[i-3] + D[i-4];
cout<<D[i];

Enumerate all words of a language

I have a language consisting of the words having exactly two "1" and three "0". How can I efficiently enumerate the finite set of all words of this language?
Easy, write the number 11100, calculate the number of permutation of this value = n! = 5!,
divide by the number of permutation of the 3 1's = 3! and the number of permutation of 0's = 2! => 5! / (2! * 3!) = 120 / (6 * 2) = 10
11100
11010
11001
10110
10101
10011
01110
01101
01011
00111
Now if you need the actual values, for an arbitrary language, you have no other choice but to use a backtracking algorithm.
For this particular case, you can easily build a simple algorithm generating this language:
Here's an example using python
def GenerateLanguage(nZeros, nOnes):
if nZeros + nOnes == 0:
return ['']
res = [] # Resulting list, initialize with 1 empty string
if nOnes > 0: # If we have 1's left, build all the strings that starts with a 1
for l in GenerateLanguage(nZeros, nOnes - 1):
res.append('1' + l)
if nZeros > 0: # If we have 0's left, build all the strings that starts with a 0
for l in GenerateLanguage(nZeros - 1, nOnes):
res.append('0' + l)
return res

Resources