number of letters to be deleted from a string so that it is divisible by another string - algorithm

I am doing this problem https://www.spoj.com/problems/DIVSTR/
We are given two strings S and T.
S is divisible by string T if there is some non-negative integer k, which satisfies the equation S=k*T
What is the minimum number of characters which should be removed from S, so that S is divisible by T?
The main idea was to match T with S using a pointer and count the number of instances of T occurring in S when the count is done, bring the pointer to the start of T and if there's a mismatch, compare T's first letter with S's present letter.
This code is working totally fine with test cases they provided and custom test cases I gave, but it could not get through hidden test cases.
this is the code
def no_of_letters(string1,string2):
# print(len(string1),len(string2))
count = 0
pointer = 0
if len(string1)<len(string2):
return len(string1)
if (len(string1)==len(string2)) and (string1!=string2):
return len(string1)
for j in range(len(string1)):
if (string1[j]==string2[pointer]) and pointer<(len(string2)-1):
pointer+=1
elif (string1[j]==string2[pointer]) and pointer == (len(string2)-1):
count+=1
pointer=0
elif (string1[j]!=string2[pointer]):
if string1[j]==string2[0]:
pointer=1
else:
pointer = 0
return len(string1)-len(string2)*count
One place where I think there should be confusion is when same letters can be parts of two counts, but it should not be a problem, because our answer doesn't need to take overlapping into account.
for example, S = 'akaka' T= 'aka' will give the output 2, irrespective of considering first 'aka',ka as count or second ak,'aka'.

I believe that the solution is much more straightforward that you make it. You're simply trying to find how many times the characters of T appear, in order, in S. Everything else is the characters you remove. For instance, given RobertBaron's example of S="akbaabka" and T="aka", you would write your routine to locate the characters a, k, a, in that order, from the start of S:
akbaabka
ak a^
# with some pointer, ptr, now at position 4, marked with a caret above
With that done, you can now recur on the remainder of the string:
find_chars(S[ptr:], T)
With each call, you look for T in S; if you find it, count 1 repetition and recur on the remainder of S; if not, return 0 (base case). As you crawl back up your recursion stack, accumulate all the 1 counts, and there is your value of k.
The quantity of chars to remove is len(s) - k*len(T).
Can you take it from there?

Related

Longest odd Palindromic Substring with middle index i

Longest Odd Pallindromes
Problem Description
Given a string S(consisting of only lower case characters) and Q queries.
In each query you will given an integer i and your task is to find the length of longest odd palindromic substring whose middle index is i. Note:
1.) Assume 1 based indexing.
2.) Longest odd palindrome: A palindrome substring whose length is odd.
Problem Constraints
1<=|s|,Q<=1e5
1<=i<=|s|
Input Format
First argument A is string S.
Second argument B is an array of integers where B[i] denotes the query index of ith query.
Output Format
Return an array of integers where ith integer denotes the answer of ith query.
Is there any better way to solve this question other than brute force, that is, when we generate all the palindromic substrings and check
There is Manacher's algorithm that calculates number of palindromes centered at i-th index in linear time.
After precalculation stage you can answer query in O(1). I changed result array to contain lengths of the longest palindromes centered at every position.
Python code (link contains C++ one)
def manacher_odd(s):
n = len(s)
odds = []
l, r = 0, -1
for i in range(n):
k = min(odds[l+r-i], r-i+1) if i<=r else 1
while (i+k < n) and (i-k >= 0) and (s[i+k]==s[i-k]):
k += 1
odds.append(k)
if (i+k-1 > r):
l, r = i-k+1, i+k-1
for i in range(n):
odds[i] = 2 * odds[i] - 1
return odds
print(manacher_odd("abaaaba"))
[1, 3, 1, 7, 1, 3, 1]
There are 2 possible optimizations they might be looking for.
First, you can do an initial run over S first, cleverly building a lookup table, and then your query will just use that, which I think would be faster if B is long.
Alternatively, if not doing a look up, then while you're searching at index i, you'll potentially search neighboring indexes at the same time. As you check i, you can also be checking i+1, i-1, i+2, i-2, etc... as you go, and save that answer for later. This seems the less promising route to me, so I want to dive into the first idea more.
Finally, if B is quite short, then the best answer might be brute force, actually. It's good to know when to keep it simple.
Initial run method
One optimization that comes to mind for a pre-process run is as follows:
Search the next unknown index, brute force it by looking forwards and back, while recording the frequency of each letter (including the middle one.) If a palindrome of 1 or 3 was found, move to the next index and repeat.
If a palindrome or 5 or longer was found, calculate mid points of any letters that showed up more than twice which are to the right of the current index.
Any point between the current index and the index of the last palindrome letter that isn't in the mid-points list is a 1 for length.
This means you'll search all the midpoints found in (2). After that, you'll continue searching from the index of the last letter of the palindrome found in (2).
An example
Let's say S starts with: ```a, b, c, d, a, f, g, f, g, f, a, d, c, b, a, ...``` and you have checked from ```i = 2``` up to ```i = 7``` but found nothing except a run of 3 at ```i = 7```. Now, you check index ```i = 8```. You will find a palindrome extending out 7 letters in each direction, for a total of 15, but as you check, note any letters that show up more than twice. In this case, there are 3 ```f```s and 4 ```a```s. Find any mid points these pairs have that are right of the current index (8). In this case, 2 ```f```s have a mid point of i=9, the 2 right-most ```a```s have a midpoint of i=13. Once you're done looking at i=8, then you can skip any index not on your list, all the way up to the last letter you found in i=8. For example, we only have to check i=9 and i=13, and then start from i=15, checking every step. We've been able to skip checking i=10, 11, 12, and 14.

Counting Substrings: In a given text, find the number of substrings that start with an A and end with a B

For example, there are four such substrings in CABAAXBYA.
The original brute force algorithm that I used was, Using an outer for loop, whenever I encounter an A, I go inside another for loop to check if there's a B present or not. If a B is found, I increment the count. Finally, the value stored in the count variable yields the required result.
I came across a point while reading about String matching algorithms, when you traverse right to left rather than left to right, your algorithm is more efficient but here the substring isn't given as a parameter to the function that you would be using to compute the required value.
My question is if I traverse the string from right to left instead of left to right, will it make my algorithm more efficient in any case?
Here is one way in which iterating backwards through the string could result in O(n) computation instead of your original O(n^2) work:
A = "CABAAXBYA"
count = 0 # Number of B's seen
total = 0
for a in reversed(A):
if a=='B':
count += 1
elif a=='A':
total += count
print total
This works by keeping track in count of the number of B's to the right of the current point.
(Of course, you could also get the same result with forwards iteration by counting the number of A's to the left instead:
count = 0 # Number of A's seen
total = 0
for a in A:
if a=='A':
count += 1
elif a=='B':
total += count
print total
)

Reconstructing a string of words using a dictionary into an English sentence

I am completely stumped. The question is: given you have a string like "thisisasentence" and a function isWord() that returns true if it is an English word, I would get stuck on "this is a sent"
How can I recursively return and keep track of where I am each time?
You need backtracking, which is easily achievable using recursion. Key observation is that you do not need to keep track of where you are past the moment when you are ready to return a solution.
You have a valid "split" when one of the following is true:
The string w is empty (base case), or
You can split non-empty w into substrings p and s, such that p+s=w, p is a word, and s can be split into a sentence (recursive call).
An implementation can return a list of words when successful split is found, or null when it cannot be found. Base case will always return an empty list; recursive case will, upon finding a p, s split that results in non-null return for s, construct a list with p prefixed to the list returned from the recursive call.
The recursive case will have a loop in it, trying all possible prefixes of w. To speed things up a bit, the loop could terminate upon reaching the prefix that is equal in length to the longest word in the dictionary. For example, if the longest word has 12 characters, you know that trying prefixes 13 characters or longer will not result in a match, so you could cut enumeration short.
Just adding to the answer above.
According to my experience, many people understand recursion better when they see a «linearized» version of a recursive algorithm, which means «implemented as a loop over a stack». Linearization is applicable to any recursive task.
Assuming that isWord() has two parameters (1st: string to test; 2nd: its length) and returns a boolean-compatible value, a C implementation of backtracking is as follows:
void doSmth(char *phrase, int *words, int total) {
int i;
for (i = 0; i < total; ++i)
printf("%.*s ", words[i + 1] - words[i], phrase + words[i]);
printf("\n");
}
void parse(char *phrase) {
int current, length, *words;
if (phrase) {
words = (int*)calloc((length = strlen(phrase)) + 2, sizeof(int));
current = 1;
while (current) {
for (++words[current]; words[current] <= length; ++words[current])
if (isWord(phrase + words[current - 1],
words[current] - words[current - 1])) {
words[current + 1] = words[current];
current++;
}
if (words[--current] == length)
doSmth(phrase, words, current); /** parse successful! **/
}
free(words);
}
}
As can be seen, for each word, a pair of stack values are used, the first of which being an offset to the current word`s first character, whereas the second is a potential offset of a character exactly after the current word`s last one (thus being the next word`s first character). The second value of the current word (the one whose pair is at the top of our «stack») is iterated through all characters left in the phrase.
When a word is accepted, a new second value (equalling the current, to only look at positions after it) is pushed to the stack, making the former second the first in a new pair. If the current word (the one just found) completes the phrase, something useful is performed; see doSmth().
If there are no more acceptable words in the remaining part of our phrase, the current word is considered unsuitable, and its second value is discarded from the stack, effectively repeating a search for words at a previous starting location, while the ending location is now farther than the word previously accepted there.

Algorithm for traveling through a sequence of digits

Does anybody know an efficient algorithm for traveling through a sequence of digits by looking for a certain combination, e.g.:
There is this given sequence and I want to find the index of a certain combination of 21??73 in e.g.
... 124321947362862188734738 ...
So I have a pattern 21??94 and need to find out where is the index of:
219473
218873
I assume that there is way to not touch every single digit.
EDIT:
"Lasse V. Karlsen" has brought up an important point that I did forget.
There is no overlapping allowed, e.g.
21217373215573
212173 is ok, then the next would be 215573
Seems like you are looking for the regular expression 21..73 - . stands for "any character"1
Next you just need iterate all matches of this regex.
Most high level languages already have a regex library built in that is simple and easy to use for such tasks.
Note that many regex libraries already take care of "no overlapping" for you, including java:
String s = "21217373215573";
Matcher m = Pattern.compile("21..73").matcher(s);
while (m.find()) System.out.println(m.group());
Will yield the required output of:
212173
215573
(1) This assumes your sequence is of digits in the first place, as your question implies.
Depending on what language you are using, you could use regular expressions of the sort 21\d{2}73 which will look for 21, followed by two digits which are in turn followed by 73. Languages such as C# allow you to get the index of the match, as shown here.
Alternatively, you could construct your own Final State Machine which could be something of the sort:
string input = ...
int index = 0
while(index < input.length - 5)
if(input[index] == 2) && (input[index + 1] == 1) && (input[index + 4] == 7) && (input[index + 5] == 3)
print(index);
index += 6;
else index++
Since you dont know where these combinations start and you are not looking just for the first one, there is no way to not touch each digit (maybe just last n-1 digits, where n is length of combination, because if there is less numbers, there is not enough space).
I just dont know better way then just read whole sequence, because you can have
... 84452121737338494684 ...
and then you have two combinations overlapping. If you are not looking for overlapping combinations, it's just easier version, but it is possibility in your example.
Some non-overlap algorithm pseudo-code:
start := -1; i := 0
for each digit in sequence
if sequence[digit] = combination[i]
if start = -1
start := digit
endif
i++
if i >= length(combination)
possibleCombinations.add(start)
start := -1
i := 0
endif
else
start := -1
endif
end
This should be O(n). Same complexity as looking for one value in unsorted array. If you are looking for overlapping combinations like in my example, then complexity is a bit higher and you have to check each possible start, which add one loop inside checking each found start value. Something that check if combination continue, then leave start value or discarding it when combination is broken. Then complexity will be something like O(n*length(combination)), because there cannot be more starts, then what is length of combination.

Scope of variables and the digits function

My question is twofold:
1) As far as I understand, constructs like for loops introduce scope blocks, however I'm having some trouble with a variable that is define outside of said construct. The following code depicts an attempt to extract digits from a number and place them in an array.
n = 654068
l = length(n)
a = Int64[]
for i in 1:(l-1)
temp = n/10^(l-i)
if temp < 1 # ith digit is 0
a = push!(a,0)
else # ith digit is != 0
push!(a,floor(temp))
# update n
n = n - a[i]*10^(l-i)
end
end
# last digit
push!(a,n)
The code executes fine, but when I look at the a array I get this result
julia> a
0-element Array{Int64,1}
I thought that anything that goes on inside the for loop is invisible to the outside, unless I'm operating on variables defined outside the for loop. Moreover, I thought that by using the ! syntax I would operate directly on a, this does not seem to be the case. Would be grateful if anyone can explain to me how this works :)
2) Second question is about syntex used when explaining functions. There is apparently a function called digits that extracts digits from a number and puts them in an array, using the help function I get
julia> help(digits)
Base.digits(n[, base][, pad])
Returns an array of the digits of "n" in the given base,
optionally padded with zeros to a specified size. More significant
digits are at higher indexes, such that "n ==
sum([digits[k]*base^(k-1) for k=1:length(digits)])".
Can anyone explain to me how to interpret the information given about functions in Julia. How am I to interpret digits(n[, base][, pad])? How does one correctly call the digits function? I can't be like this: digits(40125[, 10])?
I'm unable to reproduce you result, running your code gives me
julia> a
1-element Array{Int64,1}:
654068
There's a few mistakes and inefficiencies in the code:
length(n) doesn't give the number of digits in n, but always returns 1 (currently, numbers are iterable, and return a sequence that only contain one number; itself). So the for loop is never run.
/ between integers does floating point division. For extracting digits, you´re better off with div(x,y), which does integer division.
There's no reason to write a = push!(a,x), since push! modifies a in place. So it will be equivalent to writing push!(a,x); a = a.
There's no reason to digits that are zero specially, they are handled just fine by the general case.
Your description of scoping in Julia seems to be correct, I think that it is the above which is giving you trouble.
You could use something like
n = 654068
a = Int64[]
while n != 0
push!(a, n % 10)
n = div(n, 10)
end
reverse!(a)
This loop extracts the digits in opposite order to avoid having to figure out the number of digits in advance, and uses the modulus operator % to extract the least significant digit. It then uses reverse! to get them in the order you wanted, which should be pretty efficient.
About the documentation for digits, [, base] just means that base is an optional parameter. The description should probably be digits(n[, base[, pad]]), since it's not possible to specify pad unless you specify base. Also note that digits will return the least significant digit first, what we get if we remove the reverse! from the code above.
Is this cheating?:
n = 654068
nstr = string(n)
a = map((x) -> x |> string |> int , collect(nstr))
outputs:
6-element Array{Int64,1}:
6
5
4
0
6
8

Resources