I'm trying to solve pretty complex problem with bracket sequence, only '(' and ')', actually I need to implement segment tree that in logarithmic time will check if sequence of brackets is valid or not.
Valid sequence of bracket is a string that is either:
The empty string
A string (B) where B is valid sequence of brackets
LR, the concentration of two strings L and R which are both valid sequences of brackets.
Now with those conditions I need to implement segment tree that for query will check if the range [A,B] is valid sequence of brackets.
Actually what I need in my program is index K such that K <= B and the sequence [A,K] is valid sequence of brackets, K is maximum possible, but I think that I cannot implement segment tree that will give me index K, so I think that I will try to implement segment tree that will only check if the sequence is valid or not.
Example
sequence = "()((()))(())" query_1 = [1,2], answer for query 1 is true because the sequence "()" is valid.
query_2=[1,6]= "()((()" this sequence is not valid and the query should return false;
What I have tried
I implemented check with stack that checks if sequence of brackets is valid or not, but that check works in O(N) and I think that we can check this in O(logN)
Thanks in advance.
Make all your ( be equal to +1, and all the ) equal to -1. Now run through the parenthesis string and create a new array out of it by cumulatively adding the value of the bracket (+1/-1). Like so:
( ) ( ( ( ) ) ) ( ( ) )
1 0 1 2 3 2 1 0 1 2 1 0
Now if some [i,j] is a valid sequence, then for every x such that i<=x<=j, [i,x] should contain more open brackets ( than closed brackets ). If ()) was a segment, it's final value would be (1)+(-1)+(-1) = -1, which is negative. This implies that the number of ) in the segment is more than the number of (s. Which is impossible in a valid sequence. Similarly for a segment ())((), although the final sum of the segment would be zero, the sum at index three (1-indexed) would be -1. So this would be invalid.
So for any query [i,j] we just have to check if in that segment, whether there exists any point x such that the sum([i,x]) <= 0. If it is, return false. Also make a check that the number of ( is equal to that of ).
For this, maintain a minimum range query segment tree of the sum array. Find the minimum value in range [i,j]. (Call that val). If val - sum[i-1] < 0 return false. Again, make a check for ( == ).
Related
I am doing this problem https://www.spoj.com/problems/DIVSTR/
We are given two strings S and T.
S is divisible by string T if there is some non-negative integer k, which satisfies the equation S=k*T
What is the minimum number of characters which should be removed from S, so that S is divisible by T?
The main idea was to match T with S using a pointer and count the number of instances of T occurring in S when the count is done, bring the pointer to the start of T and if there's a mismatch, compare T's first letter with S's present letter.
This code is working totally fine with test cases they provided and custom test cases I gave, but it could not get through hidden test cases.
this is the code
def no_of_letters(string1,string2):
# print(len(string1),len(string2))
count = 0
pointer = 0
if len(string1)<len(string2):
return len(string1)
if (len(string1)==len(string2)) and (string1!=string2):
return len(string1)
for j in range(len(string1)):
if (string1[j]==string2[pointer]) and pointer<(len(string2)-1):
pointer+=1
elif (string1[j]==string2[pointer]) and pointer == (len(string2)-1):
count+=1
pointer=0
elif (string1[j]!=string2[pointer]):
if string1[j]==string2[0]:
pointer=1
else:
pointer = 0
return len(string1)-len(string2)*count
One place where I think there should be confusion is when same letters can be parts of two counts, but it should not be a problem, because our answer doesn't need to take overlapping into account.
for example, S = 'akaka' T= 'aka' will give the output 2, irrespective of considering first 'aka',ka as count or second ak,'aka'.
I believe that the solution is much more straightforward that you make it. You're simply trying to find how many times the characters of T appear, in order, in S. Everything else is the characters you remove. For instance, given RobertBaron's example of S="akbaabka" and T="aka", you would write your routine to locate the characters a, k, a, in that order, from the start of S:
akbaabka
ak a^
# with some pointer, ptr, now at position 4, marked with a caret above
With that done, you can now recur on the remainder of the string:
find_chars(S[ptr:], T)
With each call, you look for T in S; if you find it, count 1 repetition and recur on the remainder of S; if not, return 0 (base case). As you crawl back up your recursion stack, accumulate all the 1 counts, and there is your value of k.
The quantity of chars to remove is len(s) - k*len(T).
Can you take it from there?
What can be the most efficient algorithm to count the number of substrings of a given string that contain a given character.
e.g. for abb b
sub-strings : a, b, b, ab, bb, abb.
Answer : strings containg b atlest once = 5.
PS. i solved this question by generating all the substrings and then checking in O(n ^ 2). Just want to know whether there can be a better solution to this.
Let you need to find substrings with character X.
Scan string left to right, keeping position of the last X: lastX with starting value -1
When you meet X at position i, add i+1 to result and update lastX
(this is number of substrings ending in current position and they all contain X)
When you meet another character, add lastX + 1 to result
(this is again number of substrings ending in current position and containing X),
because the rightmost possible start of substring is position of the last X
Algorithm is linear.
Example:
a X a a X a
good substrings overall
idx char ending at idx lastX count count
0 a - -1 0 0
1 X aX X 1 2 2
2 a aXa Xa 1 2 4
3 a aXaa Xaa 1 2 6
4 X aXaaX XaaX aaX aX X 4 5 11
5 a aXaaXa XaaXa aaXa aXa Xa 4 5 16
Python code:
def subcnt(s, c):
last = -1
cnt = 0
for i in range(len(s)):
if s[i] == c:
last = i
cnt += last + 1
return cnt
print(subcnt('abcdba', 'b'))
You could turn this around and scan your string for occurrences of your letter. Every time you find an occurrence in some position i, you know that it is contained by definition in all the substrings that contain it (i.e. all substrings which start before or at i and end at or after i), so you only need to store pairs of indices to define substrings instead of storing substrings explicitly.
That being said, you'll still need O(n²) with this approach because although you don't mind repeated substrings as your example shows, you don't want to count the same substring twice, so you still have to make sure that you don't select the same pair of indices twice.
Let's consider the string as abcdaefgabb and the given character as a.
Loop over the string char by char.
If a character matches a given character, let's say a at index 4, so number of substrings which will contain a is from abcda to aefgabb. So, we add (4-0 + 1) + (10 - 4) = 11. These represent substrings as abcda,bcda,cda,da,a,ae,aef,aefg,aefga,aefgab and aefgabb.
This applies to wherever you find a, like you find it at index 0 and also at index 8.
Final answer is the sum of above mentioned math operations.
Update: You will have to maintain 2 pointers between last occurred a and the current a to avoid calculating duplicate substrings which start end end with the same index.
Think of a substring as selecting two elements from the gaps between the letters in your string and including everything between them (where there are gaps on the extreme ends of the string).
For a string of length n, there are choose(n+1,2) substrings.
Of those, for each run of k characters that doesn't include the target, there are choose(k+1,2) substrings that only include letters from that substring. All other substrings of the main string must include the target.
Answer: choose(n+1,2) - sum(choose(k_i+1,2)), where the k_i are the lengths of runs of letters that don't include the target.
Suppose we have sequence of x numbers and x-1 operators (+ or -), where the order of the numbers and the operators are fixed. For example 5-2-1+3. By different parentheses you get different values. For example (5 - 2)-1+3 = 5, 5-(2-1)+3=7 and so on. I am now interested in the maximum sum and best in linear run-time/memory space.
I think that this problem can be solved with dynamic programming, but I simply don't find a meaningful variant.
What you need here is certainly a dynamic algorithm.
This would work in a recursive way, finding the maximum value that can be gotten for every range.
Algorithm:
You could separate the numbers and the operators into different lists (if the first number is positive add + to the list first).
max_sum(expression, operators):
if len(expression) == 1: return expression
max_value = -float('inf') # minus infinity
length = len(expression)
for i in range(length):
left_exp = max_sum(expression[0:i], operators[0:i])
right_exp = max_sum(expression[i:length], operators[i:length])
value = operator[i].apply(left_exp, right_exp)
if value >= max_value:
max_value = value
return max_value
The main idea of the algorithm is that it checks the maximum sums in every possible range division, goes all the way down recursively and then returns the maximum sum it got.
The pseudo-code doesn't take into account a case where you could get a maximum value by substracting the minimum value of the right expression, but with a few tweaks I think you could fix it pretty fast.
I tried to make the pseudo-code as easy to convert to code as possible out of my head, I hope this helps you.
Let an expression be a sequence of operator-number pairs: it starts with an operator followed by a number, and ends with an operator followed by a number. Your example 5-2-1+3 can be made into an expression by placing a + at the beginning: +5-2-1+3.
Let the head of an expression be its first operator-number pair, and its tail, the rest. The head of +5-2-1+3 is +5 and the tail, -2-1+3.
In this context, let parenthesizing an expression mean placing an opening parenthesis just after the first operator and a closing parenthesis at the end of the expression, like so: +(5-2-1+3). Parenthesizing an expression with a positive head doesn't do anything. Parenthesizing an expression with a negative head is equivalent to changing every sign of its tail: -(5 -2-1+3) = -5 +2+1-3.
If you want to get an extremum by parenthesizing some of its subexpressions, then you can first make some simplifications. It's easy to see that any subexpression of the form +x1+x2+...+xn won't be split: all of its elements will be used together towards the extremum. Similarly, any subexpression of the form -x1-x2-...-xn won't be split, but may be parenthesized (-(x1-x2-...-xn)). Therefore, you can first simplify any subexpression of the first form into +X, where X is the sum of its elements, and any subexpression of the second form into -x1-X, where X is the sum of its tail elements.
The resulting expression cannot have 3 consecutive - operators or 2 consecutive + operators. Now, start from the end, find the first subexpression of the form -a-b, -a+b-c, or -a+b, and compute its potential minimum and its potential maximum:
min(-a-b) = -a-b
max(-a-b) = -(a-b)
min(-a+b-c) = -(a+b)-c
max(-a+b-c) = -a+b-c if b>=c, max(-a+b-c) = -(a+b-c) if b<=c
min(-a+b) = -(a+b)
max(-a+b) = -a+b
Repeat by treating that subexpression as a single operator-number pair in the next one, albeit with two possible values (its two extrema). This way, the extrema of each subsequent subexpression is computed until you get to the main expression, of which you can simply compute the maximum. Note that the main expression may have a positive first pair, which makes it a special case, but that's easy to take into account: just add it to the maximum.
Expression consists of numbers (0-9) seperated by one of the two operators '*' and '+'. There are no spaces between the characters.
Example: 1+2*3+4*5
We need to find out the maximum and minimum value we can get by using brackets at appropriate places.
Maximum value:105 = (1+2)*(3+4)*5
Minimum value: 27 = 1+2*3+4*5
I am looking for a recursive way to do it? Any ideas would be appreciated.
Minimization:
The main idea of the solution: instead of thinking how to add parentheses, let's think about which operation was the last one. Let's write a recursive function minimize(expr). What should it do? If it is given one number, it should just return it. Otherwise, we can iterate over all operators in it, call minimize for the part expression to the left and to the right of the operator and combine the result. Now we just need to pick the smallest value.
Here is some pseudo code:
int minimize(string expr)
if isNumber(expr) then // If it is one number, return it.
return value(expr)
int res = infinity
for int i <- 0 .. lenght expr - 1
if expr[i] == '+' then
res = min(res, minimize(expr[0 .. i - 1]) +
minimize(expr[i + 1 .. length expr - 1])
if expr[i] == '*' then
res = min(res, minimize(expr[0 .. i - 1]) *
minimize(expr[i + 1 .. length expr - 1])
return res
Maximization:
Pretty much the same, but we should take maximum instead of minimum at each step.
Why is it correct? When we multiply and add non-negative numbers, the larger(the smaller) the operands are, the larger(the smaller) the result is.
We can also use memoization to avoid recomputing result for the same expression twice(or more times) and obtain polynomial time complexity.
Is there a way in constant working space to do arbitrary size and arbitrary base conversions. That is, to convert a sequence of n numbers in the range [1,m] to a sequence of ceiling(n*log(m)/log(p)) numbers in the range [1,p] using a 1-to-1 mapping that (preferably but not necessarily) preservers lexigraphical order and gives sequential results?
I'm particularly interested in solutions that are viable as a pipe function, e.i. are able to handle larger dataset than can be stored in RAM.
I have found a number of solutions that require "working space" proportional to the size of the input but none yet that can get away with constant "working space".
Does dropping the sequential constraint make any difference? That is: allow lexicographically sequential inputs to result in non lexicographically sequential outputs:
F(1,2,6,4,3,7,8) -> (5,6,3,2,1,3,5,2,4,3)
F(1,2,6,4,3,7,9) -> (5,6,3,2,1,3,5,2,4,5)
some thoughts:
might this work?
streamBasen -> convert(n, lcm(n,p)) -> convert(lcm(n,p), p) -> streamBasep
(where lcm is least common multiple)
I don't think it's possible in the general case. If m is a power of p (or vice-versa), or if they're both powers of a common base, you can do it, since each group of logm(p) is then independent. However, in the general case, suppose you're converting the number a1 a2 a3 ... an. The equivalent number in base p is
sum(ai * mi-1 for i in 1..n)
If we've processed the first i digits, then we have the ith partial sum. To compute the i+1'th partial sum, we need to add ai+1 * mi. In the general case, this number is going have non-zero digits in most places, so we'll need to modify all of the digits we've processed so far. In other words, we'll have to process all of the input digits before we'll know what the final output digits will be.
In the special case where m are both powers of a common base, or equivalently if logm(p) is a rational number, then mi will only have a few non-zero digits in base p near the front, so we can safely output most of the digits we've computed so far.
I think there is a way of doing radix conversion in a stream-oriented fashion in lexicographic order. However, what I've come up with isn't sufficient for actually doing it, and it has a couple of assumptions:
The length of the positional numbers are already known.
The numbers described are integers. I've not considered what happens with the maths and -ive indices.
We have a sequence of values a of length p, where each value is in the range [0,m-1]. We want a sequence of values b of length q in the range [0,n-1]. We can work out the kth digit of our output sequence b from a as follows:
bk = floor[ sum(ai * mi for i in 0 to p-1) / nk ] mod n
Lets rearrange that sum into two parts, splitting it at an arbitrary point z
bk = floor[ ( sum(ai * mi for i in z to p-1) + sum(ai * mi for i in 0 to z-1) ) / nk ] mod n
Suppose that we don't yet know the values of a between [0,z-1] and can't compute the second sum term. We're left with having to deal with ranges. But that still gives us information about bk.
The minimum value bk can be is:
bk >= floor[ sum(ai * mi for i in z to p-1) / nk ] mod n
and the maximum value bk can be is:
bk <= floor[ ( sum(ai * mi for i in z to p-1) + mz - 1 ) / nk ] mod n
We should be able to do a process like this:
Initialise z to be p. We will count down from p as we receive each character of a.
Initialise k to the index of the most significant value in b. If my brain is still working, ceil[ logn(mp) ].
Read a value of a. Decrement z.
Compute the min and max value for bk.
If the min and max are the same, output bk, and decrement k. Goto 4. (It may be possible that we already have enough values for several consecutive values of bk)
If z!=0 then we expect more values of a. Goto 3.
Hopefully, at this point we're done.
I've not considered how to efficiently compute the range values as yet, but I'm reasonably confident that computing the sum from the incoming characters of a can be done much more reasonably than storing all of a. Without doing the maths though, I won't make any hard claims about it though!
Yes, it is possible
For every I character(s) you read in, you will write out O character(s)
based on Ceiling(Length * log(In) / log(Out)).
Allocate enough space
Set x to 1
Loop over digits from end to beginning # Horner's method
Set a to x * digit
Set t to O - 1
Loop while a > 0 and t >= 0
Set a to a + out digit
Set out digit at position t to a mod to base
Set a to a / to base
Set x to x * from base
Return converted digit(s)
Thus, for base 16 to 2 (which is easy), using "192FE" we read '1' and convert it, then repeat on '9', then '2' and so on giving us '0001', '1001', '0010', '1111', and '1110'.
Note that for bases that are not common powers, such as base 17 to base 2 would mean reading 1 characters and writing 5.